Proximity algorithms for nearly-doubling spaces - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Proximity algorithms for nearly-doubling spaces

Description:

Proximity algorithms for nearly-doubling spaces Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual before you ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 16
Provided by: nyu48
Learn more at: https://cs.nyu.edu
Category:

less

Transcript and Presenter's Notes

Title: Proximity algorithms for nearly-doubling spaces


1
Proximity algorithms for nearly-doubling spaces
  • Lee-Ad Gottlieb
  • Robert Krauthgamer
  • Weizmann Institute

TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAA
2
Proximity problems
  • In arbitrary metric space, some proximity
    problems are hard
  • For example, the nearest neighbor search problem
    requires T(n) time
  • The doubling dimension
  • parameterizes the bad
  • case

q
1
1
1
1
1
3
Doubling Dimension
  • Definition Ball B(x,r) all points within
    distance r from x.
  • The doubling constant (of a metric M) is the
    minimum value gt0 such that every ball can be
    covered by balls of half the radius
  • First used by Ass-83, algorithmically by
    Cla-97.
  • The doubling dimension is dim(M)log (M)
    GKL-03
  • A metric is doubling if its doubling dimension is
    constant
  • Packing property of doubling spaces
  • A set with diameter D and min. inter-point
  • distance a, contains at most
  • (D/a)O(log) points

Here ?7.
4
Applications
  • In the past few years, many algorithmic tasks
    have been analyzed via the doubling dimension
  • For example, approximate nearest neighbor search
    can be executed in time O(1) log n
  • Some other algorithms analyzed via the doubling
    dimension
  • Nearest neighbor search KL-04, BKL-06, CG-06
  • Clustering Tal-04, ABS-08, FM-10
  • Spanner construction GGN-06, CG-06, DPP-06,
    GR-08
  • Routing KSW-04, Sil-05, AGGM-06, KRXY-07,
    KRX-08
  • Travelling Salesperson Tal-04
  • Machine learning BLL-09, GKK-10
  • Message This is an active line of research

5
Problem
  • Most algorithms developed for doubling spaces are
    not robust
  • Algorithmic guarantees dont hold for
    nearly-doubling spaces
  • If a small fraction of the working set possesses
    high doubling dimension, algorithmic performance
    degrades.
  • This problem motivates the following key task
  • Given an n-point set S and target dimension d
  • Remove from S the fewest number of points so that
    the remaining set has doubling dimension at most
    d

6
Two paradigms
  • How can removing a few bad points help? Two
    models
  • 1. Ignore the bad points
  • Outlier detection.
  • GHPT-05 cluster based on similarity, seek a
    large subset with low intrinsic dimension.
  • Algorithms with slack. Throw bad points into the
    slack
  • KRXY-07 gave a routing algorithm with
    guarantees for most of the input points.
  • FM-10 gave a kinetic clustering algorithm for
    most of the input points.
  • GKK-10 gave a machine learning algorithm
    small subset doesnt interfere with learning

7
Two paradigms
  • How can removing a few bad points help? Two
    models
  • 2. Tailor a different algorithm for the bad
    points
  • Example Spanner construction. A spanner is an
    edge subset of the full graph
  • Good points Low doubling dimension sparse
    spanner with nice properties (low stretch and
    degree)
  • Bad points Take the full graph
  • If the number of bad points is O(n.5), we have a
    spanner with O(n) edges

8
Results
  • Recall our key problem
  • Given an n-point set S and target dimension d
  • Remove from S the fewest number of points so that
    the remaining set has doubling dimension at most
    d
  • This problem is NP-hard
  • Even determining the doubling dimension of a
    point set exactly is NP-hard!
  • Proof on the next slide
  • But the doubling dimension can be approximated
    within a constant factor
  • Our contribution bicriteria approximation
    algorithm
  • In time 2O(d) n3, we remove a number of points
    arbitrarily close to optimal, while achieving
    doubling dimension 4d O(1)
  • We can also achieve near-linear runtime, at the
    cost of slightly higher dimension

9
Warm up
  • Lemma It is NP-hard to determine the doubling
    dimension of a set S
  • Reduction from vertex cover with bounded degree
    ? n½.
  • the size of any vertex cover is at least n½.
  • Construction A set S of n points corresponding
    to the vertex set V.
  • Let d(u,v) ½ if the cor. vertices are
    connected by an edge
  • Let d(u,v) 1 if the cor. vertices arent
    connected
  • Analysis
  • Any subset of S found in a ball of radius ½ has
    at most n½ points - degree of original graph
  • S is a ball of radius 1. The minimum covering of
    all of S with balls of radius ½ is equal to the
    minimum vertex cover of V.
  • Note reduction preserves hardness of
    approximation
  • Corollary It is NP-hard to determine if removing
    k points from S can leave a set with doubling
    dimension d.
  • So our problem is hard as well.

½
½
1
10
Bicriteria algorithm
  • Recall that he doubling constant (of a metric M)
    is
  • the minimum value gt0 such that every r-radius
    ball can be covered by balls of half the radius
  • Define the related notion of density constant as
  • the minimum value mgt0 such that every r-radius
    ball contains at most m points at mutual
    interpoint distance r/2
  • Nice property The density constant can only
    decrease under the removal of points, unlike the
    doubling constant.
  • We can show that
  • vm(S) (S) m(S)
  • its NP-hard to compute the density constant
  • (ratio-preserving reduction from independent set)

l2, m3
11
Bicriteria algorithm
  • We will give a bicriteria algorithm for the
    density constant. Problem statement
  • Given an n-point set S and target density
    constant m
  • Remove from S the fewest number of points so that
    the remaining set has density constant at most m
  • A bicriteria algorithm for the density constant
    is itself a bicriteria algorithm for the doubling
    constant
  • within a quadratic factor

12
Witness set
  • Given a set S, a subset S is a witness set for
    the density constant if
  • All points are at interpoint distance at least
    r/2
  • Note that S is a concise proof that the density
    constant of S is at least S
  • Theorem Fix a value mlt m(S). A witness set of S
    of size at least vm can be found in time 2O(m)
    n3
  • Proof outline
  • For each point p and radius r define the r-ball
    of p.
  • Greedily cover all points in the r-ball with
    disjoint balls of radius r/2.
  • Then cover all points in each r/2 ball with
    disjoint balls of radius r/4.
  • Since there exists in S a witness set of size
    m(S), there exists a p and r so that
  • either there are vm(S) r/2 balls, and these form
    a witness set, or
  • one r/2 ball covers vm(S) r/4 balls, and these
    form a witness set.

13
Bicriteria algorithm
  • Recall our problem
  • Given an n-point set S and target density
    constant m
  • Remove from S the fewest number of points so that
    the remaining set has density constant at most m
  • Our bricriteria solution
  • Let k be the true answer (the minimum number of
    points that must be removed).
  • We remove k c/(c-1) points and the remaining set
    has density constant c2m2

14
Bicriteria algorithm
  • Algorithm
  • Run the subroutine to identify a witness set of
    size at least cm
  • Remove it
  • Repeat
  • Analysis
  • The density constant of the resulting set is not
    greater than c2m2
  • since we terminated without finding a witness set
    of size at least cm
  • Every time a witness set of size wgtcm is
    removed by our algorithm, the optimal algorithm
    must remove at least w-m points
  • or else the true solution would have density
    constant greater than m
  • It follows that are algorithm removes k w/(w-m)
    lt kc/(c-1) points

15
Conclusion
  • We conclude that there exists a bicriteria
    algorithm for the density constant
  • We remove k c/(c-1) points and the remaining set
    has density constant c2m2
  • It follows that there exists a bricriteria
    algorithm for the doubling constant
  • We remove k c/(c-1) points and the remaining set
    has doubling constant c44
Write a Comment
User Comments (0)
About PowerShow.com