Streaming Algorithms for Geometric Problems - PowerPoint PPT Presentation

About This Presentation
Title:

Streaming Algorithms for Geometric Problems

Description:

Geometric Data Stream Algorithms as Data Structures. Data structures that support: ... The algorithms will maintain certain statistics over nP(.), which will allow it ... – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 40
Provided by: piotr2
Category:

less

Transcript and Presenter's Notes

Title: Streaming Algorithms for Geometric Problems


1
Streaming Algorithms for Geometric Problems
  • Piotr Indyk
  • MIT

2
Streaming The Model
  • Single pass over the data e1, e2, ,en
  • Bounded storage
  • Fast processing time per element

3
Recap Norm estimation
  • Norm estimation
  • Stream elements (i,b) , i1m
  • Interpretation xixib
  • Want to (1?)-approximate xp
  • Note the frequency moment Fp is a special case
  • Algorithms with (log n1/?)O(1) space
  • p2 Alon-Matias-Szegedy96
  • p?(0,2 Indyk00
  • Idea maintain Ax

4
Recap Other algorithms
  • Compute the number of distinct elements F0
  • Exactly ?(n) bits of space
  • (1?) -approximation O(1/?2 log n) bits
    Flajolet-Martin, JCSS85 ,
  • Compute the median
  • Exactly ?(n)
  • (50 ? ?) -approximation O(1/? polylog n)
    Paterson-Munro, TCS80 ,

5
Geometric Data Stream Algorithms as Data
Structures
  • Data structures that support
  • Insert(p) to P
  • Possibly Delete(p) from P
  • Compute-Some-Property(P)
  • Use space that is sub-linear in P

6
Dichotomy
  • Insertions Deletions
  • Randomized linear
  • mappings FM85,
  • AMS96
  • Randomized
  • Insertions
  • Merge and Reduce
  • MP80,
  • core-sets
  • Deterministic

7
Insertions and Deletions
8
Minimum Weight Bi-chromatic Matching
  • Estimate the cost of MWBM

9
Minimum Weight Matching
  • Estimate the cost of MWM

10
Minimum Spanning Tree
  • Estimate the cost of MST

11
Facility Location
  • Goal choose a set F of facilities to minimize
    the
  • sum of the distances to nearest facility plus
  • the number of facilities times f
  • Again, report the cost

12
Approach
  • Assume P?1?2
  • Reduce to vector problems
  • Impose square grids G0Gk, with side lengths
    20,21, , 2k , shifted at random.
  • For each square cell c in Gi, let nP(c) be the
    number of points from P in c.
  • The algorithms will maintain certain statistics
    over nP(.), which will allow it to approximately
    solve the problems

1
2
1
3
1
5
1
1
13
Estimators
  • MST ?i 2i ?c ?Gi nP(c)gt0
  • MWM ?i 2i ?c ?Gi nP(c) is odd
  • MWBM ?i 2i ?c ?Gi nG(c)-nB(c)
  • Fac. Loc. ?i 2i ?c ?Gi minnP(c), Ti
  • K-median ?i 2i ?c ?Gi - B(Q, 2i) nP(c)
  • (const. factor)
  • Maintain non-zero entries in nP FM85
  • Maintain L1 difference I00

14
Results Indyk04
? 1? Frahling-Indyk-Sohler05
Space (log ? log n)O(1)
follows from Charikar02,Indyk-Thaper03
15
Probabilistic embeddings into HSTs
T
1
2
1
3
1
5
1
1
  • Known Bartal96, Charikar-Chekuri-Goel-Guha-Plotk
    in98
  • p-q Dtree (p,q)
  • E Dtree(p,q) p-q O(log ?)

16
MST
1
2
1
3
1
5
1
1
  • ECost(MST in T) O(log ?) Cost(MST)
  • Cost(MST in T) ? Cost(T)
  • How to compute Cost(T) ?
  • Sum over all levels i, of the nodes at i, times
    2i
  • Node c exists iff ni(c)gt0

17
Matching
  • Algorithm
  • Match what you can at the current level
  • Odd leftovers wait for the next level
  • Repeat
  • Optimal on the HST
  • Cost?i 2i ?c ?Gi nP(c) is odd

1
0
1
1
1
0
1
1
0

18
Insertions-only
19
k-median/k-center
  • k is given
  • Goal choose k medians/centers to minimize
  • k-median the sum of the distances
  • k-center the max distance

20
Metric clustering problems
  • k-center Charikar-Chekuri-Feder-Motwani97
  • k-median Guha-Mishra-Motwani-OCallaghan00,
    Meyerson01, Charikar-OCallaghan-Panigrahy03
  • Bounds
  • Poly(K,log n) space
  • O(1)-approximation

21
Geometric Problems
  • Diameter, Minimum Enclosing Ball
    Agarwal-Har-Peled01, Feigenbaum-Kannan-Zhang02
    (Algorithmica), Hershberger-Suri04
  • K-center AHP01
  • K-median Har-Peled-Mazumdar04
  • Range searching via ?-approximations
  • Suri-Toth-Zhou04
  • Bagchi-Chaudhary-Eppstein-Goodrich04

22
Dominant Approach Merge and Reduce
  • Main ideas
  • Design an (off-line) algorithm that computes a
    sketch of the input
  • Small size
  • Sufficient to solve the problem
  • A sketch of sketches is a sketch

23
Tree Computation
p1
p2
p3
p4
p5
p7
p6
p8
p9
p10
p11
p12
p13
p15
p14
p16
24
Algorithm
  • Space (sketch size)log n
  • Time sketch computation time
  • Question Where do sketches come from ?

25
Idea I solutionsketch
  • Consider k-median
  • Guha-Mishra-Motwani-OCallaghan00
    approximate k-median of approximate weighted
    k-medians is an approximate k-median
  • Result
  • Constant depth tree
  • Space kn? , ?gt0
  • O(1) -approximation
  • Works for any metric space

3
2
1
3
2
1
k3
26
Use the solution, ctd.
  • ?-Approximations find a subset S?P , such that
    for any rectangle/halfspace/etc R,
  • R?S/S R?P/P ??
  • Matousek approximation of a union of
    approximations is an approximation
  • BCEG04 convert it into streaming algorithm,
    applications
  • ?1/?2 space
  • STZ04 better/optimal bounds for rectangles
    and halfspaces

27
Idea 2 Core-Sets AHP01
  • Assume we want to minimize CP(o)
  • S?P is an ?-core-set for P, if for any o, and a
    set T
  • CP?T (o) lt (1?) CS?T (o)
  • Note this must hold for all o, not just the
    optimal one
  • The latter is called a weak coreset
  • Badoiu-HarPeled-Indyk02

o
28
Example Core-set for MEB
  • Compute extremal points
  • Choose densely spaced direction v1 vk
  • I.e., for any u there is vi such that uvi
    u2 / (1?)
  • For each direction maintain extremal point
  • kO(1/?)(d-1)/2 suffice

29
Stream Algorithms via Core-sets
  • Diameter/MEB/width O(1/?)(d-1)/2 log n space
    AHPV010204
  • k-center O(k/?d) log n HP01
  • k-median O(k/?d) log n HPM04
  • (kdlog n1/?)O(1) Chen06
  • Faster algorithms and other results Chan04,
    Suri-Hershberger03

30
Core-sets Randomized Maps
  • (1?)-approximate k-median under insertions and
    deletions
  • Frahling-Sohler05

31
Insertions and Deletions, ctd
32
Histograms
  • View x as a function x1n ? 1M
  • Approximate it using piecewise constant function
    h, with B pieces (buckets)
  • Problem can be formulated in 2D as well (buckets
    become rectangular tiles)

33
Results 1D
  • Gilbert-Guha-Indyk-Kotidis-Muthukrishnan-Strauss,
    STOC02
  • Maintains h with B pieces such that
  • x-h2 (1?)x-hOPT2
  • Under increments/decrements of x
  • Space poly(B,1/?,log n)
  • Time poly(B,1/?,log n)

34
Results 2D
  • Thaper-Guha-Indyk-Koudas, SIGMOD02
  • Maintains h with B log (nM) tiles such that
  • x-h2 (1?)x-hOPT2
  • Under increments/decrements of x
  • Space/Update time poly(B,1/?,log n)
  • Histogram reconstruction time poly(B,1/?, n)
  • Muthukrishnan-Strauss, FSTTCS03
  • Maintains h with 4B tiles
  • Time poly(B,1/?, log(nM))

35
General Approach
  • Maintain sketches Ax of x
  • This allows us to estimate the error of any given
    h, via x-h ? Ax-Ah
  • Construct h
  • Enumeration
  • Greedy
  • Dynamic Programming

36
Conclusions
  • Algorithms for geometric data streams
  • Insertions-only merge and reduce
  • Insertions and deletions randomized linear
    embeddings

37
Open Problems
  • High dimensions
  • Diameter
  • 21/2-approx, O(d2 n1/2 ) space, follows from
    Goel-Indyk-Varadarajan, SODA01
  • c-approx, O( dn1/(c2 - 1) ) Indyk, SODA03
  • Conjecture ?21/2-approx, O(d polylog n) space
  • Min-width cylinder 18-approx, O(d) space
    Chan04
  • Other problems ?

38
Open Problems
  • Range queries
  • General lower bounds ? (Not just for ?-
    approximations)
  • ?(1/?2) -bit bound for general queries follows
    from LB for dot product Indyk-Woodruff, FOCS03
    , and is tight (for randomized algorithms)
  • What about e.g., half-space queries ? O(1/?4/3)
    is known STZ04
  • Other problems STZ04

39
Open Problems
  • Matchings, Facility Location, etc
  • Replace log ? by O(1) or even 1?
  • Possible for MST Frahling-Indyk-Sohler??
  • Related to computing bi-chromatic matching
    Agarwal-Varadarajan04
  • Min-sum clustering ?
Write a Comment
User Comments (0)
About PowerShow.com