Title: Lower Bound Techniques for Data Structures
1Lower Bound Techniquesfor Data Structures
Mihai Patra?cu
- Committee
- Erik Demaine (advisor)
- Piotr Indyk
- Mikkel Thorup
2Data Structures
- I dont study stacks, queues and binary search
trees! - I do study data structure problems (a.k.a.
Abstract Data Types) -
partial-sums problem
Preprocess T n numbers pred(q) max y
? T y lt q
predecessor search
Maintain an array An under update(i, ?)
Ai ? sum(i) return A0 Ai
3Motivation?
packet forwarding
partial-sums problem
Preprocess T n numbers pred(q) max y
? T y lt q
predecessor search
Maintain an array An under update(i, ?)
Ai ? sum(i) return A0 Ai
4Binary Search Trees Upper Bound
Binary search trees solve predecessor
search gt Complexity of
predecessor O(lg n)/operation
my work
Augmented binary search trees solve partial
sums gt Complexity of partial sums O(lg
n)/operation
my work
partial-sums problem
Preprocess T n numbers pred(q) max y
? T y lt q
predecessor search
Maintain an array An under update(i, ?)
Ai ? sum(i) return A0 Ai
5What kind of lower bound?
Lower bounds you can trust.TM
- Model of computation real computers
- memory words of w gt lg n bits (pointers words)
- random access to memory
- any operation on CPU registers (arithmetic,
bitwise) - Just prove lower bound on memory accesses
Array Mem1.. S of w-bit words
Black box
6Why Data Structures?
I want to understand computation.
- Other settings
- streaming L.B. many ? not very
computational mostly storage / info thy - space-bounded (P vs L)
- L.B. a few, O(n vlg n) ? unnatural
questions - algebraic L.B. some ? cool, but not
real computing - depth 3 circuits with mod-6 gates ??
- The gospel
- data structures L.B. some ?
understand some nontrivial computational
phenomena - efficient algorithms circuit L.B. not
forthcoming - hard optimization ? NP-completeness
L.B. one per STOC/FOCS ?
7Why Data Structures?
I want to understand computation.
- Other settings
- streaming L.B. many ? not very
computational mostly storage / info thy - space-bounded (P vs L)
- L.B. a few, O(n vlg n) ? unnatural
questions - algebraic L.B. some ? cool, but not
real computing - depth 3 circuits with mod-6 gates ??
- The gospel
- data structures L.B. some ?
understand some nontrivial computational
phenomena - efficient algorithms circuit L.B. not
forthcoming - hard optimization ? NP-completeness
L.B. one per STOC/FOCS ?
Weak as some of the lower bounds may be, its the
area that has gotten farthest towards
understanding computation
8History
- Yao, FOCS78
- Ajtai88 -- predecessor (static)
-
-
-
-
-
-
- Fredman, Saks89 -- partial sums, union
find (dynamic) -
Omitted bounds for succinct data structures.
- Observations
- huge influence
- 2nd papers
- result wrong (better upper bound known)
- no journal version many claims without proof
9History
- Yao, FOCS78
- Ajtai88 -- predecessor (static)
- Bing Xiao, Stanford92
- Miltersen STOC94
- Miltersen, Nisan, Safra, Wigderson STOC95
- Beame, Fich STOC99
- Sen ICALP01
- (1e)-nearest neighbor Chakrabarti,
Chazelle, Gum, Lvov STOC99
Chakrabarti, Regev FOCS04 - Fredman, Saks89 -- partial sums, union
find (dynamic) - Ben-Amram, Galil FOCS91
- Miltersen, Subramanian, Vitter, Tamassia93
- Husfeldt, Rauhe, Skyum96
- Fredman, Henzinger98 planar connectivity
- Husfeldt, Rauhe ICALP98 nondeterminism
- Alstrup, Husfeldt, Rauhe FOCS98 marked
ancestor - Alstrup, Husfeldt , Rauhe SODA01 dynamic
2D NN - Alstrup, Ben-Amram, Rauhe STOC99 union-find
Omitted bounds for succinct data structures.
richness lower bounds Borodin,
Ostrovsky, Rabani STOC99 p.m. Barkol, Rabani
STOC00 rand. NN Jayram,Khot,Kumar,Rabani
STOC03 p.m. Liu04 det. ANN
10Three Main Ideas
- Yao, FOCS78
- Ajtai88 -- predecessor (static)
- Bing Xiao, Stanford92
- Miltersen STOC94
- Miltersen, Nisan, Safra, Wigderson STOC95
- Beame, Fich STOC99
- Sen ICALP01
- (1e)-nearest neighbor Chakrabarti,
Chazelle, Gum, Lvov STOC99
Chakrabarti, Regev FOCS04 - Fredman, Saks89 -- partial sums, union
find (dynamic) - Ben-Amram, Galil FOCS91
- Miltersen, Subramanian, Vitter, Tamassia93
- Husfeldt, Rauhe, Skyum96
- Fredman, Henzinger98 planar connectivity
- Husfeldt, Rauhe ICALP98 nondeterminism
- Alstrup, Husfeldt, Rauhe FOCS98 marked
ancestor - Alstrup, Husfeldt , Rauhe SODA01 dynamic
2D NN - Alstrup, Ben-Amram, Rauhe STOC99 union-find
richness lower bounds Borodin,
Ostrovsky, Rabani STOC99 p.m. Barkol, Rabani
STOC00 rand. NN Jayram,Khot,Kumar,Rabani
STOC03 p.m. Liu04 det. ANN
3. Round Elimination
2. Asym. Communication, Rectangles
1. Epochs
11Three Main Ideas
- Yao, FOCS78
- Ajtai88 -- predecessor (static)
- Bing Xiao, Stanford92
- Miltersen STOC94
- Miltersen, Nisan, Safra, Wigderson STOC95
- Beame, Fich STOC99
- Sen ICALP01
- (1e)-nearest neighbor Chakrabarti,
Chazelle, Gum, Lvov STOC99
Chakrabarti, Regev FOCS04 - Fredman, Saks89 -- partial sums, union
find (dynamic) - Ben-Amram, Galil FOCS91
- Miltersen, Subramanian, Vitter, Tamassia93
- Husfeldt, Rauhe, Skyum96
- Fredman, Henzinger98 planar connectivity
- Husfeldt, Rauhe ICALP98 nondeterminism
- Alstrup, Husfeldt, Rauhe FOCS98 marked
ancestor - Alstrup, Husfeldt , Rauhe SODA01 dynamic
2D NN - Alstrup, Ben-Amram, Rauhe STOC99 union-find
richness lower bounds Borodin,
Ostrovsky, Rabani STOC99 p.m. Barkol, Rabani
STOC00 rand. NN Jayram,Khot,Kumar,Rabani
STOC03 p.m. Liu04 det. ANN
3. Round Elimination
2. Asym. Communication, Rectangles
1. Epochs
12Review Epoch Lower Bounds
time
update mark/unmark node tu query marked
ancestors? tq
updates r3 r2 r1 r0
bits written tuwr3 tuwr2 tuwr tuw
- epoch j rj updates
- epochs 0, .., j-1 write O(tuwrj-1) bits
- pick r gtgt tuw
most updates from epoch j not known outside
epoch j
random query needs to read a cell from epoch j
tq ?(lg n / lg r) ?(lg n / lg(tuw))
max tq , tu ?(lg n / lglg n)
13Review Epoch Lower Bounds
See also Fredman JACM 81 Fredman JACM
82 Yao SICOMP 85 Fredman, Saks STOC
89 Ben-Amram, Galil FOCS 91 Hampapuram,
Fredman FOCS 93 Chazelle STOC
95 Husfeldt, Rauhe, Skyum SWAT
96 Husfeldt, Rauhe ICALP 98 Alstrup,
Husfeldt, Rauhe FOCS 98
- Big Challenges Miltersen99
- prove some ?(lg n/lglg n) bound Candidate ?(lg
n) for the partial sums problem - prove ?(lg n) in the bit-probe model
Maintain an array An underupdate(i, ?)
Ai ?sum(i) return A0 Ai
14Our contribution
- P., Demaine SODA04 ?(lg n) for partial sums
- P., Demaine STOC04 ?(lg n) for dynamic trees,
etc. - very simple proof not based on epochs
- P., Tarnita ICALP05 ?(lg n) via epoch
argument!! - gt ?(lg2n/lg2lg n) in the bit-probe model
Best Student Paper
15?(lg n) via Epoch Arguments?
j
- Old information about epoch j outside j
- cells written by epochs 0, .., j-1
- O(turj-1)
16?(lg n) via Epoch Arguments?
j
- New information about epoch j outside j
- cells read by epochs 0, .., j-1 from epoch
j - still O(turj-1) in the worst case ?
Foil worst-case by randomizing epoch construction!
17?(lg n) via Epoch Arguments?
cells read by epochs 0, .., j-1 from epoch
j O((tu / epochs) rj-1) on average gt
max tu, tq ?(lg n)
Foil worst-case by randomizing epoch construction!
18The Very Simple ?(lg n) Proof
19Maintain an array An under update(i, ?)
Ai ? sum(i) return A0 Ai
- The hard instance
- p random permutation
- for t 1 to nquery sum(p(t))?t
rand()update(p(t), ?t)
20time
21?8
?7
?9
?1?5?3?7?2
?1
?1?5?3
How much information needs to be transferred?
?1?5?3?7?2 ?8 ?4
time
At least ?5 , ?5?7 , ?5?7?8 gt i.e. at
least 3 words (random values incompressible)
22The general principle
- Lower bound down arrows
- How many down arrows? (in expectation)
- (2k-1) Pr Pr
- (2k-1) ½ ½ ?(k)
k operations
k operations
23Recap
Communication between periods of k items ?(k)
?(k)
24Putting it all together
aaaa
?(n/8)
?(n/4)
?(n/8)
?(n/2)
?(n/8)
?(n/4)
?(n/8)
time
25Q.E.D.
- Augmented binary search trees are optimal.
- First ?(lg n) for any dynamic data structure.
26Three Main Ideas
- Yao, FOCS78
- Ajtai88 -- predecessor (static)
- Bing Xiao, Stanford92
- Miltersen STOC94
- Miltersen, Nisan, Safra, Wigderson STOC95
- Beame, Fich STOC99
- Sen ICALP01
- (1e)-nearest neighbor Chakrabarti,
Chazelle, Gum, Lvov STOC99
Chakrabarti, Regev FOCS04 - Fredman, Saks89 -- partial sums, union
find (dynamic) - Ben-Amram, Galil FOCS91
- Miltersen, Subramanian, Vitter, Tamassia93
- Husfeldt, Rauhe, Skyum96
- Fredman, Henzinger98 planar connectivity
- Husfeldt, Rauhe ICALP98 nondeterminism
- Alstrup, Husfeldt, Rauhe FOCS98 marked
ancestor - Alstrup, Husfeldt , Rauhe SODA01 dynamic
2D NN - Alstrup, Ben-Amram, Rauhe STOC99 union-find
richness lower bounds Borodin,
Ostrovsky, Rabani STOC99 p.m. Barkol, Rabani
STOC00 rand. NN Jayram,Khot,Kumar,Rabani
STOC03 p.m. Liu04 det. ANN
3. Round Elimination
2. Asym. Communication, Rectangles
2. Asym. Communication, Rectangles
1. Epochs
27Review Communication Complexity
28Review Communication Complexity
lg S bits
w bits
lg S bits
w bits
database gt space S
query(a,b,c)
Traditional communication complexity total
bits communicated X gt tq(lg S w) X
gt tq ?(X/w) But wait! X CPU input O(w)
29Review Communication Complexity
lg S bits
w bits
lg S bits
w bits
database gt space S
query(a,b,c)
Asymmetric communication complexity either
Alice sends A bits or Bob sends B bits gt
either tqlg S A or tqw B gt tq min
A/lg S, B/w
30Richness Lower Bounds
Prove either Alice sends A bits or Bob sends
B bits Assume Alice sends o(A), Bob sends o(B)
gt big monochromatic rectangle Show any big
rectangle is bichromatic (standard idea in comm.
complex.)
Bob
output1
1/2o(A)
Alice
1/2o(B)
Example Alice --gt q ? 0,1d Bob --gt Sn points
in 0,1d Goal find argminx?S x-q
2 Barkol, Rabani A?(d), B?(n1-e) gt
tq min d/lg S, n1-e/w
31Richness Lower Bounds
- upper bound either
- exponential space
- near-linear query time
What does this really mean? optimal space
lower bound for constant query time
tq
n1-o(1)
T(d/lg n)
1
S
lower bound S 2O(d/tq)
T(n)
2T(d)
Example Alice --gt q ? 0,1d Bob --gt Sn points
in 0,1d Goal find argminx?S x-q
2 Barkol, Rabani A?(d), B?(n1-e) gt
tq min d/lg S, n1-e/w
Also optimal lower bound for decision trees
32Results
- Partial match -- database of n strings in
0,1d, query ? 0,1,d Borodin,
Ostrovsky, Rabani STOC99 Jayram,Khot,Kumar,R
abani STOC03 A ?(d/lg n) - P. FOCS08 A ?(d)
- Nearest Neighbor on hypercube (l1, l2)
- deterministic ?-approximate Liu04 A
?(d/ ?2) - randomized exact Barkol, Rabani
STOC00 A ?(d) - rand. (1e)-approx Andoni, Indyk, P.
FOCS06 A ?(e-2lg n) - Johnson-Lindenstrauss space is optimal!
- Approximate Nearest Neighbor in l8
- Andoni, Croitoru, P. FOCS08 Indyk FOCS98
is optimal!
simplify
33Limits of Communication Approach
tq
branchingprograms
n1-o(1)
T(d/lg d)
T(d/lg n)
1
S
T(n)
2T(d)
Alice must send ?(A) bits gt tq ?(A / lg S)
gt tq ?(A / lg(Sd/n))
No separation between SO(n) and SnO(1) !
Separation of ?(lg n / lglg n) between SO(n)
and SnO(1) !
34Richness Gets You More
- CPU(s) --gt memory communication
- one query lg S
- k queries lg ( )T(k lg )
S k
S k
35Richness Gets You More
- CPU(s) --gt memory communication
- one query lg S
- k queries lg ( )T(k lg )
S k
S k
36Richness Gets You More
- CPU(s) --gt memory communication
- one query lg S
- k queries lg ( )T(k lg )
S k
S k
Direct Sum
Any richness lower bound Alice must send A or
Bob must send B gt kAlice must send kA
or kBob must send kB
37Richness Gets You More
- CPU(s) --gt memory communication
- one query lg S
- k queries lg ( )T(k lg )
tq ?(A / lg(S/k))
S k
S k
Direct Sum
Any richness lower bound Alice must send A or
Bob must send B gt kAlice must send kA
or kBob must send kB
38Three Main Ideas
- Yao, FOCS78
- Ajtai88 -- predecessor Bing
Xiao, Stanford92 Miltersen STOC94
Miltersen, Nisan, Safra, Wigderson
STOC95 Beame, Fich STOC99 Sen
ICALP01 - (1e)-nearest neighbor Chakrabarti,
Chazelle, Gum, Lvov STOC99
Chakrabarti, Regev FOCS04 - Fredman, Saks89 - partial sums, union
find - Ben-Amram, Galil FOCS91
Miltersen, Subramanian, Vitter, Tamassia93
Husfeldt, Rauhe, Skyum96 Fredman,
Henzinger98 planar connectivity
Husfeldt, Rauhe ICALP98 nondeterminism
Alstrup, Husfeldt, Rauhe FOCS98 marked
ancestor Alstrup, Husfeldt , Rauhe
SODA01 dynamic 2D NN Alstrup,
Ben-Amram, Rauhe STOC99 union-find
richness lower bounds Borodin,
Ostrovsky, Rabani STOC99 p.m. Barkol, Rabani
STOC00 rand. NN Jayram,Khot,Kumar,Rabani
STOC03 p.m. Liu04 det. ANN
3. Round Elimination
2. Asym. Communication, Rectangles
1. Epochs
4. Range Queries
39Open Hunting Season
- Nice trick, but ?(lg n / lglg n) with O(n polylg
n) space not impressive argument for curse of
dimensionality - But space n1o(1) is hugely important in data
structures gt open hunting season for range
queries etc.
2D range counting
SELECT count() FROM employees WHERE salary lt
70000 AND startdate lt 1998
40Open Hunting Season
P. STOC07 ?(lg n / lglg n) with O(n polylg n)
space N.B. tight! 1st bound beyond the
semigroup model question from Fredman
JACM82 Chazelle FOCS86
2D range counting
SELECT count() FROM employees WHERE salary lt
70000 AND startdate lt 1998
41The Power of Reductions
2D stabbing
Preprocess Sn rectangles stab(x,y) is
(x,y) inside some R?S?
routing ACLs dispatching in some OO languages
2D range counting
SELECT count() FROM employees WHERE salary lt
70000 AND startdate lt 1998
42The Power of Reductions
-1
1
1
-1
2D stabbing
1
-1
-1
1
-1
Preprocess Sn rectangles stab(x,y) is
(x,y) inside some R?S?
1
1
-1
1
-1
1
-1
2D range counting
SELECT count() FROM employees WHERE salary lt
70000 AND startdate lt 1998
43The Power of Reductions
2D stabbing
Preprocess Sn rectangles stab(x,y) is
(x,y) inside some R?S?
reachability oracles in butterfly graph
Preprocess G subgraph of butterfly
reachable(x,y) is there a path x-gty ?
44The Power of Reductions
2D stabbing
Preprocess Sn rectangles stab(x,y) is
(x,y) inside some R?S?
reachability oracles in butterfly graph
Preprocess G subgraph of butterfly
reachable(x,y) is there a path x-gty ?
45The Power of Reductions
Lopsided Set Disjointness
Alice set S Bob set Tare S and T disjoint?
Hint S one edge out of every node gt n
queries from 1st to last level T deleted
edgesS disjoint from T gt all queries yes
reachability oracles in butterfly graph
Preprocess G subgraph of butterfly
reachable(x,y) is there a path x-gty ?
46Reachability in Butterfly??
marked ancestor problem
update(node) (un)mark node query(leaf) any
marked ancestor?
47lopsided set disjointness (LSD)
reachability oracles in the butterfly
partial match
(1e)-ANN l1, l2
NN in l1, l2
dyn. marked ancestor
3-ANN in l8
2D stabbing
worst-case union-find
dyn. trees, graphs
4D reporting
2D counting
dyn. 1D stabbing
P. FOCS08
partial sums
dyn. 2D reporting
dyn. NN in 2D
48Three Main Ideas
- Yao, FOCS78
- Ajtai88 -- predecessor Bing
Xiao, Stanford92 Miltersen STOC94
Miltersen, Nisan, Safra, Wigderson
STOC95 Beame, Fich STOC99 Sen
ICALP01 - (1e)-nearest neighbor Chakrabarti,
Chazelle, Gum, Lvov STOC99
Chakrabarti, Regev FOCS04 - Fredman, Saks89 - partial sums, union
find - Ben-Amram, Galil FOCS91
Miltersen, Subramanian, Vitter, Tamassia93
Husfeldt, Rauhe, Skyum96 Fredman,
Henzinger98 planar connectivity
Husfeldt, Rauhe ICALP98 nondeterminism
Alstrup, Husfeldt, Rauhe FOCS98 marked
ancestor Alstrup, Husfeldt , Rauhe
SODA01 dynamic 2D NN Alstrup,
Ben-Amram, Rauhe STOC99 union-find
richness lower bounds Borodin,
Ostrovsky, Rabani STOC99 p.m. Barkol, Rabani
STOC00 rand. NN Jayram,Khot,Kumar,Rabani
STOC03 p.m. Liu04 det. ANN
3. Round Elimination
3. Round Elimination
2. Asym. Communication, Rectangles
1. Epochs
4. Range Queries
49Packet Forwarding/ Predecessor Search
- Preprocess n prefixes of w bits
- ? make a hash-table H with all prefixes of
prefixes - ? HO(nw), can be reduced to O(n)
- Given w-bit IP, find longest matching prefix
- ? binary search for longest l such that IP0 l
? H - van Emde Boas FOCS75
- Waldvogel, Varghese, Turener, Plattner
SIGCOMM97 - Degermark, Brodnik, Carlsson, Pink SIGCOMM97
- Afek, Bremler-Barr, Har-Peled SIGCOMM99
O(lg w)
50Review Round Elimination
hi lo
hash(hi)
0/1
I want to talk to Alice
0 continue searching for pred(hi) 1
continue searching for pred(lo)
i
1
o(k) bits
Message has negligible info about the typical
i gt can be eliminated for fixed i
2
k
51The Lemma
- Observe cant work worst-case!
- Traditional fix introduce 2-sided error
- Think outside the
- easy proof with a different error model
P.-Thorup, STOC06
52The Model
- Alice, Bob receive inputs
- they may reject inputs
- if they accept, they start communicating and
must produce a correct output - The point error probability ½ is
trivial reject probability 0.99999 is still
hard - We regret to inform you that your input has not
been accepted for communication. We receive a
large number of inputs, many of them of high
quality, and scheduling constraints unfortunately
make it impossible to accept all of them.
53The Proof
x1
- Trie for Alices input (x1, , xk)
- leaves message sent
- node set of msgs in subtree
- Say msg size is mk/2
- leaf1, root2k/2 gt (?)root-to-leaf path,
½ of nodes have node ½parent - averaging over node-child pairs gt (?) node ½
its children have childgt ½node - thus (?) msg M ¼ of children have M ? child
m1,m2, m3
x2
m1,m2
x3
m1
m3
m2
m1
m3
m1
m1
m2
m2
fix i, x1, , xi-1
fixed message (eliminate)
reject ¾ of inputs (xi)
54Predecessor Search Timeline
- after van Emde Boas FOCS75 O(lg w) has to
be tight! - Beame, Fich STOC99 slightly better bound
with O(n2) space must improve the algorithm
for O(n) space! - P., Thorup STOC06 tight ?(lg w) for space
O(n polylg n) !
Idea consider multiple queries prove round
elimination under direct sum
55Predecessor Search Timeline
I want to talk to Alice
2
1
k
2
I want to talk to Alice
1
k
2
2
1
I want to talk to Alice
2
2
1
k
Idea consider multiple queries prove round
elimination under direct sum
56Round Eliminated!
57The End
Questions?
58(No Transcript)
59The Partial Sums Problem
Textbook solution augmented binary search
trees Running time O(lg n) / operation
Maintain an array An underupdate(i, ?)
Ai ?sum(i) return A0 Ai