Succinct Data Structures: Upper, Lower - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Succinct Data Structures: Upper, Lower

Description:

So break tree into little hunks (say (1-e) lg n size), small enough to ... Hunks Lead to. Updates on binary trees (M., Raman & Storm), & more general trees ... – PowerPoint PPT presentation

Number of Views:173

Avg rating:3.0/5.0

Slides: 36

Provided by: ianm8

Category:

more less

Transcript and Presenter's Notes

Title: Succinct Data Structures: Upper, Lower

1
Succinct Data Structures Upper, Lower Middle
Bounds

Ian Munro
University of Waterloo
Joint work with/of Arash Farzan, Alex Golynski,
Meng He
How do we encode a large combinatorial object
(e.g. a tree, string, graph, group)
even a static one
in a small amount of space still perform
required operations in constant time ???

2
Example of a Succinct Data Structure The
(Static) Bounded Subset

Given Universe of n elements 0,...n-1
and m arbitrary elements from this universe
Create a static structure to support search in
constant time (lg n bit word usual ops)
Using Essentially minimum possible bits
Operation Member query in O(1) time
(Brodnik M.)

3
Careful .. Lower Bounds

Beame-Fich Find largest less than i is tough in
some ranges of m(e.g. m2 vlg n)
But OK if i is present this can be added (Raman,
Raman, Rao)

4
Focus on Trees
.. Because Computer Science is .. Arbophilic -
Directories (Unix, all the rest) - Search trees
(B-trees, binary search trees, digital trees or
tries) - Graph structures (we do a tree based
search) - Search indices for text (including DNA)
5
A Big Patricia Trie / Suffix Trie
0
1

Given a large text file treat it as bit vector
Construct a trie with leaves pointing to unique
locations in text that match path in trie
(paths must start at character boundaries)
Skip the nodes where there is no branching (n-1
internal nodes)

0
1
1 0 0 0 1 1
6
Space for Trees

Abstract data type binary tree
Size n-1 internal nodes, n leaves
Operations child, parent, subtree size, leaf
data
Motivation Obvious representation of an n node
tree takes about 6 n lg n words (up, left, right,
size, memory manager, leaf reference)
i.e. full suffix tree takes about 5 or 6 times
the space of suffix array (i.e. leaf references
only)

7
Succinct Representations of Trees

Start with Jacobson, then others
There are about 4n/(pn)3/2 ordered rooted trees,
and same number of binary trees
Lower bound on specifying is about 2n bits
What are the natural representations?

8
Arbitrary Ordered Trees

Use parenthesis notation
Represent the tree
As the binary string (((())())((())()()))
traverse tree as ( for node, then subtrees,
then )
Each node takes 2 bits

9
Heap-like Notation for a Binary Tree
Add external nodes Enumerate level by
level Store vector 11110111001000000 length
2n1 (Here dont know size of subtrees can be
overcome. Could use isomorphism to flip between
notations)
1
1
1
1
0
1
1
1
0
0
0
0
1
0
0
0
0
10
How do we Navigate?

Jacobsons key suggestionOperations on a bit
vector
rank(x) 1s up to including x
select(x) position of xth 1
So in the binary tree
leftchild(x) 2 rank(x)
rightchild(x) 2 rank(x) 1
parent(x) select(?x/2?)

11
Rank Select

Rank Auxiliary storage 2nlglg n / lg n bits
1s up to each (lg n)2 rd bit
1s within these too each lg nth bit
Table lookup after that
Select More complicated (especially to get this
lower order term) but similar notions
Key issue Rank Select take O(1) time with lg n
bit word (M. et al)

12
Aside Dynamic Rank Select

Rank/Select Structures Raw data plus some
cumulative arrays
Model We keep a finger at a position and can
insert/delete change at that spot or move 1 spot
left/right
When at position i maintain structures up to i
and backwards from n down to i1.
Problem in most (tree) applications rank/select
updates are all over

13
Lower Bound for Rank for Select

Theorem (Golynski) Given a bit vector of length
n and an index (extra data) of size r bits, let
t be the number of bits probed to perform rank
(or select) then rO(n (lg t)/t).
Proof idea Argue to reconstructing the entire
string with too few rank queries (similarly for
select)
Corollary (Golynski) Under the lg n bit RAM
model, an index of size ?(n lglg n/ lg n) is
necessary and sufficient to perform the rank and
the select operations.

14
More on Trees

Updating trees simple mapping plus rank/select
does not work well
Other kinds of trees free trees (no root or
ordering on children), a simple mapping may not
exist
So break tree into little hunks (say (1-e) lg n
size), small enough to explicitly keep in a
table, with special constraints (e.g. few edges
going out of a hunk)

15
More on Trees

Keep most nodes in these little hunks (or a
couple of levels of hunk size classes), a limited
number can be in a core tree with real pointers

16
Hunks Lead to

Updates on binary trees (M., Raman Storm),
more general trees (Farzan M.)
Also representing
special classes of trees
optimally (Farzan M.)
e.g. free trees 1.56..n bits,
free binary trees 1.31..n bits

17
Other Combinatorial Objects

Planar Graphs (Lu et al, Barbay et al))
Permutations n? n
Or more generally
Functions n ? n But what operations?
Clearly p(i), but also p -1(i)
And then p k(i) and p -k(i)
Suffix Arrays (special permutations) in linear
space
Arbitrary Graphs (Farzan M.)

18
Permutations Backpointer Notation

Let P be a simple array giving p Pi pi
Also have Bi be a pointer t positions back in
(the cycle of) the permutation
Bi p-ti .. But only define B for every tth
position in cycle. (t is a constant ignore cycle
length round-off)
So array representation
P 8 4 12 5 13 x x 3 x 2 x 10 1
1 2 3 4 5
6 7 8 9 10 11 12 13

2
4
5
13
1
8
3
12
10
19
Representing Shortcuts

In a cycle there is a B every t positions
But these positions can be in arbitrary order
Which is have a B, and how do we store it?
Keep a vector of all positions 0 no B 1 B
Rank gives the position of Bi in B array
So p(i) p -1(i) in O(1) time (1e)n lg n
bits
Theorem Under a pointer machine model with space
(1 e) n references, we need time 1/e to answer p
and p -1 queries i.e. this is as good as it gets
in the pointer model.

20
Aside Extending to powers of p

Consider the cycles of p
( 2 6 8)( 3 5 9 10)( 4 1 7)
Bit vector indicates start of each cycle
( 2 6 8 3 5 9 10 4 1 7)
Ignore parens, view as new permutation, ?.
Note ?-1(i) is position containing i
So we have ? and ?-1 as before
Use ?-1(i) to find i, then bit vector (rank,
select) to find pk or p-k

21
Aside Functions

Consider an arbitrary function, fn?n
Note f-1(i) is a set
All tree edges lead to a cycle
A function is just a hairy permutation
Deal with level ancestors, result holds

22
Back to p p-1 in Fewer Bits

This is the best we can do for O(1) operations
But using Benes networks
1-Benes network is a 2 input/2 output switch
r1-Benes network join tops to tops
bits(n)2bits(n/2)nn lg n-n1minO(n)

1 2 3 4 5 6 7 8
3 5 7 8 1 6 4 2
R-Benes Network
R-Benes Network
23
A Benes Network

Realizing the permutation (std p(i) notation)
(3 5 7 8 1 6 4 2)
Note O(n) bits more than necessary

1 2 3 4 5 6 7 8
3 5 7 8 1 6 4 2
24
What can we do with it?

Divide into blocks of lg lg n gates encode
their actions in a word. Taking advantage of
regularity of address mechanism
and also
Modify approach to avoid power of 2 issue
Can trace a path in time O(lg n/(lg lg n)
Beats previous lower bound by using micro
pointers

25
Backpointers Benes Both are Best

Recall Benes method violates the pointer
machine lower bound by using micropointers.
Indeed With (a lot of) care, space required is
lg(n!) O(n (lg lg n)2/lg n) bits
But more general
Lower Bound (Golynski) Both methods are optimal
for their respective extra space constraints

26
Permutation Lower Bound

Operations p(i), p-1(i) with times t and t
Backpointers natural index
Benes just a pile of bits, in lg n bit words
General Model memory (lg(n!)r bits in words
Lower bound r extra space O(lg n!/tt)
It works out both Backpointers and Benes are
optimal

27
Proof of Lower Bound Model

Model Tree program
Separate tree for each p(i) or p-1(i)
Start at root, look at memory location (word)
based on value required
At depth d take appropriate
branch based on which of n
values is read

28
Proof of Lower Bound Set up

Fix the permutation (for now)
Consider table of locations inspected at every
step for every query

location
r 4 6 9 3 6 8
m 5 3 4 9 2
o 7 5 9 1 8
p 9 8 3 2 3
q 8 1 3 4 7
s 3 7 3 8
t 3 7 1 2 4
p(1) p(2) p(3) p(4) . p-1(n)
query
29
Proof of Lower Bound contd

Take the least used cell (over all queries for
this permutation

location
r 4 6 9 3 6 8
m 5 3 4 9 2
o 7 5 9 1 8
p 9 8 3 2 3
q 8 1 3 4 7
s 3 7 3 8
t 3 7 1 2 4
p(1) p(2) p(3) p(4) . p-1(n)
query
30
Proof of Lower Bound contd

Take the least used cell (over all queries for
this permutation
And NUKE (eliminate) it

location
r 4 6 9 3 6 8
m 5 3 4 9 2
o 7 5 9 1 8
p 9 8 3 2 3
q 8 1 3 4 7
s 3 7 3 8
t 3 7 1 2 4
p(1) p(2) p(3) p(4) . p-1(n)
query
31
Proof of Lower Bound contd

And continuing removing cells for a while ..
This means some queries may become unanswerable
(no matter how many probes made) but other are
still OK
e.g. removing a cell for p(6) (56) p-1 (56)
(6) makes these unanswerable, versus cell for
p(9) (52) but not p(52)-1 (9),
We do have to remember what we removed (though
not the order)

32
Proof of Lower Bound Saving Space

So we save the space for the values we no
longer need, but we do have to remember which
are destroyed
d locations destroyed, order doesnt matter
d lg(n/d) bits used to say what is gone
But
d lg(n) bits saved

33
Proof of Lower Bound Finishing

Now some queries dont work
p(is) s1,..c p-1(js) s1,..c
We know is js but not their correspondence
encode it
After reduction we still need lg (n!) bits
(averaging over all permutations)
So reduce to that point .. Do arithmetic, bound
follows

34
Text Search Lower Bound

Key point reciprocal relation
Text search operations
F access substring length p starting in ip1,
i0,n/p
I search(X,j) jth (aligned) occurrence of X
Theorem(Golynski) rtt O(np(lg s)2/?2)
rextra space in words salphabet ?word size
For lg n substring linear extra space needed
same as Demaine Lopez-Ortiz, but better model

35
Conclusion