Succinct Data Structures: Upper, Lower - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Succinct Data Structures: Upper, Lower

Description:

So break tree into little hunks (say (1-e) lg n size), small enough to ... Hunks Lead to. Updates on binary trees (M., Raman & Storm), & more general trees ... – PowerPoint PPT presentation

Number of Views:173
Avg rating:3.0/5.0
Slides: 36
Provided by: ianm8
Category:

less

Transcript and Presenter's Notes

Title: Succinct Data Structures: Upper, Lower


1
Succinct Data Structures Upper, Lower Middle
Bounds
  • Ian Munro
  • University of Waterloo
  • Joint work with/of Arash Farzan, Alex Golynski,
    Meng He
  • How do we encode a large combinatorial object
    (e.g. a tree, string, graph, group)
  • even a static one
  • in a small amount of space still perform
    required operations in constant time ???

2
Example of a Succinct Data Structure The
(Static) Bounded Subset
  • Given Universe of n elements 0,...n-1
  • and m arbitrary elements from this universe
  • Create a static structure to support search in
    constant time (lg n bit word usual ops)
  • Using Essentially minimum possible bits
  • Operation Member query in O(1) time
  • (Brodnik M.)

3
Careful .. Lower Bounds
  • Beame-Fich Find largest less than i is tough in
    some ranges of m(e.g. m2 vlg n)
  • But OK if i is present this can be added (Raman,
    Raman, Rao)

4
Focus on Trees
.. Because Computer Science is .. Arbophilic -
Directories (Unix, all the rest) - Search trees
(B-trees, binary search trees, digital trees or
tries) - Graph structures (we do a tree based
search) - Search indices for text (including DNA)
5
A Big Patricia Trie / Suffix Trie
0
1
  • Given a large text file treat it as bit vector
  • Construct a trie with leaves pointing to unique
    locations in text that match path in trie
    (paths must start at character boundaries)
  • Skip the nodes where there is no branching (n-1
    internal nodes)

0
1
1 0 0 0 1 1
6
Space for Trees
  • Abstract data type binary tree
  • Size n-1 internal nodes, n leaves
  • Operations child, parent, subtree size, leaf
    data
  • Motivation Obvious representation of an n node
    tree takes about 6 n lg n words (up, left, right,
    size, memory manager, leaf reference)
  • i.e. full suffix tree takes about 5 or 6 times
    the space of suffix array (i.e. leaf references
    only)

7
Succinct Representations of Trees
  • Start with Jacobson, then others
  • There are about 4n/(pn)3/2 ordered rooted trees,
    and same number of binary trees
  • Lower bound on specifying is about 2n bits
  • What are the natural representations?

8
Arbitrary Ordered Trees
  • Use parenthesis notation
  • Represent the tree
  • As the binary string (((())())((())()()))
    traverse tree as ( for node, then subtrees,
    then )
  • Each node takes 2 bits

9
Heap-like Notation for a Binary Tree
Add external nodes Enumerate level by
level Store vector 11110111001000000 length
2n1 (Here dont know size of subtrees can be
overcome. Could use isomorphism to flip between
notations)
1
1
1
1
0
1
1
1
0
0
0
0
1
0
0
0
0
10
How do we Navigate?
  • Jacobsons key suggestionOperations on a bit
    vector
  • rank(x) 1s up to including x
  • select(x) position of xth 1
  • So in the binary tree
  • leftchild(x) 2 rank(x)
  • rightchild(x) 2 rank(x) 1
  • parent(x) select(?x/2?)

11
Rank Select
  • Rank Auxiliary storage 2nlglg n / lg n bits
  • 1s up to each (lg n)2 rd bit
  • 1s within these too each lg nth bit
  • Table lookup after that
  • Select More complicated (especially to get this
    lower order term) but similar notions
  • Key issue Rank Select take O(1) time with lg n
    bit word (M. et al)

12
Aside Dynamic Rank Select
  • Rank/Select Structures Raw data plus some
    cumulative arrays
  • Model We keep a finger at a position and can
    insert/delete change at that spot or move 1 spot
    left/right
  • When at position i maintain structures up to i
    and backwards from n down to i1.
  • Problem in most (tree) applications rank/select
    updates are all over

13
Lower Bound for Rank for Select
  • Theorem (Golynski) Given a bit vector of length
    n and an index (extra data) of size r bits, let
    t be the number of bits probed to perform rank
    (or select) then rO(n (lg t)/t).
  • Proof idea Argue to reconstructing the entire
    string with too few rank queries (similarly for
    select)
  • Corollary (Golynski) Under the lg n bit RAM
    model, an index of size ?(n lglg n/ lg n) is
    necessary and sufficient to perform the rank and
    the select operations.

14
More on Trees
  • Updating trees simple mapping plus rank/select
    does not work well
  • Other kinds of trees free trees (no root or
    ordering on children), a simple mapping may not
    exist
  • So break tree into little hunks (say (1-e) lg n
    size), small enough to explicitly keep in a
    table, with special constraints (e.g. few edges
    going out of a hunk)

15
More on Trees
  • Keep most nodes in these little hunks (or a
    couple of levels of hunk size classes), a limited
    number can be in a core tree with real pointers

16
Hunks Lead to
  • Updates on binary trees (M., Raman Storm),
    more general trees (Farzan M.)
  • Also representing
  • special classes of trees
  • optimally (Farzan M.)
  • e.g. free trees 1.56..n bits,
  • free binary trees 1.31..n bits

17
Other Combinatorial Objects
  • Planar Graphs (Lu et al, Barbay et al))
  • Permutations n? n
  • Or more generally
  • Functions n ? n But what operations?
  • Clearly p(i), but also p -1(i)
  • And then p k(i) and p -k(i)
  • Suffix Arrays (special permutations) in linear
    space
  • Arbitrary Graphs (Farzan M.)

18
Permutations Backpointer Notation
  • Let P be a simple array giving p Pi pi
  • Also have Bi be a pointer t positions back in
    (the cycle of) the permutation
  • Bi p-ti .. But only define B for every tth
    position in cycle. (t is a constant ignore cycle
    length round-off)
  • So array representation
  • P 8 4 12 5 13 x x 3 x 2 x 10 1
  • 1 2 3 4 5
    6 7 8 9 10 11 12 13

2
4
5
13
1
8
3
12
10
19
Representing Shortcuts
  • In a cycle there is a B every t positions
  • But these positions can be in arbitrary order
  • Which is have a B, and how do we store it?
  • Keep a vector of all positions 0 no B 1 B
  • Rank gives the position of Bi in B array
  • So p(i) p -1(i) in O(1) time (1e)n lg n
    bits
  • Theorem Under a pointer machine model with space
    (1 e) n references, we need time 1/e to answer p
    and p -1 queries i.e. this is as good as it gets
    in the pointer model.

20
Aside Extending to powers of p
  • Consider the cycles of p
  • ( 2 6 8)( 3 5 9 10)( 4 1 7)
  • Bit vector indicates start of each cycle
  • ( 2 6 8 3 5 9 10 4 1 7)
  • Ignore parens, view as new permutation, ?.
  • Note ?-1(i) is position containing i
  • So we have ? and ?-1 as before
  • Use ?-1(i) to find i, then bit vector (rank,
    select) to find pk or p-k

21
Aside Functions
  • Consider an arbitrary function, fn?n
  • Note f-1(i) is a set
  • All tree edges lead to a cycle
  • A function is just a hairy permutation
  • Deal with level ancestors, result holds

22
Back to p p-1 in Fewer Bits
  • This is the best we can do for O(1) operations
  • But using Benes networks
  • 1-Benes network is a 2 input/2 output switch
  • r1-Benes network join tops to tops
  • bits(n)2bits(n/2)nn lg n-n1minO(n)

1 2 3 4 5 6 7 8
3 5 7 8 1 6 4 2
R-Benes Network
R-Benes Network
23
A Benes Network
  • Realizing the permutation (std p(i) notation)
  • (3 5 7 8 1 6 4 2)
  • Note O(n) bits more than necessary

1 2 3 4 5 6 7 8
3 5 7 8 1 6 4 2
24
What can we do with it?
  • Divide into blocks of lg lg n gates encode
    their actions in a word. Taking advantage of
    regularity of address mechanism
  • and also
  • Modify approach to avoid power of 2 issue
  • Can trace a path in time O(lg n/(lg lg n)
  • Beats previous lower bound by using micro
    pointers

25
Backpointers Benes Both are Best
  • Recall Benes method violates the pointer
    machine lower bound by using micropointers.
  • Indeed With (a lot of) care, space required is
  • lg(n!) O(n (lg lg n)2/lg n) bits
  • But more general
  • Lower Bound (Golynski) Both methods are optimal
    for their respective extra space constraints

26
Permutation Lower Bound
  • Operations p(i), p-1(i) with times t and t
  • Backpointers natural index
  • Benes just a pile of bits, in lg n bit words
  • General Model memory (lg(n!)r bits in words
  • Lower bound r extra space O(lg n!/tt)
  • It works out both Backpointers and Benes are
    optimal

27
Proof of Lower Bound Model
  • Model Tree program
  • Separate tree for each p(i) or p-1(i)
  • Start at root, look at memory location (word)
    based on value required
  • At depth d take appropriate
  • branch based on which of n
  • values is read

28
Proof of Lower Bound Set up
  • Fix the permutation (for now)
  • Consider table of locations inspected at every
    step for every query

location
r 4 6 9 3 6 8
m 5 3 4 9 2
o 7 5 9 1 8
p 9 8 3 2 3
q 8 1 3 4 7
s 3 7 3 8
t 3 7 1 2 4
p(1) p(2) p(3) p(4) . p-1(n)
query
29
Proof of Lower Bound contd
  • Take the least used cell (over all queries for
    this permutation

location
r 4 6 9 3 6 8
m 5 3 4 9 2
o 7 5 9 1 8
p 9 8 3 2 3
q 8 1 3 4 7
s 3 7 3 8
t 3 7 1 2 4
p(1) p(2) p(3) p(4) . p-1(n)
query
30
Proof of Lower Bound contd
  • Take the least used cell (over all queries for
    this permutation
  • And NUKE (eliminate) it

location
r 4 6 9 3 6 8
m 5 3 4 9 2
o 7 5 9 1 8
p 9 8 3 2 3
q 8 1 3 4 7
s 3 7 3 8
t 3 7 1 2 4
p(1) p(2) p(3) p(4) . p-1(n)
query
31
Proof of Lower Bound contd
  • And continuing removing cells for a while ..
  • This means some queries may become unanswerable
    (no matter how many probes made) but other are
    still OK
  • e.g. removing a cell for p(6) (56) p-1 (56)
    (6) makes these unanswerable, versus cell for
    p(9) (52) but not p(52)-1 (9),
  • We do have to remember what we removed (though
    not the order)

32
Proof of Lower Bound Saving Space
  • So we save the space for the values we no
    longer need, but we do have to remember which
    are destroyed
  • d locations destroyed, order doesnt matter
  • d lg(n/d) bits used to say what is gone
  • But
  • d lg(n) bits saved

33
Proof of Lower Bound Finishing
  • Now some queries dont work
  • p(is) s1,..c p-1(js) s1,..c
  • We know is js but not their correspondence
  • encode it
  • After reduction we still need lg (n!) bits
    (averaging over all permutations)
  • So reduce to that point .. Do arithmetic, bound
    follows

34
Text Search Lower Bound
  • Key point reciprocal relation
  • Text search operations
  • F access substring length p starting in ip1,
    i0,n/p
  • I search(X,j) jth (aligned) occurrence of X
  • Theorem(Golynski) rtt O(np(lg s)2/?2)
  • rextra space in words salphabet ?word size
  • For lg n substring linear extra space needed
    same as Demaine Lopez-Ortiz, but better model

35
Conclusion
  • Interesting, and useful, combinatorial objects
    can be
  • Stored succinctly lower bound o()
  • So that
  • Natural queries are performed in O(1) time (or at
    least very close)
  • Indeed our o() terms are often optimal
  • But border on operations is subtle
Write a Comment
User Comments (0)
About PowerShow.com