Title: An improved succinct representation for dynamic kary trees
1An improved succinct representation for dynamic
k-ary trees
- Diego Arroyuelo
- Department of Computer Science, University of
Chile
Yahoo! Research Latin America
2k-ary trees (tries)
Every node has at most k children
Edges labeled with a symbol in the set 1,, k
(k is the alphabet size) (k is large enough to be
regarded as a constant)
Pointer-based representation requires O(nlog n)
bits
Applications text searching (suffix trees), text
compression (LZTries), DOM trees, etc.
3Succinct data structures
- A succinct data structure requires space close to
the information-theoretic lower bound - There are different
k-ary trees with n nodes - Therefore, the information-theoretical lower
bound is about
bits
4Succinct data structures
- We are interested in succinct representations
that can be navigated - We are interested in operations
- parent(x)
- child(x, i)
- child(x, a)
- depth(x)
- degree(x)
- subtree-size(x)
- preorder(x)
- insertions (in the leaves)
- deletions (in the leaves)
5Succinct tree representations (static)
- Succinct representations for static trees
- LOUDS Jacobson, FOCS89
- Balanced Parentheses MR, STOC97
- DFUDS Benoit et al., Algorithmica 2005
- xbw Ferragina et al., FOCS05
- Ultra succinct trees Jansson et al., SODA07
- These must be rebuilt from scratch upon insertion
or deletion of nodes
6Succinct tree representations (static)
DFUDS representation
2n o(n) bits
Constant-time operations
7Succinct tree representations (dynamic)
- The case of succinct dynamic trees was first
studied for binary trees only - Munro, Raman, and Storm SODA01 2n o(n) bit
- Raman and Rao ICALP03 2n o(n) bits
- k-ary trees basic navigation in O(k) time
- Chan et al. TALG 2007 2n nlog k o(nlog k)
bits - Operation times related to n rather than to k
(O(log n) time) - It cannot take advantage of asymptotically
smaller values of k e.g., k O(polylog(n))
We look to achieve o(log n) time whenever log k
o(log u)
8Our basic tree representation
- We incrementally divide the tree into disjoint
blocksMunro et al., Raman and Rao - Every block represents a connected component of N
nodes such that - Nmin N Nmax
- We arrange these blocks in a tree by adding
inter-block pointers (entire tree is tree of
subtrees)
9Our basic tree representation
Blocks are trees by themselves We only need to
update one block upon tree updates
10Our basic tree representation
- We define Nmin (minimum block size) as follows
- Inter-block pointers should use o(n) bits
- By choosing Nmin Q(log2n) we have O(n / log2n)
blocks(In general, Nmin Q(log n f(n)), for
f(n) w(1)) - In this way we have one pointer out of Q(log2n)
nodes in the worst case - And hence o(n) bits for pointers
11Defining block sizes
- We define Nmax (maximum block size) as follows
- In case of block overflow we must be able to
create a new block of size at least Nmin
Q(log2n) - In the worst case, the root of the block has its
k children, all of them having a subtree of the
same size - By choosing Nmax Q(klog2n) we solve this problem
Remark every time a block overflows, at least
one of the roots children has size at least
Q(log2n)
12Our basic tree representation
- The blocks cannot be as small as we would like
- We support dynamic operations on the tree by
- Dividing the tree into blocks (we only need to
rebuild a block upon updates) - Making these smaller trees dynamic (different to
other approaches) - We represent the tree topology Tp of blocks using
a dynamic DFUDS representation on top of Chan et
al.s TALG, 2007 - Basic navigation inside blocks in O(log N)
O(log(k log2n)) O(log k loglog n) (including
updates) - Overall, this require 2no(n) bits
13Representing the tree symbols
- We represent the symbols labeling the edges of
the tree in the following way
Tp ...((((()...
Overall nlog k O(n log k / log log k)
bits O(log N) time for operation childp(x, a)
14Representing the frontier of a block
- We need to indicate which nodes in a block have a
pointer to a child block - This can be done by using a bit vector
- However this would require 3no(n) bits overall
for the tree structure - We define array Fp storing the preorders of the
nodes having a child pointer - Since there are O(n/log2n) pointers, this
requires o(n) bits
15Representing the frontier of a block
Array Fp is represented in differential form with
a data structure for Searchable Partial Sums
O(log N) time
Tp (((())(()))((())))
Fp
We must update all preorders in Fp since this
position
16Solving the basic operations
- child(x, i)
- child(x,a)
- parent(x)
17Solving the basic operations
- Insert
- We use the corresponding insertion operation on
the block - When a block p becomes full
- Choose node z in block p with local subtree of
size at least Nmin - Reinsert the nodes in the subtree of z in a new
block q (time proportional to reinserted
subtree) - Delete the subtree of z from p(time proportional
to reinserted subtree) - To amortize the insertion cost, the overall
reinsertion process must be carried out in time
proportional to the size of the reinserted subtree
18Solving the basic operations
- Selecting the node to be reinserted
- We define a list of candidate nodes Cp for every
block p - maintaining (in preorder) the candidates to be
reinserted in a new block upon overflow - Cp must be dynamically maintained sampling nodes
of Tp such that - Every time p overflows, there must be at least
one candidate in Cp - The space for Cp data structures must be o(n)
bits overall
19Solving the basic operations
- Maintaining the list of candidate nodes
- Every time we descend in the tree we maintain the
last node z in block p whose subtree has size at
least Nmin - We add z to Cp whenever
- z is not the root of p, and
- There is no other candidate in the subtree of z
20Solving the basic operations
In this way we mark one out of Nmin nodes This
means o(n) bits for the candidates
Prospective candidate z
Cp
21Solving the basic operations
- Insert
- Every time a block overflows, Cp has at least one
candidate - The insertion cost is proportional to the size of
the reinserted subtree - As we have already paid to insert these nodes,
the total cost is
amortized
22Conclusions
- We have defined a representation for dynamic
k-ary trees requiring space close to the
information-theoretical lower bound - 2n nlog k o(nlog k) bits
- We can profit from small alphabets
- O(log k loglog n) time for operations
- In particular, O(loglog n) time for k
O(polylog(n)) - Versus O(log n) time of Chan et al.s for any
alphabet size
23Conclusions
- Corollary
- New trade-off for succinct dynamic binary trees
- faster updates O(loglog n) vs. O((loglog n)1e),
for e 0 - at the cost of slower navigations O(loglog n)
vs. O(1) - Our data structure works under standard model of
dynamic memory allocation (see the paper) - We can support more involved operations
24- Questions?
- Thanks for your attention