Title: Practical EntropyCompressed RankSelect Dictionary
1Practical Entropy-Compressed Rank/Select
Dictionary
ALENEX 2007_at_New Orleans
- Daisuke OkanoharaUniversity of TokyoKunihiko
SadakaneKyushu University
2Abstract
- We propose four novel practical
entropy-compressed Rank/Select dictionary - esp, recrank, vcode, sdarray
- This is the first study of practical
implementation of entropy-compressed Rank/Select
dictionary - Fundamental tool for succinct data structures
3PreliminaryRank/Select Dictionary
- Rank/Select dictionaries are data structure for
an ordered set S ? 0,1,,n-1 to support the
following operations - rank (x, S) of elements in S which are no
greater than x - select (i, S) the position of i-th smallest
element in S - S is represented by a bit array B0n-1s.t.
Bi 1 if i ? S, Bi 0 otherwise. - rank (x, B) of 1 in B0x
- select (i, B) the position of i-th 1 from the
left in B
4Example of Rank/Select Dictionary
- rank (x, B)
- of 1 in B0x
- select (i, B)
- The position of i-th 1 from the left in B.
S 1,3,4,7,10
B
2
3
4
1
rank (6, B) 3
select (4, B) 7
5Size issue
- Given a bit array B of length n with m ones
- A bit array representation requires n bits
- This is much larger than the optimal one when m
ltlt n - B is called sparse if m n and dense if m ? n/2
- The lower-bound of the size of B is
- This can be approximated by nH0(B)
- H0(B)?1 is the 0-th order empirical entropy of B
- When m n, nH0(B) is close to
- E.g. mn/64, nH0(B)/n ? 0.12
p m/n
lg log2
6Existing implementation of Rank/Select dictionary
- Theoretical study
- O(1) time, no(n) bits J. I. Munro 96
- O(1) time, nH0(B)o(n) bits R. Raman et al. 02
- Practical implementation
- O(1) time, no(n) bits Kim et al. 05 R.
Gonzalez et al. 05 - o(n) time, gap(B)o(n) bits A. Gupta et al. 06
- gap(B) ?i lg (select(i1,B)-select(i,B))
- Our result O(1) time, nH0(B) o(n) bits in
practice
7For practical entropy-compressed Rank/Select
dictionary
R00
R19
R213
R314
R419
011100100000
000000100000
010101100001
010100000001
011111111100
P00
Pointer information
P18
P212
P316
P422
- Basic Implementation O(1) time nH0(B) o(n)
bits - Partition B into blocks and then each block is
compressed by enumerative code T. Cover 73. - Store rank-directory (results of rank) in the
boundaries of blocks - Problem
- Since compressed sizes are not constant, we need
the pointer information (o(n) term), whose size
is as large as nH0(B) bits - Solution
- Estimate Pointer information esp
- Convert a sparse-array into dense ones recrank
- Select-oriented data structure vcode, sdarray
8Summary of our data structures
o(n) terms in sarray and darray are O(1) in
almost all cases
9Method 1 ESPEStimation of Pointer information
- Idea Dont have pointer information, but
estimate it from rank information - Define L(B) be the length of the code word for B
using enumerative code, then - Let Bi (i1n/u) be the partition of B, then
- Since H0(B) can be calculated by rank-directory,
we can estimate pointer information
Prop. 1
Prop. 2
10Example of ESP
- Estimate the pointer information using Rank
directory
R00
R19
R213
R314
R419
011111111100
011100100000
000000100000
010101100001
010100000001
Compressed block by enumerative code
11Method 2 recrankRecursive Rank
- Idea Use the reduction of a sparse bit-array
into denser bit arrays recursively - Partition B into blocks B0,,Bk of length s
- Define Bc and Be as
- Bc0k Bci 0 if Bi is all 0, Bci 1
otherwise - Be Concatenating all nonzero blocks of B in
order.
12recrank cont.
- Choose the block size s -lg(1-(n/m)) so that
Bc would be dense. - Use Be as the new input array B2 BeApply the
reduction to Bi (i2) recursively - Store Bc1, Bc2, , Bct, Bt.
sparse
B
B1
Bc1
01000 00110 00000 00001
B2
Bc2
1
1
0
1
01000 00110 00001
Bt
dense
Bct-
13Method 3 vcodeVertical code
- vcode stores results of select using gap coding
- di select(i1,B) - select (i, B) - 1
- Idea A gap sequence is aligned in each i-th bit
so that all operations are always byte-aligned.
- Define dik the k-th bit in di and
vki dik
Byte-aligned if t is the multiple of 8
14Example of vcode
We convert the original bit array B into the gap
sequence d and then convert it to the bit arrays V
B 0010000010011100100001000001
Select (5, B) (1 ltlt 0) (3 ltlt 1) (1 ltlt 2)
5 16
popcount(V0(1ltlt5)-1)
15Method 4 SDarraySparse array, Dense array
- Idea Use two different techniques to treat
sparse array and dense array separately - This enable us to design the data structures
simply - Sparse array uses dense array as a part of data
structure
16Method 4 (1) Sparse array
xi 100010101001002
- Let x0m-1, s.t. xi select (i1, S)
- Each x is divided into upper z lg t bits and
lower w lg (n/t) bits for t 1.44m - Lower bits are stored explicitly in L 0m-1
using mw m lg (n/1.44m) bits - Upper bits are stored in H using unary coding of
gaps using m t 2.44m bits - The total size is 1.92m m lg(n/m) bits
- select (i, B) (select1(i, H) - i)2w Li
- rank (i, B) uses select0(i/2w,H) to find the
smallest element which is greater than i/2w 2w - select(i, H) is calculated by Dense array because
H is dense
Upper
Lower
17Method 4 (2) Dense Array
- Partition B into blocks such that each block
contains L ones - Store select-dictionary (results of select) at
each boundary of blocks - Since the length of a block cannot be limited, we
store all positions explicitly if the length of a
block is large - We can perform select0 on the same data structure
by reversing bits in B at reading time
18Experimental Setup
- Methods
- esp (method 1)
- rec (method 2)
- vc (method 3)
- sa (method 4 sparse array)
- da (method 4 dense array)
- Kim (Kim 05)
- Kim2 (re-implementation of Kim 05 by ours)
- Navarro (R. Gonzalez 05)
- Entropy denotes 0-th order empirical entropy
- All results uses a bit array of length 107
- the positions of one are determined randomly.
19Experimental Result Size
20Experimental Result Size (Ratio of 1 1,5)
The values are the percentage of the size of each
data structures over the size of an original bit
array
21Experimental Result Rank
22Experimental Result Select
23Conclusion
- We propose four novel data structure
- For sparse bit arrays, sarray is the smallest and
fastest (indeed close to nH0) - esp and recrank are small for all condition
- vcode would be small if gap(B) is small c.f. ?
in Compressed Suffix Arrays - Easy to implement
- Question Can we perform fast rank operation in
entropy-compressed Rank/Select dictionary ?
24Thank you for your attention
25(No Transcript)
26PreliminarySuccinct data structure
- Represent an object form a universe with
cardinality L by (1o(1)) log L bits - e.g. an ordinal tree with n nodes is encoded in a
bit array of length 2n - Rank/Select dictionary is a fundamental tool for
- e.g. pointer information can be represented in a
sparse bit array
27Summary of our data structure
See the theoretical number in the paper
28Example of recrank
01000 00110 00000 00000 00000 00001 00
1
1
0
0
0
0
1
01000 00110
00001
01 00 00 01 10 00 00 1
1100010
1 0 0 1 1 0 0 1
10011001
01 01 10 1
0101101
- We recursively apply the reduction so that final
bit arrays (Bc1, Bc2, Be2) are all dense
29Example of vcode
We convert the original bit array B into the gap
sequence and then convert it to the bit arrays V
and T
B 0100110011100100001000001100011
Select (6, S) 9 (0 ltlt 0) (1 ltlt 1) (1 ltlt
2) 12
30Example of sarray
B 01001001000000000010000010100011
x07 1,4,7,18,24,26,30,31
1 0000 1 4 0010 0 7 0011 1 18 1001
0 24 1100 0 26 1101 0 30 1111 0 31 1111 1
L 10100001
H 10010100000 010001010011
z 4
L
31Experimental Result Close up result of Size