Practical EntropyCompressed RankSelect Dictionary - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Practical EntropyCompressed RankSelect Dictionary

Description:

This is the first study of practical implementation of entropy ... H = 10010100000. 010001010011. L = 10100001. L. Experimental Result: Close up result of Size ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 32
Provided by: okano
Category:

less

Transcript and Presenter's Notes

Title: Practical EntropyCompressed RankSelect Dictionary


1
Practical Entropy-Compressed Rank/Select
Dictionary
ALENEX 2007_at_New Orleans
  • Daisuke OkanoharaUniversity of TokyoKunihiko
    SadakaneKyushu University

2
Abstract
  • We propose four novel practical
    entropy-compressed Rank/Select dictionary
  • esp, recrank, vcode, sdarray
  • This is the first study of practical
    implementation of entropy-compressed Rank/Select
    dictionary
  • Fundamental tool for succinct data structures

3
PreliminaryRank/Select Dictionary
  • Rank/Select dictionaries are data structure for
    an ordered set S ? 0,1,,n-1 to support the
    following operations
  • rank (x, S) of elements in S which are no
    greater than x
  • select (i, S) the position of i-th smallest
    element in S
  • S is represented by a bit array B0n-1s.t.
    Bi 1 if i ? S, Bi 0 otherwise.
  • rank (x, B) of 1 in B0x
  • select (i, B) the position of i-th 1 from the
    left in B

4
Example of Rank/Select Dictionary
  • rank (x, B)
  • of 1 in B0x
  • select (i, B)
  • The position of i-th 1 from the left in B.

S 1,3,4,7,10
B
2
3
4
1
rank (6, B) 3
select (4, B) 7
5
Size issue
  • Given a bit array B of length n with m ones
  • A bit array representation requires n bits
  • This is much larger than the optimal one when m
    ltlt n
  • B is called sparse if m n and dense if m ? n/2
  • The lower-bound of the size of B is
  • This can be approximated by nH0(B)
  • H0(B)?1 is the 0-th order empirical entropy of B
  • When m n, nH0(B) is close to
  • E.g. mn/64, nH0(B)/n ? 0.12

p m/n
lg log2
6
Existing implementation of Rank/Select dictionary
  • Theoretical study
  • O(1) time, no(n) bits J. I. Munro 96
  • O(1) time, nH0(B)o(n) bits R. Raman et al. 02
  • Practical implementation
  • O(1) time, no(n) bits Kim et al. 05 R.
    Gonzalez et al. 05
  • o(n) time, gap(B)o(n) bits A. Gupta et al. 06
  • gap(B) ?i lg (select(i1,B)-select(i,B))
  • Our result O(1) time, nH0(B) o(n) bits in
    practice

7
For practical entropy-compressed Rank/Select
dictionary
R00
R19
R213
R314
R419
011100100000
000000100000
010101100001
010100000001
011111111100
P00
Pointer information
P18
P212
P316
P422
  • Basic Implementation O(1) time nH0(B) o(n)
    bits
  • Partition B into blocks and then each block is
    compressed by enumerative code T. Cover 73.
  • Store rank-directory (results of rank) in the
    boundaries of blocks
  • Problem
  • Since compressed sizes are not constant, we need
    the pointer information (o(n) term), whose size
    is as large as nH0(B) bits
  • Solution
  • Estimate Pointer information esp
  • Convert a sparse-array into dense ones recrank
  • Select-oriented data structure vcode, sdarray

8
Summary of our data structures
o(n) terms in sarray and darray are O(1) in
almost all cases
9
Method 1 ESPEStimation of Pointer information
  • Idea Dont have pointer information, but
    estimate it from rank information
  • Define L(B) be the length of the code word for B
    using enumerative code, then
  • Let Bi (i1n/u) be the partition of B, then
  • Since H0(B) can be calculated by rank-directory,
    we can estimate pointer information

Prop. 1
Prop. 2
10
Example of ESP
  • Estimate the pointer information using Rank
    directory

R00
R19
R213
R314
R419
011111111100
011100100000
000000100000
010101100001
010100000001
Compressed block by enumerative code
11
Method 2 recrankRecursive Rank
  • Idea Use the reduction of a sparse bit-array
    into denser bit arrays recursively
  • Partition B into blocks B0,,Bk of length s
  • Define Bc and Be as
  • Bc0k Bci 0 if Bi is all 0, Bci 1
    otherwise
  • Be Concatenating all nonzero blocks of B in
    order.

12
recrank cont.
  • Choose the block size s -lg(1-(n/m)) so that
    Bc would be dense.
  • Use Be as the new input array B2 BeApply the
    reduction to Bi (i2) recursively
  • Store Bc1, Bc2, , Bct, Bt.

sparse
B
B1
Bc1
01000 00110 00000 00001
B2
Bc2
1
1
0
1
01000 00110 00001
Bt
dense
Bct-
13
Method 3 vcodeVertical code
  • vcode stores results of select using gap coding
  • di select(i1,B) - select (i, B) - 1
  • Idea A gap sequence is aligned in each i-th bit
    so that all operations are always byte-aligned.
  • Define dik the k-th bit in di and
    vki dik

Byte-aligned if t is the multiple of 8
14
Example of vcode
We convert the original bit array B into the gap
sequence d and then convert it to the bit arrays V
B 0010000010011100100001000001
Select (5, B) (1 ltlt 0) (3 ltlt 1) (1 ltlt 2)
5 16
popcount(V0(1ltlt5)-1)
15
Method 4 SDarraySparse array, Dense array
  • Idea Use two different techniques to treat
    sparse array and dense array separately
  • This enable us to design the data structures
    simply
  • Sparse array uses dense array as a part of data
    structure

16
Method 4 (1) Sparse array
xi 100010101001002
  • Let x0m-1, s.t. xi select (i1, S)
  • Each x is divided into upper z lg t bits and
    lower w lg (n/t) bits for t 1.44m
  • Lower bits are stored explicitly in L 0m-1
    using mw m lg (n/1.44m) bits
  • Upper bits are stored in H using unary coding of
    gaps using m t 2.44m bits
  • The total size is 1.92m m lg(n/m) bits
  • select (i, B) (select1(i, H) - i)2w Li
  • rank (i, B) uses select0(i/2w,H) to find the
    smallest element which is greater than i/2w 2w
  • select(i, H) is calculated by Dense array because
    H is dense

Upper
Lower
17
Method 4 (2) Dense Array
  • Partition B into blocks such that each block
    contains L ones
  • Store select-dictionary (results of select) at
    each boundary of blocks
  • Since the length of a block cannot be limited, we
    store all positions explicitly if the length of a
    block is large
  • We can perform select0 on the same data structure
    by reversing bits in B at reading time

18
Experimental Setup
  • Methods
  • esp (method 1)
  • rec (method 2)
  • vc (method 3)
  • sa (method 4 sparse array)
  • da (method 4 dense array)
  • Kim (Kim 05)
  • Kim2 (re-implementation of Kim 05 by ours)
  • Navarro (R. Gonzalez 05)
  • Entropy denotes 0-th order empirical entropy
  • All results uses a bit array of length 107
  • the positions of one are determined randomly.

19
Experimental Result Size
20
Experimental Result Size (Ratio of 1 1,5)
The values are the percentage of the size of each
data structures over the size of an original bit
array
21
Experimental Result Rank
22
Experimental Result Select
23
Conclusion
  • We propose four novel data structure
  • For sparse bit arrays, sarray is the smallest and
    fastest (indeed close to nH0)
  • esp and recrank are small for all condition
  • vcode would be small if gap(B) is small c.f. ?
    in Compressed Suffix Arrays
  • Easy to implement
  • Question Can we perform fast rank operation in
    entropy-compressed Rank/Select dictionary ?

24
Thank you for your attention
25
(No Transcript)
26
PreliminarySuccinct data structure
  • Represent an object form a universe with
    cardinality L by (1o(1)) log L bits
  • e.g. an ordinal tree with n nodes is encoded in a
    bit array of length 2n
  • Rank/Select dictionary is a fundamental tool for
  • e.g. pointer information can be represented in a
    sparse bit array

27
Summary of our data structure
See the theoretical number in the paper
28
Example of recrank
01000 00110 00000 00000 00000 00001 00
1
1
0
0
0
0
1
01000 00110
00001
01 00 00 01 10 00 00 1
1100010
1 0 0 1 1 0 0 1
10011001
01 01 10 1
0101101
  • We recursively apply the reduction so that final
    bit arrays (Bc1, Bc2, Be2) are all dense

29
Example of vcode
We convert the original bit array B into the gap
sequence and then convert it to the bit arrays V
and T
B 0100110011100100001000001100011
Select (6, S) 9 (0 ltlt 0) (1 ltlt 1) (1 ltlt
2) 12
30
Example of sarray
B 01001001000000000010000010100011
x07 1,4,7,18,24,26,30,31
1 0000 1 4 0010 0 7 0011 1 18 1001
0 24 1100 0 26 1101 0 30 1111 0 31 1111 1
L 10100001
H 10010100000 010001010011
z 4
L
31
Experimental Result Close up result of Size
Write a Comment
User Comments (0)
About PowerShow.com