Title: Modeling Delta Encoding of Compressed Files
1Modeling Delta Encoding of Compressed Files
- S.T. Klein, T.C. Serebro, D. Shapira
2Delta Encoding
- Example
- SThe Prague Stringology Club
- TThe Prague Stringology Conference 06
- ?(1, 24)onferenc(3,2)06
3Compressed Differencing
Delta encoding
Semi Compressed Differencing
Full Compressed Differencing
S
T
E(S)
E(T)
S
E(S)
?(S,T)
- Goal- Create a delta file of S and T, without
decompressing the compressed files.
4LZW compression
- STR input character
- WHILE there are input characters
- C input character
- IF STR ? C is in T then
- STR STR ? C
- ELSE
- output the code for STR
- add STR ? C to T
- STR C
-
-
- output the code for STR
5Example
E(S) 1233219571
6Semi Compressed Differencing Algorithm
7Example
- E(S) 1233219571, T ccbbabccbabccbba.
(5,2)
(9,3)
b
(3,2)
b
(5,2)
(9,3)
(5,2)
?(S,T)
8Full Compressed Differencing Algorithm
- 1 construct the trie of E(S)
- 2 flag ? 0 // output character k
- 3 counter ? 1 // position in T
- 4 input oldcw from E(T)
- 5 while oldcw?NULL // still processing E(T)
-
- 5.1 input cw from E(T)
- 5.2 node ? Dictionaryoldcw
- 5.3 if (Dictionarycw ? NULL)
- 5.3.1 k ?first character of string corresponding
to Dictionarycw - 5.4 else
- 5.4.1 k ? first character of string
corresponding to node - 5.5 if ((node has a child k) and (cw?NULL))
- 5.5.1 output (posflag,len-flag) corresponding
to child k of node - 5.5.2 flag ? 1
- 5.6 else
- 5.6.1 output (posflag, len-flag)
corresponding to node - 5.6.2 create a new child of node corresponding
to k - 5.6.3 flag ? 0
9Example
- E(S) 1233219571 E(T) 33221247957
10Example
- E(S) 1233219571 E(T) 33221247957
- S abccbaaabccba T
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tc oldcw3
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tcc oldcw3 cw3
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tcc oldcw3 cw3 kc
3
11Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tcc oldcw3 cw3 kc
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tcc oldcw3 cw3 kc
lt3, 2gt
?(S,T)
3
4 (1,2,c)
12Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tcc oldcw3 cw3 flag1 kc
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccb oldcw3 cw3 flag1 kc
lt3, 2gt
?(S,T)
4 (1,2,c)
13Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccb oldcw3 cw2 flag1 kc
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccb oldcw3 cw2 flag1 kb
lt3, 2gt
lt5, 1gt
?(S,T)
4 (1,2,c)
5 (2,2,c)
14Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccb oldcw3 cw2 flag1 kb
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbb oldcw3 cw2 flag1 kb
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbb oldcw2 cw2 flag1 kb
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbb oldcw2 cw2 flag0 kb
lt3, 2gt
lt5, 1gt
ltb, 0gt
?(S,T)
6 (3,2,b)
4 (1,2,c)
5 (2,2,c)
15Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbb oldcw2 cw2 flag0 kb
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbba oldcw2 cw2 flag0 k
b
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbba oldcw2 cw1 flag0 k
a
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbba oldcw2 cw1 flag1 k
a
lt3, 2gt
lt5, 1gt
lt5, 2gt
?(S,T)
6 (3,2,b)
7 (4,2,b)
4 (1,2,c)
5 (2,2,c)
4 (1,2,c)
5 (2,2,c)
16Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbba oldcw2 cw1 flag1 k
a
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbab oldcw2 cw1 flag1 k
a
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbab oldcw1 cw2 flag1 k
b
lt3, 2gt
lt5, 1gt
lt5, 2gt
lt2,1gt
?(S,T)
6 (3,2,b)
7 (4,2,b)
8 (5,2,a)
4 (1,2,c)
5 (2,2,c)
4 (1,2,c)
5 (2,2,c)
17Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabcc oldcw2 cw4 flag1
kc
lt3, 2gt
lt5, 1gt
lt5, 2gt
lt2,1gt
lt3, 1gt
?(S,T)
6 (3,2,b)
7 (4,2,b)
8 (5,2,a)
4 (1,2,c)
5 (2,2,c)
4 (1,2,c)
5 (2,2,c)
9 (6,2,b)
18Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccba oldcw4 cw7 flag
1 kb
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccba oldcw4 cw7 flag
0 kb
lt3, 2gt
lt5, 1gt
lt5, 2gt
lt2,1gt
lt3, 1gt
(2, 1)
?(S,T)
6 (3,2,b)
7 (4,2,b)
8 (5,2,a)
4 (1,2,c)
5 (2,2,c)
9 (6,2,b)
19Example
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccbabc oldcw7 cw9 fl
ag0 kb
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccbabccb oldcw9 cw5
flag0 kc
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccbabccb oldcw9 cw5
flag1 kc
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccbabccbba oldcw5 cw7
flag1 kb
E(S) 1233219571 E(T) 33221247957 S
abccbaaabccba Tccbbabccbabccbba oldcw7 cwNul
l flag0 kb
lt3, 2gt
lt5, 1gt
lt5, 2gt
lt2,1gt
lt3, 1gt
(2, 1)
(4, 2)
lt9, 3gt
(3, 1)
(4, 2)
?(S,T)
6 (3,2,b)
7 (4,2,b)
8 (5,2,a)
4 (1,2,c)
5 (2,2,c)
9 (6,2,b)
b
10 (7,3,c)
12 (11,3,b)
20Combination of Pairs
- If two consecutive ordered pairs are of the form
and , we combine them into
a single ordered pair
lt3, 2gt
lt5, 1gt
lt5, 2gt
lt2,1gt
lt3, 1gt
(2, 1)
(4, 2)
lt9, 3gt
(3, 1)
(4, 2)
lt3, 2gt
lt5, 1gt
lt3, 3gt
?(S,T)
S abccbaaabccba
S abccbaaabccba
S abccbaaabccba
21Combination of Pairs
- If two consecutive ordered pairs are of the form
and , we combine them into
a single ordered pair
lt5, 2gt
lt2,1gt
lt3, 1gt
(2, 1)
(4, 2)
lt9, 3gt
(3, 1)
(4, 2)
lt2,1gt
lt3, 1gt
lt2, 2gt
?(S,T)
lt3, 3gt
S abccbaaabccba
S abccbaaabccba
S abccbaaabccba
22Encoding the delta file
File consists of
(pos, len) in S
(pos, len) in T
Characters
flags
23Experiments
- S xfig.3.2.1
- T xfig.3.2.2
- T 812K
- Gzip(T) 325K
- LZW(T) 497K
- ?(S,T) ? 3K