Title: Suffix arrays
1Suffix arrays
2Suffix array
- We loose some of the functionality but we save
space.
Let s abab
Sort the suffixes lexicographically ab, abab,
b, bab
The suffix array gives the indices of the
suffixes in sorted order
2
0
3
1
3How do we build it ?
- Build a suffix tree
- Traverse the tree in DFS, lexicographically
picking edges outgoing from each node and fill
the suffix array. - O(n) time
4How do we search for a pattern ?
- If P occurs in T then all its occurrences are
consecutive in the suffix array. - Do a binary search on the suffix array
- Takes O(mlogn) time
5Example
Let S mississippi
i
L
ippi
issippi
Let P issa
ississippi
mississippi
pi
M
ppi
sippi
sisippi
ssippi
ssissippi
R
6How do we accelerate the search ?
Maintain l LCP(P,L)
Maintain r LCP(P,R) Assume l r
r
l
L
M
R
7If l r then start comparing M to P at l 1
r
l
L
M
R
8l gt r
r
l
L
M
R
9Someone whispers LCP(L,M)
LCP(L,M) gt l
r
l
L
M
R
10Continue in the right half
LCP(L,M) gt l
r
l
L
M
R
11LCP(L,M) lt l
r
l
L
M
R
12Continue in the left half
LCP(L,M) lt l
r
l
L
M
R
13LCP(L,M) l
start comparing M to P at l 1
r
l
L
M
R
14Analysis
If we do more than a single comparison in an
iteration then max(l, r ) grows by 1 for each
comparison ? O(m logn) time
15Construct the suffix array without the suffix tree
16Linear time construction
Recursively ?
Say we want to sort only suffixes that start at
even positions ?
17Change the alphabet
Every pair of characters is now a character
You in fact sort suffixes of a string shorter by
a factor of 2 !
18Change the alphabet
a 0
aa 1
ab 2
b 3
ba 4
bb 5
a
a
b
a
a
b
2
1
2
19But we do not gain anything
20Divide into triples
y
a
b
b
a
b
o
d
a
b
a
d
abb
ada
bba
do
21Divide into triples
y
a
b
b
a
b
o
d
a
b
a
d
abb
ada
bba
do
y
a
b
b
a
b
o
d
a
b
a
d
bba
dab
bad
o
22Sort recursively 2/3 of the suffixes
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
0
1
2
3
4
7
5
6
abb
ada
bba
do
bba
dab
bad
o
3
7
1
2
4
6
4
5
23Sort the remaining third
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
7
8
1
4
2
6
5
3
(a, 7)
(b, 2)
(a, 5)
(y, 1)
?
(y, 1)
(b, 2)
(a, 7)
(a, 5)
0
3
9
6
24Merge
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
7
8
1
4
2
6
5
3
0
3
9
6
10
11
1
4
8
2
7
5
1
25Merge
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
7
8
1
4
2
6
5
3
0
3
9
6
10
11
4
8
2
7
5
1
6
26Merge
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
7
8
1
4
2
6
5
3
0
3
9
10
11
4
8
2
7
5
1
6
4
27Merge
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
7
8
1
4
2
6
5
3
0
3
9
10
11
8
2
7
5
1
6
4
9
28Merge
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
7
8
1
4
2
6
5
3
0
3
10
11
8
2
7
5
1
6
4
9
3
29Merge
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
7
8
1
4
2
6
5
3
0
10
11
8
2
7
5
1
6
4
9
3
8
30Merge
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
7
8
1
4
2
6
5
3
0
10
11
2
7
5
1
6
4
9
3
8
2
31Merge
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
7
8
1
4
2
6
5
3
0
10
11
7
5
1
6
4
9
3
8
2
7
32Merge
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
7
8
1
4
2
6
5
3
0
10
11
5
1
6
4
9
3
8
2
7
5
33Merge
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
7
8
1
4
2
6
5
3
0
10
11
1
6
4
9
3
8
2
7
5
34Merge
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
7
8
1
4
2
6
5
3
0
1
6
4
9
3
8
2
7
5
10
11
35summary
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
7
8
1
4
2
6
5
3
1
6
4
9
3
8
2
7
5
10
11
0
When comparing to a suffix with index 1 (mod 3)
we compare the char and break ties by the ranks
of the following suffixes
When comparing to a suffix with index 2 (mod 3)
we compare the char, the next char if there is a
tie, and finally the ranks of the following
suffixes
36Compute LCPs
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
o
11
do
10
dabbado
5
bbado
7
bbadabbado
2
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
37Crucial observation
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
o
11
do
10
dabbado
5
bbado
7
bbadabbado
2
LCP(i,j) min LCP(i,i1),LCP(i1,i2),.,LCP(j-1
,j)
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
38Find LCPs of consecutive suffixes
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
o
0
11
do
10
dabbado
5
bbado
7
bbadabbado
2
LCP(11,0)
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
391
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
o
0
1
11
do
10
dabbado
5
bbado
7
bbadabbado
2
LCP(8,2)
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
401
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
0
o
0
1
11
do
10
dabbado
5
bbado
7
bbadabbado
2
LCP(9,3)
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
411
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
0
o
1
0
1
11
do
10
dabbado
5
bbado
7
bbadabbado
2
LCP(6,4)
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
421
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
0
0
1
0
1
o
11
do
10
dabbado
5
bbado
7
bbadabbado
2
LCP(7,5)
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
431
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
5
0
0
1
0
1
o
11
do
10
dabbado
5
bbado
7
bbadabbado
2
LCP(1,6)
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
441
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
4
5
0
0
1
0
1
o
11
do
10
dabbado
5
bbado
7
bbadabbado
2
LCP(2,7)
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
451
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
4
5
0
0
1
0
1
3
o
11
do
10
dabbado
5
bbado
7
bbadabbado
2
LCP(3,8)
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
461
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
4
5
0
0
1
0
1
3
o
2
11
do
10
dabbado
5
bbado
7
bbadabbado
2
LCP(4,9)
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
471
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
4
5
0
0
1
0
1
3
o
2
1
11
do
10
dabbado
5
bbado
7
bbadabbado
2
LCP(5,10)
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
481
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
4
5
0
0
1
0
1
3
o
2
1
0
11
do
10
dabbado
5
bbado
7
bbadabbado
2
LCP(10,11)
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
49Analysis
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
yabbadabbado
0
4
5
0
0
1
0
1
3
o
2
1
0
11
do
10
dabbado
5
bbado
7
bbadabbado
2
The starting position deceases by 1 in every
iteration. So it cannot increase more than O(n)
times
bado
8
badabbado
3
ado
9
adabbado
4
abbado
6
abbadabbado
1
50We need more LCPs for search
1
2
3
4
7
8
9
10
11
12
5
6
0
y
a
b
b
a
b
o
d
a
b
a
d
4
10
11
12
1
7
9
2
8
5
3
6
1
6
4
9
3
8
2
7
5
10
11
0
4
5
0
0
1
0
1
3
2
1
0
Linearly many, calculate the all bottom up