Alpha skip Search Algorithm Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian, C., Thierry, L. and Joseph, D.P., Lecture Notes in Computer Science, Vol. 1448, 1998, pp. 55-64 - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Alpha skip Search Algorithm Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian, C., Thierry, L. and Joseph, D.P., Lecture Notes in Computer Science, Vol. 1448, 1998, pp. 55-64

Description:

Alpha skip Search Algorithm Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian, C., Thierry, L. and Joseph, D.P., Lecture Notes in ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 24
Provided by: edut1550
Category:

less

Transcript and Presenter's Notes

Title: Alpha skip Search Algorithm Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian, C., Thierry, L. and Joseph, D.P., Lecture Notes in Computer Science, Vol. 1448, 1998, pp. 55-64


1
Alpha skip Search AlgorithmVery Fast String
Matching Algorithm for Small Alphabets and Long
Patterns, Christian, C., Thierry, L. and Joseph,
D.P., Lecture Notes in Computer Science, Vol.
1448, 1998, pp. 55-64
  • Advisor Prof. R. C. T. Lee
  • Reporter Z. H. Pan

2
The Exact String Matching Problem We are given a
text string T of length n and a pattern string P
of length m and we want to find of all
occurrences of P in T.
Example
Input
There are two occurrences of P in T as shown
below 0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17
Output 2, 10
3
  • The Alpha Skip Search Algorithm is an improvement
    of the Skip Search Algorithm.
  • The Skip Search Algorithm uses Rule 2, the
    substring matching rule and Rule 4, two window
    rule.

4
Rule 2 The Substring Matching Rule
  • For any substring u in T, find a nearest u in P
    which is to the left of it. If such an u in P
    exists, move P such then the two us match
    otherwise, we may define a new partial window.

5
Rule 2-2 1-Suffix Rule (A Special Version of
Rule 2)
  • Consider the 1-suffix x. We may apply Rule 2-2
    now.

6
Rule 4 Two Window Rule
T
P
w1
w2
No prefix of P a suffix of W1. No suffix of P
a prefix of W2.
w3
w4
Matched!
7
The Skip Search Algorithm
  • The Skip Search Algorithm uses Rule 2-2 together
    with Rule 4 in a very clever way.

Example
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 20
21 22 23 G C A T C G C A G A G A G T A T A C
A G T A C G
T P
0 1 2 3 4 5 6 7 G C A G A G A
G
0 1 2 3 4 5 6 7 G C A G A G A
G
the length of two window The length of the
pattern is m. The length of two window which is
a wide window is 2m-1.
8
Example
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 G C A T C G C A G A G A G T A T
A C A G T A C G
T P
0 1 2 3 4 5 6 7 G C A G A G A
G
0 1 2 3 4 5 6 7 G C A G A G A
G
0 1 2 3 4 5 6 7 G C A G A G A
G
The length of two window is 2m-1.
A C G T (6,4,2) (1) (7,5,3,0) f
9
Example
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 G C A T C G C A G A G A G T A T
A C A G T A C G
T
The length of two window is 2m-1.
A C G T (6,4,2) (1) (7,5,3,0) f
10
Example
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19
20 21 22 23 G C A T C G C A G A G A G T A T
A C A G T A C G
T P
0 1 2 3 4 5 6 7 G C A G A G A
G
The length of two window is 2m-1.
A C G T (6,4,2) (1) (7,5,3,0) f
11
  • The Skip Search Algorithm uses a very special
    version of Rule 2. In it, the substring is
    limited to one character.
  • Later, in alpha skip algorithm, it uses a
    substring whose length may be longer than 1 and a
    wide window with length 2m-L is used.

12
We assume that the size of the alphabet S of the
text and pattern is s. In the preprocessing
phase, we first use a formula to determine L and
then find all substrings in pattern P whose
length is L. The information about where the
substrings are location in P is stored in a trie.
In the searching phase, we use the information
which is stored in trie to compare text T with
pattern P.
13
Preprocessing phase
If logsm gt 1, L logsm where s is the size of
the alphabet and m is the length of pattern P
otherwise L1.
Example
trie
a
b
T aaaababbababbbbbbaabababababbac P
ababbaba s3, m8 L logsm log38 1
7,5,2,0
6,4,3,1
In this case, the s is 3 and the length of
pattern is 8, so that L is 1, that is, the limit
of the length of substring is 1.
14
Every tries leaf stores decreasing numbers of
position of pattern P.
Example
0 1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
29 a a a a b a b b a b a b b b b b b a a b a b a
b a b a b b a
T
P
0 1 2 3 4 5 6 7 a b a b b a b a
a
b
a
b
b
s 2, m 8 L logsm log28 3
a
b
a
b
5,0
4,1
2
3
15
Trie
Example
P
0 1 2 3 4 5 6 7 a b a b b a b a
root
a
b
b
a
b
a
b
b
a
5,0
4,1
2
3
16
P
0 1 2 3 4 5 6 7 a b a b b a b a
root
a
b
a
b
a
b
b
a
a
0
P
0 1 2 3 4 5 6 7 a b a b b a b a
a
b
a
b
a
b
b
b
b
a
a
a
b
b
b
b
a
a
a
a
b
b
b
0
1
0
1
2
0
1
2
3
17
P
0 1 2 3 4 5 6 7 a b a b b a b a
a
b
b
a
b
b
a
a
b
0
4,1
2
3
18
We use a wide window with length 2m-L.
Example
0 1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
29 a a a a b a b b a b a b b b b b b a a b a b a
b a b a b b a
T
This is a wide window with length 2m-L 28-313.
P
0 1 2 3 4 5 6 7 a b a b b a b a
s 2, m 8 L logsm log28 3
19
Example
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 T aaaababbababbbbbbaababababa
bba 0 1 2 3 4 5 6
7 P ababbaba
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 T aaaababbababbbbbbaababababa
bba
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 T aaaababbababbbbbbaababababa
bba
0 1 2 3 4 5 6 7 ababbaba

Match!
20
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 T aaaababbababbbbbbaababababa
bba
0 1 2 3 4 5 6 7 ababbaba

No bbb in P
Match!
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 T aaaababbababbbbbbaababababa
bba
0 1 2 3 4 5 6 7 ababbaba

No aab in P
Match!
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 T aaaababbababbbbbbaababababa
bba
0 1 2 3 4 5 6 7 ababbaba

Match!
21
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 T aaaababbababbbbbbaababababa
bba
0 1 2 3 4 5 6 7 ababbaba

0 1 2 3 4 5 6 7 ababbaba

Match!
0 1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 T aaaababbababbbbbbaababababa
bba
0 1 2 3 4 5 6 7 ababbaba

0 1 2 3 4 5 6 7 ababbaba

Match!
0 1 2 3 4 5 6 7 ababbaba

22
Time complexity
preprocessing phase in O(m) time and space
complexity searching phase in O(mn) time
complexity
23
References BM77    A Fast String Searching
Algorithm , Boyer, R. S. and Moore, J. S.
, Communication of the ACM , Vol. 20 , 1977
, pp. 762-772 . HS91    Fast String Searching
, Hume, A. and Sundy, D. M. , Software, Practice
and Experience , Vol. 21 , 1991 , pp. 1221-1248 .
MTALSWW92 Speeding Up Two String-Matching
Algorithms, Maxime C., Thierry L., Artur C.,
Leszek G., Stefan J., Wojciech P. and Wojciech
R., Lecture Notes In Computer Science, Vol.
577, 1992, pp. 589-600 . MW94 Text
algorithms, M. Crochemore and W. Rytter, Oxford
University Press, 1994. KMP77 Fast Pattern
Matching in Strings, D.E. Knuth, J.H. Morris and
V.R. Pratt, SIAM Journal on Computing, Vol. 6,
No.2, 1977, pp 323-350 . T92 A variation on
the Boyer-Moore algorithm, Thierry Lecroq,
Theoretical Computer Science archive, Vol. 92 ,
No.1, 1992, pp 119-144 . T98 Experiments
on string matching in memory structures, Thierry
Lecroq, SoftwarePractice Experience archive,
Vol. 28, No.5, 1998, pp 561-568 T92 Tuning
the Boyer-Moore-Horspool string searching
algorithm, Timo Raita, SoftwarePractice
Experience archive, Vol. 22, No.10, 1992, pp.
879-884 . G94 String searching algorithms,
G.A. Stephen, World Scientific Lecture Notes
Series On Computing, Vol. 3, 1994, pp. 243 .
Write a Comment
User Comments (0)
About PowerShow.com