KMP algorithm - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

KMP algorithm

Description:

KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast pattern matching in strings, ... The design of the Knuth-Morris-Pratt algorithm follows a tight analysis of the ... – PowerPoint PPT presentation

Number of Views:389

Avg rating:3.0/5.0

Slides: 21

Provided by: algCsie

Category:

more less

Transcript and Presenter's Notes

Title: KMP algorithm

1
KMP algorithm
KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast
pattern matching in strings, SIAM Journal on
Computing 6(1), 1977, pp.323-350.

Advisor Prof. R. C. T. Lee
Reporter Z. H. Pan

2
Definition

String Matching Problem
Input A text string T with length n and a
pattern string P with length m.
Output Find all of the positions that
occurrence of P in T.

Example
T
P
The occurrences of P in T start at T4 ,T10 ,T16
3
Knuth-Morris-Pratt algorithm

The design of the Knuth-Morris-Pratt algorithm
follows a tight analysis of the Morris and Pratt
algorithm. The KMP algorithm just improves MP
algorithm.
KMP algorithm performs the comparisons from left
to right.

Ti the character of the ith position of text
T.
Pj the character of the jth position of pattern
P.
n The length of T. m The length of P.
Ti,j the string Ti Ti1Tj , 0?i ?j ?n-1.
Pi,j the string Pi Pi1Pj , 0?i ?j ?m-1.

Example We suppose that P0 is aligned to Ts .
s5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
T a a a a a a t c a c a t t a g c a a a a
X
? mismatch
P a t c a c a g t a t c a
0 1 2 3 4 5 6 7 8 9 10 11
Shift
According the above example T11 t P6
g T5,8 atca P1,3 tca n20 m 12
T5,10P0,5 T11?P6 Ts T5a
5
Let a,b and c be variables. When the mismatch
occurrs, we use u to denote the portion of the
pattern, that is P0,i-1, which is front of the
mismatch character, that is Pi. There is a
substring which is equal to u of pattern P in the
text T. There are some the prefixes of u which
is equal to the suffixes of u. We use vp to
denote the longest prefix of u which is equal
some of suffix of u, that is vs. We can say that
vp is the border of u. Consider an attempt at
a left position j on T, that is when the window
is positioned on the substring of the text
Tj,jm-1. Assume that the first mismatch occurs
between Pi and Tij (a?b) with 0 ? i ? m-1.
Then, P0,i-1 Tj,ij-1 u and b Pi ? Tija.
j
jm-1
ij
n-1
0
T
X
P
m-1
i
0
0
x
i
m-1
vp
vs
P
u
border
i-1-x
i-1
6
Difference between MP and KMP algorithm
In the same situation, the difference between MP
algorithm and KMP algorithm is KMP algorithm more
than one step in the shifting.
Example

MP algorithm

T
u
X
P
KMP algorithm
T
u
X
P
7
A P(0) P(i) B P(0, j) is a suffix of P(0,
i-1) C P(j1) P(i)
P(i) 1
P(i) ?-1
8
A case where prefix(i) 0
i
0
P
c
v
v
c
b
b
There is a suffix of P(0, i) equal to a prefix of
P and Pi ? P0
P a b c c c a b c c c c a
0 1 2 3 4 5 6 7 8 9 10 11
0
Prefix function of P
In this case, we move P i steps.
9
A full example
P a b c c a b c b a b c c a
-1 0 0 0 -1 0 0 3 -1 0 0 0 -1
0 1 2 3 4 5 6 7 8 9 10 11 12
special
10
Example
T
?.
P
T
?.
P
11
T
?.
P
exact match
?.
T
P
exact match
12
T
?.
P
exact match
?.
T
P
exact match
13
T
?.
P
exact match
exact match
14
Time Complexity
Preprocessing phase in O(m) space and time
complexity. Searching phase in O(nm) time
complexity.
15

The prefix function used in KMP method can be
obtained by modifying the prefix function
algorithm used in the MP method by paying
attention to the conditions where prefix(i)0 and
-1.

16
References

AHO, A.V., 1990, Algorithms for finding patterns
in strings. in Handbook of Theoretical Computer
Science, Volume A, Algorithms and complexity, J.
van Leeuwen ed., Chapter 5, pp 255-300, Elsevier,
Amsterdam.
AOE, J.-I., 1994, Computer algorithms string
pattern matching strategies, IEEE Computer
Society Press.
BAASE, S., VAN GELDER, A., 1999, Computer
Algorithms Introduction to Design and Analysis,
3rd Edition, Chapter 11, pp. ??-??,
Addison-Wesley Publishing Company.
BAEZA-YATES R., NAVARRO G., RIBEIRO-NETO B.,
1999, Indexing and Searching, in Modern
Information Retrieval, Chapter 8, pp 191-228,
Addison-Wesley.
BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992,
Éléments d'algorithmique, Chapter 10, pp 337-377,
Masson, Paris.
CORMEN, T.H., LEISERSON, C.E., RIVEST, R.L.,
1990. Introduction to Algorithms, Chapter 34, pp
853-885, MIT Press.
CROCHEMORE, M., 1997. Off-line serial exact
string searching, in Pattern Matching Algorithms,
ed. A. Apostolico and Z. Galil, Chapter 1, pp
1-53, Oxford University Press.
CROCHEMORE, M., HANCART, C., 1999, Pattern
Matching in Strings, in Algorithms and Theory of
Computation Handbook, M.J. Atallah ed., Chapter
11, pp 11-1--11-28, CRC Press Inc., Boca Raton,
FL.
CROCHEMORE, M., LECROQ, T., 1996, Pattern
matching and text compression algorithms, in CRC
Computer Science and Engineering Handbook, A.
Tucker ed., Chapter 8, pp 162-202, CRC Press
Inc., Boca Raton, FL.
CROCHEMORE, M., RYTTER, W., 1994, Text
Algorithms, Oxford University Press.
GONNET, G.H., BAEZA-YATES, R.A., 1991. Handbook
of Algorithms and Data Structures in Pascal and
C, 2nd Edition, Chapter 7, pp. 251-288,
Addison-Wesley Publishing Company.

17
References

GOODRICH, M.T., TAMASSIA, R., 1998, Data
Structures and Algorithms in JAVA, Chapter 11, pp
441-467, John Wiley Sons.
GUSFIELD, D., 1997, Algorithms on strings, trees,
and sequences Computer Science and Computational
Biology, Cambridge University Press.
HANCART, C., 1992, Une analyse en moyenne de
l'algorithme de Morris et Pratt et de ses
raffinements, in Théorie des Automates et
Applications, Actes des 2e Journées
Franco-Belges, D. Krob ed., Rouen, France, 1991,
PUR 176, Rouen, France, 99-110.
HANCART, C., 1993. Analyse exacte et en moyenne
d'algorithmes de recherche d'un motif dans un
texte, Ph. D. Thesis, University Paris 7, France.
KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R., 1977,
Fast pattern matching in strings, SIAM Journal on
Computing 6(1)323-350.
SEDGEWICK, R., 1988, Algorithms, Chapter 19, pp.
277-292, Addison-Wesley Publishing Company.
SEDGEWICK, R., 1988, Algorithms in C, Chapter 19,
Addison-Wesley Publishing Company.
SEDGEWICK, R., FLAJOLET, P., 1996, An
Introduction to the Analysis of Algorithms,
Chapter ?, pp. ??-??, Addison-Wesley Publishing
Company.
STEPHEN, G.A., 1994, String Searching Algorithms,
World Scientific.
WATSON, B.W., 1995, Taxonomies and Toolkits of
Regular Language Algorithms, Ph. D. Thesis,
Eindhoven University of Technology, The
Netherlands.
WIRTH, N., 1986, Algorithms Data Structures,
Chapter 1, pp. 17-72, Prentice-Hall.

Thank You!

To implement the KMP Algorithm, we only have to
modify the prefix function of the MP function.
That is, if the following conditions is
satisfied, Prefix (i) -1.

i
0
P
b
b
Before b, there is no suffix of P(0, i-1) equal
to any prefix of P and PiP0. Whenever this
occurs, we move P i-(-1) i1 steps (the
largest step).
P a b c a c a b a c c c a
0 1 2 3 4 5 6 7 8 9 10 11
-1
20
Another case where prefix(i) -1
i
0
P
b
v
v
b
b
b
There is a suffix of P(0, i) equal to a prefix of
P and Pi P0
P a b a c c a b a c c c a
0 1 2 3 4 5 6 7 8 9 10 11
-1
Prefix function of P
In this case, we again move P i1 steps.

Write a Comment

User Comments (0)