KMP algorithm - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

KMP algorithm

Description:

KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast pattern matching in strings, ... The design of the Knuth-Morris-Pratt algorithm follows a tight analysis of the ... – PowerPoint PPT presentation

Number of Views:389
Avg rating:3.0/5.0
Slides: 21
Provided by: algCsie
Category:
Tags: kmp | algorithm | knuth

less

Transcript and Presenter's Notes

Title: KMP algorithm


1
KMP algorithm
KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast
pattern matching in strings, SIAM Journal on
Computing 6(1), 1977, pp.323-350.
  • Advisor Prof. R. C. T. Lee
  • Reporter Z. H. Pan

2
Definition
  • String Matching Problem
  • Input A text string T with length n and a
    pattern string P with length m.
  • Output Find all of the positions that
    occurrence of P in T.

Example
T
P
The occurrences of P in T start at T4 ,T10 ,T16
3
Knuth-Morris-Pratt algorithm
  • The design of the Knuth-Morris-Pratt algorithm
    follows a tight analysis of the Morris and Pratt
    algorithm. The KMP algorithm just improves MP
    algorithm.
  • KMP algorithm performs the comparisons from left
    to right.

4
  • Ti the character of the ith position of text
    T.
  • Pj the character of the jth position of pattern
    P.
  • n The length of T. m The length of P.
  • Ti,j the string Ti Ti1Tj , 0?i ?j ?n-1.
  • Pi,j the string Pi Pi1Pj , 0?i ?j ?m-1.

Example We suppose that P0 is aligned to Ts .
s5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
T a a a a a a t c a c a t t a g c a a a a
X
? mismatch
P a t c a c a g t a t c a
0 1 2 3 4 5 6 7 8 9 10 11
Shift
According the above example T11 t P6
g T5,8 atca P1,3 tca n20 m 12
T5,10P0,5 T11?P6 Ts T5a
5
Let a,b and c be variables. When the mismatch
occurrs, we use u to denote the portion of the
pattern, that is P0,i-1, which is front of the
mismatch character, that is Pi. There is a
substring which is equal to u of pattern P in the
text T. There are some the prefixes of u which
is equal to the suffixes of u. We use vp to
denote the longest prefix of u which is equal
some of suffix of u, that is vs. We can say that
vp is the border of u. Consider an attempt at
a left position j on T, that is when the window
is positioned on the substring of the text
Tj,jm-1. Assume that the first mismatch occurs
between Pi and Tij (a?b) with 0 ? i ? m-1.
Then, P0,i-1 Tj,ij-1 u and b Pi ? Tija.
j
jm-1
ij
n-1
0
T
X
P
m-1
i
0
0
x
i
m-1
vp
vs
P
u
border
i-1-x
i-1
6
Difference between MP and KMP algorithm
In the same situation, the difference between MP
algorithm and KMP algorithm is KMP algorithm more
than one step in the shifting.
Example
  • MP algorithm

T
u
X
P
KMP algorithm
T
u
X
P
7
A P(0) P(i) B P(0, j) is a suffix of P(0,
i-1) C P(j1) P(i)
P(i) 1
P(i) ?-1
8
A case where prefix(i) 0
i
0
P
c
v
v
c
b
b
There is a suffix of P(0, i) equal to a prefix of
P and Pi ? P0
P a b c c c a b c c c c a
0 1 2 3 4 5 6 7 8 9 10 11
0
Prefix function of P
In this case, we move P i steps.
9
A full example
P a b c c a b c b a b c c a
-1 0 0 0 -1 0 0 3 -1 0 0 0 -1
0 1 2 3 4 5 6 7 8 9 10 11 12
special
10
Example
T
?.
P
T
?.
P
11
T
?.
P
exact match
?.
T
P
exact match
12
T
?.
P
exact match
?.
T
P
exact match
13
T
?.
P
exact match
exact match
14
Time Complexity
Preprocessing phase in O(m) space and time
complexity. Searching phase in O(nm) time
complexity.
15
  • The prefix function used in KMP method can be
    obtained by modifying the prefix function
    algorithm used in the MP method by paying
    attention to the conditions where prefix(i)0 and
    -1.

16
References
  • AHO, A.V., 1990, Algorithms for finding patterns
    in strings. in Handbook of Theoretical Computer
    Science, Volume A, Algorithms and complexity, J.
    van Leeuwen ed., Chapter 5, pp 255-300, Elsevier,
    Amsterdam.
  • AOE, J.-I., 1994, Computer algorithms string
    pattern matching strategies, IEEE Computer
    Society Press.
  • BAASE, S., VAN GELDER, A., 1999, Computer
    Algorithms Introduction to Design and Analysis,
    3rd Edition, Chapter 11, pp. ??-??,
    Addison-Wesley Publishing Company.
  • BAEZA-YATES R., NAVARRO G., RIBEIRO-NETO B.,
    1999, Indexing and Searching, in Modern
    Information Retrieval, Chapter 8, pp 191-228,
    Addison-Wesley.
  • BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992,
    Éléments d'algorithmique, Chapter 10, pp 337-377,
    Masson, Paris.
  • CORMEN, T.H., LEISERSON, C.E., RIVEST, R.L.,
    1990. Introduction to Algorithms, Chapter 34, pp
    853-885, MIT Press.
  • CROCHEMORE, M., 1997. Off-line serial exact
    string searching, in Pattern Matching Algorithms,
    ed. A. Apostolico and Z. Galil, Chapter 1, pp
    1-53, Oxford University Press.
  • CROCHEMORE, M., HANCART, C., 1999, Pattern
    Matching in Strings, in Algorithms and Theory of
    Computation Handbook, M.J. Atallah ed., Chapter
    11, pp 11-1--11-28, CRC Press Inc., Boca Raton,
    FL.
  • CROCHEMORE, M., LECROQ, T., 1996, Pattern
    matching and text compression algorithms, in CRC
    Computer Science and Engineering Handbook, A.
    Tucker ed., Chapter 8, pp 162-202, CRC Press
    Inc., Boca Raton, FL.
  • CROCHEMORE, M., RYTTER, W., 1994, Text
    Algorithms, Oxford University Press.
  • GONNET, G.H., BAEZA-YATES, R.A., 1991. Handbook
    of Algorithms and Data Structures in Pascal and
    C, 2nd Edition, Chapter 7, pp. 251-288,
    Addison-Wesley Publishing Company.

17
References
  • GOODRICH, M.T., TAMASSIA, R., 1998, Data
    Structures and Algorithms in JAVA, Chapter 11, pp
    441-467, John Wiley Sons.
  • GUSFIELD, D., 1997, Algorithms on strings, trees,
    and sequences Computer Science and Computational
    Biology, Cambridge University Press.
  • HANCART, C., 1992, Une analyse en moyenne de
    l'algorithme de Morris et Pratt et de ses
    raffinements, in Théorie des Automates et
    Applications, Actes des 2e Journées
    Franco-Belges, D. Krob ed., Rouen, France, 1991,
    PUR 176, Rouen, France, 99-110.
  • HANCART, C., 1993. Analyse exacte et en moyenne
    d'algorithmes de recherche d'un motif dans un
    texte, Ph. D. Thesis, University Paris 7, France.
  • KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R., 1977,
    Fast pattern matching in strings, SIAM Journal on
    Computing 6(1)323-350.
  • SEDGEWICK, R., 1988, Algorithms, Chapter 19, pp.
    277-292, Addison-Wesley Publishing Company.
  • SEDGEWICK, R., 1988, Algorithms in C, Chapter 19,
    Addison-Wesley Publishing Company.
  • SEDGEWICK, R., FLAJOLET, P., 1996, An
    Introduction to the Analysis of Algorithms,
    Chapter ?, pp. ??-??, Addison-Wesley Publishing
    Company.
  • STEPHEN, G.A., 1994, String Searching Algorithms,
    World Scientific.
  • WATSON, B.W., 1995, Taxonomies and Toolkits of
    Regular Language Algorithms, Ph. D. Thesis,
    Eindhoven University of Technology, The
    Netherlands.
  • WIRTH, N., 1986, Algorithms Data Structures,
    Chapter 1, pp. 17-72, Prentice-Hall.

18
  • Thank You!

19
  • To implement the KMP Algorithm, we only have to
    modify the prefix function of the MP function.
  • That is, if the following conditions is
    satisfied, Prefix (i) -1.

i
0
P
b
b
Before b, there is no suffix of P(0, i-1) equal
to any prefix of P and PiP0. Whenever this
occurs, we move P i-(-1) i1 steps (the
largest step).
P a b c a c a b a c c c a
0 1 2 3 4 5 6 7 8 9 10 11
-1
20
Another case where prefix(i) -1
i
0
P
b
v
v
b
b
b
There is a suffix of P(0, i) equal to a prefix of
P and Pi P0
P a b a c c a b a c c c a
0 1 2 3 4 5 6 7 8 9 10 11
-1
Prefix function of P
In this case, we again move P i1 steps.
Write a Comment
User Comments (0)
About PowerShow.com