Knuth-Morris-Pratt Algorithm - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Knuth-Morris-Pratt Algorithm

Description:

Knuth-Morris-Pratt Algorithm. left to right scan like the na ve algorithm. one main improvement ... For each position i in pattern P, define spi(P) to be the ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 11
Provided by: erict9
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Knuth-Morris-Pratt Algorithm


1
Knuth-Morris-Pratt Algorithm
  • left to right scan like the naïve algorithm
  • one main improvement
  • on a mismatch, calculate maximum possible shift
    to the right for the pattern

2
Basic Idea
  • Definition
  • For each position i in pattern P, define spi(P)
    to be the length of the longest proper suffix of
    P1..i that matches a prefix of P
  • Define spi(P) to have the added condition that
    P(i1) is not equal to P(spi(P) 1)
  • may denote as spi and spi when P is clear from
    context
  • Usage
  • mismatch occurs between P(i1) and T(k)
  • Shift P to the right so that P(spi1) aligns
    with T(k)
  • shift P i-spi spaces total
  • If P is found, shift by n - spn places

3
Illustration of sp and sp
  • 0 1
  • 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
  • a b c d a b c e a b c d a b c e f

spi 0 0 0 0 1 2 3 0 1 2 3 4 5 6 7 8 0
spi0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 8 0
4
Illustration 1 of KMP shift
  • 0 1
  • 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
  • x y a b c a b d a b c f q f e a b

a b c a b d a b d
a b c a b d a b d
5
Illustration 2 of KMP shift
  • 0 1
  • 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
  • x y a b x a b d a b c f q f e a b

a b x a b d a b d
a b x a b d a b d
6
spi and Z-boxes
  • Definitions
  • Position j gt 1 maps to i if i is the right end of
    a Z-box that starts at j
  • Note, i j Zj-1 in this case
  • Observation
  • For any i gt 1, spi 0 if no j maps to i
  • Otherwise, spi maxj maps to i Zj
  • Choosing the smallest j that maps to i leads to
    the maximum possible Zj value

7
Z-based computation of spi
  • for (i1iltni)
  • spi 0
  • for (jn jgt2 j--)
  • i jZj-1
  • spi Zj

8
Observations
  • Original KMP defined in terms of failure
    functions F(i) and F(i)
  • F(i) spi-1 and F(i) spi-1 for i 1 to n1
  • 2m upper bound on number of comparisons
  • once a position in T matches, it is never
    compared again to any position in P
  • there may be cases where positions in T that
    mismatch are compared against multiple positions
    in P, but this can happen at most m times total
  • Full implementation of KMP is on page 27

9
FSA KMP algorithm
  • Definition
  • For each position i in pattern P and each
    character x in S, define sp(i,x) (P) to be the
    length of the longest proper suffix of P1..i
    that matches a prefix of P and P(spi1) x
  • Observation
  • Now each position in T will be compared exactly
    once, even on a mismatch

10
Z-based computation of sp(i,x)
  • for (i1iltni)
  • for (all x in S)
  • sp(i,x) 0
  • for (jn jgt2 j--)
  • i jZj-1
  • x P(Zj1)
  • sp(i,x) Zj
Write a Comment
User Comments (0)
About PowerShow.com