Title: Strings and Pattern Matching Algorithms
1Strings and Pattern Matching Algorithms
Pattern P0..m-1 Text T0..n-1
Brute Force Pattern Matching
Algorithm BruteForceMatch(T,P) Input Strings
T with n characters and P with m characters
Output String index of the first substring of T
matching P, or an indication
that P is not a substring of T for i0 to n-m
do //for each candidate index in T do //
j0 while (jltm and TijPj) do
jj1 if jm then return i
return there is no substring of T matching P.
Time complexity O(mn)
2Boyer-Moore Algorithm
Improve the running time of the brute-force
algorithm by adding two potentially time-saving
heuristics Looking-Glass Heuristics When
testing a possible placement of P0..m-1 against
T0..n-1, begin the comparisons from the end of
P and move backward to the front of P.
Character-Jump Heuristic Suppose that Ti does
not match Pj and Tic. If c is not contained
anywhere in P, then shift P completely past Ti,
otherwise, shift P until an occurrence of
character c in P gets aligned with Ti.
last(c) if c is in P, last(c) is the index of
the last (rightmost) occurrence of c in P.
Otherwise, define last(c)1.
Compute-Last-Occurrence(P,m,S) for each
character c in S do last(c) -1 for j 0 to
m-1 do last(Pj) j
Time complexity O(m S)
Example P0..5 abacab
3Algorithm BMMatch(T,P) Input Strings T with
n characters and P with m characters Output
String index of the first substring of T matching
P, or an indication that P is
not a substring of T Compute-Last-Occurrence(P,
m,S) i m-1 j m-1 repeat
if Pj Ti then if j0 then
return i //a match!//
else i i-1 j
j-1 else i i(m-1)-min(j-1,
last(Ti)) //jump step//
j m-1 until igtn-1 return
there is no substring of T matching P.
m-j
m-j-1
m-last(Ti)-1
ab
Time complexity( worst case) O(nm S) Example
Taaaaaaaa, Pbaaa Usually it runs much faster.
4Knuth-Morris-Pratt Algorithm
b a c b a b a b a a a b c b a b
T
a b a b a c a
P
a b a b a c a
P
In general
T xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxx
5k index of the last character in the prefix
Example
i 1 2 3 4 5 6 7 8 9 10
Pi a b a b a b a b c a
pre(i) 0 0 1 2 3 4 5 6 0 1
Time complexity O(m)
6 Algorithm KMPMatch(T,P) Input Strings
T1..n with n characters and P1..m with m
characters Output String index of the first
substring of T matching P, or an
indication that P is not a substring of T
pre KMPPrefixFunction(P) j0 for
i 1 to n do while jgt0 and Pj1 ?
Ti do j pre(j)
if Pj1 Ti then j j1
if j m then
print Pattern occurs with shift i-m
//a match!//
j pre(j) // look for
the next match//
Time complexity O(mn)
7Assignment (1) How many character comparisons
will be Boyer-Moore algorithm make in searching
for each of the following patterns in the binary
text? Text repeat 01110 20 times Pattern (a)
01111, (b) 01110 Â (2) (i) Compute the prefix
function in KMP pattern match algorithm for
pattern ababbabbabbababbabb when the alphabet is
? a,b. (ii) How many character comparisons
will be KMP pattern match algorithm make in
searching for each of the following patterns in
the binary text? Text repeat 010011 20
times Pattern (a) 010010, (b) 010110