String Matching - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

String Matching

Description:

DNA is a one-dimensional (1-D) string of characters A's, G's, C's, T's. ... more DNA strings for similarities. Reconstructing DNA strings from overlapping ... – PowerPoint PPT presentation

Number of Views:240
Avg rating:3.0/5.0
Slides: 12
Provided by: toshi155
Category:
Tags: matching | string

less

Transcript and Presenter's Notes

Title: String Matching


1
String Matching
Input Strings P (pattern) and T (text) P
m, T n.
Output Indices of all occurrences of P in T.
Example
T discombobulate
P output
combo 4 (i.e., with shift 3)
ate 12
later 15 T (no occurrence of P)
2
Applications
Text retrieval
Computational biology
- DNA is a one-dimensional (1-D) string of
characters As, Gs, Cs, Ts.
- All information for 3-D protein folding is
contained in protein sequence itself and
independent of the environment.
Searching for DNA patterns
Comparing two or more DNA strings for similarities
Reconstructing DNA strings from overlapping
fragments.
3
Sliding the Pattern Template
T b i o l o g y P l o g i c
n 7 m 5
b i o l o g y l o g i c
b i o l o g y l o g i c
b i o l o g y l o g i c
T1 ? P1
No match!
b i o l o g y l o g i c
b i o l o g y l o g i c
T4 P1, T5 P2, T6 P3, but T7
? P4
T2 ? P1
b i o l o g y l o g i c
b i o l o g y l o g i c
T3 ? P1
4
Another Example
T b i o l o g i c a l P l o
g i c
n 10 m
5
b i o l o g i c a l l
o g i c
Match found! return 4.
5
The Naive Matcher
Pattern P1..m Text T1..n

Naive-String-Matcher(T, P) // find all
occurrences of P in T. for s 1 to n ? m
1 do if P1 .. m Ts .. sm?1
then print Pattern occurs at index s
T
s sm-1
P
1 m
6
Time Complexity
m(n ? m 1) comparisons (as below) in the worst
case.
m chars
n ? m 1 blocks, each requiring m comparisons
Time complexity is O(mn)!
7
Finite Automaton
A finite automaton consists of
a finite set Q of states a start state a set A
of accepting states a finite input alphabet ? a
transition function d Q ? ? ? Q.
accepting state
start state
8
Accepting a String
input state sequence
accepts?
Yes
aabba
010001
No
bbabb
000100
9
A String Matching Automaton
Pattern P a a b a
Ex.
aba not rescanned due to transition 4?2
T a b b a a a b a a b a
Pattern occurs at indices 5 and 8!
0 1 0 0 1
2 2 3 4
2 3 4
10
Key Ideas of Automaton Matching
Slide pattern forward by more than one position
if possible.
Do not rescan chars of T that have already been
examined.
11
The Automaton Matcher
Finite-Automaton-Matcher(T, d, m) n
lengthT q 0 //
current state for i 1 to n do q d(q,
Ti) // d function precomputed if q m
// match succeeds then print
Pattern occurs at index i ? m1
O(n) if the state transition function d is
available.
Write a Comment
User Comments (0)
About PowerShow.com