Title: Parallel%20String%20Matching%20Algorithm(s)%20Using%20Associative%20Processors
1Parallel String Matching Algorithm(s) Using
Associative Processors
- Original work by
- Mary Esenwein and Dr. Johnnie Baker
-
- Presented by Shannon Steinfadt
- April 18, 2007
2String Matching Problem
- Aka. pattern matching or string searching
- Useful in many applications such as text editing
and information retrieval, DNA analysis, Homeland
Security
3What are we doing?
- Given a pattern and some text, find out if the
pattern is IN the text - Is pattern AB in the text ABAA? If so, where?
AB
ABAA
4Whats the notation?
- P is a pattern string of length m
- T is a text string of length n, usually n m
5Goal of String Matching
- To find all occurrences of a pattern string in
the text string - Locate all positions i in T such that Tij-1
Pj for all j, 1 j m
6Pattern Variations
- An exact pattern
- A Dont Care character () in pattern
- Flexibility in matching
- indicates character(s) of the text that are
irrelevant to the matching process
7General Dont Care Characters ()
Characteristics
- Single character of text
- Multiple consecutive text characters
- No characters
- Combination of above three
- Example
- Pattern ABCD could match ABBCD, ABBBBBCD, or
ABCD ( is null)
8String Matching using ASC
- Three parallel algorithms using associative
computing (using 1-D mesh) - String matching for exact match
- String matching with fixed length dont care
- I.e., exactly 1 character
- String matching with variable length dont care
- a dont care can have any length or be null
9ASC Exact Match Algorithm
- for (j patt_length - 1 j gt 0 j--)
-
- Responders are text patt_stringj
- and counter patt_counter
- Responders add 1 to counter and store
result in counter of preceding cell - patt_counter
-
- / When pattern has been processed /
- Responders are counter patt_length
- Responders set match 1 in next cell
10Text
Match
Counter
_at_ 0 0
A 0 0
B 0 0
B 0 0
B 0 0
A 0 0
B 0 0
B 0 0
B 0 0
A 0 0
B 0 0
A 0 0
Pattern BBA Text ABBBABBBABA mpattern
length ntext length j pattern index i text
index
Pattern BBA
patt_ counter
0
patt_length
3
11(No Transcript)
12Text Match Counter
Final State of Exact Match Algorithm
_at_ 0 1
A 0 0
B 0 3
B 1 2
B 0 1
A 0 0
B 0 3
B 1 2
B 0 1
A 0 2
B 0 1
A 0 0
Pattern BBA Text ABBBABBBABA m pattern
length n text length j pattern index i text
index
B
B
A
1
0
0
1
0
0
B
B
A
13Algorithm for unit length "don't cares" using ASC
- for (j patt_length - 1 j gt 0 j--)
-
- if (patternj '')
- Responders are counter patt_counter
- else // patternj is not the dont care
character - Responders are text patternj
- and counter patt_counter
-
- If no Responders are detected, exit
- Responders add 1 to counter and store result
in counter of preceding cell - patt_counter
-
- / When pattern has been processed /
- Responders are counter patt_length
- Responders set match 1 in next cell
14ASC Exact Match Algorithm (again)
- for (j patt_length - 1 j gt 0 j--)
-
- Responders are text patt_stringj
- and counter patt_counter
- Responders add 1 to counter and store
result in counter of preceding cell - patt_counter
-
- / When pattern has been processed /
- Responders are counter patt_length
- Responders set match 1 in next cell
15Text
Match
Counter
_at_ 0 0
A 0 0
B 0 0
B 0 0
B 0 0
A 0 0
B 0 0
B 0 0
B 0 0
A 0 0
B 0 0
A 0 0
Pattern BBA Text ABBBABBBABA mpattern
length ntext length j pattern index i text
index
Pattern BA
patt_ counter
0
patt_length
3
16(No Transcript)
17Text Match Counter
Final State of Exact Match Algorithm
_at_ 0 1
A 0 0
B 0 3
B 1 2
B 0 1
A 0 0
B 0 3
B 1 2
B 0 1
A 0 2
B 0 1
A 0 0
Pattern BA Text ABBBABBBABA m pattern
length n text length j pattern index i text
index
B
B
A
1
0
0
1
0
0
B
B
A
18VLDC Algorithm (added)
- Works on each segment of the pattern broken up
by the character - ABBBA has three sections
- Consecutive characters not necessary, not
allowed - This VLDC algorithm unique
- Provides information to find all continuation
points of all matches following each
19VLDC ALGORITHM USING ASC
- int patt_length m
- int maxcell n 2
- / Special handling for at end of pattern /
- if (patternm-1 )
-
- Responders are cell index gt 1
- Responders set segment0 1
- patt_counter 1
- k 1 / Reset initial segment index /
-
- while ((patt_length - patt_counter) gt 0
maxcell gt 0) -
- patt_counter 0
- for ( I patt_length - 1 Igt 0 patternI
! I--) -
- Responders are text patternI and counter
patt_counter and cell index lt maxcell - Responders add 1 to counter and store result
in counter of preceding cell - patt_counter
-
20VLDC continued
- Responders set segmentk patt_counter in next
cell - Responders are segmentk gt 0
- maxcell maximum cell index value of Responders
- else if no Responders maxcell 0
- All cells become Responders and set counter
0 - patt_counter k
- / When pattern has been processed /
- Responders are segment--k gt 0
- Responders set match 1
-
- / Special handling for at start of pattern
/ - if (pattern0 )
-
- Responders are cell index lt maxcell and cell
index gt 1 - Responders set match 1
-
-
21After third pattern segment in VLDC Algorithm
Pattern ABBBA Text ABBBABBBABA
T
M
C
S0
S1
S2
Responder
_at_ 0 0 ?1?0 0 0 0 Y ? N
A 0 0 0?1 0 0 Y
B 0 0 0 0 0
B 0 0 0 0 0
B 0 0 ?1?0 0 0 0 Y ? N
A 0 0 0?1 0 0 Y
B 0 0 0 0 0
B 0 0 0 0 0
B 0 0 ?1?0 0 0 0 Y ? N
A 0 0 0?1 0 0 Y
B 0 0 ?1?0 0 0 0 Y ? N
A 0 0 0?1 0 0 Y
1
2
Patt_counter
3
4
0?1 ?2
5
6
7
Maxcell
8
13?12
9
10
11
12
22After second pattern segment in VLDC Algorithm
Pattern ABBBA Text ABBBABBBABA
T
M
Counter
S0
S1
S2
Responder
_at_ 0 0 0 0 0
A 0 0 ?1 ?2 ?0 1 0 0 Y
B 0 0 ?1 ?2 ?0 0 0 ?2 0 Y ?Y ?Y
B 0 0 ?1 ?0 0 0 ?2 0 Y ?Y ?N
B 0 0 0 0 0 Y ?N
A 0 0 ?1 ?2 ?0 1 0 0 Y
B 0 0 ?1 ?2 ?0 0 0 ?2 0 Y ?Y ?Y
B 0 0 ?1 ?0 0 0 ?2 0 Y ?Y ?N
B 0 0 0 0 0 Y ?N
A 0 0 ?1?0 1 0 0
B 0 0 0 0 0 Y ?N
A 0 0 1 0 0
1
2
Patt_counter
3
4
0?1?2 0?1?2?3
5
6
7
Maxcell
8
13?12 ?8
9
10
11
(Used to keep pattern segments in order, I.e. AB
occurs before BB)
12
23After first pattern segment in VLDC Algorithm
Pattern ABBBA Text ABBBABBBABA
T
M
Counter
S0
S1
S2
Responder
_at_ 0 0 ?2 ?0 0 0 0 Y
A 0 0 ?1 ?0 1 0 0?2 Y ? N
B 0 0 ?1 ?0 0 2 0 Y ? N
B 0 0 ?1 ?0 0 2 0 Y ? N
B 0 0 ?2 ?0 0 0 0 Y ? N ? Y
A 0 0 ?1 ?0 1 0 0?2 Y ? N
B 0 0 0 2 0 Y ? N
B 0 0 0 2 0
B 0 0 0 0 0
A 0 0 1 0 0
B 0 0 0 0 0
A 0 0 1 0 0
1
2
Patt_counter
3
4
0?1?2 0?1?2?3 0?1?2?3
5
6
7
Maxcell
8
13?12 ?8 ?6
9
10
11
(Used to keep pattern segments in order, I.e. AB
occurs before BB)
12
24Final State in VLDC Algorithm
Pattern ABBBA Text ABBBABBBABA
T
M
Counter
S0
S1
S2
Responder
_at_ 0 0 0 0 0
A 1 0 1 0 2 Y
B 0 0 0 2 0
B 0 0 0 2 0
B 0 0 0 0 0
A 1 0 1 0 2 Y
B 0 0 0 2 0
B 0 0 0 2 0
B 0 0 0 0 0
A 0 0 1 0 0
B 0 0 0 0 0
A 0 0 1 0 0
1
2
Patt_counter
3
4
0?1?2 0?1?2?3 0?1?2?3
5
6
7
Maxcell
8
13?12 ?8 ?6
9
10
11
(Used to keep pattern segments in order, I.e. AB
occurs before BB)
12
25Finding All Continuation Points
- Match starts where M 1
- Match to any pattern segment begins where Sx
segment length - i.e. where any Sx gt 0
- Continuation of match in Sx-1 whose cell/PE
index is gt (Sx segment size) of Sxs
cell/PE index
26Using the Final State in VLDC Algorithm
Pattern ABBBA Text ABBBABBBABA
S0
S1
S2
T
M
C
_at_ 0 0 0 0 0
A 1 0 1 0 2
B 0 0 0 2 0
B 0 0 0 2 0
B 0 0 0 0 0
A 1 0 1 0 2
B 0 0 0 2 0
B 0 0 0 2 0
B 0 0 0 0 0
A 0 0 1 0 0
B 0 0 0 0 0
A 0 0 1 0 0
- Start with index 2, where theres a match M1
- Work from S2 down and left, count down 2 values
and move into S1, count down 2 values and move
to S0 - That produces 2?4?6 ABBBA
- Any index gt 4 in S1 whose value is gt0 will
also produce a correct match - 2?7?10 ABBBABBBA
- 2?8?10 ABBBABBBA
- Some of the additional matches are
- 2?4?10 ABBBABBBA
- 2?4?12 ABBBABBBABA
- 2?8?12 ABBBABBBABA
- 6?8?10 ABBBA
- 6?8?12 ABBBABA
1
2
3
4
5
6
7
8
9
10
11
12
27 Existing Algorithms
- Sequential Algorithms
- Naïve algorithm O(mn)
- Knuth, Morris, Pratt, or Boyer-Moore O(mn)
- Parallel Algorithms
- A PRAM exact string matching O(n)
- On a reconfigurable mesh O(1) on n(n-m1) PEs
- On a SIMD hypercube (limited to 0,1) O(lg n)
on n/lg n PEs - On a neural network O(1) on nm PEs
- ASC algorithms O(m) time on O(n) PEs
28 Question to consider
- The dont care character allows non-matching
for an arbitrary length. This is discussed on
slide 13. Instead, consider to allow a
non-match for two characters and make necessary
changes in trace in Slide 15-16.