Parallel%20String%20Matching%20Algorithm(s)%20Using%20Associative%20Processors PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Parallel%20String%20Matching%20Algorithm(s)%20Using%20Associative%20Processors


1
Parallel String Matching Algorithm(s) Using
Associative Processors
  • Original work by
  • Mary Esenwein and Dr. Johnnie Baker
  • Presented by Shannon Steinfadt
  • April 18, 2007

2
String Matching Problem
  • Aka. pattern matching or string searching
  • Useful in many applications such as text editing
    and information retrieval, DNA analysis, Homeland
    Security

3
What are we doing?
  • Given a pattern and some text, find out if the
    pattern is IN the text
  • Is pattern AB in the text ABAA? If so, where?

AB
ABAA
4
Whats the notation?
  • P is a pattern string of length m
  • T is a text string of length n, usually n m

5
Goal of String Matching
  • To find all occurrences of a pattern string in
    the text string
  • Locate all positions i in T such that Tij-1
    Pj for all j, 1 j m

6
Pattern Variations
  • An exact pattern
  • A Dont Care character () in pattern
  • Flexibility in matching
  • indicates character(s) of the text that are
    irrelevant to the matching process

7
General Dont Care Characters ()
Characteristics
  • Single character of text
  • Multiple consecutive text characters
  • No characters
  • Combination of above three
  • Example
  • Pattern ABCD could match ABBCD, ABBBBBCD, or
    ABCD ( is null)

8
String Matching using ASC
  • Three parallel algorithms using associative
    computing (using 1-D mesh)
  • String matching for exact match
  • String matching with fixed length dont care
  • I.e., exactly 1 character
  • String matching with variable length dont care
  • a dont care can have any length or be null

9
ASC Exact Match Algorithm
  • for (j patt_length - 1 j gt 0 j--)
  • Responders are text patt_stringj
  • and counter patt_counter
  • Responders add 1 to counter and store
    result in counter of preceding cell
  • patt_counter
  • / When pattern has been processed /
  • Responders are counter patt_length
  • Responders set match 1 in next cell

10
Text
Match
Counter
_at_ 0 0
A 0 0
B 0 0
B 0 0
B 0 0
A 0 0
B 0 0
B 0 0
B 0 0
A 0 0
B 0 0
A 0 0
Pattern BBA Text ABBBABBBABA mpattern
length ntext length j pattern index i text
index
Pattern BBA
patt_ counter
0
patt_length
3
11
(No Transcript)
12
Text Match Counter
Final State of Exact Match Algorithm
_at_ 0 1
A 0 0
B 0 3
B 1 2
B 0 1
A 0 0
B 0 3
B 1 2
B 0 1
A 0 2
B 0 1
A 0 0
Pattern BBA Text ABBBABBBABA m pattern
length n text length j pattern index i text
index
B
B
A
1
0
0
1
0
0
B
B
A
13
Algorithm for unit length "don't cares" using ASC
  • for (j patt_length - 1 j gt 0 j--)
  • if (patternj '')
  • Responders are counter patt_counter
  • else // patternj is not the dont care
    character
  • Responders are text patternj
  • and counter patt_counter
  • If no Responders are detected, exit
  • Responders add 1 to counter and store result
    in counter of preceding cell
  • patt_counter
  • / When pattern has been processed /
  • Responders are counter patt_length
  • Responders set match 1 in next cell

14
ASC Exact Match Algorithm (again)
  • for (j patt_length - 1 j gt 0 j--)
  • Responders are text patt_stringj
  • and counter patt_counter
  • Responders add 1 to counter and store
    result in counter of preceding cell
  • patt_counter
  • / When pattern has been processed /
  • Responders are counter patt_length
  • Responders set match 1 in next cell

15
Text
Match
Counter
_at_ 0 0
A 0 0
B 0 0
B 0 0
B 0 0
A 0 0
B 0 0
B 0 0
B 0 0
A 0 0
B 0 0
A 0 0
Pattern BBA Text ABBBABBBABA mpattern
length ntext length j pattern index i text
index
Pattern BA
patt_ counter
0
patt_length
3
16
(No Transcript)
17
Text Match Counter
Final State of Exact Match Algorithm
_at_ 0 1
A 0 0
B 0 3
B 1 2
B 0 1
A 0 0
B 0 3
B 1 2
B 0 1
A 0 2
B 0 1
A 0 0
Pattern BA Text ABBBABBBABA m pattern
length n text length j pattern index i text
index
B
B
A
1
0
0
1
0
0
B
B
A
18
VLDC Algorithm (added)
  • Works on each segment of the pattern broken up
    by the character
  • ABBBA has three sections
  • Consecutive characters not necessary, not
    allowed
  • This VLDC algorithm unique
  • Provides information to find all continuation
    points of all matches following each

19
VLDC ALGORITHM USING ASC
  • int patt_length m
  • int maxcell n 2
  • / Special handling for at end of pattern /
  • if (patternm-1 )
  • Responders are cell index gt 1
  • Responders set segment0 1
  • patt_counter 1
  • k 1 / Reset initial segment index /
  • while ((patt_length - patt_counter) gt 0
    maxcell gt 0)
  • patt_counter 0
  • for ( I patt_length - 1 Igt 0 patternI
    ! I--)
  • Responders are text patternI and counter
    patt_counter and cell index lt maxcell
  • Responders add 1 to counter and store result
    in counter of preceding cell
  • patt_counter

20
VLDC continued
  • Responders set segmentk patt_counter in next
    cell
  • Responders are segmentk gt 0
  • maxcell maximum cell index value of Responders
  • else if no Responders maxcell 0
  • All cells become Responders and set counter
    0
  • patt_counter k
  • / When pattern has been processed /
  • Responders are segment--k gt 0
  • Responders set match 1
  • / Special handling for at start of pattern
    /
  • if (pattern0 )
  • Responders are cell index lt maxcell and cell
    index gt 1
  • Responders set match 1

21
After third pattern segment in VLDC Algorithm
Pattern ABBBA Text ABBBABBBABA
T
M
C
S0
S1
S2
Responder
_at_ 0 0 ?1?0 0 0 0 Y ? N
A 0 0 0?1 0 0 Y
B 0 0 0 0 0
B 0 0 0 0 0
B 0 0 ?1?0 0 0 0 Y ? N
A 0 0 0?1 0 0 Y
B 0 0 0 0 0
B 0 0 0 0 0
B 0 0 ?1?0 0 0 0 Y ? N
A 0 0 0?1 0 0 Y
B 0 0 ?1?0 0 0 0 Y ? N
A 0 0 0?1 0 0 Y
1
2
Patt_counter
3
4
0?1 ?2
5
6
7
Maxcell
8
13?12
9
10
11
12
22
After second pattern segment in VLDC Algorithm
Pattern ABBBA Text ABBBABBBABA
T
M
Counter
S0
S1
S2
Responder
_at_ 0 0 0 0 0
A 0 0 ?1 ?2 ?0 1 0 0 Y
B 0 0 ?1 ?2 ?0 0 0 ?2 0 Y ?Y ?Y
B 0 0 ?1 ?0 0 0 ?2 0 Y ?Y ?N
B 0 0 0 0 0 Y ?N
A 0 0 ?1 ?2 ?0 1 0 0 Y
B 0 0 ?1 ?2 ?0 0 0 ?2 0 Y ?Y ?Y
B 0 0 ?1 ?0 0 0 ?2 0 Y ?Y ?N
B 0 0 0 0 0 Y ?N
A 0 0 ?1?0 1 0 0
B 0 0 0 0 0 Y ?N
A 0 0 1 0 0
1
2
Patt_counter
3
4
0?1?2 0?1?2?3
5
6
7
Maxcell
8
13?12 ?8
9
10
11
(Used to keep pattern segments in order, I.e. AB
occurs before BB)
12
23
After first pattern segment in VLDC Algorithm
Pattern ABBBA Text ABBBABBBABA
T
M
Counter
S0
S1
S2
Responder
_at_ 0 0 ?2 ?0 0 0 0 Y
A 0 0 ?1 ?0 1 0 0?2 Y ? N
B 0 0 ?1 ?0 0 2 0 Y ? N
B 0 0 ?1 ?0 0 2 0 Y ? N
B 0 0 ?2 ?0 0 0 0 Y ? N ? Y
A 0 0 ?1 ?0 1 0 0?2 Y ? N
B 0 0 0 2 0 Y ? N
B 0 0 0 2 0
B 0 0 0 0 0
A 0 0 1 0 0
B 0 0 0 0 0
A 0 0 1 0 0
1
2
Patt_counter
3
4
0?1?2 0?1?2?3 0?1?2?3
5
6
7
Maxcell
8
13?12 ?8 ?6
9
10
11
(Used to keep pattern segments in order, I.e. AB
occurs before BB)
12
24
Final State in VLDC Algorithm
Pattern ABBBA Text ABBBABBBABA
T
M
Counter
S0
S1
S2
Responder
_at_ 0 0 0 0 0
A 1 0 1 0 2 Y
B 0 0 0 2 0
B 0 0 0 2 0
B 0 0 0 0 0
A 1 0 1 0 2 Y
B 0 0 0 2 0
B 0 0 0 2 0
B 0 0 0 0 0
A 0 0 1 0 0
B 0 0 0 0 0
A 0 0 1 0 0
1
2
Patt_counter
3
4
0?1?2 0?1?2?3 0?1?2?3
5
6
7
Maxcell
8
13?12 ?8 ?6
9
10
11
(Used to keep pattern segments in order, I.e. AB
occurs before BB)
12
25
Finding All Continuation Points
  • Match starts where M 1
  • Match to any pattern segment begins where Sx
    segment length
  • i.e. where any Sx gt 0
  • Continuation of match in Sx-1 whose cell/PE
    index is gt (Sx segment size) of Sxs
    cell/PE index

26
Using the Final State in VLDC Algorithm
Pattern ABBBA Text ABBBABBBABA
S0
S1
S2
T
M
C
_at_ 0 0 0 0 0
A 1 0 1 0 2
B 0 0 0 2 0
B 0 0 0 2 0
B 0 0 0 0 0
A 1 0 1 0 2
B 0 0 0 2 0
B 0 0 0 2 0
B 0 0 0 0 0
A 0 0 1 0 0
B 0 0 0 0 0
A 0 0 1 0 0
  • Start with index 2, where theres a match M1
  • Work from S2 down and left, count down 2 values
    and move into S1, count down 2 values and move
    to S0
  • That produces 2?4?6 ABBBA
  • Any index gt 4 in S1 whose value is gt0 will
    also produce a correct match
  • 2?7?10 ABBBABBBA
  • 2?8?10 ABBBABBBA
  • Some of the additional matches are
  • 2?4?10 ABBBABBBA
  • 2?4?12 ABBBABBBABA
  • 2?8?12 ABBBABBBABA
  • 6?8?10 ABBBA
  • 6?8?12 ABBBABA

1
2
3
4
5
6
7
8
9
10
11
12
27
Existing Algorithms
  • Sequential Algorithms
  • Naïve algorithm O(mn)
  • Knuth, Morris, Pratt, or Boyer-Moore O(mn)
  • Parallel Algorithms
  • A PRAM exact string matching O(n)
  • On a reconfigurable mesh O(1) on n(n-m1) PEs
  • On a SIMD hypercube (limited to 0,1) O(lg n)
    on n/lg n PEs
  • On a neural network O(1) on nm PEs
  • ASC algorithms O(m) time on O(n) PEs

28
Question to consider
  • The dont care character allows non-matching
    for an arbitrary length. This is discussed on
    slide 13. Instead, consider to allow a
    non-match for two characters and make necessary
    changes in trace in Slide 15-16.
Write a Comment
User Comments (0)
About PowerShow.com