Title: String Matching of Regular Expression
1String Matching of Regular Expression
2Introduction
- Regular Expression (RE)
- A generalized string description with
- Basic string
- Kleene star ()
- Concatenation
- Union ()
- Nondeterministic Finite Automata (NFA)
- More then one next transition
- RE to NFA require m state
- Deterministic Finite Automata (DFA)
- Only one next transition
- RE to DFA may 2m state
- Using (m1)(2m1S) bits
3RE to NFA Construction
- Thompsons construction
- Produce up to 2m states
- Not null-free NFA
- Using (m)(2m1S) bits
- Glushkovs construction
- Produce exactly m1 states
- null-free NFA
- Using (m1)(2m1S) bits
4Thompsons Construction
5Thompsons Construction
Example
6Glushkov Construction
- RE ((ATGA((AGAAA)))
- Marked RE (A1T2G3A4((A5G6A7A8A9)))
- Used in Glushkov construction
- First(RE)
- The set of positions at which the reading can
start. - Ex First (A1T2G3A4((A5G6A7A8A9))) 1 ,3 .
- Last(RE)
- The set of positions at which a string read can
be recognized. - Ex Last (A1T2G3A4((A5G6A7A8A9)))2 ,4 ,6 ,9
. - Follow(RE,x)
- All the positions in RE accessible from x
- Ex Follow ((A1T2G3A4((A5G6A7A8A9))),6)
7,5. - EmptyRE is e if e belongs to L(RE) and Ø
otherwise.
7Glushkov Construction
- Initial set of m1 states
- Marked final states, use Last (RE)
- Create transition link by Follow (RE,x)
RE (A1T2G3A4((A5G6A7A8A9)))
8Bit Parallel Automata
- Ex Shift-And
- Automata
- Update Function
State Mask
Occurrence Table
9Thompson BPA
Notation D State mask E null-closure of D B
Precomute Table S string length Tj current char
null-closure, reachable state from D with null
input
B Table bit mask of the state reachable by each
letter
Alphabet
S
m1
Pattern
10Glushkov BPA
Notation D State mask TD Follow of D B
Build by Glushkov Tj current char
B Table bit mask of the state reachable by each
letter
T Table Which states can be reached from an
active state
Alphabet
Active states
S
D2m1
m1
m1
Pattern
States
D
11Glushkov Search Algorithm
Build B Table
12Glushkov Search Algorithm
Build T Table
Initial to zero
Active states
D2m1
m1
States
13Glushkov Search Algorithm
Compute First, Last, Follow and Empty
14Performance Comparison
Forward Algorithm DFA
Glushkov with BuildT
Thompson s Construction
Glushkov with BuildTree
Test Pattern
Preprocessing time
Searching time
15Reference
- G. Navarro and M. Raffinot. Compact DFA
representation for fast regular expression search
. In Proceedings of the 5th Workshop on Algorithm
Engineering , number 2141 in Lecture Notes in
Computer Science, pages 1-12, 2001.