Title: Patterns,
1MCS680Foundations Of Computer Science
- Patterns,
- Automata
-
- Regular Expressions
2Introduction
- A pattern is a set of objects with some
recognizable property - Programming language identifiers
- Start with a character, may be followed by zero
or more other characters or numbers - a-zA-Za-zA-Z0-9_
- Problems with patterns
- Definition of patterns
- Recognition of patterns
- Uses for patterns
- Programming language design
- Circuit design
- Text editors
- Searching for words
- Operating system command processors
- dir .exe
3State Machines and Automata
- Programs that search for patterns in data often
have a special structure - Track progress toward overall goal
- Manage states
- Overall behavior of the program can be viewed as
moving from state to state as it reads its input - We can use a graph to represent the behavior of
programs that search for patterns in data - Graph is called an automation
- Special nodes
- Start node
- Accepting nodes (may be more than 1)
- Edges are called transitions
- The input is not accepted if we are not at an
accepting node after all of the input is read
4Example Automation(Finite State Machine)
- Consider an automation to recognize a sequence of
characters that contains the characters aeiou
in order - Let ? (Lambda) be the entire alphabet of
acceptable characters. In this example, a-z and
A-Z - Sigma (?) is also used in many texts to represent
the alphabet
?-a
?-e
?-i
?-o
?-u
?
a
e
i
o
u
0
1
2
3
4
5
5Example Automation(Finite State Machine)
- Consider an automation to recognize a signed
integer - May begin with a ,- or integer value in the
range of 0-9 - Followed by zero or more occurrences of integers
0-9 - Let ? (Lambda) be the entire alphabet of
acceptable characters. In this example, 0-9
?
1
?
?
0
3
?
-
2
6Deterministic and Nondeterministic Automata
- Deterministic Automata
- For any state s and any input x there is at most
one transition out of state s whose label
includes x - Simulating deterministic automata
- Given that we are in state s and the next input
is x - We either transition out of state s or,
- We die at state s
- Easy to convert a deterministic automata into a
program - NonDeterministic Automata
- Nondeterministic automata are allowed (but not
required) to have two or more transitions
containing the same symbol out of the same state - Nondeterministic automata are allowed to have
(e-moves) where we transition out of a state with
no inputs (use an empty string character)
7Nondeterministic AutomataExample
- Consider the language that accepts strings ending
with man - Let ? (Lambda) be the entire alphabet of
acceptable characters. In this example, a-z and
A-Z - There is an error in your books representation
?-m
?-m
m
m
m
a
n
0
1
2
3
DFA
?-a
?-n
?
m
a
n
NFA
0
1
2
3
8Nondeterministic AutomataExample
- Consider the language that accepts zero or more
occurrences of the sub-strings - ab or aba
- L ababa (regular expression)
a
a
b
a
DFA
0
1
2
3
a
b
b
b
4
b
a
b
a
NFAs
0
1
0
1
e
OR
a
b
a
b
a
2
2
9Deterministic and Nondeterministic Automata
- Deterministic Automata are easy to code because
all possible transitions are accounted for - From every possible state - every possible input
must be accounted for - Makes state machine tough to construct
- Nondeterministic automata are simplier to
construct, however they can not be directly coded
due to the non-determinism - Not every possible input needs to be accounted
for at every possible state - Input not accepted if a transition out of a state
is not defined - May use e-moves to move to new states in the
absence of input - Nondeterministic automata can be converted to
deterministic automata by using the subset
construction method
10Subset Construction
- Elimination of the nondeterminisim from an
automata - Given a nondeterministic automata
- Build a new starting state by expanding e-moves
- Start at the starting state
- Build new states based on allowable transitions
out of the starting state - Treat new states as sets of states from the
original NFA - Take the new states (from above) and build more
new states by considering the allowable
transitions out of the state that you started
with - Continue this process until no new states are
developed - Any state ( which is a set of states from the
original NFA) that contains an accepting state in
the NFA is also an accepting state - Construct the resultant DFA
11Subset Construction (Example) - No e-moves
?
m
a
n
0
1
2
3
- Consider the NFA that accepts as input all
strings that end with the substring man - Begin at the starting state (state 0) 0
- From state zero we stay at state 0 for any letter
other than m (0,?-m,0) - We already have state 0
- We go to state 0 or state 1 with the letter m
(0,m,0,1) - We create the new state 0,1
- From state 0,1
- If we get an a we go to state 2 (from state 1)
or state 0 (from state 0) - (0,1, a, 0,2) - A new state
- If we get an m we go to state 0 or state 1
(from state 0) nowhere to go from state 1 - (0,1, m, 0,1) - Already have 0,1
12Subset Construction (Example)
?
m
a
n
0
1
2
3
- Subset construction example continued
- From state 0,1 (cont)
- Anything besides an a or m - go to state 0
(from state 0) nowhere to go from state 1 - (0,1, ?-a-m,0) - Already have 0
- From state 0,2
- If we get an n we go to state 3 (from state 2)
or state 0 (from state 0) - (0,2, n, 0,3) - A new state
- If we get an m we go to state 0 or state 1
(from state 0) nowhere to go from state 2 - (0,2, m, 0,1) - Already have 0,1
- Anything besides an n or m we go to state 0
(from state 0) nowhere to go from state 2 - (0,2, ?-n-m, 0) - Already have 0
13Subset Construction (Example)
?
m
a
n
0
1
2
3
- Subset construction example continued
- From state 0,3
- If we get an m we go to state 0 or state 1
(from state 0) nowhere to go from state 3 - (0,3, m, 0,1) - Already have 0,1
- Anything besides an or m we go to state 0
(from state 0) nowhere to go from state 3 - (0,3, ?-m, 0) - Already have 0
- There are no new state
- Recap on the set of found states
- 0, 0,1, 0,2, 0,3
- State 0 is the starting state
- State 0,3 is the only accepting state
- It is the only state containing an accepting
state from the original NFA
14Subset Construction (Example)
- Now lets recall the transitions that we
discovered - (0,?-m,0), (0,m,0,1), (0,1, a, 0,2),
(0,1, m, 0,1), (0,1, ?-a-m,0), (0,2,
n, 0,3), (0,2, m, 0,1) , (0,2, ?-n-m,
0) , (0,3, m, 0,1) , (0,3, ?-m, 0)
?-m
m
m
m
a
n
0
0,1
0,2
0,3
?-m-a
m
?-m-n
?-m
15Subset Construction (Example) NFA with e-moves
e
1
3
a
a
a
e
e
e
0
2
4
b
b
- Construct the DFA from the NFA
- Notice how the language only consists of two
characters a,b - Step 1 Build new start state by expanding the
e-moves - From state 0 we can reach states 1,2,3 by e-moves
- Thus the new logical start state is0,1,2,3
- State 0,1,2,3
- Input a get to states 0,1,2,3,4 New state
- Input b get to states 2,3,4 New state
16Subset Construction (Example) NFA with e-moves
e
1
3
a
a
a
e
e
e
0
2
4
b
b
- Construct the DFA from the NFA (cont)
- State 0,1,2,3,4
- Input a get to state 0,1,2,3,4 Already known
- Input b get to state 2,3,4 Already known
- State 2,3,4
- Input a get to state 3,4 New state
- Input b get to state 3,4 Just discovered
- State 3,4
- Input a get to state 3,4 Already known
- Input b get to state ? New state
- State ?
- Input a or b get to state ?Already known
17Subset Construction (Example) NFA with e-moves
- States
- 0,1,2,3, 0,1,2,3,4, 2,3,4, 3,4, ?
- Start State 0,1,2,3 obtained by traversing the
e-moves from the start state in the NFA - Accepting states 0,1,2,3,4, 2,3,4, 3,4
because they all have state 4 which was an
accepting state in the NFA - Construct the DFA using the states and discovered
transitions
a
a
0,1,2,3
0,1,2,3,4
a
b
b
b
a
a
b
2,3,4
3,4
?
b
18Regular Expressions
- An automation graphically defines a pattern
- A regular expression algebraically defines a
pattern - A regular expression consists of a sequence of
atomic operands - A character
- The empty string character, e
- The empty set character, ?
- A variable that can be defined with any regular
expression - A regular expression represents a set of strings
that are often called a language - Language of atomic operands
- L(x) x
- L(e) e
- L(?) ?
19Regular Expression Operators
- Union
- Denoted by
- If R and S are regular expressions then RS
denotes the union of the R and S languages - L(RS) L(R) ? L(S)
- Concatenation
- No special symbol for concatenation operation
- If R and S are regular expressions then RS
denotes the concatenation of language S onto the
back of language R - L(RS) L(R)L(S)
- Closure
- Denoted by - Kleene closure or closure
- If R is a regular expression then R indicates
zero or more occurrences of language R - L(R) ?? L(R) ? L(R)L(R) ? L(R)L(R)L(R) ?
L(R)L(R)L(R)L(R) ?... ?RRRRRR...
20Regular Expression Examples
- Order of precedence of regular expressions
- Kleene star
- Concatenation
- Union
- Examples
- ab a,b
- ab ab
- aab a,ab
- cbc c,bc
- aabcbc ac,abc,abbc omit 2nd abc
- a e,a,aa,aaa,aaaa,aaaaa,...
- ab e, a, b, aa, ab, ba, bb, ...
- abcd a,bd,bcd,bccd,bcccd,bccccd,...
- Simplify by using precidance
- abcd ab(c)d a(b(c))da((b(c))d
) (a((b(c))d))
21Regular Expression Example
- Build a regular expression for programming
language identifiers - Begins with a character
- Followed by any number of characters, integers or
the underscore (_) character - Solution
- letter ab...yzAB...YZ
- integer 0123456789
- underscore _
- identifier letter(letterintegerunderscore)
- Construct a regular expression for a signed
integer - signed integer (-e)(0123456789)
- The is usually used to indicate one or more
occurrences. The is only a simplification - a aa
- (01...89) (01...89) (01...89)
22Converting A Regular Expression into an NFA
- There are 3 simple rule for converting a regular
expression into an NFA - NFA can then be converted into a DFA using the
subset construction method
Union R1R2
e
e
e
e
Concatenation R1R2
e
e
e
Closure R1
e
e
e
e
23Example Converting A Regular Expression into an
NFA
- Convert (abaab) to an NFA
- This is ((ab)(aab)) ((ab)((aa)b))
- Concatenation has higher precedence over union
Step 1 Handle the concatenation
a
e
b
ab
a
e
a
e
b
aab
Step 2 Handle the union
abaab
a
e
b
e
e
a
e
a
e
b
e
e
Step 3 Handle the closure
(abaab )
a
e
b
e
e
e
e
e
a
e
a
e
b
e
e
e