Regular Expressions - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Regular Expressions

Description:

Title: Paradigmas y Perspectivas Futuras en Computaci n Author: Manuel Bermudez Last modified by: Manuel E. Bermudez Created Date: 3/29/2000 4:40:24 PM – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 41
Provided by: ManuelB6
Category:

less

Transcript and Presenter's Notes

Title: Regular Expressions


1
Regular Expressions
Programming Language Concepts Lecture 5
  • Prepared by
  • Manuel E. Bermúdez, Ph.D.
  • Associate Professor
  • University of Florida

2
Regular Expressions
  • A compact, easy-to-read language description.
  • Use operators to denote the language constructors
    described earlier, to build complex languages
    from simple atomic ones.

3
Regular Expressions
  • Definition A regular expression over an alphabet
    S is recursively defined as follows
  • ø denotes language ø
  • e denotes language e
  • a denotes language a, for all a ? S.
  • (P Q) denotes L(P) U L(Q), where P, Q are
    r.e.s.
  • (PQ) denotes L(P)L(Q), where P, Q are r.e.s.
  • P denotes L(P), where P is r.e.
  • To prevent excessive parentheses, we assume left
    associativity, with the following operator
    precedence hierarchy, from most to least binding
    , ,

4
Regular Expressions
  • Examples
  • (O 1) any string of Os and 1s.
  • (O 1)1 any string of Os and 1s, ending
    with a 1.
  • 1O1 any string of 1s with a single O
    inserted.
  • Letter (Letter Digit) an identifier.
  • Digit Digit an integer.
  • Quote Char Quote a string.
  • Char Eoln a comment.
  • Char another comment.
  • Assuming that Char does not contain quotes,
  • eolns, or .

5
Regular Expressions
  • Conversion from Right-linear grammars to regular
    expressions
  • Example
  • S ? aS R ? aS
  • ? bR
  • ? e
  • What does S ? aS mean?
  • L(S) ? aL(S)
  • S ? bR means L(S) ? bL(R)
  • S ? e means L(S) ?e

6
Regular Expressions
  • Together, they mean that
  • L(S) aL(S) bL(R) e
  • or S aS bR e
  • Similarly, R ? aS means R aS.
  • Thus, S aS bR e
  • R aS
  • System of simultaneous equations, in which the
    variables are nonterminals.

7
Regular Expressions
  • Solving systems of simultaneously equations.
  • S aS bR e
  • R aS
  • Back substitute R aS
  • S aS baS e
  • (a ba) S e
  • Question What to do with equations of the form
  • X ?X ß ?

8
Regular Expressions
  • Answer ß ? L(x), so aß ? L(x),
  • aaß ? L(x),
  • aaaß ? L(x),
  • Thus aß L(x).
  • In our case,
  • S (a ba) S e
  • (a ba) e
  • (a ba)

9
Regular Expressions
  • Right-linear regular grammar
  • ?
  • regular expression
  • 1. A a1 a2 an if A ? a1
  • ? a2
  • .
  • .
  • .
  • ? an

10
Regular Expressions
  • If equation is of the form X a, where X does
    not appear in a, then replace every occurrence of
    X with a in all other equations, and delete
    equation X a.
  • If equation is of the form X aX ß, where
    X does not occur in either a or ß, then replace
    the equation with X aß.
  • Note Some algebraic manipulations may be needed
    to obtain the form X aX ß.
  • Important Catenation is not commutative!!

11
Regular Expressions
  • Example
  • S ? a R ? abaU U ? aS
  • ? bU ? U ? b
  • ? bR
  • S a bU bR
  • R abaU U (aba e) U
  • U aS b
  • Back substitute R
  • S a bU b(aba e) U
  • U aS b

12
Regular Expressions
  • Back substitute U
  • S a b(aS b) b(aba e)(aS b)
  • a baS bb babaaS babab baS bb
  • (ba babaa)S (a bb babab)
  • therefore
  • S (ba babaa)(a bb babab)

repeats
13
Regular Expressions
  • Summarizing
  • RGR RGL Minimum
  • DFA
  • RE NFA DFA

Done
Soon
14
Regular Expressions
  • Regular Expression
  • ?
  • NFA
  • Recursively build the FSA, mimicking the
    structure of the regular expression. Each FSA
    built has one start state, and one final state.
  • Conversions
  • if ø

ALGORITHM 1
2
1
15
Regular Expressions
  • if e
  • if a
  • if P Q
  • if P Q
  • or

1
a
1
2
P
e
e
1
2
e
e
Q
e
P
Q
e
e
e
1
P
Q
2
16
Regular Expressions
e
  • if P
  • Example (b (aba e) a)
  • (b (aba e) a)
  • (b (aba e) a)
  • (b (aba e) a)

e
e
1
P
2
e
b
1
2
a
3
4
b
5
6
17
Regular Expressions
a
  • (b (aba e) a)
  • (b (aba e) a)
  • (b (aba e) a)
  • (b (aba e) a)

7
8
9
a
10
11
a
b
e
3
4
5
6
e
7
8
a
18
Regular Expressions
  • (b (aba e) a)
  • (b (aba e) a)

a
b
e
3
4
5
6
e
e
12
7
9
8
13
e
a
e
e
b
2
1
e
a
b
e
3
4
5
6
e
e
12
7
9
8
13
e
a
e
e
19
Regular Expressions
  • (b (aba e) a)

b
2
1
e
a
b
e
3
e
4
5
6
e
12
7
9
8
13
e
a
e
e
e
10
a
11
20
Regular Expressions
  • (b (aba e) a)

e
a
e
e
b
2
14
12
3
4
1
e
e
e
e
5
9
11
e
e
a
6
13
15
7
8
10
a
e
e
e
21
Regular Expressions
  • Regular Expression
  • ?
  • NFA
  • Start With

ALGORITHM 2
E
22
Regular Expressions
  • Apply Rules

a
a
e
e
ab
a
b
a
a b
b
23
Regular Expressions
  • Algorithm 1
  • Builds FSA bottom up
  • Good for machines
  • Bad for humans
  • Algorithm 2
  • Builds FSA top down
  • Bad for machines
  • Good for humans

Arguable
24
Regular Expressions
  • Example (Algorithm 2)

(a b) (aa bb)
(a b)
aa bb
aa
e
e
bb
a b
a
a
e
e
b
b
a
b
25
Regular Expressions
  • Example (Algorithm 2)

ba(a b) ab
a
b
a
e
e
a
b
b
26
Regular Expressions
  • Deterministic Finite-State Automata (DFAs)
  • Definition A deterministic FSA is defined just
    like an NFA, except that
  • d Q x S ? Q, rather than
  • d Q x S U e? 2Q
  • Thus, both
  • and
  • are impossible.

e
a
a
27
Regular Expressions
  • Every transition of a DFA consumes a symbol.
    Fortunately, DFAs are just as powerful as NFAs.
  • Theorem For every NFA there exists an equivalent
    (accepting the same language) DFA.

28
Regular Expressions
  • Conversion from NFAs to DFAs
  • Simulate all moves of the NFA with the DFA.
  • The start state of the DFA is the start state of
    the NFA (say, S), together with states that are
    e-reachable from S.
  • Each state in the DFA is a subset of the set of
    states of the NFA the notion of being in any
    one of a number of states.
  • New states in the DFA are constructed by
    calculating the sets of states that are reachable
    through symbols, after the start state.
  • The final states in the DFA are those that
    contain any final state of the NFA.

29
Regular Expressions
  • Example ab ba

a
e
3
2
b
e
6
a
NFA
1
b
e
e
4
5
30
Regular Expressions
  • DFA
  • Input
  • State a b
  • 123 23 456
  • 23 23 6
  • 456 56 ---
  • 6 --- ---
  • 56 56 ---

NFA
a
b
6
23
a
DFA
123
b
a
a
456
56
31
Regular Expressions
  • In general, if NFA has N states, the DFA can have
    as many as 2N states.
  • Example ba (a b) ab

32
DFA
  • Input
  • State a b
  • 0 --- 1
  • 1 234689 ---
  • 234689 34568910 346789
  • 34568910 34568910 34678911
  • 346789 34568910 346789
  • 34678911 34568910 346789

33
Regular Expressions
a
34568910
b
a
a
b
a
234689
34678911
a
0
1
b
346789
b
b
34
Regular Expressions
  • State Minimization
  • Theorem Given a DFA M, there exists an
    equivalent DFA M that is minimal, i.e. no other
    equivalent DFA exists with fewer states than M.
  • Definition A partition of a set S is a set of
    subsets of S such that every element of S appears
    in exactly one of the subsets.

35
Regular Expressions
  • Example S 1, 2, 3, 4, 5
  • ?1 1, 2, 3, 4, 5
  • ?2 1, 2, 3,, 4, 5
  • ?3 1, 3, 2, 4, 5
  • Note ?2 is a refinement of ?1 , and ?3 is a
    refinement of ?2.

36
Regular Expressions
  • Minimization Algorithm
  • Remove all undefined transitions by introducting
    a TRAP state, i.e. a state from which no final
    state is reachable.
  • Partition all states into two groups (final
    states and non-final states).
  • Complete the Next State table for each group,
    by specifying transitions from group to group.
  • Form the next partition split groups in which
    Next State table entries differ.
  • Repeat 3 until no further splitting is possible.
  • Determine start and final states.

37
Regular Expressions
a
b
  • Example
  • ?0 1, 2, 3, 4, 5
  • State a b
  • 1 1234 1234
  • 2 1234 1234
  • 3 1234 1234
  • 4 1234 5
  • 5 1234 1234

b
Split 4 from partition 1,2,3,4
38
Regular Expressions
  • ?1 1, 2, 3, 4, 5
  • State a b
  • 1 123 123
  • 2 123 4
  • 3 123 123
  • 4 123 5
  • 5 123 123
  • Split 2 from partition 1,2,3

39
Regular Expressions
  • ?2 1, 3, 2, 4, 5
  • State a b
  • 1 2 13
  • 3 2 13
  • 2 2 4
  • 4 2 5
  • 5 2 13
  • No more splitting
    Minimal DFA

40
Regular Expressions
  • Summary of Regular Languages
  • Smallest class in the Chomsky hierarchy.
  • Appropriate for lexical analysis.
  • Four representations RGR , RGL , RE and FSA.
  • All four are equivalent there are algorithms to
    perform transformations among them.
  • Various advantages and disadvantages among these
    four, for language designer, implementor, and
    user.
  • FSAs can be made deterministic, and minimal.
Write a Comment
User Comments (0)
About PowerShow.com