Regular Expressions Finite State Automaton - PowerPoint PPT Presentation

About This Presentation
Title:

Regular Expressions Finite State Automaton

Description:

Regular Expressions Finite State Automaton – PowerPoint PPT presentation

Number of Views:177
Avg rating:3.0/5.0
Slides: 17
Provided by: ackr
Category:

less

Transcript and Presenter's Notes

Title: Regular Expressions Finite State Automaton


1
Regular ExpressionsFinite State Automaton
2
Regular expressions
  • Terminology on Formal languages
  • alphabet a finite set of symbols
  • string a finite sequence of alphabet symbols
  • language a (finite or infinite) set of strings.
  • Regular Operations on languages
  • Union R ? S x x ? R or x ? S
  • Concatenation RS xy x ? R and y ? S
  • Kleene closure R R concatenated with itself 0
    or more times
  • ? ? R ? RR ?
    RRR ?
  • strings obtained
    by concatenating a finite
  • number of
    strings from the set R.

3
Regular Expressions
  • A pattern notation for describing certain kinds
    of sets over strings
  • Given an alphabet ?
  • ? is a regular exp. (denotes the language ?)
  • for each a ? ?, a is a regular exp. (denotes the
    language a)
  • if r and s are regular exps. denoting L(r) and
    L(s) respectively, then so are
  • (r) (s) ( denotes the language L(r) ? L(s) )
  • (r)(s) ( denotes the language L(r)L(s) )
  • (r) ( denotes the language L(r) )

4
Common Extensions to r.e. Notation
  • One or more repetitions of r r
  • A range of characters a-zA-Z, 0-9
  • An optional expression r?
  • Any single character .
  • Giving names to regular expressions, e.g.
  • letter a-zA-Z_
  • digit 0 1 2 3 4 5 6 7 8 9
  • ident letter ( letter digit )
  • Integer_const digit

5
Examples of Regular Expressions
  • Identifiers
  • Letter ? (abc zABC Z)
  • Digit ? (012 9)
  • Identifier ? Letter ( Letter Digit )
  • Numbers
  • 0-9 0-9 0-9
  • 1-90-9 (1-90-9)0
  • -?0-9
  • 0-9\.0-9 (0-9)(0-9\.0-9)
  • eE-?0-9 (0-9\.0-9)(eE-?0-9)
    ?
  • -?( (0-9) (0-9\.0-9)(eE-?0-9)?
    )

6
Examples of Regular Expressions
  • Numbers
  • Integer ? (-?) (0 (123 9)(Digit ) )
  • Decimal ? Integer . Digit
  • Real ? ( Integer Decimal ) E (-?)
    Digit
  • Complex ? ( Real , Real )

7
Exercise of Regular Expressions
  • ???
  • a-z a-zA-Z a-zA-Z0-9
  • a-zA-Za-zA-Z0-9
  • ???
  • "this is a string"
  • \".\" lt- wrong!!! why?
  • \""\"
  • ??? ??
  • 0? 1? ???? ??? ???...
  • 0?? ???? ??? 001
  • 0?? ???? 0?? ??? ??? 0010
  • 0? 1? ??? ??? ??? ______
  • 0? ?? ?? ??? ?? ??? ______

8
Recognizing Tokens Finite Automata
  • A finite automaton is a 5-tuple (Q, ?, T, q0, F),
    where
  • ? is a finite alphabet
  • Q is a finite set of states
  • T Q ? ? ? Q is the transition function
  • q0 ? Q is the initial state and
  • F ? Q is a set of final states.

9
Finite Automata An Example
  • A (deterministic) finite automaton (DFA) to match
    C-style comments

10
Example 2
  • Consider the problem of recognizing register
    names
  • Register ? r (012 9) (012 9)
  • Allows registers of arbitrary number
  • Requires at least one digit
  • RE corresponds to a recognizer (or DFA)
  • Transitions on other inputs go to an
    error state, se

11
Example 2 (continued)
  • DFA operation
  • Start in state S0 take transitions on each
    input character
  • DFA accepts a word x iff x leaves it in a final
    state (S2 )
  • So,
  • r17 takes it through s0, s1, s2 and accepts
  • r takes it through s0, s1 and fails
  • a takes it straight to se

12
Example 2 (continued)
  • To be useful, recognizer must turn into code

All others
0,1,2,3,4,5,6,7,8,9
r
?
se
se
s1
s0
se
s2
se
s1
se
s2
se
s2
se
se
se
se
Table encoding RE
13
What if we need a tighter specification?
  • r Digit Digit allows arbitrary numbers
  • Accepts r00000
  • Accepts r99999
  • What if we want to limit it to r0 through r31 ?
  • Write a tighter regular expression
  • Register ? r ( (012) (Digit ?)
    (456789) (33031) )
  • Register ? r0r1r2 r31r00r01r02 r09
  • Produces a more complex DFA
  • Has more states
  • Same cost per transition
  • Same basic implementation

14
Tighter register specification (continued)
  • The DFA for
  • Register ? r ( (012) (Digit ?)
    (456789) (33031) )
  • Accepts a more constrained set of registers
  • Same set of actions, more states

15
Tighter register specification (continued)
All others
4-9
3
2
0,1
r
?
se
se
se
se
se
S1
s0
se
s4
s5
s2
s2
se
s1
se
s3
s3
s3
s3
se
s2
se
se
se
se
se
se
s3
se
se
se
se
se
se
s4
se
se
se
se
s6
se
s5
se
se
se
se
se
se
s6
se
se
se
se
se
se
se
Table encoding RE for the tighter register
specification
16
Automating Scanner Construction
  • RE? NFA (Thompsons construction)
  • Build an NFA for each term
  • Combine them with e-moves
  • NFA ? DFA (subset construction)
  • Build the simulation
  • DFA ? Minimal DFA
  • Hopcrofts algorithm
  • DFA ?RE (Not part of the scanner construction)
  • All pairs, all paths problem
  • Take the union of all paths from s0 to an
    accepting state
Write a Comment
User Comments (0)
About PowerShow.com