Regular Languages and Expressions - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Regular Languages and Expressions

Description:

Surinder Kumar Jain, University of Sydney – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 35
Provided by: edua2239
Category:

less

Transcript and Presenter's Notes

Title: Regular Languages and Expressions


1
RegularLanguages and Expressions
  • Surinder Kumar Jain,
  • University of Sydney

2
Regular Languages Expressions
  • Automaton
  • DFA
  • NFA
  • ?-NFA
  • CFG as a DFA
  • Equivalence
  • Minimal DFA
  • Expressions
  • Definition
  • Conversion from/to Automaton
  • Regular Langauges
  • Pumping Lemma proving regularness
  • Closures
  • Equivalence

3
Deterministic Finite Automaton
  • A system with many states
  • Can transition from one state to another
  • Usually caused by external input
  • Set of states is finite
  • System is in one state at any given time

4
DFA
  • Mathematical Definition of a DFA
  • A (Q, S,d, q0,F)
  • Q States, DFA is in one of these finite states
    at any time.
  • S Input symbols, DFA changes its state from
    one state to another state on consuming an input
    symbol.
  • d Transition function.
  • Given a state and an input symbols, gives the
    next DFA state
  • Function over QxS -gt Q.
  • q0 Initial DFA state
  • F Accepting states. Once DFA reaches one of
    these states, it may not accept any more input
    symbols.

5
DFA Example
Q waiting, pending, rejected, approved, paid
S receive, reject, accept, pay d
(waiting -gt receive -gt pending), (pending -gt
reject -gt rejected), (pending -gt accept -gt
accepted), (accepted -gt pay -gt paid) q0
waiting F rejected, paid
6
Transition Diagrams
start
receive
accept
Accepted
pay
Paid
Waiting
Pending
Paid
reject
Paid
Rejected
Q waiting, pending, rejected, approved, paid
S receive, reject, accept, pay d
(waiting -gt receive -gt pending), (pending -gt
reject -gt rejected), (pending -gt accept -gt
accepted), (accepted -gt pay -gt paid) q0
waiting F rejected, paid
7
Language
  • Set of alphabets
  • Concatenation (joining)
  • Strings
  • A subset of strings is a language
  • A DFA defines a language
  • Alphabet set is the set of input symbols
  • Concatenation - one symbol follows another
  • Acceptance sequence of symbols takes DFA from
    start state to one of the accepting states

8
Non-deterministic Finite Automaton (DFA)
  • Five-tuple like a DFA, (Q, S,d, q0,F)
  • Transition function returns a set not one state
  • Several outgoing arcs with same symbol
  • In several states at the same time
  • Language of NFA

9
Equivalence of DFA NFA
  • Any NFA language can be described by some DFA
  • Adding non-determinism does not give any thing
    more
  • Why use NFAs then
  • Easier to make for some languages
  • May have fewer states and less complex
  • Algorithm to convert NFA to DFA
  • For n state NFA,DFA may have up to 2n states
  • Can throw away inaccessible states
  • Observation DFA has practically the same number
    of states as NFA though it often has more
    transitions

10
NFA to DFA conversion
  • For an NFA, N Q, S, d, q0, F,
  • Construct the DFA, D Qd, S, dd, q0, Fd
  • Qd Powerset of Q
  • dd(S, a) Up in S d(p,a) for every S in Qd.
  • Fd S S is subset of Q and S has an accepting
    state of NFA
  • DFA operates on one state at a time, NFA operates
    on sets of states.
  • Given a state, NFA gives a set of new states
  • Make all possible sets of DFA states as NFA
    states
  • Transit from one set of states to a new set of
    all possible state set
  • Any set with an accepting state is the accepting
    state in NFA

11
NFA to DFA conversion complexity
  • O(2n) (number of subsets of a set)
  • Efficient algorithm
  • Do not construct the entire power set
  • Start with start state
  • Only construct subsets that can reach an
    accepting state from the start state
  • The number of states in DFA is much less than 2n.
  • DFA has practically the same number of states as
    NFA though it often has more transitions

12
epsilon - NFA
  • Includes e (the empty string, not in alphabet
    set) as a transition
  • e is identity in concatenation
  • a.e e.a a for all a
  • Spontaneous transition without an input

13
Equivalence to NFA
  • An e-NFA language can be described by some NFA
  • Every NFA can be described by some DFA
  • Adding e transition does not give any thing more
  • Why use e-NFAs then
  • Easier to make for some languages
  • Useful in proving equivalence of languages

14
Conversion to NFA
  • Conversion aims to remove e transitions
  • Define a new set of states
  • e are contained inside the set
  • No e arc leaves or enters the new set of states
  • Epsilon closure (eclose)
  • For a state, set of all states reachable
    spontaneously
  • Follow the e arcs recursively and include
    reachable states in the epsilon closure

15
epsilon-NFA to DFA conversion
  • For an e-NFA, N Q, S, d, q0, F,
  • Construct the DFA, D Qd, S, dd, eclose(q0),
    Fd
  • Qd eclose(q) q eclose(q) and q in Q
  • dd(S, a) Up in S d(p,eclose(a)) for every S
    in Qd.
  • Fd S S is subset of Q and S has an accepting
    state of NFA
  • DFA operates on one state at a time, e-NFA
    operates on sets of states with no e transition
    leaving the set
  • Make all eclose sets as DFA states
  • Transit from one set of states to a new set of
    all eclose state set
  • Any set with an accepting state is the accepting
    state in NFA

16
Programs as Automatan
  • An imperative program can be represented as a
    Control Flow Graph (CFG) with
  • statements at nodes and
  • predicates at edges
  • It can be converted into a CFG with
  • both statements and predicates at edges
  • by pushing node statements up incoming edges
  • Such a CFG is a DFA
  • Program points are States
  • Statements are input symbols that change program
    state from program point to point

17
Regular Expression
  • Algebraic expression to denote languages
  • Composed of symbols e, Ø, , , ., (,
    ) and alphabets
  • The language is generated using rules
  • L(e) empty set
  • L(Ø) empty set
  • L(a) a for all alphabets a
  • L(pq) L(p) U L(q)
  • L(p.q) p.q p in L(p) q in L(q)
  • L(p) qn q in L(p) and n gt 0 , q0 e,
    qkq.qk-1

18
Regular Expression Example
  • ab.c
  • The language generated is
  • a, b.c
  • a.b.c.d
  • the language generated is
  • a.b.d, a.b.c.d, a.b.c.c.d, a.b.c.c.c.d,
  • A finite way to express an infinite language

19
Equality of Languages
  • DEFINITION
  • Two regular expression (or automaton)
  • are EQUAL
  • if they both generate same languages
  • Thus
  • (a.b) (b.a) a.(b.a) b.(b.a)
  • (e b).(a.b).(ea)

20
Algebraic laws of regular expressions
  • p q q p
  • (p q) r p (q r)
  • (p.q).r p.(q.r)
  • Ø p p Ø p
  • e.p p.e p
  • Ø.p p.Ø Ø
  • p.(qr) p.q p.r
  • (p q).r p.r q.r
  • p p p
  • (p) p
  • Ø e
  • e e
  • p.p p.p
  • (p q) (p.q)

21
Finite Automaton and Regular Expressions
  • Every language
  • defined by a finite automaton is also defined by
    some regular expression
  • defined by a regular expression is also defined
    by some DFA

22
DFA to Regular expression
  • Hopcrofts formula
  • Rij(k) Rij(k-1)Rik(k-1).(Rkk(k-1)).Rkj(k-1)
  • Rij(n) is the regular expression of all paths
    from i to j. (n is the number of states)
  • States are sorted in some order and numbered 1 to
    n
  • Rij(k) is regular expression of all paths from i
    to j passing thru nodes whose sort order is less
    than k
  • Computed for all i,j for k0, then k1,,kn
  • Rs,f1(n)Rs,fk(n) is the regular expression of
    the DFA
  • s is the start state, f1,,fk are accepting
    states, n is the number of states.

23
DFA to RE - complexity
  • Hopcroft formula is O(n34n),
  • n3 to compute the table and
  • 4n as size of regular expression grows by 4 every
    time.
  • In practice it is close to O(n3)
  • By simplifying the regular expression at every
    step and
  • using judicious algorithm avoiding recomputation
    of Rkk(k)
  • Most DFAs have almost n and not 2n accessible
    states
  • A faster state elimination method close to O(n2)
    is also available

24
RE to Automatan conversion
  • Regular expression is converted to e-NFA
  • e-NFA can the be converted to NFA and to DFA
  • RE to e-NFA conversion rules
  • e -gt One edge (two state) DFA with e
    transition
  • Ø -gt Two state DFA with no edges
  • a -gt Two state with a transition
  • -gt A new start/accept statejoining two
  • arguments of in parallel
  • . -gt Accept of first is start of second
  • -gt An e edge joining star/accept of
    argument and
  • a new start/accept state
  • Convert resulting e-NFA to a DFA

25
Direct conversion
  • Augment regular expression r to (r).
  • Position number for each occurrence of alphabet
  • Compute for each node of syntax tree
  • nullable (e in the language)
  • firstpos (set of possible first alphabets)
  • lastpos (set of possible last alphabets)
  • Compute for each position
  • followpos (set of possible next alphabet after
    this position)
  • Construct the DFA

26
Applications
  • Unix text search, search matching patterns (grep)
  • Lexical/Parser analysis
  • Parse text against a regular expression
  • find set of first tokens at this expression root
  • find set of last tkens at this expression root
  • can the expression at this root be null set
  • find set of next tokens after an alphabet
    position in a regular expression
  • Efficient search of patterns in very large
    repository (web text search)

27
Regular Language
  • DEFINITION
  • A language (a set of strings)
  • is defined to be a regular language if
  • it can be defined by a finite automaton
  • by a DFA or
  • by an NFA or
  • by an e-NFA or
  • by a regular expression
  • Four different ways to describe a regular
    language

28
Pumping Lemma
  • If L is a regular language then there exists
  • integer n such that
  • for every string w in L
  • we can break w into x, y, z such that wx.y.z
  • y ? e
  • x.y lt n
  • x.yk.z is in L (for all k gt 0)
  • Proof based on
  • For a DFA of length n
  • any string of length gt n
  • must revisit a state
  • Used to prove that a language is not regular

29
Closure property
  • Language is a set of string over finite alphabets
  • Language operators
  • Union of two languages L(A ? B) L(A) ? L(B) -
    re
  • Intersection
  • Concatenation L(A.B) a.b a in A, b in B
  • Kleene Closure L(A) an a in A, n gt 0
  • a0 e for all a and an an-1
  • Compliment L(A) a a not in A (with
    respect to some overall alphabet set) - dfa
  • Difference L(A-B) L(A) L(B) - dfa switch q0
    F
  • Reversal L (A) ak.ak-1a1 a1ak-1.ak in A
  • Homomorphism replace an alphabet with another
    regular expression
  • Inverse homomorphism

30
Decision properties
  • Is the language described empty?
  • Is a particualr string in the described language?
  • Do two different of languages actually describe
    the same language?

31
Conversions
  • Decision properties may require conversion
    between various forms.
  • Can the conversion be done in reasonable time?

Conversion Complexity
Computing e closures O(n3) Warshalls O(n)
Subset construction O(2n)
NFA to DFA O(n32n) (In practice O(n3s)
DFA to NFA conversion O(n)
NFA/DFA to Regular Expression O(n34n) (worst case) (Actual is much less)
Regular Expression to eNFA O(n)
Regular Expression to NFA O(n3)
Regular Expression to DFA O(n34n32n)
32
Equivalence of automata
  • Equivalence of two states
  • States p and q in an automaton are Defined to be
    equivalent if
  • For all input strings applied at state p or q
  • p ends up in an accepting state
  • if and only if
  • q also ends up in an accepting state
  • The accepting state reached by p does not have to
    be same accepting state as that reached by q

33
Minimization of DFA
  • If two states p and q are equivalent
  • we can combine them together into a single state
  • it wont affect the language accepted by the DFA
  • This process of combining states together is
    called Minimization
  • Table-filling algorithm can find if two states
    are equivalent or not. Complexity O(n2)
  • Non-equivalent pairs are distinguishable

34
MinimuM DFA
  • Minimum DFA is unique
  • Eliminate all states not reachable from start
  • Determine which states are equivalent
  • Partition states into blocks of equivalent states
  • Equivalence is transitive
  • Thus no state is in two blocks
  • Equivalence of two Regular Languages
  • Convert them into their minimum DFAs
  • and check for isomorphism
  • Union method
  • Make a minimum DFA of the union of the two
  • Start state of the two original DFAs must be
    equivalent if and only if DFAs are equivalent
Write a Comment
User Comments (0)
About PowerShow.com