CSE 3813 Introduction to Formal Languages and Automata - PowerPoint PPT Presentation

1 / 88
About This Presentation
Title:

CSE 3813 Introduction to Formal Languages and Automata

Description:

They are intended for classroom use only and are not a substitute for reading the textbook. ... Another formalism for regular languages ... – PowerPoint PPT presentation

Number of Views:1483
Avg rating:3.0/5.0
Slides: 89
Provided by: genebo
Category:

less

Transcript and Presenter's Notes

Title: CSE 3813 Introduction to Formal Languages and Automata


1
CSE 3813Introduction to Formal Languages and
Automata
  • Chapter 3
  • Regular languages and regular grammars
  • These class notes are based on material from our
    textbook, An Introduction to Formal Languages and
    Automata, 4th ed., by Peter Linz, published by
    Jones and Bartlett Publishers, Inc., Sudbury, MA,
    2006. They are intended for classroom use only
    and are not a substitute for reading the textbook.

2
Operations on formal languages
Let L1 10 and L2 011, 11. Union
L1 ? L2 10, 011, 11 Concatenation L1
L2 10011, 1011 Kleene Star L1 ?,
10, 1010, 101010, Other operations
intersection, complement, difference
3
Definition Of Regular Languages
  • A regular language over an alphabet ? is one that
    contains either a single string of length 0 or 1,
    or strings which can be obtained by using the
    operations of union, concatenation, or Kleene on
    strings of length 0 or 1.

4
Alternative definition of regular languages
The simplest possible regular languages are the
empty set and languages consisting of a single
string that is either the empty string or has
length one. For example if ? a,b, the
simplest languages are ?, ?, a, and b. A
regular language is a language that can be built
from these simple languages, by using the three
operations of union, concatenation, and Kleene
star.
5
Regular Languages correspond to Regular
Expressions
  • L Ø RE is Ø
  • L ? RE ?
  • L a RE a
  • L L1 ? L2 RE (r1 r2)
  • L L1 L2 RE (r1r2)
  • L L1 RE (r1)

6
Regular expressions
A useful shorthand for describing regular
languages. Compare to arithmetic expressions,
such as (x 3)/2. An arithmetic expression is
constructed using arithmetic operators, such as
addition and division. A regular expression is
constructed using operations on languages, such
as concatenation, union, and Kleene star. The
value of an arithmetic expression is a number.
The value of a regular expression is a language.
7
Recursive definition of a regular expression
? is a regular expression corresponding to the
language ?. ? is a regular expression
corresponding to the language ?. For each
symbol a ? ?, a is a regular expression
corresponding to the language a. For any
regular expressions r and s, corresponding to
the regular languages L(r) and L(s),
respectively, each of the following is a
regular expression (r s) corresponds
to the language L(r) ? L(s) (r s) or (rs)
corresponds to the language L(r)L(s) (r)
corresponds to the language (L(r))
8
Examples
a b ?, a, b, aa, aaa, aaaa, aaaaa, aba
w ? ? w has exactly one b (a b) any
string of as and bs (a b)aa (a b) w ?
? w contains aa (a b)aa (a b) (a
b)bb (a b)
w ? ? w contains aa or bb (a ?)b
abn n ? 0 bn n ? 0 As with arithmetic
expressions, there is an order of precedence for
operators -- unless you change it
using parentheses. The order is star closure
first, then concatenation, then union.
9
More examples
All strings containing no more than two as (b
c)(? a)(b c)(? a)(b c) All strings
containing no runs of as of length greater than
two (b c)(? a aa)(b c)((b c)(b
c)(? a aa)(b c)) All strings in which
all runs of as have lengths that are multiples
of three (aaa b c)
10
Hints for writing regular expressions
Assume ? a, b, c. Zero or more as
a One or more as
aa Any string at all
(a b c) Any
nonempty string (a b
c)(a b c) Any string that does not contain
a (b c) Any string containing exactly one
a (b c)a(b c)
11
Practice
  • Let ? a,b,c. Give a regular expression for
    the following languages
  • all strings containing exactly two as
  • all strings containing no more than three as

12
Practice
Let ? a,b,c. Give a regular expression for
the following languages (a) all strings
containing exactly two as (b c)a(b c)a(b
c) ( b) all strings containing no more than
three as (b c)(? a)(b c)(? a)(b
c)(? a)(b c)
13
Practice
What languages correspond to the
following regular expressions? ab (aaa
bba) (ab)
14
More practice
Give regular expressions for the following
languages, where the alphabet is ? a, b,
c. -- all strings ending in b -- all strings
containing no more than two as -- all strings
of even length
15
More practice
Give regular expressions for the following
languages, where the alphabet is ? 0, 1. --
all strings of one or more 0s followed by a 1
-- all strings of two or more symbols followed
by three or more 0s -- all strings that do not
end with 01
16
Do these strings match the regular expression?
Regular expression String (01
1) 0101 (a
?)b
b (ab)a
? (a b)(ab) bb
17
Big Question
Given a specific regular language L, is it safe
to assume that any subset of L is also regular?
NO! Remember that the more powerful a language
is, the more precisely it is able to discriminate
between strings that do and do not belong to the
language.
18
Big Question
Suppose L is described by the regular expression
ab. Then ab, aabb, aaabbb, etc. are strings in
this language. However, a, b, aab, aaaaaaab,
etc. are also strings in this language. The
language consisting only of the strings of the
form anbn is NOT a regular language, even though
it is a subset of ab.
19
Accepting (review)
  • Let M (Q, S, q0, d, A) be an FA.
  • A string x ? S is accepted by M if
  • d(q0, x) ? A
  • The language accepted (or recognized) by M is the
    set L(M) x ? S x is accepted by M
  • A language L over the alphabet S is regular iff
    there is a Finite Automaton that accepts L.

20
Kleenes theorem
1) For any regular expression r that represents
language L(r), there is a finite automaton that
accepts that same language. 2) For any finite
automaton M that accepts language L(M), there is
a regular expression that represents the same
language. Therefore, the class of languages that
can be represented by regular expressions is
equivalent to the class of languages accepted by
finite automata -- the regular languages.
21
Kleenes theorem part 1
NFA
regular expression
proved
Kleenes Theorem part 2
DFA
22
Theorem 3.1
1st half of Kleenes theorem Let r be a regular
expression. Then there exists some
nondeterministic regular accepter that accepts
L(r). Consequently, L(r) is a regular
language. Proof strategy for any regular
expression, we show how to construct an
equivalent NFA. Because regular expressions are
defined recursively, the proof is by induction.
23
Base step Give an NFA that accepts each of the
simple or base languages, ?, ?, and a for
each a ? ?.
a
24
Inductive step For each of the operations --
union, concatenation and Kleene star -- show how
to construct an accepting NFA. Closure under
union
M1
?
?
M2
?
?
25
Closure under concatenation
26
Closure under Kleene Star
?
M1
?
?
?
27
Closure properties of Regular Languages
  • Union, concatenation, and Kleene star of two
    regular languages will result in a regular
    language, since we can write a regular expression
    for them.
  • Intersection and difference (complement) of two
    regular languages will also produce a regular
    language.
  • The class of regular languages is said to be
    closed under these operations. (More in Ch. 4.)

28
Exercise
Use the construction of the first half of
Kleenes theorem to construct a NFA that accepts
the language L(abaa bbaab).
29
Exercise
Use the construction of the first half of
Kleenes theorem to construct a NFA that accepts
the language L(abaa bbaab).
?
?
FA accepting abaa
q0
qf
?
FA accepting bbaab
?
30
Homework
Construct an NFA that accepts the language
corresponding to the regular expression ((b(a
b)a) a)
31
Theorem 3.2
Kleenes theorem part 2 Let L be a regular
language. Then there exists a regular expression
r such that L L(r). Any language accepted
by a finite automaton can be represented by a
regular expression. The proof strategy For any
DFA, we show how create an equivalent regular
expression. In other words, we describe an
algorithm for converting any DFA to a regular
expression.
32
Expression diagram
  • A labeled directed graph (similar to a finite
    state diagram) in which transitions are labeled
    by regular expressions
  • Has a single start state with no incoming
    transitions
  • Has a single accepting state with no outgoing
    transitions
  • Example

33
Algorithm for converting a DFA into an equivalent
regular expression
Initial step Change every transition labeled
a,b to (ab). Add a single start state with an
outgoing ?-transition to the current start state,
and add a single final state with incoming
?-transitions from every previous final
state. Main step Until expression diagram has
only two states (initial state and final state),
repeat the following -- pick some
non-start, non-final state -- remove it from
the diagram and re-label transitions with
regular expressions so that the same language
is accepted
34
The key step is removing states and
re-labeling transitions with regular expressions.
Here are some examples of how to do this.
b
a
a
aba
b
aba
a
b
abb
a
b
b
a
a
35
Exercise
a,b
a
a
(ab)
?
?
b
b
Continue ...
36
Exercise
a
(ab)
?
?
b
(ab)
?
ab
ab (ab)
37
Exercise
Find a regular expression that corresponds to the
language accepted by the following DFA.
38
Exercise
?
?
abba
abb
?
(abba)abb
39
Homework
Find a regular expression that corresponds to the
language accepted by the following DFA.
0
q1
q0
0
1
1
q2
1
0
40
Applications of regular expressions
  • Validation
  • checking that an input string is in valid format
  • example 1 checking format of email address on
  • WWW entry form
  • example 2 UNIX regex command
  • Search and selection
  • looking for strings that match a certain pattern
  • example UNIX grep command
  • Tokenization
  • converting sequence of characters (a string) into
    sequence of tokens (e.g., keywords, identifiers)
  • used in lexical analysis phase of compiler

41
Grammar
  • A grammar G (V, T, S, P) consists of the
    following quadruple
  • a set V of variables (non-terminal symbols),
    including a starting symbol S ? NT
  • a set T of terminals (same as an alphabet, ?)
  • A start symbol S ? V
  • a set P of production rules
  • Example
  • S ? aS A
  • A? bA ?

42
Derivation
  • Strings are derived from a grammar
  • Example of a derivation
  • S ? aS ? aaS ? aaA ? aabA ? aab
  • At each step, a nonterminal is replaced by the
    sentential form on the right-hand side of a rule
    (a sentential form can contain nonterminals
    and/or terminals)
  • Automata recognize languages grammars generate
    languages

43
Context-free grammar
  • A grammar is said to be context-free if every
    rule has a single non-terminal on the left-hand
    side
  • This means you can apply the rule in any context.
    More complicated languages (such as English)
    have context-dependent rules.
  • A language generated from a context-free grammar
    is called a context-free language.

44
The English language
  • In fact, it may not be possible to fully specify
    the stntax of the English language.
  • The language grows all the time, and new words
    and constructions are constantly being added.
  • In addition, lanuage exists to be used to convey
    meaning sometimes a particular meaning is better
    conveyed by not using standard syntax.

45
Consider this poem by e e cummings
  • l (a
  • le
  • af
  • fa
  • ll
  • s)
  • one
  • l
  • iness

46
e e cummingss poetry
  • A Cummings poem is spare and precise, employing
    a few key words eccentrically placed on the page.
    Some of these words were invented by Cummings,
    often by combining two common words into a new
    synthesis. He also revised grammatical and
    linguistic rules to suit his own purposes, using
    such words as "if," "am," and "because" as nouns,
    for example, or assigning his own private
    meanings to words.
  • - http//www.poetryfoundation.org/archive/poet.htm
    l?id81323

47
Buffalo Bills defunct who used to ride a
watersmooth-silver stallion and break
onetwothreefourfive pigeonsjustlikethat
Jesus he was a handsome man and
what I want to know is how do you like your
blueeyed boy Mister Death
e e cummings
48
Regular grammar
  • A grammar is said to be right-linear if all
    productions are of the form A?xB or A?x, where A
    and B are variables and x is a string of
    terminals. (This means that if there is a
    variable on the right side of the production
    rule, then it is the rightmost element in the
    rule.)
  • A grammar is said to be left-linear if all
    productions are of the form A?Bx or A?x
  • A regular grammar is either right-linear or
    left-linear.

49
Linear grammar
  • A grammar can be linear without being right- or
    left-linear.
  • A linear grammar is a grammar in which at most
    one variable can occur on the right side of any
    production rule, without any restriction on the
    position of the variable.
  • Example
  • S ? aS A
  • A? Ab ?

50
Another formalism for regular languages
  • Every regular grammar generates a regular
    language, and every regular language can be
    generated by a regular grammar.
  • A regular grammar is a simpler, special-case of a
    context-free grammar
  • The regular languages are a proper subset of the
    context-free languages

51
Exercises
  • Find a regular grammar that generates the
    language on ? a,b consisting of all strings
    with no more than three as.

52
Exercises
  • Find a regular grammar that generates the
    language on ? a,b consisting of all strings
    with no more than three as
  • S ? bS aA ?
  • A ? bA aB ?
  • B ? bB aC ?
  • C ? bC ?

53
Exercises
  • Find a regular grammar that generates the
    language consisting of even-length strings over
    a,b.

54
Exercises
  • Find a regular grammar that generates the
    language consisting of even-length strings over
    a,b.
  • S ? aaS abS baS bbS ?

55
Non-regular languages
  • There are non-regular languages that can be
    generated by context-free grammars
  • The language anbn n ? 0 is generated by the
    grammar S ? aSb ?
  • The language L w na(w) nb(w) is generated
    by the grammar S ? SS ? aSb bSa

56
Exercise
  • What language is generated by the following
    context-free (but not regular) grammar?
  • S ? aSa bSb a b ?

57
Exercise
  • What language is generated by the following
    context-free grammar?
  • S ? aSa bSb a b ?
  • This is the odd/even palindrome language
  • L w(ab?)wR

58
Programming languages
  • Programming languages are context-free, but not
    regular
  • Programming languages have the following features
    that require infinite stack memory
  • matching parentheses in algebraic expressions
  • nested if .. then .. else statements, and nested
    loops
  • block structure

59
Exercise
  • Given a grammar, you should be able to say what
    language it generates
  • Use set notation to define the language generated
    by the following grammars
  • 1) S ? aaSB ?
  • B ? bB b
  • 2) S ? aSbb A
  • A ? cA c

60
Exercise
S ? aaSB ? B ? bB b It helps to list some
of the strings that can be formed S ? aaSB ? aaB
? aab S ? aaSB ? aaB ? aabB ? aabb S ? aaSB ? aaB
? aabB ? aabbB ? aabbb S ? aaSB ? aaB ? aabB ?
aabbB ? aabbbB ? aabbbb S ? aaSB ? aaaaSBB ?
aaaaBB ? aaaaBb ? aaaabb S ? aaSB ? aaaaSBB ?
aaaaBB ? aaaaBbB ? aaaaBbb ? aaaabbb What is
the pattern? L (aa)nbnb
61
Exercise
  • Given a language, you should be able give a
    grammar that generates it.
  • For example, give a regular (right-linear)
    grammar for the language consisting of all
    strings over a, b, c that begin with a, contain
    exactly two bs, and end with cc.

62
Exercise
  • Give a regular (right-linear) grammar for the
    language consisting of all strings over a, b, c
    that begin with a, contain exactly two bs, and
    end with cc
  • S ? aA
  • A ? bB aA cA
  • B ? bC aB cB
  • C ? aC cC cD
  • D ? c

63
Derivation
Given the grammar, S ? aaSB ? B ?
bB b the string aab can be derived in two
different ways S ? aaSB ? aaB ? aab S ? aaSB
? aaSb ? aab
64
Parse tree
Both derivations on the previous slide correspond
to the following parse tree.
The tree structure shows the rule that is applied
to each nonterminal, without showing the order of
rule applications. Each internal node of the
tree corresponds to a nonterminal, and the leaves
of the derivation tree represent the string of
terminals.
65
Exercise
Let G be the grammar S ? abSc A A ? cAd
cd 1) Give a derivation of ababccddcc. 2) Build
the parse tree for the derivation of that
string. 3) Use set notation to define L(G).
66
Leftmost (rightmost) derivation
In a leftmost derivation, the leftmost
nonterminal is replaced at each step. In a
rightmost derivation, the rightmost nonterminal
is replaced at each step. Many derivations are
neither leftmost nor rightmost. If there is a
single parse tree, there is also a single
leftmost derivation.
67
Ambiguity
A grammar is ambiguous if it can generate a
string with two possible parse trees. (A string
has more than one parse tree if and only if it
has more than one leftmost derivation.) English
can be ambiguous. Example Disabled fly to see
Carter.
68
Example
Given the following grammar S ? S S S S
1 0 The string 1 1 0 has two different
parse trees.
69
Equivalent grammars
Here is a non-ambiguous grammar that generates
the same language. S ? S A 1 0 A ? A B
1 0 B ? 1 0 Two grammars that generate
the same language are said to be equivalent. To
make parsing easier, we prefer grammars that are
not ambiguous.
70
Dangling else
x 3 if x gt 2 then if x gt 4 then x 1
else x 5 So, what is x?
71
Ambiguous vs. unambiguous
Ambiguous grammar ltstatementgt IF lt
expressiongt THEN ltstatementgt IF
ltexpressiongt THEN ltstatementgt ELSE ltstatementgt
ltotherstatementgt Unambi
guous grammar ltstatementgt ltst1gt ltst2gt
ltst1gt IF ltexpressiongt THEN ltst1gt ELSE
ltst1gt ltotherstatementgt ltst2gt
IF ltexpressiongt THEN ltstatementgt IF
ltexpressiongt THEN ltst1gt ELSE ltst2gt
72
Exercise
Show that the following grammar is ambiguous. S
? AB aaB A ? a Aa B ? b Construct an
equivalent grammar that is unambiguous.
73
Parsing
  • In practical applications, it is usually not
    enough to decide whether a string belongs to a
    language. It is also important to know how to
    derive the string from the language.
  • Parsing uncovers the syntactical structure of a
    string, which is represented by a parse tree.
    (The syntactical structure is important for
    assigning semantics to the string -- for example,
    if it is a program).

74
Parsing
Let G be a context-free grammar for C. Let the
string w be a C program. One thing a compiler
does -- in particular, the part of the compiler
called the parser -- is determine whether w
is a syntactically correct C program. It also
constructs a parse tree for the program that is
used in code generation. There are many
sophisticated and efficient algorithms for
parsing. You may study them in more
advanced classes (for example, on compilers) or
come across them on your own. We will not discuss
them in this class.
75
Theorem 3.3
  • Every language generated by a right-linear
    grammar is regular.
  • Proof
  • Specify a procedure for automatically
    constructing an NFA that mimics the derivations
    of a right-linear grammar.

76
Theorem 3.3
  • Justification
  • The sentential forms produced by a right linear
    grammar have exactly one variable, which occurs
    as the rightmost symbol.
  • Assume that our grammar has a production rule
  • D ? dE
  • and that, during the derivation of a string,
    there is a step wcD ? wcdE
  • We can construct an NFA which has states D and E,
    and an arc labeled d from D to E.
  • NFAs can be converted to DFAs.
  • All languages accepted by DFAs are regular.

77
Theorem 3.3
  • Construction
  • For each variable Vi in the grammar there will be
    a state in the automaton labeled Vi.
  • The initial state of the automaton will be
    labeled V0 and will correspond to the S variable
    in the grammar.
  • For each production rule Vi ? a1a2amVj the
    automaton will have transitions such that
  • d(Vi ? a1a2am) Vj
  • For each production rule Vi ? a1a2am the
    automaton will have transitions such that
  • d(Vi ? a1a2am) Vfinal

78
Theorem 3.3
Construct an NFA that accepts the language
generated by the grammar S ? aA convert
to V0 ? aV1 A ?abS b V1 ? abV0 b
a
b
V0
V1
Vf
b
a
79
Theorem 3.4
  • Every regular language can be generated by a
    right-linear grammar.
  • Proof
  • Generate a DFA for the language.
  • Specify a procedure for automatically
    constructing a right-linear grammar from the DFA.

80
Theorem 3.4
  • Given a regular language L, let M (Q, ?, d, q0,
    F) be a DFA that accepts L. Let Q q0, q1, ,
    qn and ? a1, a2, , am.
  • Construct the grammar G (V, T, S, P) with
  • V q0, q1, , qn
  • T a1, a2, , am
  • S q0.
  • P initially.
  • P, the set of production rules, is constructed as
    follows

81
Theorem 3.4
  • For each transition
  • d(qi, aj) qk
  • in the transition table of M, add to P the
    production
  • qi ? ajqk
  • If qk is in F, then add to P the production
  • qk ? ?

82
Example
  • Construct a right-linear grammar for the language
    L L(aaba)
  • First, build an NFA for L

a
a
a
q0
q1
q2
qf
b
83
Example, cont.
a
a
a
q0
q1
q2
qf
b
P initially. Add to P a rule for each
transition in the NFA q0 ? aq1 q1 ? aq2 q2 ?
bq2 q2 ? aqf Since qf is in F, add to P the
production qf ? ?
84
Example
a
a
a
q0
q1
q2
qf
b
Now P You can convert to normal grammar
notation q0 ? aq1 S ? aA q1 ? aq2 A ? aB q2 ?
bq2 B ? bB q2 ? aqf B ? aC qf ? ? C ? ?
85
Theorem 3.5
A language L is regular if and only if there
exists a left-linear grammar G such that L
L(G). Proof The strategy here is a little
tricky. We describe an algorithm to construct a
right-linear grammar that generates the reverse
of all the strings generated by the left-linear
grammar.
86
Theorem 3.5
Given any left-linear grammar we can construct
from it an right-linear grammar G by replacing
productions of the form A ? Bv with A ?
vRB and A ? v with A ? vR Since L(G) is
generated by a right-linear grammar, it is
regular. It can be demonstrated that L(G)
(L(G))R. It can be proven that the reverse of
any regular language is also regular (see
exercise 12, section 2.3 in the Linz
text). Hence, L is regular.
87
Theorem 3.6
A language L is regular if and only if there
exists a regular grammar G such that L L(G).
Proof Combine our definition of regular
grammars, which includes the statement, A
regular grammar is either right-linear or
left-linear, with theorems 3.4 and 3.5
88
3 ways of specifying regular languages
Regular expressions DFA NFA Regular
grammars
describe
accept
Regular languages
generate
Write a Comment
User Comments (0)
About PowerShow.com