CSE 3813 Introduction to Formal Languages and Automata

About This Presentation

Title:

CSE 3813 Introduction to Formal Languages and Automata

Description:

They are intended for classroom use only and are not a substitute for reading the textbook. ... Another formalism for regular languages ... – PowerPoint PPT presentation

Number of Views:1487

Avg rating:3.0/5.0

Slides: 89

Provided by: genebo

Category:

more less

Transcript and Presenter's Notes

Title: CSE 3813 Introduction to Formal Languages and Automata

1
CSE 3813Introduction to Formal Languages and
Automata

Chapter 3
Regular languages and regular grammars
These class notes are based on material from our
textbook, An Introduction to Formal Languages and
Automata, 4th ed., by Peter Linz, published by
Jones and Bartlett Publishers, Inc., Sudbury, MA,
2006. They are intended for classroom use only
and are not a substitute for reading the textbook.

2
Operations on formal languages
Let L1 10 and L2 011, 11. Union
L1 ? L2 10, 011, 11 Concatenation L1
L2 10011, 1011 Kleene Star L1 ?,
10, 1010, 101010, Other operations
intersection, complement, difference
3
Definition Of Regular Languages

A regular language over an alphabet ? is one that
contains either a single string of length 0 or 1,
or strings which can be obtained by using the
operations of union, concatenation, or Kleene on
strings of length 0 or 1.

4
Alternative definition of regular languages
The simplest possible regular languages are the
empty set and languages consisting of a single
string that is either the empty string or has
length one. For example if ? a,b, the
simplest languages are ?, ?, a, and b. A
regular language is a language that can be built
from these simple languages, by using the three
operations of union, concatenation, and Kleene
star.
5
Regular Languages correspond to Regular
Expressions

L Ø RE is Ø
L ? RE ?
L a RE a
L L1 ? L2 RE (r1 r2)
L L1 L2 RE (r1r2)
L L1 RE (r1)

6
Regular expressions
A useful shorthand for describing regular
languages. Compare to arithmetic expressions,
such as (x 3)/2. An arithmetic expression is
constructed using arithmetic operators, such as
addition and division. A regular expression is
constructed using operations on languages, such
as concatenation, union, and Kleene star. The
value of an arithmetic expression is a number.
The value of a regular expression is a language.
7
Recursive definition of a regular expression
? is a regular expression corresponding to the
language ?. ? is a regular expression
corresponding to the language ?. For each
symbol a ? ?, a is a regular expression
corresponding to the language a. For any
regular expressions r and s, corresponding to
the regular languages L(r) and L(s),
respectively, each of the following is a
regular expression (r s) corresponds
to the language L(r) ? L(s) (r s) or (rs)
corresponds to the language L(r)L(s) (r)
corresponds to the language (L(r))
8
Examples
a b ?, a, b, aa, aaa, aaaa, aaaaa, aba
w ? ? w has exactly one b (a b) any
string of as and bs (a b)aa (a b) w ?
? w contains aa (a b)aa (a b) (a
b)bb (a b)
w ? ? w contains aa or bb (a ?)b
abn n ? 0 bn n ? 0 As with arithmetic
expressions, there is an order of precedence for
operators -- unless you change it
using parentheses. The order is star closure
first, then concatenation, then union.
9
More examples
All strings containing no more than two as (b
c)(? a)(b c)(? a)(b c) All strings
containing no runs of as of length greater than
two (b c)(? a aa)(b c)((b c)(b
c)(? a aa)(b c)) All strings in which
all runs of as have lengths that are multiples
of three (aaa b c)
10
Hints for writing regular expressions
Assume ? a, b, c. Zero or more as
a One or more as
aa Any string at all
(a b c) Any
nonempty string (a b
c)(a b c) Any string that does not contain
a (b c) Any string containing exactly one
a (b c)a(b c)
11
Practice

Let ? a,b,c. Give a regular expression for
the following languages
all strings containing exactly two as
all strings containing no more than three as

12
Practice
Let ? a,b,c. Give a regular expression for
the following languages (a) all strings
containing exactly two as (b c)a(b c)a(b
c) ( b) all strings containing no more than
three as (b c)(? a)(b c)(? a)(b
c)(? a)(b c)
13
Practice
What languages correspond to the
following regular expressions? ab (aaa
bba) (ab)
14
More practice
Give regular expressions for the following
languages, where the alphabet is ? a, b,
c. -- all strings ending in b -- all strings
containing no more than two as -- all strings
of even length
15
More practice
Give regular expressions for the following
languages, where the alphabet is ? 0, 1. --
all strings of one or more 0s followed by a 1
-- all strings of two or more symbols followed
by three or more 0s -- all strings that do not
end with 01
16
Do these strings match the regular expression?
Regular expression String (01
1) 0101 (a
?)b
b (ab)a
? (a b)(ab) bb
17
Big Question
Given a specific regular language L, is it safe
to assume that any subset of L is also regular?
NO! Remember that the more powerful a language
is, the more precisely it is able to discriminate
between strings that do and do not belong to the
language.
18
Big Question
Suppose L is described by the regular expression
ab. Then ab, aabb, aaabbb, etc. are strings in
this language. However, a, b, aab, aaaaaaab,
etc. are also strings in this language. The
language consisting only of the strings of the
form anbn is NOT a regular language, even though
it is a subset of ab.
19
Accepting (review)

Let M (Q, S, q0, d, A) be an FA.
A string x ? S is accepted by M if
d(q0, x) ? A
The language accepted (or recognized) by M is the
set L(M) x ? S x is accepted by M
A language L over the alphabet S is regular iff
there is a Finite Automaton that accepts L.

20
Kleenes theorem
1) For any regular expression r that represents
language L(r), there is a finite automaton that
accepts that same language. 2) For any finite
automaton M that accepts language L(M), there is
a regular expression that represents the same
language. Therefore, the class of languages that
can be represented by regular expressions is
equivalent to the class of languages accepted by
finite automata -- the regular languages.
21
Kleenes theorem part 1
NFA
regular expression
proved
Kleenes Theorem part 2
DFA
22
Theorem 3.1
1st half of Kleenes theorem Let r be a regular
expression. Then there exists some
nondeterministic regular accepter that accepts
L(r). Consequently, L(r) is a regular
language. Proof strategy for any regular
expression, we show how to construct an
equivalent NFA. Because regular expressions are
defined recursively, the proof is by induction.
23
Base step Give an NFA that accepts each of the
simple or base languages, ?, ?, and a for
each a ? ?.
a
24
Inductive step For each of the operations --
union, concatenation and Kleene star -- show how
to construct an accepting NFA. Closure under
union
M1
?
?
M2
?
?
25
Closure under concatenation
26
Closure under Kleene Star
?
M1
?
?
?
27
Closure properties of Regular Languages

Union, concatenation, and Kleene star of two
regular languages will result in a regular
language, since we can write a regular expression
for them.
Intersection and difference (complement) of two
regular languages will also produce a regular
language.
The class of regular languages is said to be
closed under these operations. (More in Ch. 4.)

28
Exercise
Use the construction of the first half of
Kleenes theorem to construct a NFA that accepts
the language L(abaa bbaab).
29
Exercise
Use the construction of the first half of
Kleenes theorem to construct a NFA that accepts
the language L(abaa bbaab).
?
?
FA accepting abaa
q0
qf
?
FA accepting bbaab
?
30
Homework
Construct an NFA that accepts the language
corresponding to the regular expression ((b(a
b)a) a)
31
Theorem 3.2
Kleenes theorem part 2 Let L be a regular
language. Then there exists a regular expression
r such that L L(r). Any language accepted
by a finite automaton can be represented by a
regular expression. The proof strategy For any
DFA, we show how create an equivalent regular
expression. In other words, we describe an
algorithm for converting any DFA to a regular
expression.
32
Expression diagram

A labeled directed graph (similar to a finite
state diagram) in which transitions are labeled
by regular expressions
Has a single start state with no incoming
transitions
Has a single accepting state with no outgoing
transitions
Example

33
Algorithm for converting a DFA into an equivalent
regular expression
Initial step Change every transition labeled
a,b to (ab). Add a single start state with an
outgoing ?-transition to the current start state,
and add a single final state with incoming
?-transitions from every previous final
state. Main step Until expression diagram has
only two states (initial state and final state),
repeat the following -- pick some
non-start, non-final state -- remove it from
the diagram and re-label transitions with
regular expressions so that the same language
is accepted
34
The key step is removing states and
re-labeling transitions with regular expressions.
Here are some examples of how to do this.
b
a
a
aba
b
aba
a
b
abb
a
b
b
a
a
35
Exercise
a,b
a
a
(ab)
?
?
b
b
Continue ...
36
Exercise
a
(ab)
?
?
b
(ab)
?
ab
ab (ab)
37
Exercise
Find a regular expression that corresponds to the
language accepted by the following DFA.
38
Exercise
?
?
abba
abb
?
(abba)abb
39
Homework
Find a regular expression that corresponds to the
language accepted by the following DFA.
0
q1
q0
0
1
1
q2
1
0
40
Applications of regular expressions

Validation
checking that an input string is in valid format
example 1 checking format of email address on
WWW entry form
example 2 UNIX regex command
Search and selection
looking for strings that match a certain pattern
example UNIX grep command
Tokenization
converting sequence of characters (a string) into
sequence of tokens (e.g., keywords, identifiers)
used in lexical analysis phase of compiler

41
Grammar

A grammar G (V, T, S, P) consists of the
following quadruple
a set V of variables (non-terminal symbols),
including a starting symbol S ? NT
a set T of terminals (same as an alphabet, ?)
A start symbol S ? V
a set P of production rules
Example
S ? aS A
A? bA ?

42
Derivation

Strings are derived from a grammar
Example of a derivation
S ? aS ? aaS ? aaA ? aabA ? aab
At each step, a nonterminal is replaced by the
sentential form on the right-hand side of a rule
(a sentential form can contain nonterminals
and/or terminals)
Automata recognize languages grammars generate
languages

43
Context-free grammar

A grammar is said to be context-free if every
rule has a single non-terminal on the left-hand
side
This means you can apply the rule in any context.
More complicated languages (such as English)
have context-dependent rules.
A language generated from a context-free grammar
is called a context-free language.

44
The English language

In fact, it may not be possible to fully specify
the stntax of the English language.
The language grows all the time, and new words
and constructions are constantly being added.
In addition, lanuage exists to be used to convey
meaning sometimes a particular meaning is better
conveyed by not using standard syntax.

45
Consider this poem by e e cummings

l (a
le
af
fa
ll
s)
one
l
iness

46
e e cummingss poetry

A Cummings poem is spare and precise, employing
a few key words eccentrically placed on the page.
Some of these words were invented by Cummings,
often by combining two common words into a new
synthesis. He also revised grammatical and
linguistic rules to suit his own purposes, using
such words as "if," "am," and "because" as nouns,
for example, or assigning his own private
meanings to words.
- http//www.poetryfoundation.org/archive/poet.htm
l?id81323

47
Buffalo Bills defunct who used to ride a
watersmooth-silver stallion and break
onetwothreefourfive pigeonsjustlikethat
Jesus he was a handsome man and
what I want to know is how do you like your
blueeyed boy Mister Death
e e cummings
48
Regular grammar

A grammar is said to be right-linear if all
productions are of the form A?xB or A?x, where A
and B are variables and x is a string of
terminals. (This means that if there is a
variable on the right side of the production
rule, then it is the rightmost element in the
rule.)
A grammar is said to be left-linear if all
productions are of the form A?Bx or A?x
A regular grammar is either right-linear or
left-linear.

49
Linear grammar

A grammar can be linear without being right- or
left-linear.
A linear grammar is a grammar in which at most
one variable can occur on the right side of any
production rule, without any restriction on the
position of the variable.
Example
S ? aS A
A? Ab ?

50
Another formalism for regular languages

Every regular grammar generates a regular
language, and every regular language can be
generated by a regular grammar.
A regular grammar is a simpler, special-case of a
context-free grammar
The regular languages are a proper subset of the
context-free languages

51
Exercises

Find a regular grammar that generates the
language on ? a,b consisting of all strings
with no more than three as.

52
Exercises

Find a regular grammar that generates the
language on ? a,b consisting of all strings
with no more than three as
S ? bS aA ?
A ? bA aB ?
B ? bB aC ?
C ? bC ?

53
Exercises

Find a regular grammar that generates the
language consisting of even-length strings over
a,b.

54
Exercises

Find a regular grammar that generates the
language consisting of even-length strings over
a,b.
S ? aaS abS baS bbS ?

55
Non-regular languages

There are non-regular languages that can be
generated by context-free grammars
The language anbn n ? 0 is generated by the
grammar S ? aSb ?
The language L w na(w) nb(w) is generated
by the grammar S ? SS ? aSb bSa

56
Exercise

What language is generated by the following
context-free (but not regular) grammar?
S ? aSa bSb a b ?

57
Exercise

What language is generated by the following
context-free grammar?
S ? aSa bSb a b ?
This is the odd/even palindrome language
L w(ab?)wR

58
Programming languages

Programming languages are context-free, but not
regular
Programming languages have the following features
that require infinite stack memory
matching parentheses in algebraic expressions
nested if .. then .. else statements, and nested
loops
block structure

59
Exercise

Given a grammar, you should be able to say what
language it generates
Use set notation to define the language generated
by the following grammars
1) S ? aaSB ?
B ? bB b
2) S ? aSbb A
A ? cA c

60
Exercise
S ? aaSB ? B ? bB b It helps to list some
of the strings that can be formed S ? aaSB ? aaB
? aab S ? aaSB ? aaB ? aabB ? aabb S ? aaSB ? aaB
? aabB ? aabbB ? aabbb S ? aaSB ? aaB ? aabB ?
aabbB ? aabbbB ? aabbbb S ? aaSB ? aaaaSBB ?
aaaaBB ? aaaaBb ? aaaabb S ? aaSB ? aaaaSBB ?
aaaaBB ? aaaaBbB ? aaaaBbb ? aaaabbb What is
the pattern? L (aa)nbnb
61
Exercise

Given a language, you should be able give a
grammar that generates it.
For example, give a regular (right-linear)
grammar for the language consisting of all
strings over a, b, c that begin with a, contain
exactly two bs, and end with cc.

62
Exercise

Give a regular (right-linear) grammar for the
language consisting of all strings over a, b, c
that begin with a, contain exactly two bs, and
end with cc
S ? aA
A ? bB aA cA
B ? bC aB cB
C ? aC cC cD
D ? c

63
Derivation
Given the grammar, S ? aaSB ? B ?
bB b the string aab can be derived in two
different ways S ? aaSB ? aaB ? aab S ? aaSB
? aaSb ? aab
64
Parse tree
Both derivations on the previous slide correspond
to the following parse tree.
The tree structure shows the rule that is applied
to each nonterminal, without showing the order of
rule applications. Each internal node of the
tree corresponds to a nonterminal, and the leaves
of the derivation tree represent the string of
terminals.
65
Exercise
Let G be the grammar S ? abSc A A ? cAd
cd 1) Give a derivation of ababccddcc. 2) Build
the parse tree for the derivation of that
string. 3) Use set notation to define L(G).
66
Leftmost (rightmost) derivation
In a leftmost derivation, the leftmost
nonterminal is replaced at each step. In a
rightmost derivation, the rightmost nonterminal
is replaced at each step. Many derivations are
neither leftmost nor rightmost. If there is a
single parse tree, there is also a single
leftmost derivation.
67
Ambiguity
A grammar is ambiguous if it can generate a
string with two possible parse trees. (A string
has more than one parse tree if and only if it
has more than one leftmost derivation.) English
can be ambiguous. Example Disabled fly to see
Carter.
68
Example
Given the following grammar S ? S S S S
1 0 The string 1 1 0 has two different
parse trees.
69
Equivalent grammars
Here is a non-ambiguous grammar that generates
the same language. S ? S A 1 0 A ? A B
1 0 B ? 1 0 Two grammars that generate
the same language are said to be equivalent. To
make parsing easier, we prefer grammars that are
not ambiguous.
70
Dangling else
x 3 if x gt 2 then if x gt 4 then x 1
else x 5 So, what is x?
71
Ambiguous vs. unambiguous
Ambiguous grammar ltstatementgt IF lt
expressiongt THEN ltstatementgt IF
ltexpressiongt THEN ltstatementgt ELSE ltstatementgt
ltotherstatementgt Unambi
guous grammar ltstatementgt ltst1gt ltst2gt
ltst1gt IF ltexpressiongt THEN ltst1gt ELSE
ltst1gt ltotherstatementgt ltst2gt
IF ltexpressiongt THEN ltstatementgt IF
ltexpressiongt THEN ltst1gt ELSE ltst2gt
72
Exercise
Show that the following grammar is ambiguous. S
? AB aaB A ? a Aa B ? b Construct an
equivalent grammar that is unambiguous.
73
Parsing

In practical applications, it is usually not
enough to decide whether a string belongs to a
language. It is also important to know how to
derive the string from the language.
Parsing uncovers the syntactical structure of a
string, which is represented by a parse tree.
(The syntactical structure is important for
assigning semantics to the string -- for example,
if it is a program).

74
Parsing
Let G be a context-free grammar for C. Let the
string w be a C program. One thing a compiler
does -- in particular, the part of the compiler
called the parser -- is determine whether w
is a syntactically correct C program. It also
constructs a parse tree for the program that is
used in code generation. There are many
sophisticated and efficient algorithms for
parsing. You may study them in more
advanced classes (for example, on compilers) or
come across them on your own. We will not discuss
them in this class.
75
Theorem 3.3

Every language generated by a right-linear
grammar is regular.
Proof
Specify a procedure for automatically
constructing an NFA that mimics the derivations
of a right-linear grammar.

76
Theorem 3.3

Justification
The sentential forms produced by a right linear
grammar have exactly one variable, which occurs
as the rightmost symbol.
Assume that our grammar has a production rule
D ? dE
and that, during the derivation of a string,
there is a step wcD ? wcdE
We can construct an NFA which has states D and E,
and an arc labeled d from D to E.
NFAs can be converted to DFAs.
All languages accepted by DFAs are regular.

77
Theorem 3.3

Construction
For each variable Vi in the grammar there will be
a state in the automaton labeled Vi.
The initial state of the automaton will be
labeled V0 and will correspond to the S variable
in the grammar.
For each production rule Vi ? a1a2amVj the
automaton will have transitions such that
d(Vi ? a1a2am) Vj
For each production rule Vi ? a1a2am the
automaton will have transitions such that
d(Vi ? a1a2am) Vfinal

78
Theorem 3.3
Construct an NFA that accepts the language
generated by the grammar S ? aA convert
to V0 ? aV1 A ?abS b V1 ? abV0 b
a
b
V0
V1
Vf
b
a
79
Theorem 3.4

Every regular language can be generated by a
right-linear grammar.
Proof
Generate a DFA for the language.
Specify a procedure for automatically
constructing a right-linear grammar from the DFA.

80
Theorem 3.4

Given a regular language L, let M (Q, ?, d, q0,
F) be a DFA that accepts L. Let Q q0, q1, ,
qn and ? a1, a2, , am.
Construct the grammar G (V, T, S, P) with
V q0, q1, , qn
T a1, a2, , am
S q0.
P initially.
P, the set of production rules, is constructed as
follows

81
Theorem 3.4

For each transition
d(qi, aj) qk
in the transition table of M, add to P the
production
qi ? ajqk
If qk is in F, then add to P the production
qk ? ?

82
Example

Construct a right-linear grammar for the language
L L(aaba)
First, build an NFA for L

a
a
a
q0
q1
q2
qf
b
83
Example, cont.
a
a
a
q0
q1
q2
qf
b
P initially. Add to P a rule for each
transition in the NFA q0 ? aq1 q1 ? aq2 q2 ?
bq2 q2 ? aqf Since qf is in F, add to P the
production qf ? ?
84
Example
a
a
a
q0
q1
q2
qf
b
Now P You can convert to normal grammar
notation q0 ? aq1 S ? aA q1 ? aq2 A ? aB q2 ?
bq2 B ? bB q2 ? aqf B ? aC qf ? ? C ? ?
85
Theorem 3.5
A language L is regular if and only if there
exists a left-linear grammar G such that L
L(G). Proof The strategy here is a little
tricky. We describe an algorithm to construct a
right-linear grammar that generates the reverse
of all the strings generated by the left-linear
grammar.
86
Theorem 3.5
Given any left-linear grammar we can construct
from it an right-linear grammar G by replacing
productions of the form A ? Bv with A ?
vRB and A ? v with A ? vR Since L(G) is
generated by a right-linear grammar, it is
regular. It can be demonstrated that L(G)
(L(G))R. It can be proven that the reverse of
any regular language is also regular (see
exercise 12, section 2.3 in the Linz
text). Hence, L is regular.
87
Theorem 3.6
A language L is regular if and only if there
exists a regular grammar G such that L L(G).
Proof Combine our definition of regular
grammars, which includes the statement, A
regular grammar is either right-linear or
left-linear, with theorems 3.4 and 3.5
88
3 ways of specifying regular languages
Regular expressions DFA NFA Regular
grammars
describe
accept
Regular languages
generate

Write a Comment

User Comments (0)