Regular Expression - PowerPoint PPT Presentation

About This Presentation

Title:

Regular Expression

Description:

r. p. q (ab c)* Figure (a) Figure (b) ... Suppose q has a self-loop, and is on a path between its two neighboring states r and s ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 20

Provided by: scie210

Learn more at: http://www.cs.rpi.edu

Category:

more less

Transcript and Presenter's Notes

Title: Regular Expression

1
Regular Expression
While studying formal languages, we often
expresse the languages in terms of set notation,
like aibi i gt 0. Such set notation is
practical only when the language property is
simple enough to describe. However, all regular
languages can be expressed succinctly in terms
of a regular expression, which is defined as
follow.

Regular expressions over an alphabet ? are
defined recursively as follows.
(1) Ø, which denotes the empty set, is a
regular expression.
(2) ? is a regular expression and denotes the
set ?.
(3) For every a ? ?, a is a regular
expression and denotes the set a.
(4) If r and s are regular expressions that
denote the sets R and S, respectively,
then (r s), ( rs ), and ( r ) are regular
expressions that denote, respectively, the
sets R ? S, RS, and R.
We may omit parentheses from a regular expression
if it expresses the same set under the assumption
that the star has higher precedence than
concatenation or , and that concatenation has
higher precedence than . For a regular
expression r, by L(r) we denote the set of
strings that is expressed by regular expression r.

2
Regular expression (conted)
For example regular language aibj i, j ? 0
can be expressed in regular expression ab, and
regular language xaaybbz x, y, z ? a, b
can be expressed as (ab)aa(ab)bb(ab). We
can easily prove that ab is a regular
expression according to the definition since a
and b is regular expressions, respectively,
denoting the sets a and b, by definition part
(4) expressions a and b are regular
expressions, which denote, respectively, the sets
a and b. Since a and b are regular
expression, the concatenation ab is also a
regular expression by definition part (4), which
denotes the set ab, which is equivalent to
aibj i, j ? 0 . By similar argument we can
show that (ab)aa(ab)bb(ab) is a regular
expression which denotes the regular language
above. Later we will see that every regular
language can be expressed in a regular
expression, and if a language is expressible in a
regular expression, then that language is regular.
3
Chomsky Hierarchy of Languages andRelated Models
We have studied four types of formal grammars
and their languages, and four different
computational models that recognize the
languages, together with other related models,
such as L-systems, syntax flow graph, and regular
expressions. Now we will study more closely
about their relationships. The table on the next
page summarizes the relationship among those
models. This relationship, called the Chomsky
hierarchy (after Noam Chomsky, who defined the
classes of languages) is one of the most
significant achievement in computer science. In
the table the vertical relationship ? denotes
proper containment and the horizontal
relationship ? denotes the characterizations.
For example, the class of context-free languages
properly contains regular languages, finite state
machines can only recognize regular languages,
and the languages recognized by finite state
machines can be expressed by regular expressions.
Many powerful models have been introduced (for
example, the ones shown at upper right corner),
which turned out to be computationally equivalent
to the Turing machines and their languages, also
called recursively enumerable sets.
4
The Chomsky Hierarchy

5
Characterization Theorem among Regular Grammars,
FAs and Regular Expressions

We only prove the characterization (i.e.,
horizontal relationship) at the level of regular
languages, and later prove the vertical relations
for the lower two levels only.
Theorem. (1) A language L is regular if and only
if it is accepted by an FA M. (2) A language L
can be expressible in terms of a regular
expression if and only if L is accepted by an FA
M.
Proof of (1-a) If L is regular, then there is an
FA M which accepts L. We construct an FA M with
any regular grammar G whose language is L.
Without loss of generality, assume G has
production rules of the form A ? xB or A ? x,
where x is ? or a single terminal symbol, i.e.,
x 1. Otherwise, we can easily convert the
rules into these restricted forms without
affecting the language of the grammar. For
example, if there is rule A ? abbB in a grammar,
this rule can be converted to a set of rules as
follows without changing the language, where Bi
are new non-terminal symbols.

6
Proof of Characterization Theorem(conted)
Suppose the grammar is given as G (VT , VN , P,
S), We construct an FA M from G using the rules
shown below. Let A, B ? VN and a ?VT ? ? .
We can prove that L(G) L(M), i.e., the language
accepted by M is exactly the language generated
by the grammar G.
7
Proof of Characterization Theorem(conted)
Proof of (1-b) If L is the language accepted by
an FA M, then there is a regular G which
generates L. Let M ( Q, ?, ? , q0, F ).
Construct a regular grammar G from M according
to the rules shown blow, where A, B ? Q and a, b
?? ? ? .
?
8
Characterization Theorem(examples)
Example 1. (Regular grammar ? FA)
S ? aS bbcB B ? bA a A ? aS
bB ?
?
Example 2. (FA ? regular grammar)
Transform to grammar
S ? aS aA A ? bB B ? bB bS aD ?
D ? aC C ? aB cE E ? ?
9
Proof of Characterization Theorem(conted)
Proof of (2)-(a) If a language L can be
expressible in terms of a regular expression,
then L is accepted by an FA M.
Going along the definition of regular expression,
we show how to construct an FA for a given
regular expression. (This is proof by induction.)
Assume that the alphabet is ?.
1. If the regular expression is ?, ?, or a ? ?,
which respectively denote the empty set, ?,
and a. Then for each case we construct the
following FA.
2. Suppose that for regular expressions r1 and
r2, we have constructed FA M1 And M2, which
recognize the language expressed by r1 and r2,
respectively. Then we can construct FA M12, M12,
and M1 which respectively recognize the
languages expressed by regular expressions r1
r2 , r1r2 , and (r1), as follows
10
Proof of Characterization Theorem(conted)
start
11
Proof of Characterization Theorem(conted)
Proof of (2)-(b) If L is a language L accepted
by an FA M, then L can be expressible in terms of
a regular expression.
Definition Generalized state transition graph.
For all strings expressed by a regular expression
r, if an FA M takes transition from a state p to
a state q, we write ?(p, r) q, and draw state
transition as the following Figure (a) shows.
Figure (b) is an example.
The state transition graphs of M can be
considered as a generalized state transition
graphs of special case, where each edge label has
a regular expression expressing one string of
length 1 or zero (for the case of ? transition).
By further generalizing ?, for a path label w
r1r2ri (i.e., a concatenated sequence of regular
expressions), let ?(p, w) q denotes the
sequence of transitions along a path with labels
of regular expressions r1, r2, , ri.
12
For a generalized state transition graph G, let
L(G) be the set of strings defined as follows,
where q0 is the start state and F is the set of
accepting states. Clearly L(G) L(M). L(G)
x x ?L(w), w is a path label such that ?(q0,
w) qf ? F Given a generalized state
transition graph G of an FA, we can eliminate a
state from G, and transform it to another
generalized state transition graph G' such
that L(G) L(G'). Suppose that q is a
non-accepting state in a state transition graph
G. Suppose q has a self-loop, and is on a path
between its two neighboring states r and s as
shown in figure (a) below. (Dotted arrows
indicate other possible transitions.) State q can
be eliminated and generalized transitions can be
added without changing the language of the
automaton as figure (b) shows.
(a) G
(b) G'
13
Now, we give an example for transforming a
state transition graph G into a regular
expression using the above technique. Consider an
FA whose state transition graph is shown in
figure (a) below. Clearly, if an automaton has k
? 1 Accepting states, then the language of the
automaton is the union of the languages accepted
by k accepting states. So we compute a regular
expression ri for the language Li accepted by
each of the k accepting state, and find the
regular expression for the language of the
automaton r r1 r2 . . . . rk For
example, the language accepted by the automaton
shown below is the union of the languages
accepted by state 0 and 1.
14
For this example, we first compute the regular
expression for the language accepted by state 4
by changing state 0 to non-accepting state.
Leaving the start state and the accepting state,
we eliminate all other states, one at a time.
Eliminating state 2 will give the generalized
state transition graph shown in (b). We could
eliminated state 1 or 3 first. In general it is
better to choose a state which does not induce
too many new links. Before eliminating state 3,
we merge links which have the same origin and
destination using the operator, and get figure
(c) below.
15
Eliminating state 3 gives the graph shown in
figure (d), and
bb
16
Finally eliminating state 1 we get the graph in
figure (e). Notice that regular expression bbb
on the self-loop of state 4 has been simplified
to b, because looping on b or bb is equivalent to
looping on b.
17
By merging edges which have the same origin and
destination, we get the final transition graph
(f), from which we can construct a regular
expression r4 whose language is exactly the
language accepted by state 4.
18
In general suppose a generalized transition
graph with the start state and an accepting state
is given with each edge labeled with a regular
expression as shown in figure (g) below . Then
regular expression r2 shown in the figure
expresses the language accepted by the automaton.
By substituting rij in the expression in figure
(g) with corresponding regular expression from
figure (f), we get the regular expression r4 for
the language accepted by state 4.
19
Now to construct a regular expression for the
language accepted by the other accepting state,
which is the start state, we can start with
figure (f) by changing the start state back to
accepting state and state 4 to non-accepting
state as shown in figure (h). This is the general
case as shown in figure (i) whose regular
expression can be given as r1 in the figure.
Substituting corresponding regular expressions
from figure (h), we get a regular expression r0
which denotes the language accepted by state 0.
Finally we get a regular expression r r0 r4
which denotes the language accepted by automaton
M.

Write a Comment

User Comments (0)