Title: Module 28
1Module 28
- Context Free Grammars
- Definition of a grammar G
- Deriving strings and defining L(G)
- Context-Free Language definition
2Context-Free Grammars
3Definition
- A context-free grammar G (V, S, S, P)
- V finite set of variables (nonterminals)
- S finite set of characters (terminals)
- S start variable
- element of V
- role is similar to that of q0 for an FSA or NFA
- P finite set of grammar rules or production
rules - Syntax of a production
- variable ? string of variables and terminals
4English Context-Free Grammar
- ECFG (V, S, S, P)
- V ltsentencegt, ltnoun phrasegt, ltverb phrasegt,
... - people sometimes use lt gt to delimit variables
- In this course, we generally will use capital
letters to denote variables - S a, b, c, ..., z, , ,, ., ...
- S ltsentencegt
- P ltsentencegt ? ltnoun phrasegt ltverb phrasegt
ltpctgt, ltnoun phrasegt ? ltarticlegt ltadjgt ltnoungt,
...
5aibi igt0 CFG
- ABG (V, S, S, P)
- V S
- S a, b
- S S
- P S ? aSb, S ? ab or S ? aSb ab
- second format saves some space
6Context-Free Grammars
- Deriving strings, defining L(G), and defining
context-free languages
7Defining ?, gt notation
- First ? notation
- This is used to define the productions of a
grammar - S ? aSb ab
- Second gtG notation
- This is used to denote the application of a
production rule from a grammar G - S gtABG aSb gtABG aaSbb gtABG aaabbb
- We say that string S derives string aSb (in one
step) - We say that string aSb derives string aaSbb (in
one step) - We say that string aaSbb derives string aaabbb
(in one step) - We often omit the grammar subscript when the
intended grammar is unambiguous
8Defining gt continued
- Third gtkG notation
- This is used to denote k applications of
production rules from a grammar G - S gt2ABG aaSbb
- We say that string S derives string aaSbb in two
steps - aSb gt2ABG aaabbb
- We say that string aSb derives string aaabbb in
two steps - We often omit the grammar subscript when the
intended grammar is unambiguous
9Defining gt continued
- Fourth gtG notation
- This is used to denote 0 or more applications of
production rules from a grammar G - S gtABG S
- We say that string S derives string S in 0 or
more steps - S gtABG aaSbb
- We say that string S derives string aaSbb in 0 or
more steps - aSb gtABG aaSbb
- We say that string aSb derives string aaSbb in 0
or more steps - aSb gtABG aaabbb
- We say that string aSb derives string aaabbb in 0
or more steps - We often omit the grammar subscript when the
intended grammar is unambiguous
10Defining derivations
- Derivation of a string x
- The complete step by step derivation of a string
x from the start variable S - Key fact each step in a derivation makes only
one application of a production rule from G - Example Derivation of string aaabbb using ABG
- S gtABG aSb gtABG aaSbb gtABG aaabbb
- Example 2 AG (V, S, S, P) where P S ?SS a
- Deriving string aaa
- S gt SS gt Sa gt SSa gt aSa gt aaa
11Defining L(G)
- Generating strings
- If S gtG x, then grammar G generates string x
- Note G generates strings which contain terminals
and nonterminals - aSb contains nonterminals and terminals
- S contains only nonterminals
- aaabbb contains only terminals
- L(G)
- The set of strings over S generated by grammar G
- Note we only consider terminal strings generated
by G - aibi i gt 0 L(ABG)
- ai i gt 0 L(AG)
12Context-Free Languages
- Context-Free Languages
- A language L is a context-free language (CFL) iff
- Results so far
- ai i gt 0 is a CFL
- One CFG G such that L(G) this language is AG
- Note this language is also regular
- aibi i gt 0 is a CFL
- One CFG G such that L(G) this language is ABG
- Note this language is NOT regular
13Example
- Let BAL the set of strings over (,) in which
the parentheses are balanced - Prove that BAL is a CFL
- To prove this, you need to come up with a CFG
BALG such that L(BALG) BAL - BALG (V, S, S, P)
- V S
- S (, )
- S S
- P ?
- Give derivations of ((( ))) and ( )(( )) with
your grammar
14Module 29
- Parse/Derivation Trees
- Leftmost derivations, rightmost derivations
- Ambiguous Grammars
- Examples
- Arithmetic expressions
- If-then-else Statements
- Inherently ambiguous CFLs
15Context-Free Grammars
- Parse Trees
- Leftmost/rightmost derivations
- Ambiguous grammars
16Parse Tree
- Parse/derivation trees are structured derivations
- The structure graphically illustrates semantic
information about the string - Formalization of concept we encountered in
regular languages unit - Note, what we saw before were not exactly parse
trees as we define them now, but they were close
17Parse Tree Example
- Parse tree for string ( )(( )) and grammar BALG
- BALG (V, S, S, P)
- V S, S (, ), S S
- P S ? SS (S) l
- One derivation of ( )(( ))
- S gt SS gt (S)S gt ( )S gt ( )(S) gt (
)((S)) gt ( )(( )) - Parse tree
18Comments about Example
- Syntax
- draw a unique arrow from each variable to each
character that is a direct child of that variable - A line instead of an arrow is ok
- The derived string can be read in a left to right
traversal of the leaves - Semantics
- The tree graphically illustrates the nesting
structure of the string of parentheses
19Leftmost/Rightmost Derivations
- There is more than one derivation of the string (
)(( )). - S gt SS gt (S)S gt( )S gt ( )(S)
- gt ( )((S)) gt ( )(( ))
- S gt SS gt (S)S gt (S)(S) gt ( )(S)
- gt ( )((S)) gt ( )(( ))
- S gt SS gt S(S) gt S((S)) gt S(( ))
- gt (S)(( )) gt( )(( ))
- Leftmost derivation
- Leftmost variable is always expanded
- Which one of the above is leftmost?
- Rightmost derivation
- Rightmost variable is always expanded
- Which one of the above is rightmost?
20Comments
- Fix a string and a grammar
- Any derivation corresponds to a unique parse tree
- Any parse tree can correspond to many different
derivations - Example
- The one parse tree corresponds to all three
derivations - Unique mappings
- For any parse tree, there is a unique
leftmost/rightmost derivation that it corresponds
to
- S gt SS gt (S)S gt( )S gt ( )(S)
- gt ( )((S)) gt ( )(( ))
- S gt SS gt (S)S gt (S)(S) gt ( )(S)
- gt ( )((S)) gt ( )(( ))
- S gt SS gt S(S) gt S((S)) gt S(( ))
- gt (S)(( )) gt( )(( ))
21Example
- S gt SS gt SSS gt (S)SS gt ( )SS gt ( )S gt
( ) - The above is a leftmost derivation of the string
( ) from the grammar BALG - Draw the corresponding parse tree
- Draw the corresponding rightmost derivation
- S gt (S) gt (SS) gt (S(S)) gt (S( )) gt ((
)) - The above is a rightmost derivation of the string
(( )) from the grammar BALG - Draw the corresponding parse tree
- Draw the corresponding leftmost derivation
22Ambiguous Grammars
- Examples
- Arithmetic Expressions
- If-then-else statements
- Inherently ambiguous grammars
23Ambiguous Grammars
- A grammar G is ambiguous if there exists a string
x in L(G) with two or more distinct parse trees - (2 or more distinct leftmost/rightmost
derivations) - Example
- Grammar AG is ambiguous
- String aaa in L(AG) has 2 rightmost derivations
- S gt SS gt SSS gt SSa gt Saa gt aaa
- S gt SS gt Sa gt SSa gt Saa gt aaa
242 Simple Examples
- Grammar BALG is ambiguous
- String ( ) in L(BALG) has gt1 leftmost derivation
- S gt (S) gt ( )
- S gt (S) gt (SS) gt(S) gt( )
- Give another leftmost derivation of ( ) from BALG
- Grammar ABG is NOT ambiguous
- Consider any string x in aibi i gt 0
- There is a unique parse tree for x
25Legal Arithmetic Expressions
- Develop a grammar MATHG (V, S, S, P) for the
language of legal arithmetic expressions - S 0, 1, , , -, /, (, )
- Strings in the language include
- 0
- 10
- 1011111100
- 10(11111100)
- Strings not in the language include
- 10
- 11101
- )(
26Grammar MATHG1
- V E, N
- S 0, 1, , , -, /, (, )
- S E
- P
- E ? N EE EE E/E E-E (E)
- N ? N0 N1 0 1
27MATHG1 is ambiguous
E ? N EE EE E/E E-E (E)N ? N0 N1
0 1
- Come up with two distinct leftmost derivations of
the string 11011 - E gt EE gt NE gt N1E gt 11E gt 11EE
gt 11NE gt 110E gt 110N gt 110N1 gt
11011 - E gt EE gt EEE gt NEE gt N1EE gt
11EE gt 11NE gt 110E gt 110N gt
110N1 gt11011 - Draw the corresponding parse trees
28Corresponding Parse Trees
- E gt EE gt NE gt N1E gt 11E gt 11EE
gt 11NE gt 110E gt 110N gt 110N1 gt
11011
- E gt EE gt EEE gt NEE gt N1EE gt
11EE gt 11NE gt 110E gt 110N gt
110N1 gt11011
E
E
29Parse Tree Meanings
Note how the parse trees captures the semantic
meaning of string 11011. More specifically,
what number does the first parse tree
represent? What number does the second parse
tree represent?
30Implications
- Two interpretations of string 11011
- 11(011) 11
- (110)11 1001
- What if a line in a program is
- MSU_Tuition 11011
- What is MSU_Tuition?
- Depends on how the expression 11011 is parsed.
- This is not good.
- Ambiguity in grammars is undesirable,
particularly if the grammar is used to develop a
compiler for a programming language like C. - In this case, there is an unambiguous grammar for
the language of arithmetic expressions
31If-Then-Else Statements
- A grammar ITEG (V, S, S, P) for the language of
legal If-Then-Else statements - V (S, BOOL)
- S Dlt85, Dgt50, grade3.5, grade3.0, if, then,
else - S S
- P
- S ? if BOOL then S else S if BOOL then S
grade3.5 grade3.0 - BOOL ? Dlt85 Dgt50
32ITEG is ambiguous
S ? if BOOL then S grade3.5 grade3.0 if
BOOL then S else S BOOL ? Dlt85 Dgt50
- Come up with two distinct leftmost derivations of
the string - if Dlt85 then if Dgt50 then grade3.5 else
grade3.0 - S gtif BOOL then S else S gt if Dlt85 then S
else S gt if Dlt85 then if BOOL then S else S gt
if Dlt85 then if Dgt50 then S else S gt if Dlt85
then if Dgt50 then grade3.5 else S gt if Dlt85
then if Dgt50 then grade3.5 else grade3.0 - S gtif BOOL then S gt if Dlt85 then S gt if
Dlt85 then if BOOL then S else S gt if Dlt85 then
if Dgt50 then S else S gt if Dlt85 then if Dgt50
then grade3.5 else S gt if Dlt85 then if Dgt50
then grade3.5 else grade3.0 - Draw the corresponding parse trees
33Corresponding Parse Trees
- S gtif BOOL then S else S gt if Dlt85 then S
else S gt if Dlt85 then if BOOL then S else S gt
if Dlt85 then if Dgt50 then S else S gt if Dlt85
then if Dgt50 then grade3.5 else S gt if Dlt85
then if Dgt50 then grade3.5 else grade3.0
- S gtif BOOL then S gt if Dlt85 then S gt if
Dlt85 then if BOOL then S else S gt if Dlt85 then
if Dgt50 then S else S gt if Dlt85 then if Dgt50
then grade3.5 else S gt if Dlt85 then if Dgt50
then grade3.5 else grade3.0
S
S
34Parse Tree Meanings
S
S
if
B
then
S
if
S
B
then
S
else
S
else
if
Dlt85
B
then
S
if
Dlt85
grade3.0
B
then
S
Dgt50
grade3.5
grade3.0
Dgt50
grade3.5
If you receive a 90 on type D points, what is
your grade? By parse tree 1 By parse tree 2
35Implications
- Two interpretations of string
- if Dlt85 then if Dgt50 then grade3.5 else
grade3.0 - Issue is which if-then does the last ELSE attach
to? - This phenomenon is known as the dangling else
- Answer Typically, else binds to NEAREST if-then
- In this case, there is an unambiguous grammar for
handling if-thens as well as if-then-elses
36Inherently ambiguous CFLs
- A CFL L is inherently ambiguous iff for all CFGs
G such that L(G) L, G is ambiguous - Examples so far
- None of the CFLs weve seen so far are
inherently ambiguous - While the CFGs weve seen ambiguous, there do
exist unambiguous CFGs for those CFLs. - Later result
- There exist inherently ambiguous CFLs
- Example aibjck ij or jk or ijk
- Note ijk is unnecessary, but I added it here
for clarity
37Summary
- Parse trees illustrate semantic information
about strings - Ambiguous grammars are undesirable
- This means there are multiple parse trees for
some string - These strings can be interpreted in multiple ways
- There are some heuristics people use for taking
an ambiguous grammar and making it unambiguous,
but this is not the focus of this course - There are some inherently ambiguous CFLs
- Thus, the above heuristics do not always work
38Module 30
- EQUAL language
- Designing a CFG
- Proving the CFG is correct
39EQUAL language
40EQUAL
- EQUAL is the set of strings over a,b with an
equal number of as and bs - Strings in EQUAL include
- aabbab
- bbbaaa
- abba
- Strings in a,b not in EQUAL include
- aaa
- bbb
- aab
- ababa
41Designing a CFG for EQUAL
- Think recursively
- Base Case
- What is the shortest possible string in EQUAL?
- Production Rule
42Recursive Case
- Recursive Case
- Now consider a longer string x in EQUAL
- Since x has length gt 0, x must have a first
character - This must be a or b
- Two possibilities for what x looks like
- x ay
- What must be true about relative number of as
and bs in y? - x bz
- What must be true about relative number of as
and bs in z?
43Case 1 xay
- x ay where y has one extra b
- What must y look like?
- Some examples
- b
- babba
- aabbbab
- aaabbbb
- Is there a general pattern that applies to all of
the above examples? - More specifically, show how we can decompose all
of the above strings y into 3 pieces, two of
which belong to EQUAL. - Some of these pieces might be the empty string l
44Decomposing y
- y has one extra b
- Possible examples
- b, babba, aabbbab, aaabbbb
- Decomposition
- y ubv where
- u and v both have an equal number of as and bs
- Decompose the 4 strings above into u, b, v
- lbl, aabbbab, lbabba, aaabbbbl
45Implication
- Case 1 xay
- y has one extra b
- Case 1 refined xaubv
- u, v belong to EQUAL
- Production rule for this case?
46Case 2 xbz
- Case 2 xbz
- z has one extra a
- Case 2 refined xbuav
- u, v belong to EQUAL
- Production rule for this case?
47Final Grammar
- EG (V, S, S, P)
- V S
- S a,b
- S S
- P
48EQUAL language
49Is our grammar correct?
- How do we prove our grammar is correct?
- Informal
- Test some strings
- Review logic behind program (CFG) design
- Formal
- First, show every string derived by EG belongs to
EQUAL - That is, show L(EG) is a subset of EQUAL
- Second, show every string in EQUAL can be derived
by EG - That is, show EQUAL is a subset of L(EG)
- Both proofs will be inductive proofs
- Inductive proofs and recursive algorithms go well
together
50L(EG) subset of EQUAL
- Let x be an arbitrary string in L(EG)
- What does this mean?
- S gtEG x
- Follows from definition of x in L(EG)
- We will prove the following
- If S gt1EG x, then x is in EQUAL
- If S gt2EG x, then x is in EQUAL
- If S gt3EG x, then x is in EQUAL
- If S gt4EG x, then x is in EQUAL
- ...
51Base Case
- Statement to be proven
- For all n 1, if S gtnEG x, then x is in EQUAL
- Prove this by induction on n
- Base Case
- n 1
- What is the set of strings x S gt1EG x?
- What do we need to prove about this set of
strings?
52Inductive Case
- Inductive Hypothesis
- For 1 j n, if S gtjEG x, then x is in EQUAL
- Note, this is a strong induction hypothesis
- Traditional inductive hypothesis would take form
- For some n 1, if S gtnEG x, then x is in EQUAL
- The difference is we assume the basic hypothesis
for all integers between 1 and n, not just n - Statement to be Proven in Inductive Case
- If S gtn1EG x, then x is in EQUAL
53Regular induction vs Strong induction
- Infinite Set of Facts
- Fact 1
- Fact 2
- Fact 3
- Fact 4
- Fact 5
- Fact 6
- Base Case
- Prove fact 1
- Regular inductive case
- For n 1,
- Fact n --gt Fact n1
- Strong inductive case
- For n 1,
- Fact 1 to Fact n --gt Fact n1
54Visualization of Induction
Regular Induction
Strong Induction
Fact 1
Fact 1
Fact 2
Fact 2
Fact 3
Fact 3
Fact 4
Fact 4
Fact 5
Fact 5
Fact 6
Fact 6
Fact 7
Fact 7
Fact 8
Fact 8
Fact 9
Fact 9
55Proving Inductive Case
- If S gtn1EG x, then x is in EQUAL
- Let x be an arbitrary string such that S gtn1EG
x - Examining EG, what are the three possible first
derivation steps - Case 1 S gt gtnEG x
- Case 2 S gt gtnEG x
- Case 3 S gt gtnEG x
- One of the cases is impossible. Which one and
why?
56Case 2 S gt gtnEG x
- This means x has the form aubv where
- What can we conclude about u (dont apply IH)?
- What can we conclude about v (dont apply IH)?
- Apply the inductive hypothesis
- u and v belong to EQUAL
- Why do we need the strong inductive hypothesis?
- Conclude x belongs to EQUAL
- x aubv where u and v belong to EQUAL
- Clearly the number of as in x equals the number
of bs in x
57Case 3 S gt gtnEG x
- This means x has the form buav where
- What can we conclude about u (no IH)?
- What can we conclude about v (no IH)
- Apply the inductive hypothesis
- u and v belong to EQUAL
- Why do we need the strong inductive hypothesis?
- Conclude x belongs to EQUAL
- x buav where u and v belong to EQUAL
- Clearly the number of as in x equals the number
of bs in x
58L(EG) subset of EQUAL
- Wrapping up inductive case
- In all possible derivations of x, we have shown
that x belongs to EQUAL - Thus, we have proven the inductive case
- Conclusion
- By the principle of mathematical induction, we
have shown that L(EG) is a subset of EQUAL
59EQUAL subset of L(EG)
- Let x be an arbitrary string in EQUAL
- What does this mean?
- We will prove the following
- If x 0 and x is in EQUAL, then x is in L(G)
- If x 1 and x is in EQUAL, then x is in L(G)
- If x 2 and x is in EQUAL, then x is in L(G)
- If x 3 and x is in EQUAL, then x is in L(G)
- ...
60EQUAL subset of L(EG)
- Statement to be proven
- For all n 0, if x n and x is in EQUAL, then
x is in L(EG) - Prove this by induction on n
- Base Case
- n 0
- What is the only string x such that x0 and x
is in EQUAL? - Prove this string belongs to L(EG)
61Inductive Case
- Inductive Hypothesis
- For 0 j n, if x j and x is in EQUAL, then
x is in L(EG) - Again, this is a strong induction hypothesis
- Statement to be Proven in Inductive Case
- For n 0,
- if x n1 and x is in EQUAL, then x is in L(EG)
62Proving Inductive Case
- If xn1 and x is in EQUAL, then x is in L(EG)
- Let x be an arbitrary string such that xn1
and x is in L(EG) - Examining S, what are the two possibilities for
the first character in x? - Case 1 first character in x is
- Case 2 first character in x is
- In each case, what can we say about the remainder
of x? - Case 1 the remainder of x
- Case 2 the remainder of x
63Case 1 x ay
- What can we say about y in this case?
- This means x has the form aubv where
- u is in EQUAL and has length n
- v is in EQUAL and has length n
- Proving this statement true
- Consider all the prefixes of string y
- length 0 l
- length 1 y1
- length 2 y1y2
-
- length n y1y2 yn y
64Case 1 x ay
- Consider all the prefixes of string y
- length 0 l
- length 1 y1
- length 2 y1y2
-
- length n y1y2 yn y
- The first prefix l has the same number of as as
bs - The last prefix y has one extra b
- The relative number of as and bs changes in the
length i prefix differs by only one from the
length i-1 prefix - Thus, there must be a first prefix t of y where t
has one extra b - Furthermore, the last character of t must be b
- Otherwise, t would not be the FIRST prefix of y
with one extra b - Break t into u and b and let the remainder of y
be v - The statement follows
65Case 1 x aubv
- x aubv
- u is in EQUAL and has length n
- v is in EQUAL and has length n
- Apply the induction hypothesis
- What can we conclude from applying the IH?
- Why did we need a strong inductive hypothesis?
- Conclude x is in L(EG) by constructing a
derivation - S gt aSbS gtEG aubS gtEG aubv
66Case 2 x buav
- x buav
- u is in EQUAL and has length n
- v is in EQUAL and has length n
- Apply the induction hypothesis
- What can we conclude about u and v?
- Conclude x is in L(EG) by constructing a
derivation - S gt bSaS gtEG buaS gtEG buav
- Justify each of the steps in this derivation
67EQUAL subset of L(EG)
- Wrapping up inductive case
- For all possible first characters of x, we have
shown that x belongs to L(EG) - Thus, we have proven the inductive case
- Conclusion
- By the principle of mathematical induction, we
have shown that EQUAL is a subset of L(EG)
68Module 31
- Closure Properties for CFLs
- Kleene Closure
- construction
- examples
- proof of correctness
- Others covered less thoroughly in lecture
- union, concatenation
- CFLs versus regular languages
- regular languages subset of CFL
69Closure Properties for CFLs
70CFL closed under Kleene Closure
- Let L be an arbitrary CFL
- Let G1 be a CFG s.t. L(G1) L
- G1 exists by definition of L1 in CFL
- Construct CFG G2 from CFG G1
- Argue L(G2) L
- There exists CFG G2 s.t. L(G2) L
- L is a CFL
71Visualization
- Let L be an arbitrary CFL
- Let G1 be a CFG s.t. L(G1) L
- G1 exists by definition of L1 in CFL
- Construct CFG G2 from CFG G1
- Argue L(G2) L
- There exists CFG G2 s.t. L(G2) L
- L is a CFL
CFL
72Algorithm Specification
- Input
- CFG G1
- Output
- CFG G2 such that L(G2)
CFG G1
CFG G2
73Construction
- Input
- CFG G1 (V1, S, S1, P1)
- Output
- CFG G2 (V2, S, S2, P2)
- V2 V1 union T
- T is a new symbol not in V1 or S
- S2 T
- P2 P1 union ??
74Closure Properties for CFLs
75Example 1
V2 V1 union T T is a new symbol not in
V1 or SS2 TP2 P1 union T ? ST l
- Input grammar
- V S
- S a,b
- S S
- P
- S ? aa ab ba bb
- Output grammar
- V
- S a,b
- Start symbol is
- P
76Example 2
V2 V1 union T T is a new symbol not in
V1 or SS2 TP2 P1 union T ? ST l
- Input grammar
- V S, T
- S a,b
- Start symbol is T
- P
- T ? ST l
- S ? aa ab ba bb
- Output grammar
- V
- S a,b
- Start symbol is
- P
77Closure Properties for CFLs
- Kleene Closure Proof of Correctness
78Is our construction correct?
- How do we prove our construction is correct?
- Informal
- Test some strings
- Review logic behind construction
- Formal
- First, show every string derived by G2 belongs to
(L(G1)) - That is, show L(G2) is a subset of (L(G1))
- Second, show every string in (L(G1)) can be
derived by G2 - That is, show (L(G1)) is a subset of L(G2)
- Both proofs will be inductive proofs
- Inductive proofs and recursive algorithms go well
together
79L(G2) is a subset of (L(G1))
- We want to prove the following
- If x in L(G2), then x is in (L(G1))
- This is equivalent to the following
- If T gtG2 x, then x is in (L(G1))
- The two statements are equivalent because
- x in L(G2) means that T gtG2 x
- We break the second statement down as follows
- If T gt1G2 x, then x is in (L(G1))
- If T gt2G2 x, then x is in (L(G1))
- If T gt3G2 x, then x is in (L(G1))
- ...
80L(G2) is a subset of (L(G1))
- Statement to be proven
- For all n 1, if T gtnG2 x, then x is in
(L(G1)) - Prove this by induction on n
- Base Case
- n 1
- Examining grammer G2, what is the only string x
such that T gt1G2 x ? - Prove this string is in (L(G1))
81Inductive Case
- Inductive Hypothesis
- For 1 j n, if T gtjG2 x, then x is in
(L(G1)) - Note, this is a strong induction hypothesis
- Statement to be Proven in Inductive Case
- For n above, if T gtn1G2 x, then x is in
(L(G1)) - Proving this statement
- Let x be an arbitrary string such that T gtn1G2
x - Examining G2, what are the two possible first
derivation steps? - Case 1 T gtG2 gtnG2 x
- Case 2 T gtG2 gtnG2 x
82Case Analysis
- Case 1 T gtG2 gtn x is not possible
- Why not?
- Case 2 T gtG2 gtnG2 x
- This means x has the form uv where
- What can we say about u (no IH)?
- What can we say about v (no IH)?
- Applying the inductive hypothesis, what can we
conclude?
83Concluding Case 2 T gtG2 gtnG2 x
- Concluding string u belongs to L(G1)
- Follows from S gt G2 u and
- Our construction insures that all strings derived
from S in L(G2) are also in L(G1) - How do we conclude that x belongs to (L(G1))
- Wrapping up inductive case
- In all possible derivations of x, we have shown
that x belongs to (L(G1)) - Thus, we have proven the inductive case
- Conclusion
- By the principle of mathematical induction, we
have shown that L(G2) is a subset of (L(G1))
84(L(G1)) is a subset of L(G2)
- We want to prove the following
- If x is in (L(G1)), then x is in L(G2)
- This is equivalent to the following
- If x is in (L(G1)), then T gtG2 x
- The two statements are equivalent because
- x in L(G2) means that T gtG2 x
- We break the second statement down as follows
- If x is in (L(G1))0, then T gtG2 x
- If x is in (L(G1))1, then T gtG2 x
- If x is in (L(G1))2, then T gtG2 x
- ...
85(L(G1)) is a subset of L(G2)
- Statement to be proven
- For all n 0, if x is in (L(G1))n, then x is in
L(G2) - Prove this by induction on n
- Base Case
- n 0
- What is the only string x in (L(G1))0?
- Show this string belongs to L(G2)
86Inductive Case
- Inductive Hypothesis
- For n 0, if x is in (L(G1))j, then T gtG2 x
- Note, this is a normal induction hypothesis
- Statement to be Proven in Inductive Case
- For n 0, if x is in (L(G1))n1, then T gtG2
x - Proving this statement
- Let x be an arbitrary string in (L(G1))n1
- This means x uv where
- u in L(G1)
- What can we say about v?
87Deriving x
- x uv where
- u is a string in L(G1)
- v is a string in
- Justify all the steps in the following derivation
- T gt G2 ST gt G2 Sv gt G2 uv x
- First step
- Second step
- Third step
- Thus T gt G2 x
- The inductive case follows
- The result is proven by the principle of
mathematical induction
88Construction for Set Union
- Input
- CFG G1 (V1, S, S1, P1)
- CFG G2 (V2, S, S2, P2)
- Output
- CFG G3 (V3, S, S3, P3)
- V3 V1 union V2 union T
- Variable renaming to insure no names shared
between V1 and V2 - T is a new symbol not in V1 or V2 or S
- S3 T
- P3
89Construction for Set Concatenation
- Input
- CFG G1 (V1, S, S1, P1)
- CFG G2 (V2, S, S2, P2)
- Output
- CFG G3 (V3, S, S3, P3)
- V3 V1 union V2 union T
- Variable renaming to insure no names shared
between V1 and V2 - T is a new symbol not in V1 or V2 or S
- S3 T
- P3
90CFLs and regular languages
91CFL Closure Properties
- What have we just proven
- CFLs are closed under Kleene closure
- CFLs are closed under set union
- CFLs are closed under set concatenation
- What can we conclude from these 3 results?
- It follows that regular languages are a subset of
CFLs
92Regular languages subset of CFL
- Recursive definition of regular languages
- Base Case
- , l, a, b are regular languages over
a,b - P, PS ? l, PS ? a, PS ? b
- Inductive Case
- If L1 and L2 are are regular languages, then L1,
L1L2, L1 union L2 are regular languages - Use previous constructions to see that these
resulting languages are also context-free
93Other CFL Closure Properties
- We will show that CFLs are NOT closed under many
other set operations - Examples include
- set complement
- set intersection
- set difference
94Language class hierarchy
REG
95Module 32
- Pushdown Automata (PDAs)
- definition
- Example
- We define configurations and computations of
PDAs - We define L(M) for PDAs
96Pushdown Automata
- Definition and Motivating Example
97Pushdown Automata (PDA)
- In this presentation we introduce the PDA model
of computation (programming language). - The key addition to a PDA (from an NFA-/\) is the
addition of external memory in the form of an
infinite capacity stack - The word pushdown comes from the stacks of
trays in cafeterias where you have to pushdown on
the stack to add a tray to it.
98NFA for ambn m,n 0
- Consider the language anbn n 0.
- This NFA can recognize strings which have the
correct form, - as followed by bs.
- However, the NFA cannot remember the relative
number of as and bs seen at any point in time.
- What strings end up in each state of the above
NFA? - I
- B
- C
99PDA for anbn n 0
Imagine we now have memory in the form of a stack
which we can use to help remember how many as we
have seen by pushing onto and popping from the
stack When we see an a in state I, we do the
following two actions 1) We push an a on the
stack. 2) We stay in state I. When we see a b
in state B, we do the following two actions 1)
We pop an a from the stack. 2) We stay in state
B. From state B, we allow a /\-transition to
state C only if 1) The stack is empty. Finally,
when we begin, the stack should be empty.
100Formal PDA definition
- PDA M (Q, S, G, q0, Z, A, d)
- Modified elements
- G is the stack alphabet
- Z is a special character that is initially on the
stack - Often used to represent an empty stack
- d is modified as follows
- Pop to read the top character on the stack
- Stack update action
- What to push back on the stack
- If we push /\, then the net result of the action
is a pop
101Example PDA
- Q I, B, C
- S a,b
- G Z, a
- q0 I
- Z is the initial stack character
- A C
- d
- S a TopSt NS stack update
- I a a I push aa
- I a Z I push aZ
- I /\ a B push a
- I /\ Z B push Z
- B b a B push /\
- B /\ Z C push Z
102Computing with PDAs
- Configurations change compared with NFA-/\s
- Configuration components
- current state
- remaining input to be processed
- stack contents
- Computations are essentially the same as with
NFA-/\s given the modified configurations - Determining which transitions of a PDA can be
applied to a given configuration is more
complicated though
103Computation Graph of PDA
Computation graph for this PDA on the input
string aabb
Q I, B, C S a,b G Z, a q0 I Z is
the initial stack character A C d S
a TopSt NS stack update I a a I push
aa I a Z I push aZ I /\ a B push a I /\ Z B push
Z B b a B push /\ B /\ Z C push Z
(I,aabb,Z)
104Definition of
Input string aabb
(I, aabb, Z) (I,abb,aZ) (I, aabb, Z) (B,
aabb, Z) (I, aabb, Z) 2 (C, aabb, Z) (I, aabb,
Z) 3 (B, bb, aaZ) (I, aabb, Z) (B, abb,
aZ) (I, aabb, Z) (B, /\, Z) (I, aabb, Z)
(C, /\, Z)
105Acceptance and Rejection
Input string aabb
M accepts string x if one of the configurations
reached is an accepting configuration (q0, x,
Z) (f, /\, a),f in A, a in G Stack contents
can be anything M rejects string x if all
configurations reached are either not halting
configurations or are rejecting configurations
106Defining L(M) and LPDA
- L(M) (or Y(M))
- The set of strings ?
- N(M)
- The set of strings ?
- LPDA
- Language L is in language class LPDA iff ?
M accepts string x if one of the configurations
reached is an accepting configuration (q0, x,
Z) (f, /\, a),f in A, a in G Stack contents
can be anything M rejects string x if all
configurations reached are either not halting
configurations or are rejecting configurations
107Deterministic PDAs
- A PDA is deterministic if its transition function
satisfies both of the following properties - For all q in Q, a in S union /\, and X in G,
- the set d(q,a,X) has at most one element
- For all q in Q and X in G,
- if d(q, /\, X) ? , then d(q,a,X) for all
a in S - A computation graph is now just a path again
- Our default assumption is that PDAs are
nondeterministic
108Two forms of nondeterminism
Trans Current Input Top of Next Stack
State Char. Stack State
Update -------------------------------------------
------------ 1 q0 a
Z q0 aZ 2 q0
a Z q0 aa 3 q0
/\ Z q0 aZ 4
q0 a Z q0
aa
109LPDA and DCFL
- A language L is in language class LPDA if and
only if there exists a PDA M such that L(M) L - A language L is in language class DCFL
(Deterministic Context-Free Languages) if and
only if there exists a deterministic PDA M such
that L(M) L - To be proven
- LPDA CFL
- CFL is a proper superset of DCFL
110PDA Comments
- Note, we can use the stack for much more than
just a counter - See examples in chapter 7 for some details
111Module 33
- Pushdown Automata (PDAs)
- Another example
112Palindromes
- Let PAL be the set of palindromes over a,b
- Let PAL1 be the following related language
- wcwr w consists only of as and bs
- we add c to the input alphabet as a special
marker character - Strings in PAL1
- aca, bcb, abcba, aabcbaa, c
- strings not in PAL1
- aaca, aaccaa, abccba, abcb, abba
- Let PAL2 be the set of even length palindromes
- wwr w consists only of as and bs
113PAL1
- Lets first construct a PDA for PAL1
- Basic ideas
- Have one state remember first half of string
- Have one state match second half of string to
first half - Transition between these two states when the
first c is encountered
114PDA for PAL1
- M (Q, S, G, q0, Z, A, d)
- Q q0, qm, qf
- S a, b, c
- G Z, a, b
- q0 q0
- Z Z
- A qf
115Transition Function
Trans Current Input Top of Next Stack
State Char. Stack State
Update -------------------------------------------
------------ 1 q0 a
Z q0 aZ 2 q0
a a q0 aa 3 q0
a b q0 ab 4
q0 b Z q0
bZ 5 q0 b a
q0 ba 6 q0 b
b q0 bb 7 q0
c Z qm Z 8
q0 c a qm
a 9 q0 c b
qm b 10 qm a
a qm l 11 qm b
b qm l 12 qm
l Z qf Z
First three transitions push a on top of the
stack Second three transitions push b on the
stack Third three transitions switch state q0 to
qm No change to stack Transitions 10 and 11
match characters from first and last half of
input string
116Notation comment
Trans Current Input Top of Next Stack
State Char. Stack State
Update -------------------------------------------
------------ 1 q0 a
Z q0 aZ 2 q0
a a q0 aa 3 q0
a b q0 ab 4
q0 b Z q0
bZ 5 q0 b a
q0 ba 6 q0 b
b q0 bb 7 q0
c Z qm Z 8
q0 c a qm
a 9 q0 c b
qm b 10 qm a
a qm l 11 qm b
b qm l 12 qm
l Z qf Z
- We might represent transition 1 in two other ways
- d(q0,a,Z) (q0, aZ)
- (q0, a, Z, q0, aZ)
- Question
- Is this PDA deterministic?
117Computation Graph 1
Trans Current Input Top of Next Stack
State Char. Stack State
Update -------------------------------------------
------------ 1 q0 a
Z q0 aZ 2 q0
a a q0 aa 3 q0
a b q0 ab 4
q0 b Z q0
bZ 5 q0 b a
q0 ba 6 q0 b
b q0 bb 7 q0
c Z qm Z 8
q0 c a qm
a 9 q0 c b
qm b 10 qm a
a qm l 11 qm b
b qm l 12 qm
l Z qf Z
(q0, abcba, Z)
118Computation Graph 2
Trans Current Input Top of Next Stack
State Char. Stack State
Update -------------------------------------------
------------ 1 q0 a
Z q0 aZ 2 q0
a a q0 aa 3 q0
a b q0 ab 4
q0 b Z q0
bZ 5 q0 b a
q0 ba 6 q0 b
b q0 bb 7 q0
c Z qm Z 8
q0 c a qm
a 9 q0 c b
qm b 10 qm a
a qm l 11 qm b
b qm l 12 qm
l Z qf Z
(q0, abcab, Z)
119Computation Graph 3
Trans Current Input Top of Next Stack
State Char. Stack State
Update -------------------------------------------
------------ 1 q0 a
Z q0 aZ 2 q0
a a q0 aa 3 q0
a b q0 ab 4
q0 b Z q0
bZ 5 q0 b a
q0 ba 6 q0 b
b q0 bb 7 q0
c Z qm Z 8
q0 c a qm
a 9 q0 c b
qm b 10 qm a
a qm l 11 qm b
b qm l 12 qm
l Z qf Z
(q0, acab, Z)
120PAL2
- Lets now construct a PDA for PAL
- What is harder this time?
- When do we switch from putting strings on the
stack to matching? - Example
- After seeing aab, should we switch to match mode
or stay in stack mode? - Solution
- Do both using nondeterminism
121PDA for PAL2
- M (Q, S, G, q0, Z, A, d)
- Q q0, qm, qf
- S a, b
- G Z, a, b
- q0 q0
- Z Z
- A qf
122Transition Relation
Trans Current Input Top of Next Stack
State Char. Stack State
Update -------------------------------------------
------------ 1 q0 a
Z q0 aZ 2 q0
a a q0 aa 3 q0
a b q0 ab 4
q0 b Z q0
bZ 5 q0 b a
q0 ba 6 q0 b
b q0 bb 7 q0
l Z qm Z 8
q0 l a qm
a 9 q0 l b
qm b 10 qm a
a qm l 11 qm b
b qm l 12 qm
l Z qf Z
First three transitions push a on top of the
stack Second three transitions push b on the
stack Third three transitions switch state q0 to
qm Is the PDA deterministic or nondeterministic?
123Computation Graph 1
Trans Current Input Top of Next Stack
State Char. Stack State
Update -------------------------------------------
------------ 1 q0 a
Z q0 aZ 2 q0
a a q0 aa 3 q0
a b q0 ab 4
q0 b Z q0
bZ 5 q0 b a
q0 ba 6 q0 b
b q0 bb 7 q0
l Z qm Z 8
q0 l a qm
a 9 q0 l b
qm b 10 qm a
a qm l 11 qm b
b qm l 12 qm
l Z qf Z
(q0, abba, Z)
124Computation Graph 2
Trans Current Input Top of Next Stack
State Char. Stack State
Update -------------------------------------------
------------ 1 q0 a
Z q0 aZ 2 q0
a a q0 aa 3 q0
a b q0 ab 4
q0 b Z q0
bZ 5 q0 b a
q0 ba 6 q0 b
b q0 bb 7 q0
l Z qm Z 8
q0 l a qm
a 9 q0 l b
qm b 10 qm a
a qm l 11 qm b
b qm l 12 qm
l Z qf Z
(q0, aba, Z)
125PAL
- Challenge
- Construct a PDA for PAL
- First step
- Construct a PDA for odd length palindromes
- Then
- Combine PDAs for odd length and even length
palindromes
126Module 34
- CFG ? PDA construction
- Shows that for any CFL L, there exists a PDA M
such that L(M) L - The reverse is true as well, but we do not prove
that here
127CFL subset LPDA
- Let L be an arbitrary CFL
- Let G be the CFG such that L(G) L
- G exists by definition of L is CF
- Construct a PDA M such that L(M) L(G)
- Argue L(M) L
- There exists a PDA M such that L(M) L
- L is in LPDA
- By definition of L in LPDA
128Visualization
- Let L be an arbitrary CFL
- Let G be the CFG such that L(G) L
- G exists by definition of L is CF
- Construct a PDA M such that L(M) L
- M is constructed from CFG G
- Argue L(M) L
- There exists a PDA M such that L(M) L
- L is in LPDA
- By definition of L in LPDA
CFL
LPDA
129Algorithm Specification
- Input
- CFG G
- Output
- PDA M such that L(M)
CFG G
PDA M
130Construction Idea
- The basic idea is to have a 2-phase PDA
- Phase 1
- Derive all strings in L(G) on the stack
nondeterministically - Do not process any input while we are deriving
the string on the stack - Phase 2
- Match the input string against the derived string
on the stack - This is a deterministic process
- Move to an accepting state only when the stack is
empty
131Illustration
1. Derive all strings in L(G) on the stack2.
Match the derived string against input
- Input Grammar G
- V S
- S a,b
- S S
- P
- S ? aSb l
- What is L(G)?
Illustration of how the PDA might work, though
not completely accurate.
(q0, aabb, Z) / put