Title: CS 3240: Languages and Computation
1CS 3240 Languages and Computation
- Non-Context-Free Languages andRecursive Decent
Parsing
2Recall Pumping Lemma for Regular Languages
- For every regular language L, there is a finite
pumping length p, such that for any string s?L
and s?p, we can write sxyz with1) x yi z ?
L for every i?0,1,2,2) y ? 13) xy ? p
3Pumping Lemma for CFL
Theorem For every context-free language L, there
is a pumping length p, such that for any string
s?L and s?p, we can write suvxyz with1) u vi
x yi z ? L for every i?0,1,2,2) vy ? 13)
vxy ? p Note that 1) implies that uxz ? L
(take i0), requirement2) says that v,y cannot
be the empty strings e and condition 3) is
useful in proving non-CFL.
4Pumping a Parse Tree
S
A
A
u
v
x
y
z
If s uvxyz ? L is long, then its parse-tree is
tall.Hence, there is a path on which a variable
A repeats itself. We can pump this AA part.
5uvxyz ?L
S
A
A
u
v
x
y
z
By repeating the AA part we get
6uv2xy2z ?L
S
A
A
A
R
y
v
u
x
z
y
x
v
while removing the AA gives
7uxz ? L
S
A
u
z
x
In general uvixyiz ? L for all i0,1,2,
8Finding pumping length of a CFL
- Let b equal the longest right-hand side of any
rule (assume b gt 1) - Each node in the parse tree has at most b
children - At most bh nodes are h steps from the start node
- Let p equal bV2, where V is the number of
variables (could be huge!) - Tree height is at least V2
9Formal Proof of Pumping Lemma
Let G be the grammar of a CFL with b?2 the
maximum size of rules in A ? X1Xb. Consider
smallest trees, such that s requires a tree-depth
of at least logbs. If s ? pbV2, then the
tree-depth ? V2, hence thereis a path and
variable A where A repeats itself S ? uAz ?
uvAyz ? uvxyzIt follows that uvixyiz ? L for
all i0,1,2,.. Furthermore vy ? 1 because
tree is minimal vxy ? p because every tree
with ? p leaves has a repeating path
10Using Pumping Lemma
- Proof by contradiction. Assume that L is context
free.Let p be the pumping length of L and take
s?L. - By the pumping lemma it is possible to write
suvxyz(with vy ? 1 and vxy ? p) such that
uvixyiz ? L for all i?0. - However
- Thus the pumping lemma does not hold for s.
Contradiction. The language L is not
context-free.
Here you have to be clever in picking the right s.
Show that pumping up or down s gives strings that
are not in L. Be careful to consider all v,x,y
possibilities.
11Example I Pumping anbncn
- Proof by contradiction assume B anbncn n?0
is CF. - According to the pumping lemma there is a pumping
length p such that for s apbpcp ? B, we can
write s uvxyz apbpcp, with uvixyiz ? B for
all i?0. - Two options for 1 ? vxy ? p
- 1) vxy ab, then the string uv2xy2z has not
enough letters c, hence uv2xy2z?C - 2) vxy bc, then the string uv2xy2z has not
enough letters a, hence uv2xy2z?C - Contradiction the pumping lemma does not hold,
hence B is not context free.
12Example II (Pumping down)
- Prove that C aibjck 0?i?j?k is not
context-free. - Let p be the pumping length, and s apbpcp ? C.
- Pumping lemma s uvxyz, such that uvixyiz ? C
for every i?0. Two options for 1 ? vxy ? p - 1) vxy ab, then the string uv2xy2z has not
enough letters c, hence uv2xy2z?C - 2) vxy bc, then the string uv0xy0z uxz has
too many letters a, hence uv0xy0z?C - Contradiction The pumping lemma does not hold
forthis string apbpcp, hence C is not
context-free.
13Example III ww
- Prove that the language D ww w?0,1 is
not CF. - You must be careful when picking the strings s?D
- Let p be the pumping length, take s0p1p0p1p.
- Options for suvxyz with 1 ? vxy ? p
- If vxy is to the left of the middle of 0p1p
0p1p,then second half of uv2xy2z starts with a
1. - If vxy is to the right of the middle of 0p1p
0p1p, then first half of hence uv2xy2z ends with
a 0. - If x is in the middle of 0p1p 0p1p, then pumped
down uxz equals 0p1i 0j1p ? D (because i or j lt
p)
14Final Note on CFL
- Let A1 and A2 be two context-free languages,then
the union A1 ? A2 is also context free. - However, the intersection A1 ? A2 is not
necessarily context free - The complement A1 ?A1 are not necessarily
context free either
15Where We Are Now?
- So far Automata and languages
- Regular languages
- Context-free languages
- Next few weeks Construction of parsers
- Top-down parsing
- Bottom-up parsing
- Semantic analysis
- Later in the semester Computability theory
16Parser Classification
- Parsers are broadly broken down into
- LL - Top-down parsers
- L - Scan left to right
- L - Traces leftmost derivation of input string
- LR - Bottom-up parsers
- L - Scan left to right
- R - Traces rightmost derivation of input string
- LL is a subset of LR
- Typical notation
- LL(0), LL(1), LR(1), LR(k)
- Number (k) refers to maximum look ahead
- Lower is better!
17Recursive Descent Parsing
- Simple top-down parsing technique for
hand-written parsers - Convert every nontrivial variable into a function
- Assume we have a scanner from which we get a
token and match it by calling - matchToken(token)which consumes the token if
matching, or report error if nonmatching. We also
need - peekToken() to get current token without
consuming it - Output is abstract syntax tree
18A Familiar Example
- ltexprgt ltexprgt ltaddopgt lttermgt lttermgt
- lttermgt lttermgt ltfactorgt ltfactorgt
- ltfactorgt '(' ltexprgt ')' num id
- ltaddopgt -
- Notation is called Backus-Naur form (BNF)
- num and id are terminal symbols, supplied by
scanner - How to apply recursive descent to it
19Problem
- Grammar contains left-recursive productions
- Not suitable for top-down parsing, as it may run
into infinite loop
ltexprgt ltexprgt ltaddopgt lttermgt lttermgt lttermgt
lttermgt ltfactorgt ltfactorgt ltfactorgt
'(' ltexprgt ')' num id ltaddopgt -
20Extended Backus-Naur Form
- For simple cases, one solution is EBNF
- Uses notation to indicate 0 or more
- ltexprgt lttermgt ltaddopgt lttermgt
- Concept is similar to operator of regexp
- Num 0-90-9
head
tail
21EBNF Back to BNF
- ltexprgt lttermgt lte_tailgt
- lte_tailgt ltaddopgt lttermgt lte_tailgt ?
Example 123
22Continued
- lttermgt lttermgt ltfactorgt ltfactorgt
- EBNF
- lttermgt ltfactorgt ltfactorgt
- BNF
- lttermgt ltfactorgt ltt_tailgt
- ltt_tailgt ltfactorgt ltt_tailgt ?
- Now top down parsing will work!
23Revised Grammar Rules
- ltexprgt lttermgt lte_tailgt
- lte_tailgt ltaddopgt lttermgt lte_tailgt ?
- lttermgt ltfactorgt ltt_tailgt
- ltt_tailgt ltfactorgt ltt_tailgt ?
- ltfactorgt '(' ltexprgt ')' num id
- ltaddopgt -
24Solution from EBNF Nonrecursive Version
- Map tail to a loop
- ltaddopgt was mapped to token matching
enum PLUS, MINUS, MULT, LPAREN, RPAREN, NUM,
ID void expr() term() int token while
( (token peekToken()) PLUS token
MINUS) matchToken(token) term()
ltexprgt lttermgt ltaddopgt lttermgt ltaddopgt
-
25Solution from BNF Recursive Version
- enum PLUS, MINUS, MULT, LPAREN, RPAREN, NUM,
ID - void expr()
- term()
- e_tail()
ltexprgt lttermgt lte_tailgt lte_tailgt ltaddopgt
lttermgt lte_tailgt ? ltaddopgt -
26- void e_tail()
- int token
- if ( (tokenpeekToken()) PLUS
- token MINUS)
- matchToken( token)
- term()
- e_tail()
-
- else
- return
ltexprgt lttermgt lte_tailgt lte_tailgt ltaddopgt
lttermgt lte_tailgt ? ltaddopgt -
27- void term(void)
-
- factor()
- t_tail()
-
- void t_tail()
- if ( peekToken() MULTI)
- matchToken(MULTI) term() t_tail()
-
- else
- return
-
lttermgt ltfactorgt ltt_tailgt ltt_tailgt
ltfactorgt ltt_tailgt ?
28- void factor()
-
- if ( peekToken() LPAREN)
- matchToken(LPAREN) expr()
- matchToken(RPAREN)
-
- else if (peekToken() NUM)
- matchToken(NUM)
- else if (peekToken() ID)
- matchToken(ID)
-
ltfactorgt '(' ltexprgt ')' num id
29expr
1 2 3
30expr
term
1 2 3
31expr
term
factor
1 2 3
32expr
term
factor
Finds num
1 2 3
33expr
term
Success
factor
1 2 3
34expr
term
Success
factor
t_tail
Finds nothing!
1 2 3
35expr
term
Success
Success
factor
t_tail
1 2 3
36expr
Success
term
1 2 3
37expr
Success
term
e_tail
1 2 3
38expr
Success
term
e_tail
Finds
1 2 3
39expr
Success
term
e_tail
Finds
term
1 2 3
40expr
Success
term
e_tail
Finds
term
factor
1 2 3
41expr
Success
term
e_tail
Finds
term
factor
Finds num
1 2 3
42expr
Success
term
e_tail
Finds
term
factor
1 2 3
43expr
Success
term
e_tail
Finds
term
Success
factor
1 2 3
44expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
1 2 3
45expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Finds
1 2 3
46expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Finds
factor
1 2 3
47expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Finds
factor
Finds num
1 2 3
48expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Finds
factor
t_tail
Finds nothing
1 2 3
49expr
Success
term
e_tail
Finds
term
Success
factor
t_tail
Success
Success
Finds
factor
t_tail
1 2 3
50expr
Success
term
e_tail
Finds
term
Success
Success
factor
t_tail
1 2 3
51expr
Success
term
e_tail
Success
Finds
term
1 2 3
52expr
Success
Success
term
e_tail
1 2 3
53expr
Success
1 2 3
54What happened?
ltexprgt
lttermgt
lte_tailgt
ltaddopgt
lttermgt
lte_tailgt
ltfactorgt
ltt-tailgt
?
ltfactorgt
ltt-tailgt
?
num
ltfactorgt
ltt_tailgt
num
num
?
1
2
3
55More on Left Recursion
- If a grammar is left recursive we must first
rewrite it to make it right recursive - Case 1 Simple immediate left recursion
- A ? A u v where v does not start with A
- Change to A ? v A A ? u A ?
- Example Change
- exp ? exp addop term
- to
- exp ? term exp
- exp ? addop term exp ?
56More General Case
- General Immediate Left Recursion
- A ? Au1 Au2 ... Aun v1 v2 ...
vmwhere vi does not start with A - Example
- exp ? exp term exp - term
- Solution
- A ? v1A v2A ... vmA
- A? u1A u2A ... unA ?
- Example
- exp ? term exp
- exp ? term exp - term exp ?
57Indirect Left Recursion
- A ? Au1 Au2 ... Aun v1 v2 ...
vmwhere vi ? Aw for some vi - Example A ? Ba Aa c B ? Bb Ab d
- Solution
- For each rule Ai ? Ajv for jlti with Aj ? w1 w2
... wk, replace the former rule byAi ? w1v
w2v ... wkv assuming Aj has no immediate
recursion
58Example
- Example A ? Ba Aa c B ? Bb Ab d
- It can be rewritten to A ? BaA' cA A ?
aA ? - Then substitute A into RHS of B B ? Bb BaAb
cAb d - Finally, remove left recursion in B B ? cAbB
dB B? bB aAbB ?
59Left Factoring
- Required if two or more grammar rule choices
share a common prefix string A ? uv uw - Would cause difficulties if we look ahead only
one token - Solution A ? uA A? v w
60Problem Left Association?
- Can we still maintain left-association?
- The parse tree of
- ltexprgt lttermgt ltaddopgt lttermgt
- is right-association
- One solution introduce temporary variables
- int expr()
- int temp term()
- int token
- while ( (token peekToken()) PLUS
token MINUS) - matchToken(token)
- if ( tokenPLUS) temp term()
- else temp - term()
-
- return temp
61Construct Syntax Tree
- SyntaxTree expr()
- SyntaxTree temp term()
- int token
- while ( (token peekToken()) PLUS
token MINUS) - matchToken(token)
- SyntaxTree tree makeOpNode( token)
- tree-gtleftChild temp
- tree-gtrightChild term()
- temp tree
-
- return temp
-
- Question How to construct the tree in the
recursive version?
62Summary
- Change grammar to remove left-recursion
- Tail becomes a loop, or
- Map tail into a new rule
- Convert each nontrivial variable into a function
call - Left-association by introducing temporary
variables or constructing syntax tree - Limitations of recursive descent
- What if multiple options in RHS start with
variables? - Empty-string production A ? ?