Title: COS 320 Compilers
1COS 320Compilers
2last time
- context free grammars (Appel 3.1)
- terminals, non-terminals, rules
- derivations parse trees
- ambiguous grammars
- recursive descent parsers (Appel 3.2)
- parse LL(k) grammars
- easy to write as ML programs
- algorithms for automatic construction from a CFG
3non-terminals S, E, L terminals NUM, IF,
THEN, ELSE, BEGIN, END, PRINT, , rules
1. S IF E THEN S ELSE S 2. BEGIN S
L 3. PRINT E
4. L END 5. S L 6. E NUM
NUM
val tok ref (getToken ()) fun advance () tok
getToken () fun eat t if (! tok t) then
advance () else error ()
datatype token NUM IF THEN ELSE BEGIN
END PRINT SEMI EQ
fun S () case !tok of IF gt eat
IF E () eat THEN S () eat ELSE S ()
BEGIN gt eat BEGIN S () L () PRINT gt
eat PRINT E () and L () case !tok of END
gt eat END SEMI gt eat SEMI S ()
L () and E () eat NUM eat EQ eat NUM
4 Constructing RD Parsers
- To construct an RD parser, we need to know what
rule to apply when - we have seen a non terminal X
- we see the next terminal a in input
- We apply rule X s when
- a is the first symbol that can be generated by
string s, OR - s reduces to the empty string (is nullable) and a
is the first symbol in any string that can follow
X
5Computing Nullable Sets
- Non-terminal X is Nullable only if the following
constraints are satisfied (computed using
iterative analysis) - base case
- if (X ) then X is Nullable
- inductive case
- if (X ABC...) and A, B, C, ... are all
Nullable then X is Nullable
6Computing First Sets
- First(X) is computed iteratively
- base case
- if T is a terminal symbol then First (T) T
- inductive case
- if X is a non-terminal and (X ABC...) then
- First (X) First (X) U First (ABC...)
- where First(ABC...) F1 U F2 U F3 U ... and
- F1 First (A)
- F2 First (B), if A is Nullable
- F3 First (C), if A is Nullable B is Nullable
- ...
7Computing Follow Sets
- Follow(X) is computed iteratively
- base case
- initially, we assume nothing in particular
follows X - (Follow (X) is initially )
- inductive case
- if (Y s1 X s2) for any strings s1, s2 then
- Follow (X) First (s2) U Follow (X)
- if (Y s1 X s2) for any strings s1, s2 then
- Follow (X) Follow(Y) U Follow (X), if s2 is
Nullable
8building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z
Y
X
9building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no
Y yes
X no
base case
10building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no
Y yes
X no
after one round of induction, we realize we have
reached a fixed point
11building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no d
Y yes c
X no a,b
base case
12building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no d,a,b
Y yes c
X no a,b
after one round of induction, no fixed point
13building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no d,a,b
Y yes c
X no a,b
after two rounds of induction, no more changes
gt fixed point
14building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no d,a,b
Y yes c
X no a,b
base case
15building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,d,a,b
after one round of induction, no fixed point
16building a predictive parser
Y c Y
X a X b Y e
Z X Y Z Z d
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,d,a,b
after two rounds of induction, fixed point (but
notice, computing Follow(X) before Follow (Y)
would have required 3rd round)
17Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
- if T ? First(s) then
- enter (X s) in row X, col T
- if s is Nullable and T ? Follow(X)
- enter (X s) in row X, col T
Build parsing table where row X, col T tells
parser which clause to execute in function X with
next-token T
a b c d e
Z
Y
X
18Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
- if T ? First(s) then
- enter (X s) in row X, col T
- if s is Nullable and T ? Follow(X)
- enter (X s) in row X, col T
Build parsing table where row X, col T tells
parser which clause to execute in function X with
next-token T
a b c d e
Z Z XYZ Z XYZ
Y
X
19Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
- if T ? First(s) then
- enter (X s) in row X, col T
- if s is Nullable and T ? Follow(X)
- enter (X s) in row X, col T
Build parsing table where row X, col T tells
parser which clause to execute in function X with
next-token T
a b c d e
Z Z XYZ Z XYZ Z d
Y
X
20Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
- if T ? First(s) then
- enter (X s) in row X, col T
- if s is Nullable and T ? Follow(X)
- enter (X s) in row X, col T
Build parsing table where row X, col T tells
parser which clause to execute in function X with
next-token T
a b c d e
Z Z XYZ Z XYZ Z d
Y Y c
X
21Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
- if T ? First(s) then
- enter (X s) in row X, col T
- if s is Nullable and T ? Follow(X)
- enter (X s) in row X, col T
Build parsing table where row X, col T tells
parser which clause to execute in function X with
next-token T
a b c d e
Z Z XYZ Z XYZ Z d
Y Y Y Y c Y Y
X
22Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
- if T ? First(s) then
- enter (X s) in row X, col T
- if s is Nullable and T ? Follow(X)
- enter (X s) in row X, col T
Build parsing table where row X, col T tells
parser which clause to execute in function X with
next-token T
a b c d e
Z Z XYZ Z XYZ Z d
Y Y Y Y c Y Y
X X a X b Y e
23Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
What are the blanks?
a b c d e
Z Z XYZ Z XYZ Z d
Y Y Y Y c Y Y
X X a X b Y e
24Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
What are the blanks? --gt syntax errors
a b c d e
Z Z XYZ Z XYZ Z d
Y Y Y Y c Y Y
X X a X b Y e
25Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d
Y c Y
X a X b Y e
Is it possible to put 2 grammar rules in the same
box?
a b c d e
Z Z XYZ Z XYZ Z d
Y Y Y Y c Y Y
X X a X b Y e
26Grammar
Computed Sets
nullable first follow
Z no d,a,b
Y yes c e,d,a,b
X no a,b c,e,d,a,b
Z X Y Z Z d Z d e
Y c Y
X a X b Y e
Is it possible to put 2 grammar rules in the same
box?
a b c d e
Z Z XYZ Z XYZ Z d Z d e
Y Y Y Y c Y Y
X X a X b Y e
27predictive parsing tables
- if a predictive parsing table constructed this
way contains no duplicate entries, the grammar is
called LL(1) - Left-to-right parse, Left-most derivation, 1
symbol lookahead - if not, of the grammar is not LL(1)
- in LL(k) parsing table, columns include every
k-length sequence of terminals
aa ab ba bb ac ca ...
28another trick
- Previously, we saw that grammars with
left-recursion were problematic, but could be
transformed into LL(1) in some cases - the example non-LL(1) grammar we just saw
- how do we fix it?
Z X Y Z Z d Z d e
Y c Y
X a X b Y e
29another trick
- Previously, we saw that grammars with
left-recursion were problematic, but could be
transformed into LL(1) in some cases - the example non-LL(1) grammar we just saw
- solution here is left-factoring
Z X Y Z Z d Z d e
Y c Y
X a X b Y e
Z X Y Z Z d W
Y c Y
X a X b Y e
W W e
30summary of RD parsing
- CFGs are good at specifying programming language
structure - parsing general CFGs is expensive so we define
parsers for simpler classes of CFG - LL(k), LR(k)
- we can build a recursive descent parser for LL(k)
grammars by - computing nullable, first and follow sets
- constructing a parse table from the sets
- checking for duplicate entries, which indicates
failure - creating an ML program from the parse table
- if parser construction fails we can
- rewrite the grammar (left factoring, eliminating
left recursion) and try again - try to build a parser using some other method
31summary of RD parsing
- CFGs are good at specifying programming language
structure - parsing general CFGs is expensive so we define
parsers for simpler classes of CFG - LL(k), LR(k)
- we can build a recursive descent parser for LL(k)
grammars by - computing nullable, first and follow sets
- constructing a parse table from the sets
- checking for duplicate entries, which indicates
failure - creating an ML program from the parse table
- if parser construction fails we can
- rewrite the grammar (left factoring, eliminating
left recursion) and try again - try to build a parser using some other
method...such as using a bottom-up parsing
technique
32Bottom-up (Shift-Reduce) Parsing
33shift-reduce parsing
- shift-reduce parsing
- aka bottom-up parsing
- aka LR(k) Left-to-right parse, Rightmost
derivation, k-token lookahead - more powerful than LL(k) parsers
- LALR variant
- the basis for parsers for most modern programming
languages - implemented in tools such as ML-Yacc
34shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
35shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
yet to read
Input from lexer
( id num id num ) EOF
State of parse so far
36shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
yet to read
Input from lexer
( id num id num ) EOF
SHIFT
State of parse so far
(
37shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
yet to read
Input from lexer
( id num id num ) EOF
SHIFT
State of parse so far
( id
38shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
yet to read
Input from lexer
( id num id num ) EOF
SHIFT
State of parse so far
( id
39shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
yet to read
Input from lexer
( id num id num ) EOF
SHIFT
State of parse so far
( id num
40shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
yet to read
Input from lexer
( id num id num ) EOF
REDUCE S id num
State of parse so far
( S
41shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
yet to read
Input from lexer
( id num id num ) EOF
REDUCE L S
State of parse so far
( L
42shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
yet to read
Input from lexer
( id num id num ) EOF
SHIFT
State of parse so far
( L
43shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
yet to read
Input from lexer
( id num id num ) EOF
SHIFT SHIFT SHIFT
State of parse so far
( L id num
44shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
yet to read
Input from lexer
( id num id num ) EOF
REDUCE S id num
State of parse so far
( L S
45shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
yet to read
Input from lexer
( id num id num ) EOF
REDUCE S L S
State of parse so far
( L
46shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
yet to read
Input from lexer
( id num id num ) EOF
SHIFT
State of parse so far
( L )
47shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
yet to read
Input from lexer
( id num id num ) EOF
REDUCE S ( L )
State of parse so far
S
48shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
Input from lexer
( id num id num ) EOF
SHIFT REDUCE A S EOF ACCEPT
State of parse so far
A
49shift-reduce parsing example
Parsing Table
Grammar
A S EOF
L L S L S
S ( L ) S id num
Input from lexer
( id num id num ) EOF
State of parse so far
A
A successful parse! Is this grammar LL(1)?
50Shift-reduce algorithm
- Parser keeps track of
- position in current input (what input to read
next) - a stack of terminal non-terminal symbols
representing the parse so far - Based on next input symbol stack, parser table
indicates - shift push next input on to top of stack
- reduce R
- top of stack should match RHS of rule
- replace top of stack with LHS of rule
- error
- accept (we shift EOF can reduce what remains on
stack to start symbol)
51Shift-reduce algorithm (a detail)
- The parser summarizes the current parse state
using an integer - the integer is actually a state in a finite
automaton - the current parse state can be computed by
running the automaton over the current parse
stack - Revised algorithm Based on next input symbol
the parse state (as opposed to the entire stack),
parser table indicates - shift s
- push next input on to top of stack and move
automaton into state s - reduce R goto s
- top of stack should match RHS of rule
- replace top of stack with LHS of rule
- move automaton into state s
- error
- accept
52shift-reduce parsing
Grammar
????
Input from lexer
???? ???? EOF
State of parse so far
????
Like LL parsing, shift-reduce parsing does not
always work. What sort of grammar rules make
shift-reduce parsing impossible?
53shift-reduce parsing
Grammar
????
Input from lexer
???? ???? EOF
State of parse so far
????
Like LL parsing, shift-reduce parsing does not
always work.
- Shift-Reduce errors cant decide whether to
Shift or Reduce - Reduce-Reduce errors cant decide whether to
Reduce by R1 or R2
54shift-reduce errors
Grammar
S S S S S S S id
A S EOF
Input from lexer
???? ???? EOF
State of parse so far
????
55shift-reduce errors
Grammar
S S S S S S S id
A S EOF
notice, this is an ambiguous grammar we are
always going to need some mechanism for
resolving the outstanding ambiguity before parsing
Input from lexer
id id id EOF
State of parse so far
S S
- reduce by rule (S S S) or
- shift the ???
56shift-reduce errors
Grammar
A S id EOF
E E E E id
S E
some unambiguous grammars cant be parsed by
LR(1) parsers either
id id id EOF
Input from lexer
id id EOF
State of parse so far
E
input might be this, making shifting correct
- reduce by rule (S E ) or
- shift the id
57reduce-reduce errors
Grammar
A S EOF
S ( E ) S E
E ( E ) E E E E id
Input from lexer
( id ) EOF
State of parse so far
( E )
- reduce by rule ( S ( E ) ) or
- reduce by rule ( E ( E ) )
58Summary
- Top-down Parsing
- simple to understand and implement
- you can code it yourself using nullable, first,
follow sets - excellent for quick dirty parsing jobs
- Bottom-up Parsing
- more complex uses stack table
- more powerful
- Bonus tools do the work for you gt ML-Yacc
- but you need to understand how shift-reduce
reduce-reduce errors can arise