Title: Bottom-up%20Parsing
1Bottom-up Parsing
2Parsing Techniques
- Top-down parsers (LL(1), recursive descent)
- Start at the root of the parse tree and grow
toward leaves - Pick a production try to match the input
- Bad pick ? may need to backtrack
- Some grammars are backtrack-free
(predictive parsing) - Bottom-up parsers (LR(1), operator
precedence) - Start at the leaves and grow toward root
- As input is consumed, encode possibilities in an
internal state - Start in a state valid for legal first tokens
- Bottom-up parsers handle a large class of grammars
3Bottom-up Parsing
(definitions)
- The point of parsing is to construct a derivation
- A derivation consists of a series of rewrite
steps - S ? ?0 ? ?1 ? ?2 ? ? ?n1 ? ?n ? sentence
- Each ?i is a sentential form
- If ? contains only terminal symbols, ? is a
sentence in L(G) - If ? contains 1 non-terminals, ? is a
sentential form - To get ?i from ?j1, expand some NT A ? ?i1 by
using A ?? - Replace the occurrence of A ? ?i1 with ? to get
?i - In a leftmost derivation, it would be the first
NT A ? ?i1 - A left-sentential form occurs in a leftmost
derivation - A right-sentential form occurs in a rightmost
derivation
4Bottom-up Parsing
- Bottom-up paring and reverse right most
derivation - A derivation consists of a series of rewrite
steps - A bottom-up parser builds a derivation by working
from the input sentence back toward the start
symbol S - S ? ?0 ? ?1 ? ?2 ? ? ?n1 ? ?n
? sentence - In terms of the parse tree, this is working from
leaves to root - Nodes with no parent in a partial tree form its
upper fringe - Since each replacement of ? with A shrinks the
upper fringe, - we call it a reduction.
Bottom-up
5Finding Reductions
(Handles)
- The parser must find a substring ? of the trees
frontier that - matches some production A ? ? that occurs as one
step - In the rightmost derivation
- Informally, we call this substring ? a handle
- Formally,
- A handle of a right-sentential form ? is a pair
ltA??,kgt where - A?? ? P and k is the position in ? of ?s
rightmost symbol. - If ltA??,kgt is a handle, then replace ? at k with
A - Handle Pruning
- The process of discovering a handle and reducing
it to the appropriate left-hand side is called
handle pruning - Because ? is a right-sentential form, the
substring to the right of a handle contains only
terminal symbols
6Example
(a very busy slide)
The expression grammar
Handles for rightmost derivation of x 2 y
7Handle-pruning, Bottom-up Parsers
- One implementation technique is the shift-reduce
parser
push INVALID token ? next_token( ) repeat until
(top of stack Goal and token EOF) if the
top of the stack is a handle A?? then
/ reduce ? to A/ pop ? symbols
off the stack push A onto the
stack else if (token ? EOF)
then / shift / push
token token ? next_token( )
8Back to x - 2 y
5 shifts 9 reduces 1 accept
1. Shift until the top of the stack is the right
end of a handle 2. Find the left end of the
handle reduce
9Back to x - 2 y
5 shifts 9 reduces 1 accept
1. Shift until the top of the stack is the right
end of a handle 2. Find the left end of the
handle reduce
10Back to x - 2 y
5 shifts 9 reduces 1 accept
1. Shift until the top of the stack is the right
end of a handle 2. Find the left end of the
handle reduce
11Back to x - 2 y
5 shifts 9 reduces 1 accept
1. Shift until the top of the stack is the right
end of a handle 2. Find the left end of the
handle reduce
12Back to x - 2 y
5 shifts 9 reduces 1 accept
1. Shift until the top of the stack is the right
end of a handle 2. Find the left end of the
handle reduce
13Back to x 2 y
5 shifts 9 reduces 1 accept
1. Shift until the top of the stack is the right
end of a handle 2. Find the left end of the
handle reduce
14Example
ltid,ygt
ltid,xgt
ltnum,2gt
15Shift-reduce Parsing
- Shift reduce parsers are easily built and easily
understood - A shift-reduce parser has just four actions
- Shift next word is shifted onto the stack
- Reduce right end of handle is at top of stack
- Locate left end of handle within the stack
- Pop handle off stack push appropriate lhs
- Accept stop parsing report success
- Error call an error reporting/recovery routine
- Critical Question How can we know when we have
found a handle without generating lots of
different derivations? - Answer we use look ahead in the grammar along
with tables produced as the result of analyzing
the grammar. - LR(1) parsers build a DFA that runs over the
stack find them
- Handle finding is key
- handle is on stack
- finite set of handles
- use a DFA !
16LR(1) Parsers
- A table-driven LR(1) parser looks like
- Tables can be built by hand
- It is a perfect task to automate
source code
IR
grammar
17LR(1) Skeleton Parser
stack.push(INVALID) stack.push(s0) not_found
true token scanner.next_token() do while
(not_found) s stack.top() if (
ACTIONs,token reduce A?? ) then
stack.popnum(2?) // pop 2? symbols
s stack.top() stack.push(A)
stack.push(GOTOs,A) else if (
ACTIONs,token shift si ) then
stack.push(token) stack.push(si) token ?
scanner.next_token() else if (
ACTIONs,token accept token EOF
) then not_found false else report a
syntax error and recover report success
- The skeleton parser
- uses ACTION GOTO tables
- does words shifts
- does derivation
- reductions
- does 1 accept
- detects errors by failure of 3 other cases
18LR(1) Parsers
- How does this LR(1) stuff work?
- Unambiguous grammar ? unique rightmost derivation
- Keep upper fringe on a stack
- All active handles include top of stack (TOS)
- Shift inputs until TOS is right end of a handle
- Language of handles is regular (finite)
- Build a handle-recognizing DFA
- ACTION GOTO tables encode the DFA
- To match subterm, invoke subterm DFA
- leave old DFAs state on stack
- Final state in DFA ? a reduce action
- New state is GOTOstate at TOS (after pop), lhs
- For SN, this takes the DFA to s1
19Building LR(1) Parsers
- How do we generate the ACTION and GOTO tables?
- Use the grammar to build a model of the DFA
- Use the model to build ACTION GOTO tables
- If construction succeeds, the grammar is LR(1)
- The Big Picture
- Model the state of the parser
- Use two functions goto( s, X ) and closure( s )
- goto() is analogous to move() in the subset
construction - closure() adds information to round out a state
- Build up the states and transition functions of
the DFA - Use this information to fill in the ACTION and
GOTO tables
Terminal or non-terminal
20LR(k) items
- An LR(k) item is a pair P, ?, where
- P is a production A?? with a at some position
in the rhs - ? is a lookahead string of length k
(words or EOF) - The in an item indicates the position of the
top of the stack - A???,a means that the input seen so far is
consistent with the use of A ??? immediately
after the symbol on top of the stack - A ???,a means that the input sees so far is
consistent with the use of A ??? at this point in
the parse, and that the parser has already
recognized ?. - A ???,a means that the parser has seen ??, and
that a lookahead symbol of a is consistent with
reducing to A. - The table construction algorithm uses items to
represent valid - configurations of an LR(1) parser
21Computing Gotos
- Goto(s,x) computes the state that the parser
would reach - if it recognized an x while in state s
- Goto( A???X?,a , X ) produces A??X??,a
(obviously) - It also includes closure( A??X??,a ) to fill
out the state - The algorithm
- Not a fixed point method!
- Straightforward computation
- Uses closure( )
- Goto() advances the parse
Goto( s, X ) new ?Ø ? items A??X?,a ?
s new ? new ? A??X?,a return
closure(new)
22Computing Closures
- Closure(s) adds all the items implied by items
already in s - Any item A???B?,a implies B???,x for each
production - with B on the lhs, and each x ? FIRST(?a)
- Since ?B? is valid, any way to derive ?B? is
valid, too - The algorithm
Closure( s ) while ( s is still changing )
? items A ? ?B?,a ? s ?
productions B ? ? ? P ? b ?
FIRST(?a) // ? might be ?
if B? ?,b ? s then
add B? ?,b to s
- Classic fixed-point algorithm
- Halts because s ? ITEMS
- Worklist version is faster
- Closure fills out a state
23LR(1) Items
- The production A??, where ? B1B1B1 with
lookahead a, can give rise to 4 items - A?B1B1B1,a, A?B1B1B1,a, A?B1B1B1,a,
A?B1B1B1,a - The set of LR(1) items for a grammar is finite
- Whats the point of all these lookahead symbols?
- Carry them along to choose correct reduction
(if a choice occurs) - Lookaheads are bookkeeping, unless item has at
right end - Has no direct use in A???,a
- In A??,a, a lookahead of a implies a reduction
by A ?? - For A??,a,B???,b , a ? reduce to A
FIRST(?) ? shift - Limited right context is enough to pick the
actions
24LR(1) Table Construction
- High-level overview
- Build the canonical collection of sets of LR(1)
Items, I - Begin in an appropriate state, s0
- S ?S,EOF, along with any equivalent items
- Derive equivalent items as closure( i0 )
- Repeatedly compute, for each sk, and each X,
goto(sk,X) - If the set is not already in the collection, add
it - Record all the transitions created by goto( )
- This eventually reaches a fixed point
- Fill in the table from the collection of sets of
LR(1) items - The canonical collection completely encodes the
- transition diagram for the handle-finding DFA
25Building the Canonical Collection
- Start from s0 closure( S?S,EOF )
- Repeatedly construct new states, until all are
found - The algorithm
s0 ? closure( S?S,EOF ) S ? s0 k ?
1 while ( S is still changing ) ? sj ? S and
? x ? ( T ? NT ) sk ? goto(sj,x)
record sj ? sk on x if sk ? S then S ?
S ? sk k ? k 1
- Fixed-point computation
- Loop adds to S
- S ? 2ITEMS, so S is finite
- Worklist version is faster
26Example
(grammar sets)
- Simplified, right recursive expression grammar
Goal ? Expr Expr ? Term Expr Expr ? Term Term ?
Factor Term Term ? Factor Factor ? ident
27Example (building the collection)
- Initialization Step
- s0 ? closure( Goal ? Expr , EOF )
- Goal ? Expr , EOF, Expr ? Term Expr
, EOF, Expr ? Term , EOF, - Term ? Factor Term , EOF, Term ?
Factor Term , , - Term ? Factor , EOF, Term ? Factor ,
, - Factor ? ident , EOF, Factor ?
ident , , Factor ? ident , - S ? s0
28Example (building the collection)
- Iteration 1
- s1 ? goto(s0 , Expr)
- s2 ? goto(s0 , Term)
- s3 ? goto(s0 , Factor)
- s4 ? goto(s0 , ident )
- Iteration 2
- s5 ? goto(s2 , )
- s6 ? goto(s3 , )
- Iteration 3
- s7 ? goto(s5 , Expr )
- s8 ? goto(s6 , Term )
29Example
(Summary)
- S0 Goal ? Expr , EOF, Expr ? Term
Expr , EOF, Expr ? Term , EOF, - Term ? Factor Term , EOF, Term ? Factor
Term , , - Term ? Factor , EOF, Term ? Factor , ,
- Factor ? ident , EOF, Factor ? ident ,
, Factor? ident, - S1 Goal ? Expr , EOF
- S2 Expr ? Term Expr , EOF, Expr ?
Term , EOF - S3 Term ? Factor Term , EOF,Term ?
Factor Term , , Term ? Factor , EOF, - Term ? Factor ,
- S4 Factor ? ident , EOF,Factor ? ident ,
, Factor ? ident , - S5 Expr ? Term Expr , EOF, Expr ?
Term Expr , EOF, Expr ? Term , EOF, - Term ? Factor Term , , Term ? Factor ,
, - Term ? Factor Term , EOF, Term ? Factor
, EOF, - Factor ? ident , , Factor ? ident , ,
Factor ? ident , EOF
30Example
(Summary)
- S6 Term ? Factor Term , EOF, Term ?
Factor Term , , - Term ? Factor Term , EOF, Term ?
Factor Term , , - Term ? Factor , EOF, Term ? Factor ,
, - Factor ? ident , EOF, Factor ? ident
, , Factor ? ident , - S7 Expr ? Term Expr , EOF
- S8 Term ? Factor Term , EOF, Term ?
Factor Term ,
31Filling in the ACTION and GOTO Tables
- The algorithm
- Many items generate no table entry
- Closure( ) instantiates FIRST(X) directly for
A??X?,a
? set sx ? S ? item i ? sx if i is
A?? ad,b and goto(sx,a) sk , a ? T
then ACTIONx,a ? shift k else if
i is S?S ,EOF then ACTIONx ,a
? accept else if i is A?? ,a
then ACTIONx,a ? reduce A?? ? n ?
NT if goto(sx ,n) sk then
GOTOx,n ? k
x is the state number
32Example
(Summary)
- The Goto Relationship (from the construction)
33Example
(Filling in the tables)
- The algorithm produces the following table
34What can go wrong?
- What if set s contains A??a?,b and B??,a ?
- First item generates shift, second generates
reduce - Both define ACTIONs,a cannot do both actions
- This is a fundamental ambiguity, called a
shift/reduce error - Modify the grammar to eliminate it
(if-then-else) - Shifting will often resolve it correctly
- What is set s contains A??, a and B??, a ?
- Each generates reduce, but with a different
production - Both define ACTIONs,a cannot do both
reductions - This is a fundamental ambiguity, called a
reduce/reduce conflict - Modify the grammar to eliminate it
(PL/Is overloading of (...)) - In either case, the grammar is not LR(1)
35LR(k) versus LL(k) (Top-down Recursive
Descent )
- Finding Reductions
- LR(k) ? Each reduction in the parse is detectable
with - the complete left context,
- the reducible phrase, itself, and
- the k terminal symbols to its right
- LL(k) ? Parser must select the reduction based on
- The complete left context
- The next k terminals
- Thus, LR(k) examines more context
- in practice, programming languages do not
actually seem to fall in the gap between LL(1)
languages and deterministic languages J.J.
Horning, LR Grammars and Analysers, in Compiler
Construction, An Advanced Course,
Springer-Verlag, 1976
36Summary