Bottom-up%20Parsing - PowerPoint PPT Presentation

About This Presentation
Title:

Bottom-up%20Parsing

Description:

Start at the root of the parse tree and grow toward leaves ... Bottom-up Parsing (definitions) The point of parsing is to construct a derivation ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 37
Provided by: keit123
Category:

less

Transcript and Presenter's Notes

Title: Bottom-up%20Parsing


1
Bottom-up Parsing
2
Parsing Techniques
  • Top-down parsers (LL(1), recursive descent)
  • Start at the root of the parse tree and grow
    toward leaves
  • Pick a production try to match the input
  • Bad pick ? may need to backtrack
  • Some grammars are backtrack-free
    (predictive parsing)
  • Bottom-up parsers (LR(1), operator
    precedence)
  • Start at the leaves and grow toward root
  • As input is consumed, encode possibilities in an
    internal state
  • Start in a state valid for legal first tokens
  • Bottom-up parsers handle a large class of grammars

3
Bottom-up Parsing
(definitions)
  • The point of parsing is to construct a derivation
  • A derivation consists of a series of rewrite
    steps
  • S ? ?0 ? ?1 ? ?2 ? ? ?n1 ? ?n ? sentence
  • Each ?i is a sentential form
  • If ? contains only terminal symbols, ? is a
    sentence in L(G)
  • If ? contains 1 non-terminals, ? is a
    sentential form
  • To get ?i from ?j1, expand some NT A ? ?i1 by
    using A ??
  • Replace the occurrence of A ? ?i1 with ? to get
    ?i
  • In a leftmost derivation, it would be the first
    NT A ? ?i1
  • A left-sentential form occurs in a leftmost
    derivation
  • A right-sentential form occurs in a rightmost
    derivation

4
Bottom-up Parsing
  • Bottom-up paring and reverse right most
    derivation
  • A derivation consists of a series of rewrite
    steps
  • A bottom-up parser builds a derivation by working
    from the input sentence back toward the start
    symbol S
  • S ? ?0 ? ?1 ? ?2 ? ? ?n1 ? ?n
    ? sentence
  • In terms of the parse tree, this is working from
    leaves to root
  • Nodes with no parent in a partial tree form its
    upper fringe
  • Since each replacement of ? with A shrinks the
    upper fringe,
  • we call it a reduction.

Bottom-up
5
Finding Reductions
(Handles)
  • The parser must find a substring ? of the trees
    frontier that
  • matches some production A ? ? that occurs as one
    step
  • In the rightmost derivation
  • Informally, we call this substring ? a handle
  • Formally,
  • A handle of a right-sentential form ? is a pair
    ltA??,kgt where
  • A?? ? P and k is the position in ? of ?s
    rightmost symbol.
  • If ltA??,kgt is a handle, then replace ? at k with
    A
  • Handle Pruning
  • The process of discovering a handle and reducing
    it to the appropriate left-hand side is called
    handle pruning
  • Because ? is a right-sentential form, the
    substring to the right of a handle contains only
    terminal symbols

6
Example
(a very busy slide)
The expression grammar
Handles for rightmost derivation of x 2 y
7
Handle-pruning, Bottom-up Parsers
  • One implementation technique is the shift-reduce
    parser

push INVALID token ? next_token( ) repeat until
(top of stack Goal and token EOF) if the
top of the stack is a handle A?? then
/ reduce ? to A/ pop ? symbols
off the stack push A onto the
stack else if (token ? EOF)
then / shift / push
token token ? next_token( )

8
Back to x - 2 y
5 shifts 9 reduces 1 accept
1. Shift until the top of the stack is the right
end of a handle 2. Find the left end of the
handle reduce
9
Back to x - 2 y
5 shifts 9 reduces 1 accept
1. Shift until the top of the stack is the right
end of a handle 2. Find the left end of the
handle reduce
10
Back to x - 2 y
5 shifts 9 reduces 1 accept
1. Shift until the top of the stack is the right
end of a handle 2. Find the left end of the
handle reduce
11
Back to x - 2 y
5 shifts 9 reduces 1 accept
1. Shift until the top of the stack is the right
end of a handle 2. Find the left end of the
handle reduce
12
Back to x - 2 y
5 shifts 9 reduces 1 accept
1. Shift until the top of the stack is the right
end of a handle 2. Find the left end of the
handle reduce
13
Back to x 2 y
5 shifts 9 reduces 1 accept
1. Shift until the top of the stack is the right
end of a handle 2. Find the left end of the
handle reduce
14
Example


ltid,ygt
ltid,xgt
ltnum,2gt
15
Shift-reduce Parsing
  • Shift reduce parsers are easily built and easily
    understood
  • A shift-reduce parser has just four actions
  • Shift next word is shifted onto the stack
  • Reduce right end of handle is at top of stack
  • Locate left end of handle within the stack
  • Pop handle off stack push appropriate lhs
  • Accept stop parsing report success
  • Error call an error reporting/recovery routine
  • Critical Question How can we know when we have
    found a handle without generating lots of
    different derivations?
  • Answer we use look ahead in the grammar along
    with tables produced as the result of analyzing
    the grammar.
  • LR(1) parsers build a DFA that runs over the
    stack find them
  • Handle finding is key
  • handle is on stack
  • finite set of handles
  • use a DFA !

16
LR(1) Parsers
  • A table-driven LR(1) parser looks like
  • Tables can be built by hand
  • It is a perfect task to automate

source code
IR
grammar
17
LR(1) Skeleton Parser
stack.push(INVALID) stack.push(s0) not_found
true token scanner.next_token() do while
(not_found) s stack.top() if (
ACTIONs,token reduce A?? ) then
stack.popnum(2?) // pop 2? symbols
s stack.top() stack.push(A)
stack.push(GOTOs,A) else if (
ACTIONs,token shift si ) then
stack.push(token) stack.push(si) token ?
scanner.next_token() else if (
ACTIONs,token accept token EOF
) then not_found false else report a
syntax error and recover report success
  • The skeleton parser
  • uses ACTION GOTO tables
  • does words shifts
  • does derivation
  • reductions
  • does 1 accept
  • detects errors by failure of 3 other cases

18
LR(1) Parsers
  • How does this LR(1) stuff work?
  • Unambiguous grammar ? unique rightmost derivation
  • Keep upper fringe on a stack
  • All active handles include top of stack (TOS)
  • Shift inputs until TOS is right end of a handle
  • Language of handles is regular (finite)
  • Build a handle-recognizing DFA
  • ACTION GOTO tables encode the DFA
  • To match subterm, invoke subterm DFA
  • leave old DFAs state on stack
  • Final state in DFA ? a reduce action
  • New state is GOTOstate at TOS (after pop), lhs
  • For SN, this takes the DFA to s1

19
Building LR(1) Parsers
  • How do we generate the ACTION and GOTO tables?
  • Use the grammar to build a model of the DFA
  • Use the model to build ACTION GOTO tables
  • If construction succeeds, the grammar is LR(1)
  • The Big Picture
  • Model the state of the parser
  • Use two functions goto( s, X ) and closure( s )
  • goto() is analogous to move() in the subset
    construction
  • closure() adds information to round out a state
  • Build up the states and transition functions of
    the DFA
  • Use this information to fill in the ACTION and
    GOTO tables

Terminal or non-terminal
20
LR(k) items
  • An LR(k) item is a pair P, ?, where
  • P is a production A?? with a at some position
    in the rhs
  • ? is a lookahead string of length k
    (words or EOF)
  • The in an item indicates the position of the
    top of the stack
  • A???,a means that the input seen so far is
    consistent with the use of A ??? immediately
    after the symbol on top of the stack
  • A ???,a means that the input sees so far is
    consistent with the use of A ??? at this point in
    the parse, and that the parser has already
    recognized ?.
  • A ???,a means that the parser has seen ??, and
    that a lookahead symbol of a is consistent with
    reducing to A.
  • The table construction algorithm uses items to
    represent valid
  • configurations of an LR(1) parser

21
Computing Gotos
  • Goto(s,x) computes the state that the parser
    would reach
  • if it recognized an x while in state s
  • Goto( A???X?,a , X ) produces A??X??,a
    (obviously)
  • It also includes closure( A??X??,a ) to fill
    out the state
  • The algorithm
  • Not a fixed point method!
  • Straightforward computation
  • Uses closure( )
  • Goto() advances the parse

Goto( s, X ) new ?Ø ? items A??X?,a ?
s new ? new ? A??X?,a return
closure(new)
22
Computing Closures
  • Closure(s) adds all the items implied by items
    already in s
  • Any item A???B?,a implies B???,x for each
    production
  • with B on the lhs, and each x ? FIRST(?a)
  • Since ?B? is valid, any way to derive ?B? is
    valid, too
  • The algorithm

Closure( s ) while ( s is still changing )
? items A ? ?B?,a ? s ?
productions B ? ? ? P ? b ?
FIRST(?a) // ? might be ?
if B? ?,b ? s then
add B? ?,b to s
  • Classic fixed-point algorithm
  • Halts because s ? ITEMS
  • Worklist version is faster
  • Closure fills out a state

23
LR(1) Items
  • The production A??, where ? B1B1B1 with
    lookahead a, can give rise to 4 items
  • A?B1B1B1,a, A?B1B1B1,a, A?B1B1B1,a,
    A?B1B1B1,a
  • The set of LR(1) items for a grammar is finite
  • Whats the point of all these lookahead symbols?
  • Carry them along to choose correct reduction
    (if a choice occurs)
  • Lookaheads are bookkeeping, unless item has at
    right end
  • Has no direct use in A???,a
  • In A??,a, a lookahead of a implies a reduction
    by A ??
  • For A??,a,B???,b , a ? reduce to A
    FIRST(?) ? shift
  • Limited right context is enough to pick the
    actions

24
LR(1) Table Construction
  • High-level overview
  • Build the canonical collection of sets of LR(1)
    Items, I
  • Begin in an appropriate state, s0
  • S ?S,EOF, along with any equivalent items
  • Derive equivalent items as closure( i0 )
  • Repeatedly compute, for each sk, and each X,
    goto(sk,X)
  • If the set is not already in the collection, add
    it
  • Record all the transitions created by goto( )
  • This eventually reaches a fixed point
  • Fill in the table from the collection of sets of
    LR(1) items
  • The canonical collection completely encodes the
  • transition diagram for the handle-finding DFA

25
Building the Canonical Collection
  • Start from s0 closure( S?S,EOF )
  • Repeatedly construct new states, until all are
    found
  • The algorithm

s0 ? closure( S?S,EOF ) S ? s0 k ?
1 while ( S is still changing ) ? sj ? S and
? x ? ( T ? NT ) sk ? goto(sj,x)
record sj ? sk on x if sk ? S then S ?
S ? sk k ? k 1
  • Fixed-point computation
  • Loop adds to S
  • S ? 2ITEMS, so S is finite
  • Worklist version is faster

26
Example
(grammar sets)
  • Simplified, right recursive expression grammar

Goal ? Expr Expr ? Term Expr Expr ? Term Term ?
Factor Term Term ? Factor Factor ? ident
27
Example (building the collection)
  • Initialization Step
  • s0 ? closure( Goal ? Expr , EOF )
  • Goal ? Expr , EOF, Expr ? Term Expr
    , EOF, Expr ? Term , EOF,
  • Term ? Factor Term , EOF, Term ?
    Factor Term , ,
  • Term ? Factor , EOF, Term ? Factor ,
    ,
  • Factor ? ident , EOF, Factor ?
    ident , , Factor ? ident ,
  • S ? s0

28
Example (building the collection)
  • Iteration 1
  • s1 ? goto(s0 , Expr)
  • s2 ? goto(s0 , Term)
  • s3 ? goto(s0 , Factor)
  • s4 ? goto(s0 , ident )
  • Iteration 2
  • s5 ? goto(s2 , )
  • s6 ? goto(s3 , )
  • Iteration 3
  • s7 ? goto(s5 , Expr )
  • s8 ? goto(s6 , Term )

29
Example
(Summary)
  • S0 Goal ? Expr , EOF, Expr ? Term
    Expr , EOF, Expr ? Term , EOF,
  • Term ? Factor Term , EOF, Term ? Factor
    Term , ,
  • Term ? Factor , EOF, Term ? Factor , ,
  • Factor ? ident , EOF, Factor ? ident ,
    , Factor? ident,
  • S1 Goal ? Expr , EOF
  • S2 Expr ? Term Expr , EOF, Expr ?
    Term , EOF
  • S3 Term ? Factor Term , EOF,Term ?
    Factor Term , , Term ? Factor , EOF,
  • Term ? Factor ,
  • S4 Factor ? ident , EOF,Factor ? ident ,
    , Factor ? ident ,
  • S5 Expr ? Term Expr , EOF, Expr ?
    Term Expr , EOF, Expr ? Term , EOF,
  • Term ? Factor Term , , Term ? Factor ,
    ,
  • Term ? Factor Term , EOF, Term ? Factor
    , EOF,
  • Factor ? ident , , Factor ? ident , ,
    Factor ? ident , EOF

30
Example
(Summary)
  • S6 Term ? Factor Term , EOF, Term ?
    Factor Term , ,
  •  Term ? Factor Term , EOF, Term ?
    Factor Term , ,
  • Term ? Factor , EOF, Term ? Factor ,
    ,
  • Factor ? ident , EOF, Factor ? ident
    , , Factor ? ident ,
  • S7 Expr ? Term Expr , EOF
  • S8 Term ? Factor Term , EOF, Term ?
    Factor Term ,

31
Filling in the ACTION and GOTO Tables
  • The algorithm
  • Many items generate no table entry
  • Closure( ) instantiates FIRST(X) directly for
    A??X?,a

? set sx ? S ? item i ? sx if i is
A?? ad,b and goto(sx,a) sk , a ? T
then ACTIONx,a ? shift k else if
i is S?S ,EOF then ACTIONx ,a
? accept else if i is A?? ,a
then ACTIONx,a ? reduce A?? ? n ?
NT if goto(sx ,n) sk then
GOTOx,n ? k
x is the state number
32
Example
(Summary)
  • The Goto Relationship (from the construction)

33
Example
(Filling in the tables)
  • The algorithm produces the following table

34
What can go wrong?
  • What if set s contains A??a?,b and B??,a ?
  • First item generates shift, second generates
    reduce
  • Both define ACTIONs,a cannot do both actions
  • This is a fundamental ambiguity, called a
    shift/reduce error
  • Modify the grammar to eliminate it
    (if-then-else)
  • Shifting will often resolve it correctly
  • What is set s contains A??, a and B??, a ?
  • Each generates reduce, but with a different
    production
  • Both define ACTIONs,a cannot do both
    reductions
  • This is a fundamental ambiguity, called a
    reduce/reduce conflict
  • Modify the grammar to eliminate it
    (PL/Is overloading of (...))
  • In either case, the grammar is not LR(1)

35
LR(k) versus LL(k) (Top-down Recursive
Descent )
  • Finding Reductions
  • LR(k) ? Each reduction in the parse is detectable
    with
  • the complete left context,
  • the reducible phrase, itself, and
  • the k terminal symbols to its right
  • LL(k) ? Parser must select the reduction based on
  • The complete left context
  • The next k terminals
  • Thus, LR(k) examines more context
  • in practice, programming languages do not
    actually seem to fall in the gap between LL(1)
    languages and deterministic languages J.J.
    Horning, LR Grammars and Analysers, in Compiler
    Construction, An Advanced Course,
    Springer-Verlag, 1976

36
Summary
Write a Comment
User Comments (0)
About PowerShow.com