Shift/Reduce and LR(1) - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Shift/Reduce and LR(1)

Description:

Automaton is trivial (no need for explicit states) ... Automaton synthesizes (reduces) when end of a production is recognized. States of automaton encode ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 35
Provided by: Shmue2
Category:

less

Transcript and Presenter's Notes

Title: Shift/Reduce and LR(1)


1
Shift/Reduce and LR(1)
  • CMSC 431
  • Shon Vick

2
Table-driven parsing
  • Parsing performed by a finite state machine.
  • Parsing algorithm is language-independent.
  • FSM driven by table (s) generated automatically
    from grammar.
  • Language generator
    tables

Input
parser
stack
tables
3
Pushdown Automata
  • A context-free grammar can be recognized by a
    finite state machine with a stack a PDA.
  • The PDA is defined by set of internal states and
    a transition table.
  • The PDA can read the input and read/write on the
    stack.
  • The actions of the PDA are determined by its
    current state, the current top of the stack, and
    the current input symbol.
  • There are three distinguished states
  • start state nothing seen
  • accept state sentence complete
  • error state current symbol doesnt belong.

4
Review Top-down parsing
  • Parse tree is synthesized from the root (sentence
    symbol).
  • Stack contains symbols of rhs of current
    production, and pending non-terminals.
  • Automaton is trivial (no need for explicit
    states)
  • Transition table indexed by grammar symbol G and
    input symbol a. Entries in table are terminals or
    productions P ABC

5
Top-down parsing
  • Actions
  • initially, stack contains sentence symbol
  • At each step, let S be symbol on top of stack,
    and a be the next token on input.
  • if T (S, a) is terminal a, read token, pop symbol
    from stack
  • if T (S, a) is production P ABC.,
    remove S from stack, push the symbols A, B, C on
    the stack (A on top).
  • If S is the sentence symbol and a is the end of
    file, accept.
  • If T (S, a) is undefined, signal error.
  • Semantic action when starting a production,
    build tree node for non-terminal, attach to
    parent.

6
Table-driven parsing and recursive descent
parsing
  • Recursive descent every production is a
    procedure. Call stack holds active procedures
    corresponding to pending non-terminals.
  • Stack still needed for context-sensitive legality
    checks, error messages, etc.
  • Table-driven parser recursion simulated with
    explicit stack.

7
Building the parse table
  • Define two functions on the symbols of the
    grammar FIRST and FOLLOW.
  • For a non-terminal N, FIRST (N) is the set of
    terminal symbols that can start any derivation
    from N.
  • First (If_Statement) if
  • First (Expr) id, (
  • FOLLOW (N) is the set of terminals that can
    appear after a string derived from N
  • Follow (Expr) , ),

8
Computing FIRST (N)
  • If N e First (N) includes e
  • if N aABC First (N) includes a
  • if N X1X2 First (N) includes First
    (X1)
  • if N X1X2 and X1 e,
  • First (N) includes
    First (X2)
  • Obvious generalization to First (a) where a is
    X1X2...

9
Computing First (N)
  • Grammar for expressions, without left-recursion
  • E TE T
  • E TE e
  • T FT F
  • T FT e
  • F id (E)
  • First (F) id, (
  • First (T) , e First (T)
    id, (
  • First (E) , e First (E)
    id, (

10
Computing Follow (N)
  • Follow (N) is computed from productions in which
    N appears on the rhs
  • For the sentence symbol S, Follow (S) includes
  • if A a N b, Follow (N) includes First
    (b)
  • because an expansion of N will be followed by an
    expansion from b
  • if A a N, Follow (N) includes Follow
    (A)
  • because N will be expanded in the context in
    which A is expanded
  • if A a N B , B e, Follow (N) includes
    Follow (A)

11
Computing Follow (N)
  • E TE T
  • E TE e
  • T FT F
  • T FT e
  • F id (E)
  • Follow (E) ), Follow (E) ),
  • Follow (T) First (E ) Follow (E) , ),
  • Follow (T) Follow (T) , ),
  • Follow (F) First (T) Follow (T) , ,
    ),

12
Building LL (1) parse tables
  • Table indexed by non-terminal and token. Table
    entry is a production
  • for each production P A a loop
  • for each terminal a in First (a) loop
  • T (A, a) P
  • end loop
  • if e in First (a), then
  • for each terminal b in Follow (a) loop
    T (A, b) P end loop
  • end if
  • end loop
  • All other entries are errors.
  • If two assignments conflict, parse table cannot
    be built.

13
LL (1) grammars
  • If table construction is successful, grammar is
    LL (1) left-to right, leftmost derivation with
    one-token lookahead.
  • If construction fails, can conceive of LL (2),
    etc.
  • Ambiguous grammars are never LL (k)
  • If a terminal is in First for two different
    productions of A, the grammar cannot be LL (1).
  • Grammars with left-recursion are never LL (k)
  • Some useful constructs are not LL (k)

14
Bottom-up parsing
  • Synthesize tree from fragments
  • Automaton performs two actions
  • shift push next symbol on stack
  • reduce replace symbols on stack
  • Automaton synthesizes (reduces) when end of a
    production is recognized
  • States of automaton encode synthesis so far, and
    expectation of pending non-terminals
  • Automaton has potentially large set of states
  • Technique more general than LL (k)

15
LR (k) parsing
  • Left-to-right, rightmost derivation with k-token
    lookahead.
  • Most general parsing technique for deterministic
    grammars.
  • In general, not practical tables too large
    (106 states for C, Ada).
  • Common subsets SLR, LALR (1).

16
LR(k) Parsing Algorithms
  • This is an efficient class of Bottom-up parsing
    algorithms. The other bottom-up parsers include
    operator precedence parsers.
  • The name LR(k) means
  • L - Left-to-right scanning of the input
  • R - Constructing rightmost derivation in reverse
  • k - number of input symbols to select a parser
    action

17
Yet Another Example
  • Consider a grammar to generate all palindromes.
  • 1) S--gt P
  • 2) P --gt a Pa
  • 3) P --gt b P b
  • 4) P --gt c
  • LR parsers work with an augmented grammar in
    which the start symbol never appears in the right
    side of a production.
  • In a given grammar, if the start symbol appears
    in the RHS, we can add a production S --gt S (S
    is the new start symbol and S was the old start
    symbol)

18
Example Cont...
  • STACK INPUT BUFFER ACTION
  • abcba shift
  • a bcba shift
  • ab cba shift
  • abc ba reduce
  • abP ba shift
  • abPb a reduce
  • aP a shift
  • aPa reduce
  • P reduce
  • S accept

19
LR(0) Parsers
  • Qn How to select parser actions (namely shift,
    reduce, accept and error)?
  • Ans
  • 1) By constructing a DFA that encodes all parser
    states, and transitions on terminals and
    nonterminals. The transitions on terminals are
    the parser actions( also called the action table)
    and transitions on nonterminals resulting in a
    new state (also called the goto table).
  • 2) Keeping a stack to simulate the PDA. This
    stack maintains the list of states.

20
LR(0) Items and Closure
  • LR(0) parser state needs to capture how much of a
    given production we have scanned . LR(0) parser
    (like a FSA) needs to know how much the
    production (on the rhs) we have scanned so far.
  • For example in the production
  • P --gt a P a
  • An LR(0) item is a production with a mark/dot on
    the RHS. SO the items for this production will be
    P--gt . a P a , P --gt a . P a, P --gt a P. a,
    P--gt aPa.

21
Items and Closure Contd
  • Intuitively, there is a derivation (or we have
    seen the input symbols) to the left of dot.
  • Two kinds of items, kernel items and nonkernel
    items - Kernel and nonkernel items.
  • Kernel Items - Includes initial item S --gt .S
    and all items in which dot does not appear at the
    left most position.
  • Nonkernel Items- All other items which have dots
    at the leftmost position.

22
Closure of Items
  • Let I be the set of items. Then Closure (I)
    consists of the set of items that are constructed
    as follows
  • 1) Every item I is also in the Closure(I) -
    reflexive
  • 2 If A ? a . B b is in Closure(I), and B--gt g is
    production, then add the item B--gt .g also in the
    Closure(I), if it is not already a member. Repeat
    this until no more items can be added.

23
Intuition
  • Closure represents an equivalent state - all the
    possible ways that you could have reached that
    state.
  • Example I S-gt .P
  • Closure (I) S-gt.P, P-gt.aPa, P-gt.bPb, P-gt.c

24
GOTO Operation
  • Let I be the set of items and let X be a grammar
    symbol (nonterminal/terminal). Then
  • GOTO(I,X) Closure(A--gt a X.b A--gt a . X b is
    in I)
  • It is a new set of items by moving a dot over X.
    Intuitively, we have seen either an input symbol
    (terminal symbol) or seen a derivation starting
    with that nonterminal.

25
Canonical set of Items (states)
  • Enumerate possible states for an LR(0) parser.
    Each state is a canonical set of items.
  • Algorithm
  • 1) Start with a canonical set, Closure(S--gt.S)
  • 2) If I is a canonical set and X is a grammar
    symbol such that Igoto(I,X) is nonempty, then
    make I a new canonical set (if it is not already
    a canonical set). Keep repeating this until no
    more canonical sets can be created.
  • The algorithm terminates!!.

26
Example
  • S0 S--gt .P , P --gt .a P a, P--gt .bP b, P--gt.c
  • S1 S--gt P.
  • S2 P --gt a.Pa, P--gt.aPa,P--gt.bPb,P--gt.c
  • S3P--gt b.P b, P--gt.aPa,P--gt.bPb,P--gt.c
  • S4 P--gt c.
  • S5 P--gt aP.a
  • S6P--gt bP.b
  • S7 P--gt aPa.
  • S8 P--gt bP b.

27
Finite State Machine
  • Draw the FSA. The major difference is that
    transitions can be both terminal and nonterminal
    symbols.
  • The Goto and Action Parts of the parsing table
    come from the FSA as detailed in Galles on pp
    89-92.

28
Key Idea in Canonical states
  • If a state contains an item of the form
  • A-gt b . , then state prompts a reduce action
    (provided the correct symbols follow).
  • If a state contains A--gt a . d, then the state
    prompts the parser to perform a shift action (of
    course on the right symbols).
  • If a state contains S--gt S. and there are no
    more input symbols left, then the parser is
    prompted to accept.
  • Else an error message is prompted.

29
Parsing Table
  • state Input symbol goto
  • a b c P
  • 0 s2 s3 s4 2
  • 1. acc
  • 2. s2 s3 s4
    5
  • 3. s2 s3 s4
    6
  • 4. r3 r3
  • 5. s7
  • 6. s8
  • 7. r1 r1 r1 r1
  • 8. r2 r2 r2 r2

30
Parsing Table Contd
  • si means shift the input symbol and goto state I.
  • rj means reduce by jth production. Note that we
    are not storing all the items in the state in our
    table.
  • example abcba
  • if we go thru, parsing algorithm, we get

31
Example Contd
  • State input action
  • S0 abcba shift
  • S0aS2 bcba shift
  • S0aS2bS3 cba shift
  • S0aS2bS3cS4 ba reduce

32
Shift/Reduce Conflicts
  • An LR(0) state contains a conflict if its
    canonical set has two items that recommend
    conflicting actions.
  • shift/reduce conflict - when one item prompts a
    shift action, the other prompts a reduce action.
  • reduce/reduce conflict - when two items prompt
    for reduce actions by different production.
  • A grammar is said be to be LR(0) grammar, if the
    table does not have any conflicts.

33
Examples
  • See Figure 3.24 of Appel and associated text
  • (to be provided)

34
References
  • Modern Compiler Implementation in Java, Andrew
    Appel, Cambridge University Press
  • http//www.cs.nyu.edu/courses/spring02/G22.2130-00
    1/parsing2.ppt
  • http//www.cs.rpi.edu/moorthy/Courses/compiler98/
    Lectures/lecturesinppt/lecture7.ppt
  • Compilers Principles, Techniques and Tools, Aho
    Sethi Ullman , Addison Wesley
Write a Comment
User Comments (0)
About PowerShow.com