Title: LR Parsing
1LR Parsing
- Compiler
- Baojian Hua
- bjhua_at_ustc.edu.cn
2Front End
lexical analyzer
source code
tokens
abstract syntax tree
parser
semantic analyzer
IR
3Parsing
- The parser translates the source program into
abstract syntax trees - Token sequence
- returned from the lexer
- abstract syntax tree
- check validity of programs
- form compiler internal data structures for
programs - Must take account the program syntax
4Conceptually
parser
token sequence
abstract syntax tree
language syntax
5Predicative Parsing
- Grammars encode enough information on how to
choose production rules, when input terminals are
seen - LL(1) pros
- simple, easy to implement
- efficient
- Cons
- grammar rewriting
- ugly
6Todays Topic
- Bottom-up Parsing
- shift-reduce parsing, LR parsing
- This is the predominant algorithm used by
automatic YACC-like parser generators - YACC, bison, CUP, etc.
7Bottom-up Parsing
- 1 S exp
- 2 exp exp term
- 3 exp term
- 4 term term factor
- 5 term factor
- 6 factor ID
- 7 factor INT
2 3 4 factor 3 4 term 3 4 exp 3
4 exp factor 4 exp term 4 exp term
factor exp term exp S
A reverse of right-most derivation!
8Dot notation
- As a convenient notation, we will mark how much
of the input we have consumed by using a symbol
exp 3 ? 4
consumed
remaining input
9Bottom-up Parsing
2 ? 3 4 factor ? 3 4 term ? 3 4 exp
3 ? 4 exp factor ? 4 exp term 4 ? exp
term factor ? exp term ? exp ? S ?
2 3 4 factor 3 4 term 3 4 exp 3
4 exp factor 4 exp term 4 exp term
factor exp term exp S
10Another View
2 3 4 ? 3 4 ? 3 4 ? 3 4 ? 3
4 ? 3 4 ? 4 ? 4 ? 4 ? 4 ? ? ? ? ?
2 factor term exp exp exp 3 exp
factor exp term exp term exp term
4 exp term factor exp term exp S
- S exp
- exp exp term
- exp term
- term term factor
- term factor
- factor ID
- factor INT
Whats the data structure of the left?
11Producing a rightmost derivation in reverse
- We do two things
- shift a token (terminal) onto the stack, or
- reduce the top n symbols on the stack by a
production - When we reduce by a production A ?
- ? is on the top of the stack, pop ?
- and push A
- Key problem when to shift or reduce?
12Yet Another View
2 3 4 ? 3 4 ? 3 4 ? 3 4 ? 3 4
2 factor term exp
E
T
F
2
13Yet Another View
2 3 4 ? 3 4 ? 3 4 ? 3 4 ? 3
4 ? 3 4 ? 4 ? 4 ? 4
2 factor term exp exp exp 3 exp
factor exp term
S
T
E
T
F
T
F
4
F
3
2
14A shift-reduce parser
- Two components
- Stack holds the viable prefixes
- Input stream holds remaining source
- Four actions
- shift push token from input stream onto stack
- reduce right-end (? of A ?) is at top of
stack, pop ?, push A - accept success
- error syntax error discovered
15Table-driven LR(k) parsers
AST
tokens
Parser Loop
Lexer
Stack
Action table GOTO table
Grammar
Parser Generator
16An LR parser
- Put S on stack in state s0
- Parser configuration is(S, s0, X1, s1, X2, s2,
Xm, sm ai ai1 an ) - do forever
- read ai.
- if (actionai, sm is shift s then(S, s0, X1,
s1, X2, s2, Xm, sm, ai, s ai1 an ) - if (actionai, sm is reduce A ? then(S, s0,
X1, s1, X2, s2, Xm- ?, sm- ?, A, s ai ai1
an )where s gotosm- ?, A - if (actionai, sm is accept, DONE
- if (actionai, sm is error, handle error
17Generating LR parsers
- In order to generate an LR parser, we must create
the action and GOTO tables - Many different ways to do this
- We will start here with the simplest approach,
called LR(0) - Left-to-right parsing, Rightmost derivation, 0
lookahead
18Item
- LR(0) items have the formproduction-with-dot
- For example, X -gt A B C has 4 forms of items
- X ? A B C
- X A ? B C
- X A B ? C
- X A B C ?
19What items mean?
- X ? ? ? ?
- input is consistent with X ? ? ?
- X ? ? ? ?
- input is consistent with X ? ? ? and we have
already recognized ? - X ? ? ? ?
- input is consistent with X ? ? ? and we have
already recognized ? ? - X ? ? ? ?
- input is consistent with X ? ? ? and we can
reduce to X
20LR(0) Items
8
2
x
L -gt L, ? S S -gt ? (L) S -gt ? x
S -gt x ?
0 S -gt S 1 S -gt x S 2 S -gt y
3
x
(
(
S -gt (? L) L -gt ? S L -gt ? L, S S -gt ? (L) S -gt
? x
9
S
,
L -gt L, S ?
(
S -gt (L ? ) L -gt L ?, S
5
L
)
S
7
6
L -gt S ?
S -gt (L) ?
21LR(0) Items
8
2
x
L -gt L, ? S S -gt ? (L) S -gt ? x
S -gt x ?
0 S -gt S 1 S -gt (L) 2 S -gt x 3 L -gt S 4 L
-gt L, S
3
x
(
(
S -gt (? L) L -gt ? S L -gt ? L, S S -gt ? (L) S -gt
? x
9
S
,
L -gt L, S ?
(
S -gt (L ? ) L -gt L ?, S
5
L
)
S
7
6
L -gt S ?
S -gt (L) ?
22LR(0) table construction
- Construct LR(0) Items
- Item Ii becomes state i
- Parsing actions at state i are
- A ? ? a ? ? Ii and goto(Ii, a) Ijthen
actioni, a shift j - A ? ? ? Ii and A ? Sthen actioni, a
reduce by A ? - S S ? ? Ii then actioni, accept
23LR(0) table construction, contd
- GOTO table for non-terminals GOTOi,A j if
GOTO(Ii, A) Ij - Empty entries are error
24LR(0) Table
action action action action action goto goto
s\t ( ) x , S L
1 s3 s2 g4
2 r2 r2 r2 r2 r2
3 s3 s3 g7 g5
4 accept
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
8 s3 s2 g9
9 r4 r4 r4 r4 r4
25Problems with LR(0)
- For every item of the form X -gt ? ?
- blindly reduce to X, followed with a goto
- which may not miss any error, but may postpone
the detection of some errors
26Problems with LR(0)
8
2
x
L -gt L, ? S S -gt ? (L) S -gt ? x
S -gt x ?
0 S -gt S 1 S -gt (L) 2 S -gt x 3 L -gt S 4 L
-gt L, S
3
x
(
(
S -gt (? L) L -gt ? S L -gt ? L, S S -gt ? (L) S -gt
? x
9
S
,
L -gt L, S ?
(
S -gt (L ? ) L -gt L ?, S
5
L
)
S
7
6
Consider this input x 5
L -gt S ?
S -gt (L) ?
27Problems with LR(0)
action action action action action goto goto
s\t ( ) x , S L
1 s3 s2 g4
2 r2 r2 r2 r2 r2
3 s3 s3 g7 g5
4 accept
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
8 s3 s2 g9
9 r4 r4 r4 r4 r4
28Another Example
2
S -gt E ?
0 S -gt E 1 E -gt TE 2 E -gt T 3 T -gt x
3
T
E -gt T ? E E -gt T ?
T
4
x
E -gt T ? E E -gt ? TE E -gt ? T T -gt ? x
6
E
E -gt TE ?
A shift-reduce conflict!
29LR(0) Parse Table
action action action goto goto
s\t x E T
1 s5 g2 g3
2 accept
3 r2 s4, r2 r2
4 s5 g6 g3
5 r3 r3 r3
6 r1 r1 r1
30SLR table construction
- Construct LR(0) Items
- Item Ii becomes state i
- Parsing actions at state i are
- A ? ? a ? ? Ii and goto(Ii, a) Ijthen
actioni,a shift j - A ? ? ? Ii and A ? Sthen actioni,a
reduce by A ? for all a ? FOLLOW(A) - S S ? ? Ii then actioni, accept
- GOTO table for non-terminals
- GOTOi,A j if GOTO(Ii, A) Ij
- Empty entries are error
31Reduce LR(0) Table
8
2
x
L -gt L, ? S S -gt ? (L) S -gt ? x
S -gt x ?
0 S -gt S 1 S -gt (L) 2 S -gt x 3 L -gt S 4 L
-gt L, S
3
x
(
(
S -gt (? L) L -gt ? S L -gt ? L, S S -gt ? (L) S -gt
? x
9
S
,
L -gt L, S ?
(
S -gt (L ? ) L -gt L ?, S
5
L
)
S
7
6
Follow set S S , ,, ) L ,, )
L -gt S ?
S -gt (L) ?
32Reduce LR(0) Table
action action action action action goto goto
s\t ( ) x , S L
1 s3 s2 g4
2 r2 r2 r2 r2 r2
3 s3 s3 g7 g5
4 accept
5 s6 s8
6 r1 r1 r1 r1 r1
7 r3 r3 r3 r3 r3
8 s3 s2 g9
9 r4 r4 r4 r4 r4
33Resolve Shift-reduce Conflict
2
S -gt E ?
0 S -gt E 1 E -gt TE 2 E -gt T 3 T -gt x
3
T
E -gt T ? E E -gt T ?
T
4
x
E -gt T ? E E -gt ? TE E -gt ? T T -gt ? x
6
E
E -gt TE ?
Follow set S E T ,
34Resolve Shift-reduce Conflict
action action action goto goto
s\t x E T
1 s5 g2 g3
2 accept
3 r2 s4, r2 r2
4 s5 g6 g3
5 r3 r3 r3
6 r1 r1 r1
35Problems with SLR
R
S S S L R R L R id R
L
id
L
L
L
R
36Problems with SLR
- Reduce on ALL terminals in FOLLOW set
- FOLLOW(R) FOLLOW(L)
- But, we should never reduce R L on
- Thus, there should be no reduction in state 2
- Why this happen and how can we solve this?
S L R R L R id R L
37LR(1) Items
- X ? ? ?, a Means
- ? is at top of stack
- Input string is derivable from ?a
- In other words, when we reduce X ??, a had
better be the look ahead symbol. - Or, put reduce by X ?? in actions, a only
38LR(1) table construction
- Construct LR(1) Items
- Item Ii becomes state i
- Parsing actions at state i are
- A ? ? a ? ,b ? Ii and goto(Ii, a)
Ijthen actioni, a shift j - A ? ? ,b ? Ii and A ? Sthen actioni, a
reduce by A ? for b - S S ? , ? Ii then actioni,
accept - GOTO table for non-terminals GOTOi, A j if
GOTO(Ii, A) Ii - Empty entries are error
- Initial state is from Item containing S ? S
,
39LR(1) Items (part)
S S S L R R L R id R
L
L
40More
R
S L R R L R id R L
id
L
L
R
others
41Notice similar states?
R
S L R R L R id R L
id
L
L
R
others
42Notice similar states?
R
S L R R L R id R L
id
L
L
L id ? ,/
5
R L ? ,/
8
R L ? ,
10
L id ? ,
11
R
others
43LALR
S CC C cC d
c
C
c
d
d
S
c
C
C
C
c
d
44LALR
S CC C cC d
c
C
c
d
d
d
c
S
c
C
C
C
45LALR
S CC C cC d
c
C
c
d
d
d
c
S
C
C
46LALR Construction
- Merge items with common cores
- Change GOTO table to reflect merges
- Can introduce reduce/reduce conflicts
- Cannot introduce shift/reduce conflicts
47Ambiguous Grammars
- No ambiguous grammars can be LR(k)
- hence can not be parsed bottom-up
- Nevertheless, some of the ambiguous grammar are
well-understood, and can be parsed by LR(k) with
some tricks - precedence
- associativity
- dangling-else
48Precedence
E EE EE id
S E ? E E ? E E E ? E
S ? E E ? E E E ? E E E ? id
E
s/r on both and
49Precedence
E EE EE id
S E ? E E ? E E E ? E
S ? E E ? E E E ? E E E ? id
E
What if we want both and right-associative?
reduce on reduce on
reduce on shift on
50Parser Implementation
- Implementation Options
- Write a parser from scratch
- not as boring as writing a lexer, but not exactly
simple as you may imagine - Use an automatic parser generator
- Very general robust. sometimes not quite as
efficient as hand-written parsers. - Nevertheless, good for lazy compiler writers.
- Both are used extensively in production compilers
51Yacc Tool
semantic analyzer specification
parser
Yacc
Creates a parser from a declarative specification
involving a context-free grammar
52Brief History
- YACC stands for Yet Another Compiler-Compiler
- It was first developed by Steve Johnson in 1975
for Unix - There have been many later versions of YACC
(e.g., GNU Bison), each offering minor
improvements - Ported to many languages
- YACC is now a standard tool, defined in IEEE
Posix standard P1003.2
53ML-Yacc
- User Declarations declare values available in
- the rule actions
-
- ML-Yacc Definitions declare terminals and non-
- terminals special declarations to resolve
- conflicts
-
- Rules parser specified by CFG rules and
- associated semantic action that generate abstract
- syntax
54ML-Yacc Definitions (preliminaries)
- Specify type of positions
- pos int int
- Specify terminal and nonterminal symbols
- term IF THEN ELSE PLUS MINUS ...
- nonterm prog exp stm
- Specify end-of-parse token
- eop EOF
- Specify start symbol (by default, non terminal in
LHS of first rule) - start prog
55Example
-
- term ASSIGN ID PLUS NUM SEMICOLON TIMES
- nonterm s e
- pos int start p eop EOF
- left PLUS
- left TIMES
-
- p -gt s SEMICOLON p ()
- -gt ()
- s -gt ID ASSIGN e ()
- e -gt e PLUS e ()
- e TIMES e ()
- ID ()
- NUM ()
56Summary
- Bottom-up parsing
- reverse order of derivations
- LR grammars are more powerful
- use of stacks and parse tables
- yet more complex
- Bonus tools take the hard work for you, read the
online ML-Yacc manual