Title: Syntax Analysis
1Syntax Analysis
2Syntax Analysis
- Introduction to parsers
- Context-free grammars
- Push-down automata
- Top-down parsing
- Buttom-up parsing
- Bison - a parser generator
3Introduction to parsers
token
source
syntax
Parser
tree
code
next token
Symbol Table
4Context-Free Grammars
- A set of terminals basic symbols from which
sentences are formed - A set of nonterminals syntactic categories
denoting sets of sentences - A set of productions rules specifying how the
terminals and nonterminals can be combined to
form sentences - The start symbol a distinguished nonterminal
denoting the language
5An Example
- Terminals id, , -, , /, (, )
- Nonterminals expr, op
- Productions expr ? expr op expr
expr ? ( expr ) expr ? - expr expr
? id op ? - / - The start symbol expr
6Derivations
- A derivation step is an application of a
production as a rewriting rule E ? - E - A sequence of derivation steps E ? - E ? - ( E )
? - ( id ) is called a derivation of - ( id )
from E - The symbol ? denotes derives in zero or more
steps the symbol ? denotes derives in one or
more steps E ? - ( id ) E ? - ( id )
7Context-Free Languages
- A context-free language L(G) is the language
defined by a context-free grammar G - A string of terminals ? is in L(G) if and only if
S ? ?, ? is called a sentence of G - If S ? ?, where ? may contain nonterminals, then
we call ? a sentential form of G E ? - E ? - (
E ) ? - ( id ) - G1 is equivalent to G2 if L(G1) L(G2)
8Left- Right-most Derivations
- Each derivation step needs to choose
- a nonterminal to rewrite
- a production to apply
- A leftmost derivation always chooses the leftmost
nonterminal to rewrite E ?lm - E ?lm - ( E ) ?lm
- ( E E ) ?lm - ( id E ) ?lm - ( id
id ) - A rightmost derivation always chooses the
rightmost nonterminal to rewrite E ?rm - E ?rm -
( E ) ?rm - ( E E ) ?rm - (E id )
?rm - ( id id )
9Parse Trees
- A parse tree is a graphical representation for a
derivation that filters out the order of choosing
nonterminals for rewriting - Many derivations may correspond to the same parse
tree, but every parse tree has associated with it
a unique leftmost and a unique rightmost
derivation
10An Example
E ?lm - E ?lm - ( E ) ?lm - ( E E
)?lm - ( id E ) ?lm - ( id id )
E ?rm - E ?rm - ( E ) ?rm - ( E E
)?rm - ( E id ) ?rm - ( id id )
11Ambiguous Grammar
- A grammar is ambiguous if it produces more than
one parse tree for some sentence
E ? E E ? id E ? id E E ? id
id E ? id id id
E ? E E ? E E E ? id E E ? id
id E ? id id id
12Ambiguous Grammar
13Resolving Ambiguity
- Use disambiguiting rules to throw away
undesirable parse trees - Rewrite grammars by incorporating disambiguiting
rules into grammars
14An Example
- The dangling-else grammar stmt ? if expr then
stmt if expr then stmt else
stmt other - Two parse trees for if E1 then if E2 then S1
else S2
15An Example
16Disambiguiting Rules
- Rule match each else with the closest previous
unmatched then - Remove undesired state transitions in the
pushdown automaton
17Grammar Rewriting
stmt ? m_stmt unm_stmt
m_stmt ? if expr then m_stmt else m_stmt
other unm_stmt ? if expr then stmt
if expr then m_stmt else unm_stmt
18RE vs. CFG
- Every language described by a RE can also be
described by a CFG - Why use REs for lexical syntax?
- do not need a notation as powerful as CFGs
- are more concise and easier to understand than
CFGs - More efficient lexical analyzers can be
constructed from REs than from CFGs - Provide a way for modularizing the front end into
two manageable-sized components
19Push-Down Automata
Input
Finite Automaton
Stack
Output
20An Example
S ? S S ? a S bS ? ?
21Nonregular Constructs
- REs can denote only a fixed number of repetitions
or an unspecified number of repetitions of one
given construct an, a - A nonregular construct
- L anbn n ? 0
22Non-Context-Free Constructs
- CFGs can denote only a fixed number of
repetitions or an unspecified number of
repetitions of one or two given constructs - Some non-context-free constructs
- L1 wcw w is in (a b)
- L2 anbmcndm n ? 1 and m ? 1
- L3 anbncn n ? 0
23??
????? ????,???,?????? -- ??
24Top-Down Parsing
- Construct a parse tree from the root to the
leaves using leftmost derivation 1. S ? c A
B input cad 2. A ? a b 3. A ? a 4. B ? d
S
25Predictive Parsing
- A top-down parsing without backtracking
- there is only one alternative production to
choose at each derivation stepstmt ? if expr
then stmt else stmt while expr do
stmt begin stmt_list end
26LL(k) Parsing
- The first L stands for scanning the input from
left to right - The second L stands for producing a leftmost
derivation - The k stands for the number of lookahead input
symbols used to choose alternative productions at
each derivation step
27LL(1) Parsing
- Use one input symbol of lookahead
- Recursive-descent parsing
- Nonrecursive predictive parsing
28An Example
LL(1) S ? a b e c d e LL(2) S ? a b e a
d e
29Recursive Descent Parsing
- The parser consists of a set of (possibly
recursive) procedures - Each procedure is associated with a nonterminal
of the grammar that is responsible to derive the
productions of that nonterminal - Each procedure should be able to choose a unique
production to derive based on the current token
30An Example
type ? simple id
array simple of type simple ?
integer char
num dotdot num
integer, char, num
31Recursive Descent Parsing
- For each terminal in the production, the terminal
is matched with the current token - For each nonterminal in the production, the
procedure associated with the nonterminal is
called - The sequence of matchings and procedure calls in
processing the input implicitly defines a parse
tree for the input
32An Example
array num dotdot num of integer
type
33An Example
procedure match(t terminal) begin if
lookahead t then lookahead
nexttoken else error end
34An Example
procedure type begin if lookahead is in
integer, char, num then simple else if
lookahead id then match(id) else if
lookahead array then begin
match(array) match('') simple match('')
match(of) type end else error end
35An Example
procedure simple begin if lookahead integer
then match(integer) else if lookahead
char then match(char) else if lookahead
num then begin match(num) match(dotdot)
match(num) end else error end
36First Sets
- The first set of a string ? is the set of
terminals that begin the strings derived from?.
If ? ? ? , then ? is also in the first set of
?.
37First Sets
- If X is terminal, then FIRST(X) is X
- If X is nonterminal and X ? ? is a production,
then add ? to FIRST(X) - If X is nonterminal and X ? Y1 Y2 ... Yk is a
production, then add a to FIRST(X) if for some
i, a is in FIRST(Yi) and ? is in all of
FIRST(Y1), ..., FIRST(Yi-1). If ? is in FIRST(Yj)
for all j, then add ? to FIRST(X)
38An Example
E ? T E' E' ? T E' ? T ? F
T' T' ? F T' ? F ? ( E )
id FIRST(F) (, id FIRST(T') , ? ,
FIRST(T) (, id FIRST(E') , ?
, FIRST(E) (, id
39Follow Sets
- The follow set of a nonterminal A is the set of
terminals that can appear immediately to the
right of A in some sentential form, namely, S
? ? A a ? a is in the follow set of A.
40Follow Sets
- Place in FOLLOW(S), where S is the start symbol
and is the input right endmarker - If there is a production A ? ? B? , then
everything in FIRST(?) except for ? is placed in
FOLLOW(B) - If there is a production A ? ? B or A ? ? B?
where FIRST(?) contains ? , then everything in
FOLLOW(A) is in FOLLOW(B)
41An Example
E ? T E' E' ? T E' ? T ? F
T' T' ? F T' ? F ? ( E )
id FIRST(E) FIRST(T) FIRST(F) (, id
FIRST(E') , ? , FIRST(T') , ?
FOLLOW(E) ), , FOLLOW(E') ),
FOLLOW(T) , ), , FOLLOW(T') , ),
FOLLOW(F) , , ),
42Nonrecursive Predictive Parsing
Input
Parsing driver
Output
Stack
Parsing table
43Stack Operations
- Match
- when the top stack symbol is a terminal and it
matches the input token, pop the terminal and
advance the input pointer - Expand
- when the top stack symbol is a nonterminal,
replace this symbol by the right hand side of one
of its productions (pop the nonterminal and push
the right hand side of a production in reverse
order)
44An Example
type ? simple id
array simple of type simple ?
integer char
num dotdot num
45An Example
Action Stack Input E type
array num dotdot num
of integer M type of simple array
array num dotdot num of integer M type of
simple num dotdot num
of integer E type of simple
num dotdot num of integer M
type of num dotdot num num dotdot num
of integer M type of num dotdot
dotdot num of integer M
type of num
num of integer M type of
of integer M type of
of integer E type
integer E simple
integer M integer
integer
46Parsing Driver
push S onto the stack, where S is the start
symbol set ip to point to the first symbol of
w repeat let X be the top stack symbol and a
the symbol pointed to by ip if X is a
terminal or then if X a then
pop X from the stack and advance ip else
error else / X is a nonterminal / if
MX, a X ? Y1 Y2 ... Yk then pop X
from the stack and push Yk ... Y2 Y1 onto the
stack else error until X and a
47Constructing Parsing Table
- Input. Grammar G.
- Output. Parsing Table M.
- Method.
- For each production A ? ?, do steps 2 and 3.
- 2. For each terminal a in FIRST(? ), add A ? ? to
MA, a. - 3. If ? is in FIRST(? ), add A ? ? to MA, b
for each - symbol b in FOLLOW(A).
- 4. Make each undefined entry of M be error.
48An Example
FIRST(E) FIRST(T) FIRST(F) (, id
FIRST(E') , ? , FIRST(T') , ?
FOLLOW(E) ), , FOLLOW(E') ),
FOLLOW(T) , ), , FOLLOW(T') , ),
FOLLOW(F) , , ),
49An Example
Stack Input
Output E id id id
E'T id id id
E ? TE' E'T'F id id id
T ? FT' E'T'id id id id
F ? id E'T' id id E'
id id T' ?
? E'T id id
E' ? TE' E'T id
id E'T'F id id
T ? FT' E'T'id id id
F ? id E'T'
id E'T'F id
T' ? FT' E'T'F
id E'T'id id
F ? id E'T'
E'
T' ? ?
E' ? ?
50LL(1) Grammars
- A grammar is an LL(1) grammar if its LL(1)
parsing table has no multiply-defined entries
51A Counter Example
S ? i E t S S' a FOLLOW(S) , e S' ? e S
? FOLLOW(S') , e E ? b FOLLOW(E) t
a b e i
t S S ? a
S ? i E t S S' S' S' ? e S
S' ? ?
S' ? ? E E ? b
52LL(1) Grammars
- A grammar G is LL(1) iff whenever A ? ? ? are
two distinct productions of G, the following
conditions hold - For no terminal a do both ? and ? derive strings
beginning with a. - At most one of ? and ? can derive the empty
string. - If ? ? ? , then ? does not derive any string
beginning with a terminal in FOLLOW(A).
53Left Recursion
- A grammar is left recursive if it has a
nonterminal A such that A ? A ?
A ? A ? ?
A ? ? R R ? ? R ?
A
A
R
R
A
R
R
A
A
54Direct Left Recursion
A ? A ?1 A ?2 ... A ?m ?1 ?2 ... ?n
A ? ?1 A' ?2 A' ... ?n A'
A' ? ?1 A' ?2 A' ... ?m A' ?
55An Example
E ? E T T T ? T F F F ? ( E
) id E ? T E' E' ? T E' ? T ? F
T' T' ? F T' ? F ? ( E ) id
56Indirect Left Recursion
S ? A a b A ? A c S d ? S ? A a ? S
d a A ? A c A a d b d ? S ? A a b A
? b d A' A' A' ? c A' a d A' ?
57Indirect Left Recursion
Input. Grammar G with no cycles (derivations of
the form A ? A) or ?-production (productions
of the form A ? ?). Output. An equivalent grammar
with no left recursion. 1. Arrange the
nonterminals in some order A1, A2, ..., An 2. for
i 1 to n do begin for j 1 to i - 1 do
begin replace each production of the form
Ai ? Aj ? by the production Ai ? ?1 ? ?2 ?
... ?k ? where Aj ? ?1 ?2 ... ?k are
all the current Aj-productions end eliminate
direct left recursion among Ai-productions end
58Left Factoring
- Two alternatives of a nonterminal A have a
nontrivial common prefix if ? ? ? , and A ? ?
?1 ? ?2 A ? ? A' A' ? ?1 ?2
59An Example
S ? i E t S i E t S e S a E ? b S ? i E t
S S' a S' ? e S ? E ? b
60Error Recovery
- Panic mode skip tokens until a token in a set of
synchronizing tokens appears - If a terminal on stack cannot be matched, pop the
terminal - use FOLLOW(A) as sync set for A (pop A)
- use the first set of a higher construct as sync
set for A - use FIRST(A) as sync set for A
- use the production deriving ? as the default for A
61An Example
E ? T E' E' ? T E' ? T ? F T' T' ?
F T' ? F ? ( E ) id FIRST(E) FIRST(T)
FIRST(F) (, id FIRST(E') , ?
FIRST(T') , ? FOLLOW(E) FOLLOW(E')
), FOLLOW(T) FOLLOW(T') , ),
FOLLOW(F) , , ),
62An Example
63An Example
Stack Input
Output E ) id id
error, skip ) E id id
E'T id id
E ? TE' E'T'F id id
T ? FT' E'T'id id
id F ? id E'T'
id E'T'F id
T' ? FT' E'T'F id
error E'T'
id F has been poped E'
id E'T id
E' ? TE' E'T
id E'T'F id
T ? FT' E'T'id id
F ? id E'T'
E'
T' ? ?
E' ? ?
64??
?????
??????
???????,????, ??,????? -- ??
??????????? -- ????
65Bottom-Up Parsing
- Construct a parse tree from the leaves to the
root using rightmost derivation in reverse S ?
a A B e input abbcde A ? A b c b B ? d
abbcde ?rm aAbcde ? rm aAde ? rm aABe
? rm S
66Handles
- A handle ? of a right-sentential form ? consists
of - a production A ? ?
- a position of ? where ? can be replaced by A to
produce the previous right-sentential form in a
rightmost derivation of ?
67Handle Pruning
- The string ? to the right of the handle contains
only terminals - A is the bottommost leftmost interior node with
all its children in the tree
68An Example
S
S
69Shift-Reduce Parsing
Input
Handle
Parsing driver
Output
Parsing table
Stack
70Stack Operations
- Shift shift the next input symbol onto the top
of the stack - Reduce replace the handle at the top of the
stack with the corresponding nonterminal - Accept announce successful completion of the
parsing - Error call an error recovery routine
71An Example
Action Stack Input S
a b b c d e S
a b b c
d e R a b
b c d e S a A
b c d e S
a A b c d e
R a A b c
d e S a A
d e R
a A d e
S a A B
e R a A B e
A
S
72Shift/Reduce Conflict
stmt ? if expr then stmt if expr
then stmt else stmt other
Stack
Input - - - if expr then stmt
else - - -
Shift ? if expr then stmt else stmt Reduce ?
if expr then stmt
73Reduce/Reduce Conflict
stmt ? id ( para_list ) stmt ? expr expr
para_list ? para_list , para para_list ?
para para ? id expr ? id ( expr_list ) expr
? id expr_list ? expr_list , expr expr_list ?
expr
Stack
Input - - - id ( id , id
) - - -
- - - procid ( id , id ) - - -
74LR(k) Parsing
- The L stands for scanning the input from left to
right - The R stands for constructing a rightmost
derivation in reverse - The k stands for the number of lookahead input
symbols used to make parsing decisions
75LR Parsing
- The LR parsing algorithm
- Constructing SLR(1) parsing tables
- Constructing LR(1) parsing tables
- Constructing LALR(1) parsing tables
76Model of an LR Parser
Input
Stack
Sm
Output
Parsing driver
Xm
Sm-1
Xm-1
Action
Goto
Parsing table
S0
77An Example
State Action
Goto id ( )
E T F 0 s5 s4
1 2 3 1 s6
acc 2 r2 s7
r2 r2 3 r4 r4 r4 r4 4
s5 s4 8
2 3 5 r6 r6 r6
r6 6 s5 s4
9 3 7 s5 s4
10 8 s6
s11 9 r1 s7 r1
r1 10 r3 r3 r3 r3 11
r5 r5 r5 r5
(1) E ? E T (2) E ? T (3) T ? T F
(4) T ? F (5) F ? ( E ) (6) F ? id
78An Example
Action Stack Input s5 0 id
id id r6 0 id5 id id r4 0
F3 id id r2 0 T2 id
id s6 0 E1 id id s5 0 E1 6
id id r6 0 E1 6 id5
id r4 0 E1 6 F3 id s7
0 E1 6 T9 id s5 0 E1 6
T9 7 id r6 0 E1 6 T9 7 id5
r3 0 E1 6 T9 7 F10 r1
0 E1 6 T9 acc 0 E1
79LR Parsing Driver
push s0 onto the stack, where s0 is the initial
state set ip to point to the first symbol of
w repeat let s be the top state on the stack
and a the symbol pointed to by ip if
actions, a shift s then push a and s
onto the stack and advance ip else if
actions, a reduce A ? ? then pop 2
? symbols off the stack s gototop(), A
push a and s onto the stack and advance ip
else if actions, a accept then
return else error until false
80LR(0) Items
- An LR(0) item of a grammar in G is a production
of G with a dot at some position of the
right-hand side, A ? ? ? ? - The production A ? X Y Z yields the following
four LR(0) items A ? X Y Z, A ? X Y Z,
A ? X Y Z, A ? X Y Z - An LR(0) item represents a state in an NPDA
indicating how much of a production we have seen
at a given point in the parsing process
81From CFG to NPDA
- The state A ? ? ? B? will go to the state B ? ? ?
via an edge of the empty string ? - The state A ? ? ? a ? will go to the state A ? ?
a ? ? via an edge of terminal a (a shifting) - The state A ? ? ? ? will cause a reduction on
seeing a terminal in FOLLOW(A) - The state A ? ? ? B ? will go to the state A ? ?
B ? ? via an edge of nonterminal B (after a
reduction)
82An Example
Augmented grammar
Easier to identify the accepting state
1. E ? E 2. E ? E T 3. E ? T 4. T ? T
F 5. T ? F 6. F ? ( E ) 7. F ? id
83An Example
84From NPDA to DPDA
- There are two functions performed on sets of
LR(0) items (DPDA states) - The function closure(I) adds more items to I when
there is a dot to the left of a nonterminal
(corresponding to ? edges) - The function goto(I, X) moves the dot past the
symbol X in all items in I that contain X
(corresponding to non-? edges)
85The Closure Function
function closure(I) begin J I
repeat for each item A ?? ? B ? in J and
each production B ? ? of G such
that B ? ? ? is not in J do
J J ? B ? ? ? until no more items can
be added to J return J end
86An Example
s0 E ? ? E,I0 closure(s0 ) E ?
? E, E ? ? E T, E ? ? T,
T ? ? T F, T ? ? F, F ? ?
( E ), F ? ? id
1. E ? E 2. E ? E T 3. E ? T 4. T ? T
F 5. T ? F 6. F ? ( E ) 7. F ? id
87The Goto Function
function goto(I, X) begin set J to the empty
set for any item A ? ? ? X ? in I do
add A ? ? X ? ? to J return closure(J) end
88An Example
I0 E ? ? E, E ? ? E T, E ? ? T,
T ? ? T F, T ? ? F, F ? ? ( E ),
F ? ? id goto(I0 , E) closure(E ?
E ?, E ? E ? T ) E ? E ?, E ? E ? T
89Subset Construction
function items(G) begin C closure(S ?
? S) repeat for each set of items I
in C and each symbol X do J
goto(I, X) if J is not empty and not
in C then C C ? J until
no more sets of items can be added to C
return C end
90An Example
1. E ? E 2. E ? E T 3. E ? T 4. T ? T
F 5. T ? F 6. F ? ( E ) 7. F ? id
91goto(I2, ) I7 T ? T ? F F ? ? (
E ) F ? ? id goto(I4, E) I8 F ? (
E ?) E ? E ? T goto(I6, T) I9 E ?
E T ? T ? T ? F goto(I7, F) I10
T ? T F ? goto(I8, )) I11 F ? ( E ) ?
goto(I0, () I4 F ? ( ? E ) E ? ? E
T E ? ? T T ? ? T F
T ? ? F F ? ? ( E ) F ? ?
idgoto(I0, id) I5 F ? id ? goto(I1, )
I6 E ? E ? T T ? ? T F T
? ? F F ? ? ( E ) F ? ? id
I0 E ? ? E E ? ? E T E ? ? T
T ? ? T F T ? ? F F ? ?
( E ) F ? ? id goto(I0, E) I1 E ?
E ? E ? E ? Tgoto(I0, T) I2 E ?
T ? T ? T ? F goto(I0, F) I3 T ? F
?
92An Example
E ? E E ? E T E ? T T ? T F T ?
F F ? ( E ) F ? id
F ? id
F ? ( E ) E ? E T E ? T T ? T F T ?
F F ? ( E ) F ? id
id
id
F ? ( E ) E ? E T
5
8
E
(
T
)
(
F
7
id
F ? ( E )
11
(
T ? T F F ? ( E ) F ? id
0
4
E
T
F
F
2
T ? T F
10
E ? T T ? T F
E ? E T T ? T F T ? F F ? ( E ) F ?
id
(
id
E ? E T T ? T F
9
T ? F
3
F
T
E ? E E ? E T
1
6
93SLR(1) Parsing Table Generation
procedure SLR(G)begin for each state I in
items(G) do begin if A ?? ? a ? in I and
goto(I, a) J for a terminal a then
actionI, a shift J if A ?? ? in I and
A ? S then actionI, a reduce A ??
for all a in Follow(A) if S ? S ? in I
then actionI, accept if A ?? ? X ?
in I and goto(I, X) J for a nonterminal X
then gotoI, X J end all other
entries in action and goto are made errorend
94An Example
( ) id
E T F 0
s4 s5 1 2
3 1 s6
a 2 r3 s7 r3
r3 3 r5 r5 r5
r5 4 s4
s5 8 2 3 5
r7 r7 r7 r7 6
s4 s5
9 3 7 s4
s5
10 8 s6 s11 9 r2
s7 r2 r2 10
r4 r4 r4 r4 11
r6 r6 r6 r6
95??
??????,???,????
???????,??????, ???????
????????,???? -- ??
96LR(1) Items
- An LR(1) item of a grammar in G is a pair, ( A ?
? ? ?, a ), of an LR(0) item A ? ? ? ? and a
lookahead symbol a - The lookahead has no effect in an LR(1) item of
the form ( A ? ? ? ?, a ), where ? is not ? - An LR(1) item of the form ( A ? ? ? , a ) calls
for a reduction by A ? ? only if the next input
symbol is a
97The Closure Function
function closure(I) begin J I
repeat for each item (A ? ? ? B ?, a) in
J and each production B ? ? of G
and each b ? FIRST(? a) such that
(B ? ? ?, b) is not in J do
J J ? (B ? ? ?, b) until no more items
can be added to J return J end
98The Goto Function
function goto(I, X) begin set J to the empty
set for any item (A ? ? ? X ?, a) in I do
add (A ? ? X ? ? , a) to J return
closure(J) end
99Subset Construction
function items(G) begin C closure(S ?
? S, ) repeat for each set of
items I in C and each symbol X do J
goto(I, X) if J is not empty and
not in C then C C ? J
until no more sets of items can be added to C
return C end
100An Example
1. S ? S 2. S ? C C 3. C ? c C 4. C ? d
101An Example
I0 closure((S ? ? S, )) (S ? ? S, )
(S ? ? C C, ) (C ? ? c C, c/d) (C ? ?
d, c/d) I1 goto(I0, S) (S ? S ?, ) I2
goto(I0, C) (S ? C ? C, ) (C ? ? c C,
) (C ? ? d, )
I3 goto(I0, c) (C ? c ? C, c/d) (C ? ?
c C, c/d) (C ? ? d, c/d) I4 goto(I0, d)
(C ? d ?, c/d) I5 goto(I2, C) (S ? C C
?, )
102An Example
goto(I3, c) I3 goto(I3, d) I4 I9
goto(I6, C) (C ? c C ?, ) goto(I6, c)
I6 goto(I6, d) I7
I6 goto(I2, c) (C ? c ? C, ) (C ? ? c
C, ) (C ? ? d, ) I7 goto(I2, d) (C
? d ?, ) I8 goto(I3, C) (C ? c C ?, c/d)
103LR(1) Parsing Table Generation
procedure LR(G)begin for each state I in
items(G) do begin if (A ? ? ? a ?, b) in I
and goto(I, a) J for a terminal a then
actionI, a shift J if (A ? ? ?, a)
in I and A ? S then actionI, a
reduce A ?? if (S ? S ?, ) in I then
actionI, accept if (A ? ? ? X ?, a)
in I and goto(I, X) J for a nonterminal X
then gotoI, X J end all other
entries in action and goto are made errorend
104An Example
c d S C 0
s3 s4 1 2 1
a 2 s6 s7
5 3 s3 s4
8 4 r4 r4 5
r2 6 s6 s7
9 7 r4 8 r3
r3 9 r3
105The Core of LR(1) Items
- The core of a set of LR(1) Items is the set of
their first components (i.e., LR(0) items) - The core of the set of LR(1) items (C ? c ? C,
c/d), - (C ? ? c C, c/d),
- (C ? ? d, c/d) is C ? c ? C,
- C ? ? c C,
- C ? ? d
106Merging Cores
I6 (C ? c ? C, ) (C ? ? c C, )
(C ? ? d, ) I7 (C ? d ?, ) I9 (C ?
c C ?, )
I3 (C ? c ? C, c/d) (C ? ? c C, c/d)
(C ? ? d, c/d) I4 (C ? d ?, c/d) I8
(C ? c C ?, c/d)
107LALR(1) Parsing Table Generation
procedure LALR(G)begin for each state I in
mergeCore(items(G)) do begin if (A ? ? ? a
?, b) in I and goto(I, a) J for a terminal a
then actionI, a shift J if
(A ? ? ?, a) in I and A ? S then
actionI, a reduce A ?? if (S ? S ?,
) in I then actionI, accept if (A
? ? ? X ?, a) in I and goto(I, X) J for a
nonterminal X then gotoI, X J
end all other entries in action and goto are
made errorend
108An Example
c d S C 0
s36 s47 1 2 1
a 2 s36 s47
5 36 s36 s47 89
47 r4 r4 r4 5
r2 89 r3 r3 r3
109LR Grammars
- A grammar is SLR(1) iff its SLR(1) parsing table
has no multiply-defined entries - A grammar is LR(1) iff its LR(1) parsing table
has no multiply-defined entries - A grammar is LALR(1) iff its LALR(1) parsing
table has no multiply-defined entries
110Hierarchy of Grammar Classes
Unambiguous Grammars Ambiguous Grammars
LL(k) LR(k) LR(1) LALR(1)
LL(1) SLR(1)
111Hierarchy of Grammar Classes
- Why LL(k) ? LR(k)?
- Why SLR(k) ? LALR(k) ? LR(k)?
112LL(k) vs. LR(k)
- For a grammar to be LL(k), we must be able to
recognize the use of a production by seeing only
the first k symbols of what its right-hand side
derives - For a grammar to be LR(k), we must be able to
recognize the use of a production by having seen
all of what is derived from its right-hand side
with k more symbols of lookahead
113LALR(k) vs. LR(k)
- The merge of the sets of LR(1) items having the
same core does not introduce shift/reduce
conflicts - Suppose there is a shift-reduce conflict on
lookahead a in the merged set because of 1. (A ?
? ? , a) 2. (B ? ? ? a ?, b) - Then some set of items has item (A ? ? ? , a) ,
and since the cores of all sets merged are the
same, it must have an item (B ? ? ? a ?, c) for
some c - But then this set has the same shift/reduce
conflict on a
114LALR(k) vs. LR(k)
- The merge of the sets of LR(1) items having the
same core may introduce reduce/reduce conflicts - As an example, consider the grammar 1. S ? S
2. S ? a A d a B e b A e b B d 3. A ?
c 4. B ? cthat generates acd, ace, bce, bcd - The set (A ? c ?, d), (B ? c ?, e) is valid
for acx - The set (A ? c ?, e), (B ? c ?, d) is valid
for bcx - But the union (A ? c ?, d/e), (B ? c ?, d/e)
generates a reduce/reduce conflict
115SLR(k) vs. LALR(k)
1. S ? S 2. S ? L R3. S ? R 4. L ?
R 5. L ? id6. R ? L
116SLR(k) vs. LALR(k)
I0 closure(S ? ? S) S ? ? S S ? ?
L R S ? ? R L ? ? R L ? ? id
R ? ? L I1 goto(I0, S) S ? S ? I2 goto(I0,
L) S ? L ? R R ? L ?
I3 goto(I0, R) S ? R ? I4 goto(I0,
) L ? ? R R ? ? L L ? ? R
L ? ? id I5 goto(I0, id) L ? id ?
FOLLOW(R) ,
117SLR(k) vs. LALR(k)
I6 goto(I2, ) S ? L ? R R ? ? L
L ? ? R L ? ? id I7 goto(I4, R) L ?
R ?
I8 goto(I4, L) R ? L ? I9 goto(I6, R)
S ? L R ?
118SLR(k) vs. LALR(k)
I0 closure((S ? ? S, )) (S ? ? S, )
(S ? ? L R, ) (S ? ? R, ) (L ? ?
R, /) (L ? ? id, /) (R ? ? L, ) I1
goto(I0, S) (S ? S ?, ) I2 goto(I0, L)
(S ? L ? R, ) (R ? L ?, )
I3 goto(I0, R) (S ? R ?, ) I4
goto(I0, ) (L ? ? R, /) (R ? ? L,
/) (L ? ? R, /) (L ? ? id,
/) I5 goto(I0, id) (L ? id ?, /)
119SLR(k) vs. LALR(k)
I6 goto(I2, ) (S ? L ? R, ) (R ? ?
L, ) (L ? ? R, ) (L ? ? id, ) I7
goto(I4, R) (L ? R ?, /) I8 goto(I4,
L) (R ? L ?, /) I9 goto(I6, R) (S
? L R ?, )
I10 goto(I6, L) (R ? L ?, ) I11
goto(I6, ) (L ? ? R, ) (R ? ? L,
) (L ? ? R, ) (L ? ? id, ) I12
goto(I6, id) (L ? id ?, ) I13 goto(I11,
R) (L ? R ?, )
120Bison A Parser Generator
A langauge for specifying parsers and semantic
analyzers
lang.tab.clang.tab.h (-d option)
Bison compiler
lang.y
C compiler
a.out
lang.tab.c
a.out
syntax tree
tokens
121Bison Programs
C declarations Bison declarations Grammar
rules Additional C code
122An Example
line ? expr \n expr ? expr term
term term ? term factor factor factor ?
( expr ) DIGIT
123An Example
token DIGIT start line line expr \n
printf(line expr \\n\n) expr
expr term printf(expr expr term\n)
term printf(expr term\n
term term factor printf(term term
factor\n factor printf(term
factor\n) factor ( expr )
printf(factor ( expr )\n)
DIGIT printf(factor DIGIT\n)
124Functions and Variables
- yyparse() the parser function
- yylex() the lexical analyzer function. Bison
recognizes any non-positive value as indicating
the end of the input - yylval the attribute value of a token. Its
default type is int, and can be declared to be
multiple types in the first section using union
int ival double dval
125Conflict Resolutions
- A reduce/reduce conflict is resolved by choosing
the production listed first - A shift/reduce conflict is resolved in favor of
shift - A mechanism for assigning precedences and
assocoativities to terminals
126Precedence and Associativity
- The precedence and associativity of operators are
declared simultaneously nonassoc lt
/ lowest / left -
right
/ highest / - The precedence of a rule is determined by the
precedence of its rightmost terminal - The precedence of a rule can be modified by
adding prec ltterminalgt to its right end
127An Example
include ltstdio.hgt token NUMBER left
- left / right UMINUS
128An Example
line expr \n expr expr
expr expr - expr expr
expr expr / expr -
expr prec UMINUS ( expr )
NUMBER
129Error Recovery
- Error recovery is performed via error productions
- An error production is a production containing
the predefined terminal error - After adding an error production, A ? ? B ? ?
error ?on encountering an error in the middle of
B, the parser pops symbols from its stack until
?, shifts error, and skips input tokens until a
token in FIRST(?)
130Error Recovery
- The parser can report a syntax error by calling
the user provided function yyerror(char ) - The parser will suppress the report of another
error message for 3 tokens - You can resume error report immediately by using
the macro yyerrok - Error productions are used for major nonterminals
131An Example
line expr \n error \n
yyerror("reenter last line")
yyerrok expr expr expr
expr expr - expr prec
UMINUS ( expr ) NUMBER
132??
????????????,????
????????????, ???????????,?????
??????,????! -- ??