Syntax Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Syntax Analysis

Description:

A set of terminals: basic symbols from which sentences are formed ... L1 = {wcw | w is in (a | b)*} L2 = {anbmcndm | n 1 and m 1} L3 = {anbncn | n 0} 23 ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 133
Provided by: csCc1
Category:
Tags: analysis | syntax | wcw

less

Transcript and Presenter's Notes

Title: Syntax Analysis


1
Syntax Analysis
2
Syntax Analysis
  • Introduction to parsers
  • Context-free grammars
  • Push-down automata
  • Top-down parsing
  • Buttom-up parsing
  • Bison - a parser generator

3
Introduction to parsers
token
source
syntax
Parser
tree
code
next token
Symbol Table
4
Context-Free Grammars
  • A set of terminals basic symbols from which
    sentences are formed
  • A set of nonterminals syntactic categories
    denoting sets of sentences
  • A set of productions rules specifying how the
    terminals and nonterminals can be combined to
    form sentences
  • The start symbol a distinguished nonterminal
    denoting the language

5
An Example
  • Terminals id, , -, , /, (, )
  • Nonterminals expr, op
  • Productions expr ? expr op expr
    expr ? ( expr ) expr ? - expr expr
    ? id op ? - /
  • The start symbol expr

6
Derivations
  • A derivation step is an application of a
    production as a rewriting rule E ? - E
  • A sequence of derivation steps E ? - E ? - ( E )
    ? - ( id ) is called a derivation of - ( id )
    from E
  • The symbol ? denotes derives in zero or more
    steps the symbol ? denotes derives in one or
    more steps E ? - ( id ) E ? - ( id )

7
Context-Free Languages
  • A context-free language L(G) is the language
    defined by a context-free grammar G
  • A string of terminals ? is in L(G) if and only if
    S ? ?, ? is called a sentence of G
  • If S ? ?, where ? may contain nonterminals, then
    we call ? a sentential form of G E ? - E ? - (
    E ) ? - ( id )
  • G1 is equivalent to G2 if L(G1) L(G2)

8
Left- Right-most Derivations
  • Each derivation step needs to choose
  • a nonterminal to rewrite
  • a production to apply
  • A leftmost derivation always chooses the leftmost
    nonterminal to rewrite E ?lm - E ?lm - ( E ) ?lm
    - ( E E ) ?lm - ( id E ) ?lm - ( id
    id )
  • A rightmost derivation always chooses the
    rightmost nonterminal to rewrite E ?rm - E ?rm -
    ( E ) ?rm - ( E E ) ?rm - (E id )
    ?rm - ( id id )

9
Parse Trees
  • A parse tree is a graphical representation for a
    derivation that filters out the order of choosing
    nonterminals for rewriting
  • Many derivations may correspond to the same parse
    tree, but every parse tree has associated with it
    a unique leftmost and a unique rightmost
    derivation

10
An Example
E ?lm - E ?lm - ( E ) ?lm - ( E E
)?lm - ( id E ) ?lm - ( id id )
E ?rm - E ?rm - ( E ) ?rm - ( E E
)?rm - ( E id ) ?rm - ( id id )
11
Ambiguous Grammar
  • A grammar is ambiguous if it produces more than
    one parse tree for some sentence

E ? E E ? id E ? id E E ? id
id E ? id id id
E ? E E ? E E E ? id E E ? id
id E ? id id id
12
Ambiguous Grammar
13
Resolving Ambiguity
  • Use disambiguiting rules to throw away
    undesirable parse trees
  • Rewrite grammars by incorporating disambiguiting
    rules into grammars

14
An Example
  • The dangling-else grammar stmt ? if expr then
    stmt if expr then stmt else
    stmt other
  • Two parse trees for if E1 then if E2 then S1
    else S2

15
An Example
16
Disambiguiting Rules
  • Rule match each else with the closest previous
    unmatched then
  • Remove undesired state transitions in the
    pushdown automaton

17
Grammar Rewriting
stmt ? m_stmt unm_stmt
m_stmt ? if expr then m_stmt else m_stmt
other unm_stmt ? if expr then stmt
if expr then m_stmt else unm_stmt
18
RE vs. CFG
  • Every language described by a RE can also be
    described by a CFG
  • Why use REs for lexical syntax?
  • do not need a notation as powerful as CFGs
  • are more concise and easier to understand than
    CFGs
  • More efficient lexical analyzers can be
    constructed from REs than from CFGs
  • Provide a way for modularizing the front end into
    two manageable-sized components

19
Push-Down Automata
Input

Finite Automaton
Stack
Output

20
An Example
S ? S S ? a S bS ? ?
21
Nonregular Constructs
  • REs can denote only a fixed number of repetitions
    or an unspecified number of repetitions of one
    given construct an, a
  • A nonregular construct
  • L anbn n ? 0

22
Non-Context-Free Constructs
  • CFGs can denote only a fixed number of
    repetitions or an unspecified number of
    repetitions of one or two given constructs
  • Some non-context-free constructs
  • L1 wcw w is in (a b)
  • L2 anbmcndm n ? 1 and m ? 1
  • L3 anbncn n ? 0

23
??
????? ????,???,?????? -- ??
24
Top-Down Parsing
  • Construct a parse tree from the root to the
    leaves using leftmost derivation 1. S ? c A
    B input cad 2. A ? a b 3. A ? a 4. B ? d

S
25
Predictive Parsing
  • A top-down parsing without backtracking
  • there is only one alternative production to
    choose at each derivation stepstmt ? if expr
    then stmt else stmt while expr do
    stmt begin stmt_list end

26
LL(k) Parsing
  • The first L stands for scanning the input from
    left to right
  • The second L stands for producing a leftmost
    derivation
  • The k stands for the number of lookahead input
    symbols used to choose alternative productions at
    each derivation step

27
LL(1) Parsing
  • Use one input symbol of lookahead
  • Recursive-descent parsing
  • Nonrecursive predictive parsing

28
An Example
LL(1) S ? a b e c d e LL(2) S ? a b e a
d e
29
Recursive Descent Parsing
  • The parser consists of a set of (possibly
    recursive) procedures
  • Each procedure is associated with a nonterminal
    of the grammar that is responsible to derive the
    productions of that nonterminal
  • Each procedure should be able to choose a unique
    production to derive based on the current token

30
An Example
type ? simple id
array simple of type simple ?
integer char
num dotdot num
integer, char, num
31
Recursive Descent Parsing
  • For each terminal in the production, the terminal
    is matched with the current token
  • For each nonterminal in the production, the
    procedure associated with the nonterminal is
    called
  • The sequence of matchings and procedure calls in
    processing the input implicitly defines a parse
    tree for the input

32
An Example
array num dotdot num of integer
type
33
An Example
procedure match(t terminal) begin if
lookahead t then lookahead
nexttoken else error end
34
An Example
procedure type begin if lookahead is in
integer, char, num then simple else if
lookahead id then match(id) else if
lookahead array then begin
match(array) match('') simple match('')
match(of) type end else error end
35
An Example
procedure simple begin if lookahead integer
then match(integer) else if lookahead
char then match(char) else if lookahead
num then begin match(num) match(dotdot)
match(num) end else error end
36
First Sets
  • The first set of a string ? is the set of
    terminals that begin the strings derived from?.
    If ? ? ? , then ? is also in the first set of
    ?.

37
First Sets
  • If X is terminal, then FIRST(X) is X
  • If X is nonterminal and X ? ? is a production,
    then add ? to FIRST(X)
  • If X is nonterminal and X ? Y1 Y2 ... Yk is a
    production, then add a to FIRST(X) if for some
    i, a is in FIRST(Yi) and ? is in all of
    FIRST(Y1), ..., FIRST(Yi-1). If ? is in FIRST(Yj)
    for all j, then add ? to FIRST(X)

38
An Example
E ? T E' E' ? T E' ? T ? F
T' T' ? F T' ? F ? ( E )
id FIRST(F) (, id FIRST(T') , ? ,
FIRST(T) (, id FIRST(E') , ?
, FIRST(E) (, id
39
Follow Sets
  • The follow set of a nonterminal A is the set of
    terminals that can appear immediately to the
    right of A in some sentential form, namely, S
    ? ? A a ? a is in the follow set of A.

40
Follow Sets
  • Place in FOLLOW(S), where S is the start symbol
    and is the input right endmarker
  • If there is a production A ? ? B? , then
    everything in FIRST(?) except for ? is placed in
    FOLLOW(B)
  • If there is a production A ? ? B or A ? ? B?
    where FIRST(?) contains ? , then everything in
    FOLLOW(A) is in FOLLOW(B)

41
An Example
E ? T E' E' ? T E' ? T ? F
T' T' ? F T' ? F ? ( E )
id FIRST(E) FIRST(T) FIRST(F) (, id
FIRST(E') , ? , FIRST(T') , ?
FOLLOW(E) ), , FOLLOW(E') ),
FOLLOW(T) , ), , FOLLOW(T') , ),
FOLLOW(F) , , ),
42
Nonrecursive Predictive Parsing
Input
Parsing driver
Output
Stack
Parsing table
43
Stack Operations
  • Match
  • when the top stack symbol is a terminal and it
    matches the input token, pop the terminal and
    advance the input pointer
  • Expand
  • when the top stack symbol is a nonterminal,
    replace this symbol by the right hand side of one
    of its productions (pop the nonterminal and push
    the right hand side of a production in reverse
    order)

44
An Example
type ? simple id
array simple of type simple ?
integer char
num dotdot num
45
An Example
Action Stack Input E type
array num dotdot num
of integer M type of simple array
array num dotdot num of integer M type of
simple num dotdot num
of integer E type of simple
num dotdot num of integer M
type of num dotdot num num dotdot num
of integer M type of num dotdot
dotdot num of integer M
type of num
num of integer M type of

of integer M type of

of integer E type

integer E simple

integer M integer

integer
46
Parsing Driver
push S onto the stack, where S is the start
symbol set ip to point to the first symbol of
w repeat let X be the top stack symbol and a
the symbol pointed to by ip if X is a
terminal or then if X a then
pop X from the stack and advance ip else
error else / X is a nonterminal / if
MX, a X ? Y1 Y2 ... Yk then pop X
from the stack and push Yk ... Y2 Y1 onto the
stack else error until X and a
47
Constructing Parsing Table
  • Input. Grammar G.
  • Output. Parsing Table M.
  • Method.
  • For each production A ? ?, do steps 2 and 3.
  • 2. For each terminal a in FIRST(? ), add A ? ? to
    MA, a.
  • 3. If ? is in FIRST(? ), add A ? ? to MA, b
    for each
  • symbol b in FOLLOW(A).
  • 4. Make each undefined entry of M be error.

48
An Example
FIRST(E) FIRST(T) FIRST(F) (, id
FIRST(E') , ? , FIRST(T') , ?
FOLLOW(E) ), , FOLLOW(E') ),
FOLLOW(T) , ), , FOLLOW(T') , ),
FOLLOW(F) , , ),
49
An Example
Stack Input
Output E id id id
E'T id id id
E ? TE' E'T'F id id id
T ? FT' E'T'id id id id
F ? id E'T' id id E'
id id T' ?
? E'T id id
E' ? TE' E'T id
id E'T'F id id
T ? FT' E'T'id id id
F ? id E'T'
id E'T'F id
T' ? FT' E'T'F
id E'T'id id
F ? id E'T'
E'
T' ? ?
E' ? ?
50
LL(1) Grammars
  • A grammar is an LL(1) grammar if its LL(1)
    parsing table has no multiply-defined entries

51
A Counter Example
S ? i E t S S' a FOLLOW(S) , e S' ? e S
? FOLLOW(S') , e E ? b FOLLOW(E) t
a b e i
t S S ? a
S ? i E t S S' S' S' ? e S
S' ? ?
S' ? ? E E ? b
52
LL(1) Grammars
  • A grammar G is LL(1) iff whenever A ? ? ? are
    two distinct productions of G, the following
    conditions hold
  • For no terminal a do both ? and ? derive strings
    beginning with a.
  • At most one of ? and ? can derive the empty
    string.
  • If ? ? ? , then ? does not derive any string
    beginning with a terminal in FOLLOW(A).

53
Left Recursion
  • A grammar is left recursive if it has a
    nonterminal A such that A ? A ?

A ? A ? ?
A ? ? R R ? ? R ?
A
A
R
R
A
R
R
A
A
54
Direct Left Recursion
A ? A ?1 A ?2 ... A ?m ?1 ?2 ... ?n
A ? ?1 A' ?2 A' ... ?n A'
A' ? ?1 A' ?2 A' ... ?m A' ?
55
An Example
E ? E T T T ? T F F F ? ( E
) id E ? T E' E' ? T E' ? T ? F
T' T' ? F T' ? F ? ( E ) id
56
Indirect Left Recursion
S ? A a b A ? A c S d ? S ? A a ? S
d a A ? A c A a d b d ? S ? A a b A
? b d A' A' A' ? c A' a d A' ?
57
Indirect Left Recursion
Input. Grammar G with no cycles (derivations of
the form A ? A) or ?-production (productions
of the form A ? ?). Output. An equivalent grammar
with no left recursion. 1. Arrange the
nonterminals in some order A1, A2, ..., An 2. for
i 1 to n do begin for j 1 to i - 1 do
begin replace each production of the form
Ai ? Aj ? by the production Ai ? ?1 ? ?2 ?
... ?k ? where Aj ? ?1 ?2 ... ?k are
all the current Aj-productions end eliminate
direct left recursion among Ai-productions end
58
Left Factoring
  • Two alternatives of a nonterminal A have a
    nontrivial common prefix if ? ? ? , and A ? ?
    ?1 ? ?2 A ? ? A' A' ? ?1 ?2

59
An Example
S ? i E t S i E t S e S a E ? b S ? i E t
S S' a S' ? e S ? E ? b
60
Error Recovery
  • Panic mode skip tokens until a token in a set of
    synchronizing tokens appears
  • If a terminal on stack cannot be matched, pop the
    terminal
  • use FOLLOW(A) as sync set for A (pop A)
  • use the first set of a higher construct as sync
    set for A
  • use FIRST(A) as sync set for A
  • use the production deriving ? as the default for A

61
An Example
E ? T E' E' ? T E' ? T ? F T' T' ?
F T' ? F ? ( E ) id FIRST(E) FIRST(T)
FIRST(F) (, id FIRST(E') , ?
FIRST(T') , ? FOLLOW(E) FOLLOW(E')
), FOLLOW(T) FOLLOW(T') , ),
FOLLOW(F) , , ),
62
An Example
63
An Example
Stack Input
Output E ) id id
error, skip ) E id id
E'T id id
E ? TE' E'T'F id id
T ? FT' E'T'id id
id F ? id E'T'
id E'T'F id
T' ? FT' E'T'F id
error E'T'
id F has been poped E'
id E'T id
E' ? TE' E'T
id E'T'F id
T ? FT' E'T'id id
F ? id E'T'
E'
T' ? ?
E' ? ?
64
??
?????
??????
???????,????, ??,????? -- ??
??????????? -- ????
65
Bottom-Up Parsing
  • Construct a parse tree from the leaves to the
    root using rightmost derivation in reverse S ?
    a A B e input abbcde A ? A b c b B ? d

abbcde ?rm aAbcde ? rm aAde ? rm aABe
? rm S
66
Handles
  • A handle ? of a right-sentential form ? consists
    of
  • a production A ? ?
  • a position of ? where ? can be replaced by A to
    produce the previous right-sentential form in a
    rightmost derivation of ?

67
Handle Pruning
  • The string ? to the right of the handle contains
    only terminals
  • A is the bottommost leftmost interior node with
    all its children in the tree

68
An Example
S
S
69
Shift-Reduce Parsing

Input
Handle
Parsing driver
Output
Parsing table

Stack
70
Stack Operations
  • Shift shift the next input symbol onto the top
    of the stack
  • Reduce replace the handle at the top of the
    stack with the corresponding nonterminal
  • Accept announce successful completion of the
    parsing
  • Error call an error recovery routine

71
An Example
Action Stack Input S
a b b c d e S
a b b c
d e R a b
b c d e S a A
b c d e S
a A b c d e
R a A b c
d e S a A
d e R
a A d e
S a A B
e R a A B e
A
S

72
Shift/Reduce Conflict
stmt ? if expr then stmt if expr
then stmt else stmt other
Stack
Input - - - if expr then stmt
else - - -
Shift ? if expr then stmt else stmt Reduce ?
if expr then stmt
73
Reduce/Reduce Conflict
stmt ? id ( para_list ) stmt ? expr expr
para_list ? para_list , para para_list ?
para para ? id expr ? id ( expr_list ) expr
? id expr_list ? expr_list , expr expr_list ?
expr
Stack
Input - - - id ( id , id
) - - -
- - - procid ( id , id ) - - -
74
LR(k) Parsing
  • The L stands for scanning the input from left to
    right
  • The R stands for constructing a rightmost
    derivation in reverse
  • The k stands for the number of lookahead input
    symbols used to make parsing decisions

75
LR Parsing
  • The LR parsing algorithm
  • Constructing SLR(1) parsing tables
  • Constructing LR(1) parsing tables
  • Constructing LALR(1) parsing tables

76
Model of an LR Parser

Input
Stack
Sm
Output
Parsing driver
Xm
Sm-1
Xm-1
Action
Goto
Parsing table
S0

77
An Example
State Action
Goto id ( )
E T F 0 s5 s4
1 2 3 1 s6
acc 2 r2 s7
r2 r2 3 r4 r4 r4 r4 4
s5 s4 8
2 3 5 r6 r6 r6
r6 6 s5 s4
9 3 7 s5 s4
10 8 s6
s11 9 r1 s7 r1
r1 10 r3 r3 r3 r3 11
r5 r5 r5 r5
(1) E ? E T (2) E ? T (3) T ? T F
(4) T ? F (5) F ? ( E ) (6) F ? id
78
An Example
Action Stack Input s5 0 id
id id r6 0 id5 id id r4 0
F3 id id r2 0 T2 id
id s6 0 E1 id id s5 0 E1 6
id id r6 0 E1 6 id5
id r4 0 E1 6 F3 id s7
0 E1 6 T9 id s5 0 E1 6
T9 7 id r6 0 E1 6 T9 7 id5
r3 0 E1 6 T9 7 F10 r1
0 E1 6 T9 acc 0 E1

79
LR Parsing Driver
push s0 onto the stack, where s0 is the initial
state set ip to point to the first symbol of
w repeat let s be the top state on the stack
and a the symbol pointed to by ip if
actions, a shift s then push a and s
onto the stack and advance ip else if
actions, a reduce A ? ? then pop 2
? symbols off the stack s gototop(), A
push a and s onto the stack and advance ip
else if actions, a accept then
return else error until false
80
LR(0) Items
  • An LR(0) item of a grammar in G is a production
    of G with a dot at some position of the
    right-hand side, A ? ? ? ?
  • The production A ? X Y Z yields the following
    four LR(0) items A ? X Y Z, A ? X Y Z,
    A ? X Y Z, A ? X Y Z
  • An LR(0) item represents a state in an NPDA
    indicating how much of a production we have seen
    at a given point in the parsing process

81
From CFG to NPDA
  • The state A ? ? ? B? will go to the state B ? ? ?
    via an edge of the empty string ?
  • The state A ? ? ? a ? will go to the state A ? ?
    a ? ? via an edge of terminal a (a shifting)
  • The state A ? ? ? ? will cause a reduction on
    seeing a terminal in FOLLOW(A)
  • The state A ? ? ? B ? will go to the state A ? ?
    B ? ? via an edge of nonterminal B (after a
    reduction)

82
An Example
Augmented grammar
Easier to identify the accepting state
1. E ? E 2. E ? E T 3. E ? T 4. T ? T
F 5. T ? F 6. F ? ( E ) 7. F ? id
83
An Example
84
From NPDA to DPDA
  • There are two functions performed on sets of
    LR(0) items (DPDA states)
  • The function closure(I) adds more items to I when
    there is a dot to the left of a nonterminal
    (corresponding to ? edges)
  • The function goto(I, X) moves the dot past the
    symbol X in all items in I that contain X
    (corresponding to non-? edges)

85
The Closure Function
function closure(I) begin J I
repeat for each item A ?? ? B ? in J and
each production B ? ? of G such
that B ? ? ? is not in J do
J J ? B ? ? ? until no more items can
be added to J return J end
86
An Example
s0 E ? ? E,I0 closure(s0 ) E ?
? E, E ? ? E T, E ? ? T,
T ? ? T F, T ? ? F, F ? ?
( E ), F ? ? id
1. E ? E 2. E ? E T 3. E ? T 4. T ? T
F 5. T ? F 6. F ? ( E ) 7. F ? id
87
The Goto Function
function goto(I, X) begin set J to the empty
set for any item A ? ? ? X ? in I do
add A ? ? X ? ? to J return closure(J) end
88
An Example
I0 E ? ? E, E ? ? E T, E ? ? T,
T ? ? T F, T ? ? F, F ? ? ( E ),
F ? ? id goto(I0 , E) closure(E ?
E ?, E ? E ? T ) E ? E ?, E ? E ? T

89
Subset Construction
function items(G) begin C closure(S ?
? S) repeat for each set of items I
in C and each symbol X do J
goto(I, X) if J is not empty and not
in C then C C ? J until
no more sets of items can be added to C
return C end
90
An Example
1. E ? E 2. E ? E T 3. E ? T 4. T ? T
F 5. T ? F 6. F ? ( E ) 7. F ? id
91
goto(I2, ) I7 T ? T ? F F ? ? (
E ) F ? ? id goto(I4, E) I8 F ? (
E ?) E ? E ? T goto(I6, T) I9 E ?
E T ? T ? T ? F goto(I7, F) I10
T ? T F ? goto(I8, )) I11 F ? ( E ) ?
goto(I0, () I4 F ? ( ? E ) E ? ? E
T E ? ? T T ? ? T F
T ? ? F F ? ? ( E ) F ? ?
idgoto(I0, id) I5 F ? id ? goto(I1, )
I6 E ? E ? T T ? ? T F T
? ? F F ? ? ( E ) F ? ? id
I0 E ? ? E E ? ? E T E ? ? T
T ? ? T F T ? ? F F ? ?
( E ) F ? ? id goto(I0, E) I1 E ?
E ? E ? E ? Tgoto(I0, T) I2 E ?
T ? T ? T ? F goto(I0, F) I3 T ? F
?
92
An Example
E ? E E ? E T E ? T T ? T F T ?
F F ? ( E ) F ? id
F ? id
F ? ( E ) E ? E T E ? T T ? T F T ?
F F ? ( E ) F ? id
id
id
F ? ( E ) E ? E T
5
8
E

(
T
)
(
F
7
id
F ? ( E )
11
(
T ? T F F ? ( E ) F ? id
0
4
E
T
F
F
2
T ? T F
10

E ? T T ? T F
E ? E T T ? T F T ? F F ? ( E ) F ?
id
(

id
E ? E T T ? T F
9
T ? F
3
F
T

E ? E E ? E T
1
6
93
SLR(1) Parsing Table Generation
procedure SLR(G)begin for each state I in
items(G) do begin if A ?? ? a ? in I and
goto(I, a) J for a terminal a then
actionI, a shift J if A ?? ? in I and
A ? S then actionI, a reduce A ??
for all a in Follow(A) if S ? S ? in I
then actionI, accept if A ?? ? X ?
in I and goto(I, X) J for a nonterminal X
then gotoI, X J end all other
entries in action and goto are made errorend
94
An Example
( ) id
E T F 0
s4 s5 1 2
3 1 s6
a 2 r3 s7 r3
r3 3 r5 r5 r5
r5 4 s4
s5 8 2 3 5
r7 r7 r7 r7 6
s4 s5
9 3 7 s4
s5
10 8 s6 s11 9 r2
s7 r2 r2 10
r4 r4 r4 r4 11
r6 r6 r6 r6

95
??
??????,???,????
???????,??????, ???????
????????,???? -- ??
96
LR(1) Items
  • An LR(1) item of a grammar in G is a pair, ( A ?
    ? ? ?, a ), of an LR(0) item A ? ? ? ? and a
    lookahead symbol a
  • The lookahead has no effect in an LR(1) item of
    the form ( A ? ? ? ?, a ), where ? is not ?
  • An LR(1) item of the form ( A ? ? ? , a ) calls
    for a reduction by A ? ? only if the next input
    symbol is a

97
The Closure Function
function closure(I) begin J I
repeat for each item (A ? ? ? B ?, a) in
J and each production B ? ? of G
and each b ? FIRST(? a) such that
(B ? ? ?, b) is not in J do
J J ? (B ? ? ?, b) until no more items
can be added to J return J end
98
The Goto Function
function goto(I, X) begin set J to the empty
set for any item (A ? ? ? X ?, a) in I do
add (A ? ? X ? ? , a) to J return
closure(J) end
99
Subset Construction
function items(G) begin C closure(S ?
? S, ) repeat for each set of
items I in C and each symbol X do J
goto(I, X) if J is not empty and
not in C then C C ? J
until no more sets of items can be added to C
return C end
100
An Example
1. S ? S 2. S ? C C 3. C ? c C 4. C ? d
101
An Example
I0 closure((S ? ? S, )) (S ? ? S, )
(S ? ? C C, ) (C ? ? c C, c/d) (C ? ?
d, c/d) I1 goto(I0, S) (S ? S ?, ) I2
goto(I0, C) (S ? C ? C, ) (C ? ? c C,
) (C ? ? d, )
I3 goto(I0, c) (C ? c ? C, c/d) (C ? ?
c C, c/d) (C ? ? d, c/d) I4 goto(I0, d)
(C ? d ?, c/d) I5 goto(I2, C) (S ? C C
?, )
102
An Example
goto(I3, c) I3 goto(I3, d) I4 I9
goto(I6, C) (C ? c C ?, ) goto(I6, c)
I6 goto(I6, d) I7
I6 goto(I2, c) (C ? c ? C, ) (C ? ? c
C, ) (C ? ? d, ) I7 goto(I2, d) (C
? d ?, ) I8 goto(I3, C) (C ? c C ?, c/d)
103
LR(1) Parsing Table Generation
procedure LR(G)begin for each state I in
items(G) do begin if (A ? ? ? a ?, b) in I
and goto(I, a) J for a terminal a then
actionI, a shift J if (A ? ? ?, a)
in I and A ? S then actionI, a
reduce A ?? if (S ? S ?, ) in I then
actionI, accept if (A ? ? ? X ?, a)
in I and goto(I, X) J for a nonterminal X
then gotoI, X J end all other
entries in action and goto are made errorend
104
An Example
c d S C 0
s3 s4 1 2 1
a 2 s6 s7
5 3 s3 s4
8 4 r4 r4 5
r2 6 s6 s7
9 7 r4 8 r3
r3 9 r3
105
The Core of LR(1) Items
  • The core of a set of LR(1) Items is the set of
    their first components (i.e., LR(0) items)
  • The core of the set of LR(1) items (C ? c ? C,
    c/d),
  • (C ? ? c C, c/d),
  • (C ? ? d, c/d) is C ? c ? C,
  • C ? ? c C,
  • C ? ? d

106
Merging Cores
I6 (C ? c ? C, ) (C ? ? c C, )
(C ? ? d, ) I7 (C ? d ?, ) I9 (C ?
c C ?, )
I3 (C ? c ? C, c/d) (C ? ? c C, c/d)
(C ? ? d, c/d) I4 (C ? d ?, c/d) I8
(C ? c C ?, c/d)
107
LALR(1) Parsing Table Generation
procedure LALR(G)begin for each state I in
mergeCore(items(G)) do begin if (A ? ? ? a
?, b) in I and goto(I, a) J for a terminal a
then actionI, a shift J if
(A ? ? ?, a) in I and A ? S then
actionI, a reduce A ?? if (S ? S ?,
) in I then actionI, accept if (A
? ? ? X ?, a) in I and goto(I, X) J for a
nonterminal X then gotoI, X J
end all other entries in action and goto are
made errorend
108
An Example
c d S C 0
s36 s47 1 2 1
a 2 s36 s47
5 36 s36 s47 89
47 r4 r4 r4 5
r2 89 r3 r3 r3
109
LR Grammars
  • A grammar is SLR(1) iff its SLR(1) parsing table
    has no multiply-defined entries
  • A grammar is LR(1) iff its LR(1) parsing table
    has no multiply-defined entries
  • A grammar is LALR(1) iff its LALR(1) parsing
    table has no multiply-defined entries

110
Hierarchy of Grammar Classes
Unambiguous Grammars Ambiguous Grammars
LL(k) LR(k) LR(1) LALR(1)
LL(1) SLR(1)
111
Hierarchy of Grammar Classes
  • Why LL(k) ? LR(k)?
  • Why SLR(k) ? LALR(k) ? LR(k)?

112
LL(k) vs. LR(k)
  • For a grammar to be LL(k), we must be able to
    recognize the use of a production by seeing only
    the first k symbols of what its right-hand side
    derives
  • For a grammar to be LR(k), we must be able to
    recognize the use of a production by having seen
    all of what is derived from its right-hand side
    with k more symbols of lookahead

113
LALR(k) vs. LR(k)
  • The merge of the sets of LR(1) items having the
    same core does not introduce shift/reduce
    conflicts
  • Suppose there is a shift-reduce conflict on
    lookahead a in the merged set because of 1. (A ?
    ? ? , a) 2. (B ? ? ? a ?, b)
  • Then some set of items has item (A ? ? ? , a) ,
    and since the cores of all sets merged are the
    same, it must have an item (B ? ? ? a ?, c) for
    some c
  • But then this set has the same shift/reduce
    conflict on a

114
LALR(k) vs. LR(k)
  • The merge of the sets of LR(1) items having the
    same core may introduce reduce/reduce conflicts
  • As an example, consider the grammar 1. S ? S
    2. S ? a A d a B e b A e b B d 3. A ?
    c 4. B ? cthat generates acd, ace, bce, bcd
  • The set (A ? c ?, d), (B ? c ?, e) is valid
    for acx
  • The set (A ? c ?, e), (B ? c ?, d) is valid
    for bcx
  • But the union (A ? c ?, d/e), (B ? c ?, d/e)
    generates a reduce/reduce conflict

115
SLR(k) vs. LALR(k)
1. S ? S 2. S ? L R3. S ? R 4. L ?
R 5. L ? id6. R ? L
116
SLR(k) vs. LALR(k)
I0 closure(S ? ? S) S ? ? S S ? ?
L R S ? ? R L ? ? R L ? ? id
R ? ? L I1 goto(I0, S) S ? S ? I2 goto(I0,
L) S ? L ? R R ? L ?
I3 goto(I0, R) S ? R ? I4 goto(I0,
) L ? ? R R ? ? L L ? ? R
L ? ? id I5 goto(I0, id) L ? id ?
FOLLOW(R) ,
117
SLR(k) vs. LALR(k)
I6 goto(I2, ) S ? L ? R R ? ? L
L ? ? R L ? ? id I7 goto(I4, R) L ?
R ?
I8 goto(I4, L) R ? L ? I9 goto(I6, R)
S ? L R ?
118
SLR(k) vs. LALR(k)
I0 closure((S ? ? S, )) (S ? ? S, )
(S ? ? L R, ) (S ? ? R, ) (L ? ?
R, /) (L ? ? id, /) (R ? ? L, ) I1
goto(I0, S) (S ? S ?, ) I2 goto(I0, L)
(S ? L ? R, ) (R ? L ?, )
I3 goto(I0, R) (S ? R ?, ) I4
goto(I0, ) (L ? ? R, /) (R ? ? L,
/) (L ? ? R, /) (L ? ? id,
/) I5 goto(I0, id) (L ? id ?, /)
119
SLR(k) vs. LALR(k)
I6 goto(I2, ) (S ? L ? R, ) (R ? ?
L, ) (L ? ? R, ) (L ? ? id, ) I7
goto(I4, R) (L ? R ?, /) I8 goto(I4,
L) (R ? L ?, /) I9 goto(I6, R) (S
? L R ?, )
I10 goto(I6, L) (R ? L ?, ) I11
goto(I6, ) (L ? ? R, ) (R ? ? L,
) (L ? ? R, ) (L ? ? id, ) I12
goto(I6, id) (L ? id ?, ) I13 goto(I11,
R) (L ? R ?, )
120
Bison A Parser Generator
A langauge for specifying parsers and semantic
analyzers
lang.tab.clang.tab.h (-d option)
Bison compiler
lang.y
C compiler
a.out
lang.tab.c
a.out
syntax tree
tokens
121
Bison Programs
C declarations Bison declarations Grammar
rules Additional C code
122
An Example
line ? expr \n expr ? expr term
term term ? term factor factor factor ?
( expr ) DIGIT
123
An Example
token DIGIT start line line expr \n
printf(line expr \\n\n) expr
expr term printf(expr expr term\n)
term printf(expr term\n
term term factor printf(term term
factor\n factor printf(term
factor\n) factor ( expr )
printf(factor ( expr )\n)
DIGIT printf(factor DIGIT\n)
124
Functions and Variables
  • yyparse() the parser function
  • yylex() the lexical analyzer function. Bison
    recognizes any non-positive value as indicating
    the end of the input
  • yylval the attribute value of a token. Its
    default type is int, and can be declared to be
    multiple types in the first section using union
    int ival double dval

125
Conflict Resolutions
  • A reduce/reduce conflict is resolved by choosing
    the production listed first
  • A shift/reduce conflict is resolved in favor of
    shift
  • A mechanism for assigning precedences and
    assocoativities to terminals

126
Precedence and Associativity
  • The precedence and associativity of operators are
    declared simultaneously nonassoc lt
    / lowest / left -
    right
    / highest /
  • The precedence of a rule is determined by the
    precedence of its rightmost terminal
  • The precedence of a rule can be modified by
    adding prec ltterminalgt to its right end

127
An Example
include ltstdio.hgt token NUMBER left
- left / right UMINUS
128
An Example
line expr \n expr expr
expr expr - expr expr
expr expr / expr -
expr prec UMINUS ( expr )
NUMBER
129
Error Recovery
  • Error recovery is performed via error productions
  • An error production is a production containing
    the predefined terminal error
  • After adding an error production, A ? ? B ? ?
    error ?on encountering an error in the middle of
    B, the parser pops symbols from its stack until
    ?, shifts error, and skips input tokens until a
    token in FIRST(?)

130
Error Recovery
  • The parser can report a syntax error by calling
    the user provided function yyerror(char )
  • The parser will suppress the report of another
    error message for 3 tokens
  • You can resume error report immediately by using
    the macro yyerrok
  • Error productions are used for major nonterminals

131
An Example
line expr \n error \n
yyerror("reenter last line")
yyerrok expr expr expr
expr expr - expr prec
UMINUS ( expr ) NUMBER

132
??
????????????,????
????????????, ???????????,?????
??????,????! -- ??
Write a Comment
User Comments (0)
About PowerShow.com