CSCI 435 Compiler Design presentation

About This Presentation

Transcript and Presenter's Notes

Title: CSCI 435 Compiler Design

1
CSCI 435 Compiler Design

Week 5 Class 1
Section 2.2.5.5 to2.3 skipping2.2.5.8
(165-185)
Ray Schneider

2
Topics of the Day

Finish off Chapter 2
LR(1)
LALR(1) parsing
Yacc/Bison
Conclusion and Summary

3
LR(1) Parsing

Conflict set resolution by FOLLOW set doesn't
work as well as we might wish since it replaces
the look-ahead of a single item of N in an LR
state by the FOLLOW set of N, i.e. the Union of
all the look-aheads of all alternatives of N in
all states.
LR(1) is more discriminating, keeping a
look-ahead set for each item to resolve conflicts
when a reduce item has been reached
Increases the strength of the parser, but also
the size of the parse tables
We will demo LR(1) using grammar
which is not LL(1) or SLR(1)

S?Axb A?aAbB B?x
4
not LL(1) ...
S?Axb A?aAbB B?x

grammar produces the language xb,anxbnn?0
this grammar is not LL(1) x is in FIRST(B) and
so it is also in FIRST(A) and S exhibits a
FIRST/FIRST conflict on x.
not SLR(1) either since SLR(1) bases its reduce
decision using an item N?a? of the FOLLOW set of
N.
S2 contains both a shift item on b and a reduce
item B?x?b

5
SLR(1) automaton for grammar of fig 2.95
shift-reduce conflict
6
LR(1) keeps a specific look ahead

The LR(1) technique does not rely on FOLLOW sets,
but keeps a specific look ahead with each item
We write N?a?bs where s is the set of tokens
that can follow this specific item
When dot reaches the end of the item
N?ab ?s it can be reduced only if the look
ahead is in s at that moment, otherwise item is
ignored

7
Rules for determining look aheads

Look ahead sets of existing items do not change
When a new item is created then a new look ahead
set must be determined,
TWO SITUATIONS
When creating the initial item set which is
the only token that can follow the initial item
set S0
When doing e moves prediction rule creates new
items for the alternatives of N in the presence
of items of the form P?a?Nbs look ahead set is
FIRST(bs) if FIRST(b) does not include e then
FIRST(bs) FIRST(b), if b can produce e then
FIRST(b) must include all the tokens in FIRST(b)
excluding e and all the tokens in s

8
LR(1) automaton
S9
9
What can we say about LR(1)

More discriminating than SLR(1)
So strong that any language that CAN be parsed
from let to right with a one token look ahead in
linear time can be parse using LR(1)
LR(1) is the strongest possible left-to-right
parsing method since as Knuth demonstrated in
1965, the set of LR items implements the best
possible breadth-first search for handles.
BUT LR(1) parsing tables are one or two orders of
magnitude larger than SLR(1)
TANSTAAFL

10
LALR(1) Parsing

LR(1) state diagram includes many similar states,
i.e. with almost identical item sets, identical
except for look-ahead sets.
Examples are S3,S10 and S4,S9 and S6,S11 and
S8,S12 -- if we ignore the look-ahead what
remains is called the CORE of the LR(1) state
CORES of LR(1) states correspond to LR(0) states
LR(1) states are split up versions of LR(0)
states based on the look ahead

11
Combining CORE Sets Reduces States to SLR(1)/LR(0)

Combining States
S4 and S9
S3 and S10
S6 and S11
S8 and S12

12
Resulting LALR(1) Automaton for fig. 2.95 Grammar
S7
S4,9
S?Axb A?aAbB B?x
B
S1
x
B
S0
S3,10
A?a.Abb A?.aAbb A?.Bb B?.xb
S?.A S?.xb A?.aAb A?.B
B?.x
A
a
S2
S?x.b B?x.
x
a
S6,11
A
A?aA.bb
b
b
S5
S8,12
13
Summing up LALR(1)

Reduce LR(1) number of states to SLR(1) and LR(0)
automaton by combining CORE states
Most popular method in use today
Combines most of the power of LR(1) with
efficiency and has memory requirements of LR(0)
State combination cannot cause shift/reduce
conflicts (see 172)

14
Making a grammar LR(1) or not

One still encounters grammars that are not LR(1)
usually because the grammar is ambiguous
example the dangling else problem
if_statement?'if' '(' expression ')'statement
'if' '('expression')'statement'else'statement
statement ? ... if_statement ...
item 1
if_statement?
'if' '('expression')'statement...'else'...
item 2
if_statement?
'if' '('expression')'statement'else'stateme
nt
thus we see a shift/reduce conflict

15
Resolving shift-reduce conflicts

traditionally resolved similarly to conflict
resolution in lexical analyzers
the longest possible sequence of grammar symbols
is taken for reduction, easy to implement in a
shift/reduce conflict do the shift
in the case of the dangling 'else' this results
in pairing with the latest if without an else as
stipulated in the C- manual.
another useful technique
use of precedence between tokens, can be used
only if the reduce item in the conflict ends in a
token followed by at most one non-terminal
P?a?tb... //the shift item
Q?guR? ...t... //the reduce item where R is
empty or one non-terminal
one of three actions
1) u higher precedence than t ?reduce Q
2) t higher precedence than u ?shift continues P
3) same precedence ? shift (see exercise 2.55)

16
Resolving reduce-reduce conflicts

corresponds to the situation in the lexical
analyzer where two patters have the same length
so the longest token still matches more than one
pattern
Usual resolution textually first grammar rule in
the parser generator input wins
easy to implement
generally satisfactory

17
A traditional bottom up parser generator

yacc/bison started as a UNIX utility in
mid-1970's a LALR(1) parser generator
problem generates C not ANSI C, bison rectifies
this problem
Unlike Top-Down parsing it is unsafe to associate
code with a Bottom-Up parse until the entire
alternative has been recognized
yacc associates exactly one parameter with each
member of the alternative 1,2, ... n including
terminal symbols is associated with the rule
non-terminal

18
Example yacc code
include "tree.h" union struct expr
expr struct term term type ltexprgt
expression type lttermgt term token
IDENTIFIER start main main expression
print_expr(1)printf("\n") expression
expression '-' term new_expr()-gttype'-'
-gtexpr1-gtterm3 term
new_expr()-gttype'T'-gtterm1 term
IDENTIFIER new_term()-gttype'I'
Declarations
Start of grammar proper
Grammar Rules
Start of auxiliary C code
19
Auxiliary Code for yacc parser
include "lex.h" int main(void)
start_lex() yyparse() /routine generated
by yacc/ return 0 int yylex(void)
get_next_token() return Token.class
Very High Level View of Analysis Techniques
fig 2.110
20
Summary

Lexical Analysis and Syntax Analysis
input character ? tokens, tokens?parse tree (AST)
Abstract Syntax Tree (AST) is version retaining
semantically important nodes
FSA's can be used to automate the process
Different methods for different grammars
Parsing
two ways Top-Down and Bottom-Up
Top-Down (written manually or automatically)
recursive descent parser works for a relatively
small subset of grammars, generated top-down
parsers use precomputation and generate
unambiguous transition tables for LL(1) grammars
Bottom-Up methods generally automated repeatedly
identify a handle
LR(0), SLR(1), LR(1) and LALR(1) grammars were
covered the last being the most popular in
current use

21
Homework for Week 7

Get Lex/Flex generated code from page 95 figure
2.41 to run under Visual C
Hints resolve function conflicts by adding
include ltappropriate librarygt in the generated
C-file. ex. exit, malloc, realloc, and free are
in ltstdlib.hgt, the strcpy() function is in
ltstring.hgt
You can include the default main() by adding a 1
to the line define YY_MAIN 1
And then amplify the default main to called
get_next_token() and print out appropriate results

CSCI 435 Compiler Design PowerPoint PPT Presentation