Title: CSCI 435 Compiler Design
1CSCI 435 Compiler Design
- Week 5 Class 1
- Section 2.2.5.5 to2.3 skipping2.2.5.8
- (165-185)
- Ray Schneider
2Topics of the Day
- Finish off Chapter 2
- LR(1)
- LALR(1) parsing
- Yacc/Bison
- Conclusion and Summary
3LR(1) Parsing
- Conflict set resolution by FOLLOW set doesn't
work as well as we might wish since it replaces
the look-ahead of a single item of N in an LR
state by the FOLLOW set of N, i.e. the Union of
all the look-aheads of all alternatives of N in
all states. - LR(1) is more discriminating, keeping a
look-ahead set for each item to resolve conflicts
when a reduce item has been reached - Increases the strength of the parser, but also
the size of the parse tables - We will demo LR(1) using grammar
- which is not LL(1) or SLR(1)
S?Axb A?aAbB B?x
4not LL(1) ...
S?Axb A?aAbB B?x
- grammar produces the language xb,anxbnn?0
- this grammar is not LL(1) x is in FIRST(B) and
so it is also in FIRST(A) and S exhibits a
FIRST/FIRST conflict on x. - not SLR(1) either since SLR(1) bases its reduce
decision using an item N?a? of the FOLLOW set of
N. - S2 contains both a shift item on b and a reduce
item B?x?b
5SLR(1) automaton for grammar of fig 2.95
shift-reduce conflict
6LR(1) keeps a specific look ahead
- The LR(1) technique does not rely on FOLLOW sets,
but keeps a specific look ahead with each item - We write N?a?bs where s is the set of tokens
that can follow this specific item - When dot reaches the end of the item
N?ab ?s it can be reduced only if the look
ahead is in s at that moment, otherwise item is
ignored
7Rules for determining look aheads
- Look ahead sets of existing items do not change
- When a new item is created then a new look ahead
set must be determined, - TWO SITUATIONS
- When creating the initial item set which is
the only token that can follow the initial item
set S0 - When doing e moves prediction rule creates new
items for the alternatives of N in the presence
of items of the form P?a?Nbs look ahead set is
FIRST(bs) if FIRST(b) does not include e then
FIRST(bs) FIRST(b), if b can produce e then
FIRST(b) must include all the tokens in FIRST(b)
excluding e and all the tokens in s
8LR(1) automaton
S9
9What can we say about LR(1)
- More discriminating than SLR(1)
- So strong that any language that CAN be parsed
from let to right with a one token look ahead in
linear time can be parse using LR(1) - LR(1) is the strongest possible left-to-right
parsing method since as Knuth demonstrated in
1965, the set of LR items implements the best
possible breadth-first search for handles. - BUT LR(1) parsing tables are one or two orders of
magnitude larger than SLR(1) - TANSTAAFL
10LALR(1) Parsing
- LR(1) state diagram includes many similar states,
i.e. with almost identical item sets, identical
except for look-ahead sets. - Examples are S3,S10 and S4,S9 and S6,S11 and
S8,S12 -- if we ignore the look-ahead what
remains is called the CORE of the LR(1) state - CORES of LR(1) states correspond to LR(0) states
- LR(1) states are split up versions of LR(0)
states based on the look ahead
11Combining CORE Sets Reduces States to SLR(1)/LR(0)
- Combining States
- S4 and S9
- S3 and S10
- S6 and S11
- S8 and S12
12Resulting LALR(1) Automaton for fig. 2.95 Grammar
S7
S4,9
S?Axb A?aAbB B?x
B
S1
x
B
S0
S3,10
A?a.Abb A?.aAbb A?.Bb B?.xb
S?.A S?.xb A?.aAb A?.B
B?.x
A
a
S2
S?x.b B?x.
x
a
S6,11
A
A?aA.bb
b
b
S5
S8,12
13Summing up LALR(1)
- Reduce LR(1) number of states to SLR(1) and LR(0)
automaton by combining CORE states - Most popular method in use today
- Combines most of the power of LR(1) with
efficiency and has memory requirements of LR(0) - State combination cannot cause shift/reduce
conflicts (see 172)
14Making a grammar LR(1) or not
- One still encounters grammars that are not LR(1)
usually because the grammar is ambiguous - example the dangling else problem
- if_statement?'if' '(' expression ')'statement
- 'if' '('expression')'statement'else'statement
- statement ? ... if_statement ...
- item 1
- if_statement?
- 'if' '('expression')'statement...'else'...
- item 2
- if_statement?
- 'if' '('expression')'statement'else'stateme
nt - thus we see a shift/reduce conflict
15Resolving shift-reduce conflicts
- traditionally resolved similarly to conflict
resolution in lexical analyzers - the longest possible sequence of grammar symbols
is taken for reduction, easy to implement in a
shift/reduce conflict do the shift - in the case of the dangling 'else' this results
in pairing with the latest if without an else as
stipulated in the C- manual. - another useful technique
- use of precedence between tokens, can be used
only if the reduce item in the conflict ends in a
token followed by at most one non-terminal - P?a?tb... //the shift item
- Q?guR? ...t... //the reduce item where R is
empty or one non-terminal - one of three actions
- 1) u higher precedence than t ?reduce Q
- 2) t higher precedence than u ?shift continues P
- 3) same precedence ? shift (see exercise 2.55)
16Resolving reduce-reduce conflicts
- corresponds to the situation in the lexical
analyzer where two patters have the same length
so the longest token still matches more than one
pattern - Usual resolution textually first grammar rule in
the parser generator input wins - easy to implement
- generally satisfactory
17A traditional bottom up parser generator
- yacc/bison started as a UNIX utility in
mid-1970's a LALR(1) parser generator - problem generates C not ANSI C, bison rectifies
this problem - Unlike Top-Down parsing it is unsafe to associate
code with a Bottom-Up parse until the entire
alternative has been recognized - yacc associates exactly one parameter with each
member of the alternative 1,2, ... n including
terminal symbols is associated with the rule
non-terminal
18Example yacc code
include "tree.h" union struct expr
expr struct term term type ltexprgt
expression type lttermgt term token
IDENTIFIER start main main expression
print_expr(1)printf("\n") expression
expression '-' term new_expr()-gttype'-'
-gtexpr1-gtterm3 term
new_expr()-gttype'T'-gtterm1 term
IDENTIFIER new_term()-gttype'I'
Declarations
Start of grammar proper
Grammar Rules
Start of auxiliary C code
19Auxiliary Code for yacc parser
include "lex.h" int main(void)
start_lex() yyparse() /routine generated
by yacc/ return 0 int yylex(void)
get_next_token() return Token.class
Very High Level View of Analysis Techniques
fig 2.110
20Summary
- Lexical Analysis and Syntax Analysis
- input character ? tokens, tokens?parse tree (AST)
- Abstract Syntax Tree (AST) is version retaining
semantically important nodes - FSA's can be used to automate the process
- Different methods for different grammars
- Parsing
- two ways Top-Down and Bottom-Up
- Top-Down (written manually or automatically)
- recursive descent parser works for a relatively
small subset of grammars, generated top-down
parsers use precomputation and generate
unambiguous transition tables for LL(1) grammars - Bottom-Up methods generally automated repeatedly
identify a handle - LR(0), SLR(1), LR(1) and LALR(1) grammars were
covered the last being the most popular in
current use
21Homework for Week 7
- Get Lex/Flex generated code from page 95 figure
2.41 to run under Visual C - Hints resolve function conflicts by adding
include ltappropriate librarygt in the generated
C-file. ex. exit, malloc, realloc, and free are
in ltstdlib.hgt, the strcpy() function is in
ltstring.hgt - You can include the default main() by adding a 1
to the line define YY_MAIN 1 - And then amplify the default main to called
get_next_token() and print out appropriate results
22References
- Text Modern Compiler Design