CSCI 435 Compiler Design presentation

About This Presentation

Transcript and Presenter's Notes

Title: CSCI 435 Compiler Design

1
CSCI 435 Compiler Design

Week 3 Class 2
Section 2.2 From Tokens to Syntax Tree to Section
2.2.4.1 LL(1) Parsing
(110-126)
Ray Schneider

2
Topics of the Day

Tokens to Syntax Tree
Parsing Methods
Error Detection and Error Recovery
Top Down Parsing

3
Tokens to Syntax Tree

Two ways of parsing
TOP DOWN and BOTTOM UP
Top Down
1) written by hand or 2) generated automatically
Bottom Up
3) can only be generated
All 3 cases syntax structure specified using
context-free grammars

4
Importance of Grammars

1) Imposes a structure on the linear sequence of
tokens and a framework for erecting semantics on
the nodes of the structure
2) Allows automatic construction of parsers
through the field of formal languages
3) Helps create syntactically correct programs
and provide detailed answers about syntax

5
Two Ways to do Parsing

the LL Method deterministic left-to-right
top-down
the LR and LALR Methods deterministic
left-to-right bottom-up
Left-to-right
means the sequence of tokens (program text) is
processed from left to right one token at a time
Deterministic
No Searching (ideal) each token processed leads
one closer to the final construction of the
syntax tree, hence implies LINEAR TIME
Only work for restricted classes of grammars
Resulting grammars for deterministic parsers are
guaranteed to be non-ambiguous
Real grammars don't always cooperate so
transformation methods are needed to bring them
into line.
Non-ambiguous means either one syntax tree is
generated if the program is syntactically correct
or the program contains errors.

6
Parsing Methods

Constructs the syntax tree for a given sequence
of tokens (i.e. a tree of nodes labeled with
grammar symbols, such that
Leaf nodes are labeled with terminals
Inner nodes are labeled with non-terminals
TOP NODE is labeled with the Start Symbol
Children of an inner node labeled N correspond to
the members of an alternative of N, in the same
order as they occur in that alternative
Terminals labeling the leaf nodes correspond to
the sequence of tokens, in the same order as they
occur in the input.

7
Top Down or Bottom Upi.e. Pre-order or Post-order

Parsing Methods are either Top Down or
Bottom up depending on how the nodes of the
syntax tree are constructed

TREE TRAVERSAL Pre-Order visit node N and then
N's sub-trees in left-to-right order. Post-Order
visit N's sub-trees in left-to-right order and
then visit node N. TERMS visiting a node doing
something to the node in support of an algorithm
that motivates the traversal. traversing a node
visiting that node and traversing its sub-trees
in some order. traversing a tree traversing the
top node which will recursively traverse the
whole tree. Traversing belongs to the control
mechanism.
8
Top Down Parser

construct top node
from top node construct children in alternative
order
determine correct alternative
proceed down until one reaches a leftmost
terminal
terminal then matches first token

9
Bottom Up Parser

constructs nodes in post-order
constructs a node only when all children have
been constructed
1st node constructed is the top of the first
complete sub-tree it meets going left to right
through the input

10
Error Detection and Error Recovery

an error is detected when the construction of the
syntax tree fails
since tree is built from parsing methods which
read the tokens from left-to-right failure occurs
at a SPECIFIC TOKEN, two questions
What error message to give to the user? and
Whether and how to proceed after the error?
ex. x a(pq(-b(r-s)
position of detection may not reflect position of
error
We have to do error recovery to give users some
idea of how many errors there are, two strategies
error correction patch and continue
non-correcting error recovery discard and
continue with suffix grammar yields reliable
error detection but difficult to implement and
rarely found in parser generators.

point of detection
11
Manual Creation of a Top Down Parser

Given non-terminal N, token t at position p in
the input, (N, t, p)
Top Down parser must decide
Which alternative of N must be applied to obtain
a sub-tree headed by N with the correct sub-tree
at position p ?
How do you tell that a tree is incorrect?
IT HAS A DIFFERENT TOKEN THAN t AS ITS LEFTMOST
LEAF AT POSITION p !
So a correct tree (or reasonable approximation
thereto) starts with t or is empty
Obvious implementation is a recursive Boolean
function that tests possibilities until it find a
possible tree, called RECURSIVE DESIGN PARSER

t1 t2 t3 t4 t5 t6 t7 t8 t9
12
Recursive descent parsing

Next figure shows a Recursive Descent Parser
RECOGNIZER lacks code parse tree construction
Grammar is a simple arithmetic expression which
is right associative in the '' operator (see
example token stream)
IDENTIFIER (INDENTIFIER IDENTIFIER) EOF
Parser text shows a very direct relationship
between parser and grammar (note utility of
and lazy Boolean operators)
One of the attractions of Recursive Descent
Parsing
Each rule corresponds with an integer routine
that returns 1 if a terminal production of N was
found in the present position in the input
stream, otherwise it returns 0, i.e. no terminal
found.

13
The Driver
include "lex.h" / for start_lex(),
get_next_token(), Token / / DRIVER / int
main(void) start_lex() get_next_token()
require(input())//call START TOKEN return
0 void error(void) printf("Error in
expression\n") exit(1)
14
Recursive Descent Parser for grammar
define EoF 256 define IDENTIFIER 257
include "tokennumbers.h" / PARSER / int
input(void) return expression()
require(token(EoF)) int expression(void)
return term() require(rest_expression()) int
term(void) return token(IDENTIFIER)
parenthesized_expression() int
parenthesized_expression(void) return
token('(') require(expression())
require(token(')')) int rest_expression(void)
return token('') require(expression())
1 int token(int tk) if (tk !
Token.class) return 0 get_next_token()
return 1 int require(int found) if
(!found) error() return 1
input? expression EOF expression? term
rest_expression term? INDENTIFIER
parenthesized_expression parenthesized_expression?
'(' expression ')' rest_expression? ''
expression e
grammar of 2.53 and fig 2.4
15
Three Drawbacks

Repeated backtracking over one token due to
repeated calls to token(int tk) causing repeated
testing of Token.class
Operationally method often fails to produce a
correct parser
partial consumption of expressions causes the
parser to be stranded (see examples pg. 119)
Recursive descent parsers cannot handle
left-recursive grammars a serious disadvantage
Error handling leaves a lot to be desired
(laconic error handling)

16
Automatic Creation of a Top down Parser

Grammars that allow automatic construction of a
top down parser are called LL(1) grammars
LL(1) uses a push down automaton (section
2.2.4.4)
Applying precomputation is based on the
observation that when a routine for N is called
with the same token t the same sequence of
operations is called with the same result so we
can precompute for each N what is required for
each token t
Don't need to call other routines to find the
answer ORTHOGONALITY
Avoid the search overhead since only a single
routine is called
and Serendipitously it provides a solution to the
problems on pages 119 and 120

17
LL(1) parsing

final decision on success or failure was made by
comparing the input token to the first token
produced by the alternatives
so we can create FIRST sets the sets of first
tokens produced by all alternatives in the
grammar, both of Non-Terminals, N and terminals
FIRST(a), i.e. the FIRST set of the alternative
a, contains all terminals a can start with, or e
the empty string may be included in FIRST(a) if a
can produce e
Trivial if a starts with a terminal, ex.
parenthesized_expression? '(' expression ')'
Tougher if a starts with a Non-Terminal, N
Then we have to find FIRST(N) the Union of the
FIRST sets of its alternatives, which can be
computed with a closure algorithm

18
Closure Algorithm for FIRST sets in G

Data Definitions
Token sets called FIRST sets for all terminals,
non-terminals and alternatives of non-terminals
in G
A token set called FIRST for each alternative
tail in G an alternative tail is a sequence of
zero or more grammar symbols a if A a is an
alternative or alternative tail in G.
Initializations
For all terminals T, set FIRST(T) to T.
For all non-terminals, N, set FIRST(N) to the
empty set.
For all non-terminal alternatives and alternative
tails a, set FIRST(a) to the empty set.
Set the FIRST set of all empty alternatives and
alternative tails to e.
Inference rules
For each rule N?a in G, FIRST(N) must contain all
tokens in FIRST(a), including e if FIRST(a)
contains it.
For each alternative or alternative tail a of the
form Ab, FIRST(a) must contain all tokens in
FIRST(A), excluding e, should FIRST(A) contain
it.
For each alternative or alternative tail a of the
form Ab and FIRST(A) contains e, FIRST(a) must
contain all tokens in FIRST(b), including e if
FIRST(b) contains it.

figure 2.58
19
Initial FIRST sets of our example grammar
input expression EOF
EOF EOF expression term
rest_expression rest_expression
term IDENTIFIER IDENTIFIER
parenthesized_expression parenthesized_expr
ession '(' expression ')' '('
expression ')' ')' ')'
rest_expression '' expression ''
expression e e
20
Final FIRST sets
input IDENTIFIER '(' expression EOF
IDENTIFIER '(' EOF EOF
expression IDENTIFIER '(' term
rest_expression IDENTIFIER '('
rest_expression '' e term IDENTIFIER
'(' IDENTIFIER IDENTIFIER
parenthesized_expression '('
parenthesized_expression '(' '('
expression ')' '(' expression
')' IDENTIFIER '(' ')' ')'
rest_expression '' e ''
expression '' expression
IDENTIFIER '(' e e
21
Predictive Recursive Descent Parser

FIRST sets are used to construct a predictive
parser (probably ought to be called grammar
directed parser since it doesn't really predict
anything)
Code for each alternative is preceded by a CASE
label based on the FIRST set
Testing is done on tokens only (using switch
statements in C)
Routine for grammar rule only called when it is
certain (if no syntactic error) to produce a
terminal production

22
Predictive parser 1 (first half)
void input(void) switch (Token.class)
case IDENTIFIER case '('
expression() token(EoF) break default
error() void expression(void)
switch (Token.class) case IDENTIFIER
case '(' term()
rest_expression() break default
error() void term(void) switch
(Token.class) case IDENTIFIER
token(IDENTIFIER) break case '('
parenthesized_expression() break default
error()
first part of 2.61
23
Predictive Parser 2
void parenthesized_expression(void) switch
(Token.class) case '('
token('(') expression() token(')') break
default error() void
rest_expression(void) switch (Token.class)
case '' token('')
expression() break case EoF case ')'
break default error()
void token(int tk) if (tk !
Token.class) error() get_next_token()
second part of 2.61
24
LL(1) parsing with nullable alternatives

Complication how to handle the case label for
the empty alternative since it does not start
with any token
Solution When N produces an empty string we
don't see the string, but we do see a token that
can follow N
Create the FOLLOW set the set of tokens that can
immediately follow a given non-terminal N (see
closure algorithm fig. 2.62)

25
LL(1) parser/grammar

LL(1) parser is called LL(1) because the parser
works from Left to Right identifying nodes in
Leftmost derivative order, and '(1)' because all
choices are based on one-token look ahead. A
grammar for which this parsing works is called an
LL(1) grammar.
What we've seen is a strong LL(1) grammar, there
are lots of things to worry about (see the list
on page 124)

26
Things to Worry About

repetition operators in the grammar
detecting and reporting parsing conflicts (to be
covered next time)
including code for the generation of the syntax
tree
including code and tables for syntax error
recovery
optimizations

27
Homework for Week 3

Objective (two weeks), get a version of lex
running and run it on the LexByLex folder
material importing other files as necessary and
provide a "blow-by-blow" description of your
efforts and the result (Failure Is Not An Option)
http//csmweb2.emcweb.com/durable/2000/08/10/p19s2
.htm
problem to turn in next Monday
2.8 (185-186)
Some Flex/Lex
http//www.ug.bcc.bilkent.edu.tr/resat/Articles/a
rticle_1.htm
http//www.monmouth.com/wstreett/lex-yacc/lex-yac
c.html

CSCI 435 Compiler Design PowerPoint PPT Presentation