Agenda - PowerPoint PPT Presentation

About This Presentation

Title:

Agenda

Description:

the Dangling-ELSE problem. CPSC4600. 36. Handling operator precedence. Rewrite the grammar ... Resolving the 'dangling else' else matches the closest unmatched then ... – PowerPoint PPT presentation

Number of Views:21

Avg rating:3.0/5.0

Slides: 77

Provided by: rasb6

Category:

more less

Transcript and Presenter's Notes

Title: Agenda

1
Agenda

Scanner vs. parser
Regular grammar vs. context-free grammar
Grammars (context-free grammars)
grammar rules
derivations
parse trees
ambiguous grammars
useful examples
Reading
Chapter 2, 4.1 and 4.2 ,

2
Characteristics of a Parser

Input sequence of tokens from scanner
Output parse tree of the program
parse tree is generated (implicitly or
explicitly) if the input is a legal program
if input is an illegal program, syntax errors are
issued
Note
Instead of parse tree, some parsers produce
directly
abstract syntax tree (AST) symbol table , or
intermediate code, or
object code
In the following lectures, well assume that
parse tree is generated.

3
Comparison with Lexical Analysis
Phase Input Output
Lexical Analysis String of characters String of tokens
Syntax Analysis String of tokens Parse tree
4
Example

The program
x y z
Input to parser
ID TIMES ID PLUS ID
well write tokens as follows
id id id
Output of parser
the parse tree ?

E
E
E

E
E
id

id
id
5
Why are Regular Grammars Not Enough?

Write an automaton that accepts strings
a, (a), ((a)), and (((a)))
a, (a), ((a)), (((a))), (ka)k

6
What must parser do?

Recognizer not all strings of tokens are
programs
must distinguish between valid and invalid
strings of tokens
Translator must expose program structure
e.g., associativity and precedence
hence must return the parse tree
We need
A language for describing valid strings of tokens
context-free grammars
(analogous to regular grammars in the scanner)
A method for distinguishing valid from invalid
strings of tokens (and for building the parse
tree)
the parser
(analogous to the state machine in the scanner)

7
Context-free grammars (CFGs)

Example Simple Arithmetic Expressions Grammar
In English
An integer is an arithmetic expression.
If exp1 and exp2 are arithmetic expressions,
then so are the following
exp1 - exp2
exp1 / exp2
( exp1 )
the corresponding CFG well write tokens as
follows
exp ? INTLITERAL E ? intlit
exp ? exp MINUS exp E ? E - E
exp ? exp DIVIDE exp E ? E / E
exp ? LPAREN exp RPAREN E ? ( E )

8
Reading the CFG

The grammar has five terminal symbols
intlit, -, /, (, )
terminals of a grammar tokens returned by the
scanner.
The grammar has one non-terminal symbol
E
non-terminals describe valid sequences of tokens
The grammar has four productions or rules,
each of the form E ? ?
left-hand side a single non-terminal.
right-hand side either
a sequence of one or more terminals and/or
non-terminals, or
? (an empty production)

9
Example, revisited

Note
a more compact way to write previous grammar
E ? INTLITERAL E - E E / E ( E )
or
E ? INTLITERAL
E - E
E / E
( E )

10
A formal definition of CFGs

A CFG consists of
A set of terminals T
A set of non-terminals N
A start symbol S (a non-terminal)
A set of productions
X ? X1 X2 Xn
where X ? N and Yi ? T U N U ?

11
Notational Conventions

In these lecture notes
Non-terminals are written upper-case
Terminals are written lower-case
The start symbol is the left-hand side of the
first production

12
The Language of a CFG

The language defined by a CFG is the set of
strings that can be derived from the start symbol
of the grammar.
Derivation Read productions as rules
X ? Y1 Yn
? Means X can be replaced by Y1 Yn

13
Derivation key idea

1. Begin with a string consisting of the start
symbol S
2. Replace any non-terminal X in the string by a
the right-hand side of some production
3. Repeat (2) until there are no non-terminals in
the string

14
Derivation an example
derivation

CFG
E ? id
E ? E E
E ? E E
E ? ( E )
Is string id id id in the
language defined by the grammar?

15
Terminals

Terminals are called so because there are no
rules for replacing them
Once generated, terminals are permanent
Therefore, terminals are the tokens of the
language

16
The Language of a CFG (Cont.)

More formally, write
X1 X2 Xn ? X1 X2 X i-1 Y1 Y2 Ym X i1 Xn
if there is a production
X i ? Y1 Y2 Ym

17
The Language of a CFG (Cont.)

Write
X1 X2 Xn ? Y1 Y2 Ym
if
X1 X2 Xn ? ? .. ? Y1 Y2 Ym
in 0 or more steps

18
The Language of a CFG

Let G be a context-free grammar with start
symbol S. Then the language of G is
a1 a2 an S ? a1 a2 an
where ai, i 1,2, .., n are terminal symbols

19
Examples

Strings of balanced parentheses
The grammar

sameas
20
Arithmetic Expression Example

Simple arithmetic expressions
Some elements of the language

21
Notes

The idea of a CFG is a big step. But
Membership in a language is yes or no
we also need parse tree of the input!
furthermore, we must handle errors gracefully
Need an implementation of CFGs,
i.e. the parser
well create the parser using a parser generator
available generators CUP, bison, yacc

22
More Notes

Form of the grammar is important
Many grammars generate the same language
Parsers are sensitive to the form of the grammar
Example
E ? E E
E E
intlit
is not suitable for an LL(1) parser (a common
kind of parser).

23
Derivations and Parse Trees

A derivation is a sequence of productions
S .. ? .. ? ..
A derivation can be drawn as a tree
Start symbol is the trees root
For a production X ? Y1 Y2 add children Y1 Y2
to node X

24
Derivation Example

Grammar
String

25
Derivation Example (Cont.)
E
E
E

E
E
id

id
id
26
Notes on Derivations

A parse tree has
Terminals at the leaves
Non-terminals at the interior nodes
An in-order traversal of the leaves is the
original input
The parse tree shows the association of
operations, the input string does not

27
Left-most and Right-most Derivations

The example is a left-most derivation
At each step, replace the left-most non-terminal
There is an equivalent notion of a right-most
derivation

28
Derivations and Parse Trees

Note that right-most and left-most derivations
have the same parse tree
The difference is the order in which branches are
added

29
Remarks on Derivation

We are not just interested in whether s e
L(G)
We need a parse tree for s, (because we need to
build the AST)
A derivation defines a parse tree
But one parse tree may have many derivations
Left-most and right-most derivations are
important in parser implementation

30
Ambiguity(1)

Grammar
String

31
Ambiguity (2)

This string has two parse trees

E
E
E
E
E
E

E
E
id

E
E
id

id
id
id
id
32
Ambiguity(3)

for each of the two parse trees, find the
corresponding left-most derivation
for each of the two parse trees, find the
corresponding right-most derivation

33
Ambiguity (4)

A grammar is ambiguous if, for some string of the
language
it has more than one parse tree, or
there is more than one right-most derivation, or
there is more than one left-most derivation.
(the three conditions are equivalent)
Ambiguity Leaves meaning of some programs
ill-defined

34
Dealing with Ambiguity

There are several ways to handle ambiguity
Most direct method is to rewrite grammar
unambiguously
Enforces precedence of over

35
Removing Ambiguity

Rewriting
Expression Grammars
precedence
associativity
IF-THEN-ELSE
the Dangling-ELSE problem

36
Handling operator precedence

Rewrite the grammar
use a different nonterminal for each precedence
level
start with the lowest precedence (MINUS)
E ? E - E E / E ( E ) id
rewrite to
E ? E - T T
T ? T / F F
F ? id ( E )

37
Example

parse tree for id id / id
E ? E - T T
T ? T / F F
F ? id ( E )

E
T
E
-
T
/
T
F
F
F
id
id
id
38
Handling Operator Associativity

The grammar captures operator precedence, but it
is still ambiguous!
fails to express that both subtraction and
division are left associative
e.g., 5-3-2 is equivalent to ((5-3)-2) and not
to (5-(3-2)).

39
Recursion

A grammar is recursive in nonterminal X if
X ? X
? means in one or more steps, X derives a
sequence of symbols that includes an X
A grammar is left recursive in X if
X ? X
in one or more steps, X derives a sequence of
symbols that starts with an X
A grammar is right recursive in X if
X ? X
in one or more steps, X derives a sequence of
symbols that ends with an X

40
Resolving ambiguity due to associativity

The grammar given above is both left and right
recursive in nonterminals E and T
To correctly expresses operator associativity
For left associativity, use left recursion.
For right associativity, use right recursion.
Here's the correct grammar
E ? E T T
T ? T / F F
F ? id ( E )

41
The Dangling Else ambiguity

Consider the grammar
St ? if E then St
if E then St else St
other
This grammar is also ambiguous

42
Resolving the dangling else

else matches the closest unmatched then
We can describe this in the grammar
E ? MIF / all then are
matched /
UIF / some then are
unmatched /
MIF ? if E then MIF else MIF
print
UIF ? if E then E
if E then MIF else UIF
Describes the same set of strings

43
Precedence and Associativity Declarationsin
Parser Generators

Instead of rewriting the grammar
Use the more natural (ambiguous) grammar
Along with disambiguating declarations
Most parser generators allow precedence and
associativity declarations to disambiguate
grammars

44
Parsing Approaches

Top-down parsing
build parse tree from start symbol (root)
match terminal symbols(tokens) in the production
rules with tokens in the input stream
simple but limited in power
Bottom-up parsing
start from input token stream
build parse tree from terminal symbols (tokens)
until get start symbol
complex but powerful

45
Top Down vs. Bottom Up
start here
result
match
start here
result
input token stream
input token stream
Top-down Parsing
Bottom-up Parsing
46
Top-down Parsing

A top-down parsing algorithm parses an input
string of tokens by tracing out the steps in a
leftmost derivation.
The parse tree associated with the input string
is constructed using preorder traversal and hence
the name top-down.

47
Top-down parsers

There are mainly two kinds of top-down parsers
1. Predictive parsers
- Tries to make decisions about the
structure of the tree below a node based on a few
lookahead tokens (usually one!).
- Weakness Little program structure
has been seen before predictive decisions must be
made.
2. Backtracking parsers
- Backtracking parsers solve the
lookahead problem by backtracking if one decision
turns out to be wrong and making a different
choice.
- Weakness Backtracking parsers are
slow (exponential time in general).

48
Recursive-descent parsing

Main idea
1. Use the grammar rules as recipes for
procedure code that parses the rule
2. Each non-terminal corresponds to a
procedure
3. Each appearance of a terminal in the
right hand side of a rule causes a token to be
matched.
4. Each appearance of a non-terminal
corresponds to a call of the associated
procedure.

49
Example Recursive-descent Parsing

F ? (E) num
Code
void F()
if (token num) match(num)
else
match(()
E()
match())// match token (

50
Example Recursive-descent Parsing (2)

Observation
Note how lookahead is not a problem in this
example if the token is number, go one way, if
the token is ( go the other, and if the token
is neither, declare error
void match(Token expect)
if (token expect)
getToken() //get next token
else error(token,expect)

51
Example Recursive-descent Parsing (3)

A recursive-descent procedure can also compute
values or syntax trees
int F()
if (token num)
int temp atoi(lexeme)
match(number) return temp
else
match(() int temp E()
match()) return temp

52
When Recursive Descent Does Not Work

E ? E term term
void E()
if (token ??)
E() // uh, oh!!
match()
term()
else term()
- A left-recursive grammar has a non-terminal A
A ? A? for some ?
- Recursive descent does not work in such cases

53
Elimination of Left Recursion

Consider the left-recursive grammar
A ? A?b for some sentential forms a and b
S generates all strings starting with a ? and
followed by a number of ?
Can rewrite the grammar using right-recursion
A ? ? A
A ? ? A ?
where A is a new nonterminal

54
Elimination of Left Recursion (2)

In general
A ? A ?1 A ?n ?1
?m
All strings derived from A start with one of
?1,,?m and continue with several instances of
?1,,?n
Rewrite as
A ? ?1 A ?m A
A ? ?1 A ?n A ?

55
General Left Recursion

The grammar
S ? A ? ?
A ? S ?
is also left-recursive because
S ? S ? ?
This left-recursion can also be eliminated
See book, Section 4.3 for general algorithm

56
Summary of Recursive Descent with backtracking

Simple and general parsing strategy
Left-recursion must be eliminated first
but that can be done automatically
Unpopular because of backtracking
Thought to be too inefficient
In practice, backtracking is eliminated by
restricting the grammar

57
Predictive Parsers

Like recursive-descent but parser can predict
which production to use
- By looking at the next few tokens
- No backtracking
Predictive parsers accept LL(k) grammars
- L means left-to-right scan of input
- L means leftmost derivation
- k means predict based on k tokens of
lookahead
In practice, LL(1) is used

58
LL(1) Languages

In recursive-descent, for each non-terminal and
input token there may be a choice of production
LL(1) means that for each non-terminal and token
there is only one production
Can be specified via 2D tables
- One dimension for current non-terminal to
expand
- One dimension for next token
- A table entry contains one production

59
Predictive Parsing and Left Factoring