Parsing II Topdown recursive descent parsers - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Parsing II Topdown recursive descent parsers

Description:

f ( e ) | id | num. Start nonterminal. Which production should we use? (2 3) * 7 ... ( e ) | i. A nonterminal function. void e(string &line) cout 'e - t e' ... – PowerPoint PPT presentation

Number of Views:164
Avg rating:3.0/5.0
Slides: 43
Provided by: cbo4
Category:

less

Transcript and Presenter's Notes

Title: Parsing II Topdown recursive descent parsers


1
Parsing II Top-down recursive descent parsers
  • Top-down parsing
  • Recursive descent

Token Stream
Parser
2
Top-down Parsing
(2 3) 7 ( num num ) num e t t f f
f ( e ) f ( e t) f ( t t ) f ( f t )
f ( num t ) f ( num f ) f ( num num )
f ( num num ) num
e ? e t t t ? t f f f ? ( e ) id
num
3
Top-down Parsing Grammars
A problem with top-down parsing is that we need
to know what rule to apply at each step. We have
to revise the grammar to make that possible.
Start nonterminal
e ? e t t t ? t f f f ? ( e ) id
num
Which production should we use?
(2 3) 7 ( num num ) num e t t f
4
A revised grammar
e ? e t t t ? t f f f ? ( e ) id
num
Original
e ? t e e ? t e ? t ? f t t' ? f
t ? f ? ( e ) id num
idid ( num num ) num ididid
5
A simple top-down recursive-descent parser
This is a program where we write one function for
each non-terminal. Simplified grammar well
use e ? t e e ? t e ? t ? f t t' ?
f t ? f ? ( e ) i
6
Trivial test program
int main(int argc, char argv) string
line while(true) getline(cin,
line) if(line.length() 0)
break Parse(line) return
0
7
Parse()
e ? t e e ? t e ? t ? f t t' ? f t
? f ? ( e ) i
void Parse(string line) cout " 8
A nonterminal function
e ? t e e ? t e ? t ? f t t' ? f t
? f ? ( e ) i
void e(string line) cout t e'" endl t(line) eprime(line)
9
Another
e ? t e e ? t e ? t ? f t t' ? f t
? f ? ( e ) i
void eprime(string line) if(line.length()
0 line0 '') cout - t e'" terminal line line.substr(1)
t(line) eprime(line) else
cout epsilon" return
10
void e(string line) t(line)
eprime(line) void eprime(string line)
if(line.length() 0 line0 '')
// Remove the terminal line
line.substr(1) t(line)
eprime(line) else
return void t(string line)
f(line) tprime(line)
void tprime(string line) if(line.length()
0 line0 '') line
line.substr(1) f(line)
tprime(line) else
return void f(string line)
if(line.length() 0 line0 '(')
line line.substr(1) e(line)
if(line.length() 0 line0 ! ')')
cout else if(line.length() 0
line0 'i') line
line.substr(1) else cout
e ? t e e ? t e ? t ? f t t' ? f t
? f ? ( e ) i
11
How might I use this?
This is a true parser It just checks to see if
the string is in the language. But, how would I
make the program actually compute the
expressions. Of course add support for
numbers. What else?
e ? t e e ? t e ? t ? f t t' ? f t
? f ? ( e ) i
12
Parse tree for 7 9
e ? t e e ? t e ? t ? f t t' ? f t
? f ? ( e ) i
e ? t e
t ? f t
e ? t e
f ? 7
t' ? ?
t ? f t
e ? ?
f ? 9
t' ? ?
13
Parse tree for 7 9
e ? t e e ? t e ? t ? f t t' ? f t
? f ? ( e ) i
16
e ? t e
7
9
t ? f t
e ? t e
?
?
7
9
f ? 7
t' ? ?
t ? f t
e ? ?
9
?
f ? 9
t' ? ?
14
Top-down recursive descent solvers
float tprime(string line, float left)
if(line.length() 0 line0 '')
line line.substr(1) left
f(line) tprime(line, left)
return left float f(string line) float
ret 0 if(line.length() 0 line0
'(') line line.substr(1)
ret e(line) if(line.length() 0
line0 ! ')') cout error!" line.substr(1) else if(line.length()
0 isdigit(line0)) ret
line0 - '0' line line.substr(1)
else cout void Parse(string line) string linecopy
line cout float left t(line) return eprime(line,
left) float eprime(string line, float
left) if(line.length() 0 line0
'') line line.substr(1)
left t(line) left eprime(line,
left) return left float t(string
line) float left f(line) return
tprime(line, left)
15
Top-down parsing (theory)
If S ? a is possible, a is a sentential in G If a
has only terminals, a is a sentence in G
  • Given a context free grammar G with start symbol
    S and a string of tokens w
  • Top-down parsing produces derivation S ? g1 ? g2
    ? ? gn w
  • Start with our starting nonterminal S
  • Rewrite sentential g k into new sentential g
    k1 by
  • Choosing some non-terminal symbol A at position
    x in g k
  • Choosing a rule of the form A ? b
  • Replacing A with b at position x in g k
  • Eventually yielding w, if w ? L(G)

TT
16
Degrees of Freedom in Parsing
Top-down parsing using a leftmost derivation
Always replaces the leftmost nonterminal at every
step. (Alternative rightmost
derivation) Top-down recursive descent parsing
chooses the production to apply based on the
pending tokens (if there is a choice)
No choice, but unambiguous
e ? t e e ? t e ? t ? f t t' ? f t
? f ? ( e ) i
Choice is made by the first terminal
17
So, what does this mean about our grammar?
Top-down parsing using a leftmost derivation
Always replaces the leftmost nonterminal at every
step. (Alternative rightmost
derivation) Top-down recursive descent parsing
chooses the production to apply based on the
pending tokens (if there is a choice)
e ? t e e ? t e ? t ? f t t' ? f t
? f ? ( e ) i
e ? e t t t ? t f f f ? ( e ) id num
18
So, what does this mean about our grammar?
e ? t e e ? t e ? t ? f t t' ? f t
? f ? ( e ) i
If a production has more than one choice, only
one may begin with a nonterminal! It must be
possible to determine the production choice by
pending terminals only!
e ? t e t t ?
What about
19
What about this grammar?
e ? e t t t ? t f f f ? ( e ) id num
What are the problems here? How did we get
around them?
20
Left recursion
A ? A a
Our simple implementation would loop forever on
this production!
21
Left recursion elimination (simple)
A ? A a b Replace this with the equivalent A
? b A A ? a A ? This applies for many
problems in practice How to apply to
e ? e t t t ? t f f f ? ( e ) id num
22
Left-recursion elimination
e ? e t t t ? t f f f ? ( e ) id num
A ? A a b Replace this with the equivalent A
? b A A ? a A ? e ? e t t A e, a
t, b t e ? t e e ? t e ?
e ? t e e ? t e ? t ? f t t' ? f t
? f ? ( e ) i
TT
23
Left factoring
For each nonterminal symbol A, find the longest
common prefix a for all A-productions If a ? ?
replace all A productions A ? a b1 a b2
a bk With A ? a A A ? b1 b2 bk
s ? i e e e ? t e e ? t e ? t ? ( e )
i
?
24
Left factoring
For each nonterminal symbol A, find the longest
common prefix a for all A-productions If a ? ?
replace all A productions A ? a b1 a b2
a bk With A ? a A A ? b1 b2 bk
s ? i e e e ? t e e ? t e ? t ? ( e )
i
s ? i e e s ? i t e t e s ? i t e
(e) e i e s ? i s (e) e s ? t e
e
TT
25
Ambiguity
Any grammar that can parse a sentence more than
one way is called ambiguous. Ambiguity is
undesirable in general. Well want to rewrite
the grammar to eliminate ambiguity whenever
possible. If not possible, well use
disambiguation rules.
26
What about this grammar? Ambiguous?
e ? e e e e (e) id num
27
What about this grammar? Ambiguous?
e ? e e e e (e) id num
Of course it is. Try 7 8 9
e ? e e
e ? e e
e ? e e
num
num
e ? e e
num
num
num
num
How did we deal with this?
28
Our better grammar
e ? e t t t ? t f f f ? ( e ) id num
Is this ambiguous? Does it enforce
associativity? Its often said that t terms
and f factors in this grammar.
Can 1 2 3 be parsed more than one way?
29
Associativity
We expect arithmetic statements to be
left-associative (operations are done left to
right). Can you think of any common language
statement that is right associative?
30
Associativity
We expect arithmetic statements to be
left-associative (operations are done left to
right). Can you think of any common language
statement that is right associative?
x y z 0 What order should this be done in?
31
What about this grammar?
stmt ? if expr then stmt if expr
then stmt else stmt begin stmts
end stmts stmt stmts stmt
32
What about this grammar?
stmt ? if expr then stmt if expr
then stmt else stmt begin stmts
end stmts stmt stmts stmt
This is naturally ambiguous. How does C
disambiguate this case?
if (x 1) then if (y 2) then x
x 1 else x y
33
Choosing a production
e ? e t t t ? t f f f ? ( e ) id num
How would a parser possibly know to use e ? e t
or e ? t for 1 2 3?
34
Choosing a production to apply
Well define two things that will be useful for
defining classes of grammars that can be parsed
by different parsers. This will also be useful
for building parsing tables later. FIRST(a) Set
of terminal symbols that may appear first in any
sentential a. FOLLOW(A) Set of terminal symbols
that may appear immediately to the right of
nonterminal A in any sentential.
35
LL(1) Grammars
Class of grammars for which predictive parsers
can unambiguously decide which production to
apply at any given point. Notation First L
Input is scanned left to right. Second L
Parse produces a left-most derivation 1 One
token of lookahead is required. LL(1) is rich
enough to cover most languages. Grammar must be
written in LL(1) form. There will be no ambiguity
and no left recursion.
36
Formal definition of LL(1)
  • A grammar G is LL(1) if and only if whenever
    there are two productions A ? a b the following
    hold
  • For no terminal a do both a and b derive strings
    beginning with a.
  • At most one of a and b can derive ?.
  • If b ? ? then a does not derive any string
    beginning with a terminal in FOLLOW(A).
  • If a ? ? then b does not derive any string
    beginning with a terminal in FOLLOW(A)

What do each of these rules mean?
37
Formal definition of LL(1)
  • A grammar G is LL(1) if and only if whenever
    there are two productions A ? a b the following
    hold
  • For no terminal a do both a and b derive strings
    beginning with a.
  • At most one of a and b can derive ?.
  • If b ? ? then a does not derive any string
    beginning with a terminal in FOLLOW(A).
  • If a ? ? then b does not derive any string
    beginning with a terminal in FOLLOW(A)

e ? e e e e (e) id num
LL(1)?
38
Formal definition of LL(1)
  • A grammar G is LL(1) if and only if whenever
    there are two productions A ? a b the following
    hold
  • For no terminal a do both a and b derive strings
    beginning with a.
  • At most one of a and b can derive ?.
  • If b ? ? then a does not derive any string
    beginning with a terminal in FOLLOW(A).
  • If a ? ? then b does not derive any string
    beginning with a terminal in FOLLOW(A)

e ? e t t t ? t f f f ? ( e ) id num
LL(1)?
39
Formal definition of LL(1)
  • A grammar G is LL(1) if and only if whenever
    there are two productions A ? a b the following
    hold
  • For no terminal a do both a and b derive strings
    beginning with a.
  • At most one of a and b can derive ?.
  • If b ? ? then a does not derive any string
    beginning with a terminal in FOLLOW(A).
  • If a ? ? then b does not derive any string
    beginning with a terminal in FOLLOW(A)

e ? t e e ? t e ? t ? f t t' ? f t
? f ? ( e ) i
LL(1)?
40
Formal definition of LL(1)
  • A grammar G is LL(1) if and only if whenever
    there are two productions A ? a b the following
    hold
  • For no terminal a do both a and b derive strings
    beginning with a.
  • At most one of a and b can derive ?.
  • If b ? ? then a does not derive any string
    beginning with a terminal in FOLLOW(A).
  • If a ? ? then b does not derive any string
    beginning with a terminal in FOLLOW(A)

s ? i e e e ? t e e ? t e ? t ? ( e )
i
LL(1)?
41
Formal definition of LL(1)
  • A grammar G is LL(1) if and only if whenever
    there are two productions A ? a b the following
    hold
  • For no terminal a do both a and b derive strings
    beginning with a.
  • At most one of a and b can derive ?.
  • If b ? ? then a does not derive any string
    beginning with a terminal in FOLLOW(A).
  • If a ? ? then b does not derive any string
    beginning with a terminal in FOLLOW(A)

s ? i s ( e ) e s ? e e e ? t e e ?
t e ? t ? ( e ) i
LL(1)?
TT
42
Formal definition of LL(1)
  • A grammar G is LL(1) if and only if whenever
    there are two productions A ? a b the following
    hold
  • For no terminal a do both a and b derive strings
    beginning with a.
  • At most one of a and b can derive ?.
  • If b ? ? then a does not derive any string
    beginning with a terminal in FOLLOW(A).
  • If a ? ? then b does not derive any string
    beginning with a terminal in FOLLOW(A)

stmts ? stmt stmts ? stmt ? stmts
if e then stmt else while ( e )
stmt else ? else stmt ? e ? id id
LL(1)?
TT
Write a Comment
User Comments (0)
About PowerShow.com