CS 321 Programming Languages and Compilers - PowerPoint PPT Presentation

About This Presentation
Title:

CS 321 Programming Languages and Compilers

Description:

none – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 56
Provided by: SamK94
Category:

less

Transcript and Presenter's Notes

Title: CS 321 Programming Languages and Compilers


1
CS 321Programming Languages and Compilers
  • VI. Parsing

2
Parsing
  • Calculate grammatical structure of program, like
    diagramming sentences, where
  • Tokens words
  • Programs sentences

For further information, read Aho, Sethi,
Ullman, Compilers Principles, Techniques, and
Tools (a.k.a, the Dragon Book)
3
Outline of coverage
  • Context-free grammars
  • Parsing
  • Tabular Parsing Methods
  • One pass
  • Top-down
  • Bottom-up
  • Yacc

4
What parser doesExtracts grammatical structure
of program
function-def
name
arguments
stmt-list
stmt
main
expression
operator
expression
expression
variable
string
ltlt
cout
hello, world\n
5
Context-free languages
  • Grammatical structure defined by context-free
    grammar.
  • statement ? labeled-statement
    expression-statement
    compound-statementlabeled-statement ? ident
    statement case
    constant-expression statementcompound-statement
    ? declaration-list
    statement-list

Context-free only one non-terminal in
left-part.
terminal
non-terminal
6
Parse trees
  • Parse tree tree labeled with grammar symbols,
    such that
  • If node is labeled A, and its children are
    labeled x1...xn, then there is a productionA
    ??x1...xn
  • Parse tree from A root labeled with A
  • Complete parse tree all leaves labeled with
    tokens

7
Parse trees and sentences
  • Frontier of tree labels on leaves (in
    left-to-right order)
  • Frontier of tree from S is a sentential form.
  • Frontier of a complete tree from S is a sentence.

Frontier
8
Example
  • G L ??L E E E ??a b
  • Syntax trees from start symbol (L)
  • Sentential forms

9
Derivations
  • Alternate definition of sentence
  • Given ?, ? in V, say ??? is a derivation step if
    ??????? and ? ??? , where A ? ??is a
    production
  • ? is a sentential form iff there exists a
    derivation (sequence of derivation steps)
    S??????? ( alternatively, we say that S?? )

Two definitions are equivalent, but note that
there are many derivations corresponding to each
parse tree.
10
Another example
  • H L ??E L E E ??a b

L
L
L
L
E

E

L

E
E
b
E
b
a
a
11
Ambiguity
  • For some purposes, it is important to know
    whether a sentence can have more than one parse
    tree.
  • A grammar is ambiguous if there is a sentence
    with more than one parse tree.
  • Example E ? EE EE id

E
E
E
E
E


E
E
E

id
id
E
E

id
id
id
id
12
Ambiguity
  • Ambiguity is a function of the grammar rather
    than the language. Certain unambiguous grammars
    may have equivalent ambiguous ones.

13
Grammar Transformations
  • Grammars can be transformed without affecting the
    language generated.
  • Three transformations are discussed next
  • Eliminating Ambiguity
  • Eliminating Left Recursion (i.e.productions of
    the form A?A ? )
  • Left Factoring

14
Grammar Transformation1. Eliminating Ambiguity
  • Sometimes an ambiguous grammar can be rewritten
    to eliminate ambiguity.
  • For example, expressions involving additions and
    products can be written as follows
  • E ? ET T
  • T ? Tid id
  • The language generated by this grammar is the
    same as that generated by the grammar on
    tranparency 11. Both generate id(idid)
  • However, this grammar is not ambiguous.

15
Grammar Transformation1. Eliminating Ambiguity
(Cont.)
  • One advantage of this grammar is that it
    represents the precedence between operators. In
    the parsing tree, products appear nested within
    additions

E

T
E
id

T
T
id
id
16
Grammar Transformation1. Eliminating Ambiguity
(Cont.)
  • The most famous example of ambiguity in a
    programming language is the dangling else.
  • Consider
  • S ? if b then S else S if b then S a

17
Grammar Transformation1. Eliminating Ambiguity
(Cont.)
  • When there are two nested ifs and only one else..

S
if
b
then
S
else
S
a
if b then S
a
S
if
b
then
S
if
b
then
S
else
S
a
a
18
Grammar Transformation1. Eliminating Ambiguity
(Cont.)
  • In most languages (including C and Java), each
    else is assumed to belong to the nearest if that
    is not already matched by an else. This
    association is expressed in the following
    (unambiguous) grammar
  • S ? Matched
  • Unmatched
  • Matched ? if b then Matched else
    Matched
  • a
  • Unmatched ? if b then S
  • if b then
    Matched else Unmatched

19
Grammar Transformation1. Eliminating Ambiguity
(Cont.)
  • Ambiguity is a function of the grammar
  • It is undecidable whether a context free grammar
    is ambiguous.
  • The proof is done by reduction to Posts
    correspondence problem.
  • Although there is no general algorithm, it is
    possible to isolate certain constructs in
    productions which lead to ambiguous grammars.

20
Grammar Transformation1. Eliminating Ambiguity
(Cont.)
  • For example, a grammar containg the production
    A?AA ? would be ambiguous, because the
    substring aaa has two parses.

A
A
A
A
A
A
A
A
a
A
A
a
a
a
a
a
  • This ambiguity disappears if we use the
    productions
  • A?AB B and B? ?
  • or the productions
  • A?BA B and B? ?.

21
Grammar Transformation1. Eliminating Ambiguity
(Cont.)
  • Other three examples of ambiguous productions
    are
  • A?AaA
  • A?aA Ab and
  • A?aA aAbA
  • A language generated by an ambiguous Context Free
    Grammar is inherently ambiguous if it has no
    unambiguous Context Free Grammar. (This can be
    proven formally)
  • An example of such a language is Laibjcm ij
    or jm which can be generated by the grammar
  • S?AB DC
  • A?aA e C?cC e
  • B?bBc e D?aDb e

22
Grammar Transformations2. Elimination of Left
Recursion
  • A grammar is left recursive if it has a
    nonterminal A and a derivation A?Aa for some
    string a. Top-down parsing methods (to be
    discussed shortly) cannot handle left-recursive
    grammars, so a transformation to eliminate left
    recursion is needed.
  • Immediate left recursion (productions of the form
    A?A ? ) can be easily eliminated.
  • We group the A-productions as
  • A?A ?1 A ?2 A ?m b1 b2 bn
  • where no bi begins with A. Then we replace the
    A-productions by
  • A? b1 A b2 A bn A
  • A? ?1 A ?2 A ?m A e

23
Grammar Transformations2. Elimination of Left
Recursion (Cont.)
  • The previous transformation, however, does not
    eliminate left recursion involving two or more
    steps. For example, consider the grammar
  • S?Aa b
  • A?Ac Sd e
  • S is left-recursive because S?Aa??Sda , but it is
    not immediately left recursive.

24
Grammar Transformations2. Elimination of Left
Recursion (Cont.)
  • Algorithm. Eliminate left recursion
  • Arrange nonterminals in some order A1, A2 ,,, An
  • for i 1 to n
  • for j 1 to i -1
  • replace each production of the form
    Ai?Aj g
  • by the production Ai? d1 g d2 g
    dn g
  • where Aj? d1 d2 dn are all
    the current Aj-productions
  • eliminate the immediate left recursion among
    the Ai-productions

25
Grammar Transformations2. Elimination of Left
Recursion (Cont.)
  • To show that the previous algorithm actually
    works all we need notice is that iteration i only
    changes productions with Ai on the left-hand
    side. And m gt i in all productions of the form
    Ai?Am ? .
  • This can be easily shown by induction.
  • It is clearly true for i1.
  • If it is true for all iltk, then when the outer
    loop is executed for ik, the inner loop will
    remove all productions Ai?Am ? with m lt i.
  • Finally, with the elimination of self recursion,
    m in the Ai?Am ? productions is forced to be gt
    i.
  • So, at the end of the algorithm, all derivations
    of the form Ai?Ama will have m gt i and therefore
    left recursion would not be possible.

26
Grammar Transformations3. Left Factoring
  • Left factoring helps transform a grammar for
    predictive parsing
  • For example, if we have the two productions
  • S ? if b then S else S
  • if b then S
  • on seeing the input token if, we cannot
    immediately tell which production to choose to
    expand S.
  • In general, if we have A? ? b1 ? b2 and the
    input begins with a, we do not know (without
    looking further) which production to use to
    expand A.

27
Grammar Transformations3. Left Factoring(Cont.)
  • However, we may defer the decision by expanding A
    to ?A.
  • Then after seeing the input derived from ?, we
    may expand A to ?1 or to ?2. That is,
    left-factored, the original productions become
  • A? ? A
  • A? b1 b2

28
Non-Context-Free Language Constructs
  • Examples of non-context-free languages are
  • L1wcw w is of the form (ab)
  • L2anbmcndm n ? 1 and m? 1
  • L3anbncn n ? 0
  • Languages similar to these that are context free
  • L1wcwR w is of the form (ab) (wR stands
    for w reversed)
  • This language is generated by the grammar
  • S? aSa bSb c
  • L2anbmcmdn n ? 1 and m? 1
  • This language is generated by the grammar
  • S? aSd aAd
  • A? bAc bc

29
Non-Context-Free Language Constructs (Cont.)
  • L2anbncmdm n ? 1 and m? 1
  • This language is generated by the grammar
  • S? AB
  • A? aAb ab
  • B? cBd cd
  • L3anbn n ? 1
  • This language is generated by the grammar
  • S? aSb ab
  • This language is not definable by any regular
    expression

30
Non-Context-Free Language Constructs (Cont.)
  • Suppose we could construct a DFSM D accepting
    L3.
  • D must have a finite number of states, say k.
  • Consider the sequence of states s0, s1, s2, , sk
    entered by D having read ?, a, aa, , ak.
  • Since D only has k states, two of the states in
    the sequence have to be equal. Say, si ? sj
    (i?j).
  • From si, a sequence of i bs leads to an accepting
    (final) state. Therefore, the same sequence of i
    bs will also lead to an accepting state from sj.
    Therefore D would accept ajbi which means that
    the language accepted by D is not identical to
    L3. A contradiction.

31
Parsing
  • The parsing problem is Given string of tokens
    w, find a parse tree whose frontier is w.
    (Equivalently, find a derivation from w.)
  • A parser for a grammar G reads a list of tokens
    and finds a parse tree if they form a sentence
    (or reports an error otherwise)
  • Two classes of algorithms for parsing
  • Top-down
  • Bottom-up

32
Parser generators
  • A parser generator is a program that reads a
    grammar and produces a parser.
  • The best known parser generator is yacc. Both
    produce bottom-up parsers.
  • Most parser generators - including yacc - do not
    work for every cfg they accept a restricted
    class of cfgs that can be parsed efficiently
    using the method employed by that parser
    generator.

33
Top-down parsing
  • Starting from parse tree containing just S, build
    tree down toward input. Expand left-most
    non-terminal.
  • Algorithm (next slide)

34
Top-down parsing (cont.)
  • Let input a1a2...an
  • current sentential form (csf) S
  • loop
  • suppose csf t1...tkA?
  • if t1...tk ??a1...ak , its an error
  • based on ak1..., choose production A ??
  • csf becomes t1...tk??

35
Top-down parsing example
  • Grammar H L ??E L E
    E ??a b
  • Input ab
  • Parse tree Sentential form Input

L
L
ab
EL
L
ab
E
L

L
aL
ab
E
L

a
36
Top-down parsing example (cont.)
  • Parse tree Sentential form Input

L
aE
ab
E
L

a
E
ab
L
ab
E
L

a
E
b
37
LL(1) parsing
  • Efficient form of top-down parsing.
  • Use only first symbol of remaining input (ak1)
    to choose next production. That is, employ a
    function M? ? N? P in choose production step
    of algorithm.
  • When this works, grammar is (usually) called
    LL(1). (More precise definition to follow.)

38
LL(1) examples
  • Example 1
  • H L ??E L E E ??a b
  • Given input ab, so next symbol is a.
  • Which production to use? Cant tell.
  • ? H not LL(1).

39
LL(1) examples
  • Example 2
  • Exp ?? Term Exp
  • Exp ? Exp
  • Term ??id
  • (Use for end-of-input symbol.)

Grammar is LL(1) Exp and Term have only one
production Exp has two productions but only
one is applicable at any time.
40
Nonrecursive predictive parsing
  • It is possible to build a nonrecursive predictive
    parser by maintaining as stack explicitly, rather
    tan implicitly via recursive calls.
  • The key problem during predictive parsing is that
    of determining the production to be applied for a
    non-terminal.

41
Nonrecursive predictive parsing
  • Algorithm. Nonrecursive predictive parsing
  • Set ip to point to the first symbol of w.
  • repeat
  • Let X be the top of the stack symbol and a the
    symbol pointed to by ip
  • if X is a terminal or then
  • if X a then
  • pop X from the stack and advance ip
  • else error()
  • else // X is a nonterminal
  • if MX,a X?Y1 Y2 Y k then
  • pop X from the stack
  • push YkY k-1, , Y1 onto the stack with Y1 on
    top
  • (push nothing if Y1 Y2 Y k is ? )
  • output the production X?Y1 Y2 Y k
  • else error()
  • until X

42
LL(1) grammars
  • No left recursion.
  • A ?? Aa If this production is chosen, parse
    makes no progress.
  • No common prefixes.
  • A ?? ab ag
  • Can fix by left factoring
  • A ?? aA
  • A ? b g

43
LL(1) grammars (cont.)
  • No ambiguity.
  • Precise definition requires that production to
    choose be unique (choose function M very hard
    to calculate otherwise).

44
Top-down Parsing
L
Start symbol and root of parse tree
Input tokens ltt0,t1,,t-i,...gt
E0 E-n
L
Input tokens ltt-i,...gt
E0 E-n
From left to right, grow the parse tree
downwards
...
45
Checking LL(1)-ness
  • For any sequence of grammar symbols ?, define
    set FIRST(a) ? S to be those tokens a such that a
    ? ?ab for some b.
  • (Notation write a ?ab.)

46
Checking LL(1)-ness
  • Define Grammar G (N, ?, P, S) is LL(1) if
    whenever there are two left-most derivations (in
    which the leftmost non-terminal is always
    expanded first )
  • S gt wA? gt w?? gt wx
  • S gt wA? gt w?? gt wy
  • Such that FIRST(x) FIRST(y), it follows that ?
    ?.
  • In other words, given
  • 1. A string wA? in V and
  • 2. The first terminal symbol to be derived from
    A?, say t
  • There is at most one production that can be
    applied to A to
  • yield a derivation of any terminal string
    beginning with wt.
  • FIRST sets can often be calculated by
    inspection.

47
FIRST Sets
Exp ?? Term Exp Exp ? Exp Term
??id (Use for end-of-input symbol.)
FIRST(Term Exp) id FIRST() ,
FIRST( Exp) implies FIRST() ?
FIRST( Exp) FIRST(id) id ? grammar
is LL(1)
48
FIRST Sets
H L ??E L E E ??a b
FIRST(E L) a,b FIRST(E) FIRST(E L) ?
FIRST(E) ? ? H not LL(1).

49
How to compute FIRST Sets of Vocabulary Symbols
  • Algorithm. Compute FIRST(X) for all grammar
    symbols X
  • forall X ? V do FIRST(X)
  • forall X ? ? (X is a terminal) do FIRST(X)X
  • forall productions X ? ? do FIRST(X) FIRST(X)
    U ?
  • repeat
  • forall productions X?Y1 Y2 Y k do
  • forall i ? 1,k do
  • FIRST(X) FIRST(X) U (FIRST(Yi) - ?)
    if ? ? FIRST(Y i ) then continue outer loop
  • FIRST(X) FIRST(X) U ?
  • until no more terminals or ? are added to any
    FIRST set

50
How to compute FIRST Sets of Strings of Symbols
  • FIRST(X1X2Xn) is the union of FIRST(X1) and all
    FIRST(Xi) such that ? ? FIRST(X k ) for
    k1,2,..,i-1
  • FIRST(X1X2Xn) contains ? iff ? ? FIRST(X k ) for
    k1,2,..,n.

51
FIRST Sets do not Suffice
  • Given the productions
  • A? T x
  • A? T y T? w T? e
  • T? w should be applied when the next input token
    is w.
  • T? e should be applied whenever the next terminal
    (the one pointed to by ip) is either x or y

52
FOLLOW Sets
  • For any nonterminal X, define set FOLLOW(X) ? S
    to be those tokens a such that S ?aXab for some
    a and b.

53
How to compute the FOLLOW Set
  • Algorithm. Compute FOLLOW(X) for all nonterminals
    X
  • FOLLOW(S)
  • forall productions A ? ?B? do FOLLOW(B)Follow(B)
    U (FIRST(?) - ?)
  • repeat
  • forall productions A ? ?B or A ? ?B? with ? ?
    FIRST(?) do
  • FOLLOW(B) FOLLOW(B) U FOLLOW(A)
  • until all FOLLOW sets remain the same

54
Construction of a predictive parsing table
  • Algorithm. Construction of a predictive parsing
    table
  • M,
  • forall productions A ? ? do
  • forall a ? FIRST(?) do
  • MA,a MA,a U A ? ?
  • if ? ? FIRST(?) then
  • forall b ? FOLLOW(A) do
  • MA,b MA,b U A ? ?
  • Make all empty entries of M be error

55
Another Definition of LL(1)
  • Define Grammar G is LL(1) if for every A? N
    with productions A ? a1 . . . an,
  • FIRST(ai FOLLOW(A)) ? FIRST(aj FOLLOW(A) ) ?
    for all i, j
Write a Comment
User Comments (0)
About PowerShow.com