Parsing - PowerPoint PPT Presentation

About This Presentation
Title:

Parsing

Description:

Parsing. Parsing. Calculate grammatical structure of program, like ... A AB | B and B. or the productions. A BA | B and B . Eliminating Ambiguity (Cont. ... – PowerPoint PPT presentation

Number of Views:92
Avg rating:3.0/5.0
Slides: 65
Provided by: samk153
Category:
Tags: bb | parsing

less

Transcript and Presenter's Notes

Title: Parsing


1
Parsing
2
Parsing
  • Calculate grammatical structure of program, like
    diagramming sentences, where
  • Tokens words
  • Programs sentences

For further information, read Aho, Sethi,
Ullman, Compilers Principles, Techniques, and
Tools (a.k.a, the Dragon Book)
3
Outline of coverage
  • Context-free grammars
  • Parsing
  • Tabular Parsing Methods
  • One pass
  • Top-down
  • Bottom-up
  • Yacc

4
Parser extracts grammatical structure of program
function-def
name
arguments
stmt-list
stmt
main
expression
operator
expression
expression
variable
string
ltlt
cout
hello, world\n
5
Context-free languages
  • Grammatical structure defined by context-free
    grammar
  • statement ? labeled-statement
    expression-statement
    compound-statementlabeled-statement ? ident
    statement case
    constant-expression statementcompound-statement
    ? declaration-list
    statement-list

Context-free only one non-terminal in
left-part
terminal
non-terminal
6
Parse trees
  • Parse tree tree labeled with grammar symbols,
    such that
  • If node is labeled A, and its children are
    labeled x1...xn, then there is a productionA
    ??x1...xn
  • Parse tree from A root labeled with A
  • Complete parse tree all leaves labeled with
    tokens

7
Parse trees and sentences
  • Frontier of tree labels on leaves (in
    left-to-right order)
  • Frontier of tree from S is a sentential form
  • Frontier of a complete tree from S is a sentence

8
Example
  • G L ??L E E E ??a b
  • Syntax trees from start symbol (L)

Sentential forms
9
Derivations
  • Alternate definition of sentence
  • Given ?, ? in V, say ??? is a derivation step if
    ??????? and ? ??? , where A ? ??is a
    production
  • ? is a sentential form iff there exists a
    derivation (sequence of derivation steps)
    S??????? ( alternatively, we say that S?? )

Two definitions are equivalent, but note that
there are many derivations corresponding to each
parse tree
10
Another example
  • H L ??E L E E ??a b

L
L
L
L
E

E

L

E
E
b
E
b
a
a
11
Ambiguity
  • For some purposes, it is important to know
    whether a sentence can have more than one parse
    tree
  • A grammar is ambiguous if there is a sentence
    with more than one parse tree
  • Example E ? EE EE id

12
  • If e then if b then d else f
  • int x y 0
  • A.b.c d
  • Id -gt s s.id
  • E -gt E T -gt E T T -gt T T T -gt id T
    T -gt id T id T -gt id id id T -gt
  • id id id id

13
Ambiguity
  • Ambiguity is a function of the grammar rather
    than the language
  • Certain ambiguous grammars may have equivalent
    unambiguous ones

14
Grammar Transformations
  • Grammars can be transformed without affecting the
    language generated
  • Three transformations are discussed next
  • Eliminating Ambiguity
  • Eliminating Left Recursion (i.e.productions of
    the form A?A ? )
  • Left Factoring

15
Eliminating Ambiguity
  • Sometimes an ambiguous grammar can be rewritten
    to eliminate ambiguity
  • For example, expressions involving additions and
    products can be written as follows
  • E ? ET T
  • T ? Tid id
  • The language generated by this grammar is the
    same as that generated by the grammar on
    tranparency 11. Both generate id(idid)
  • However, this grammar is not ambiguous

16
Eliminating Ambiguity (Cont.)
  • One advantage of this grammar is that it
    represents the precedence between operators. In
    the parsing tree, products appear nested within
    additions

17
Eliminating Ambiguity (Cont.)
  • An example of ambiguity in a programming language
    is the dangling else
  • Consider
  • S ? if b then S else S if b then S a

18
Eliminating Ambiguity (Cont.)
  • When there are two nested ifs and only one else..

19
Eliminating Ambiguity (Cont.)
  • In most languages (including C and Java), each
    else is assumed to belong to the nearest if that
    is not already matched by an else. This
    association is expressed in the following
    (unambiguous) grammar
  • S ? Matched
  • Unmatched
  • Matched ? if b then Matched else
    Matched
  • a
  • Unmatched ? if b then S
  • if b then
    Matched else Unmatched

20
Eliminating Ambiguity (Cont.)
  • Ambiguity is a property of the grammar
  • It is undecidable whether a context free grammar
    is ambiguous
  • The proof is done by reduction to Posts
    correspondence problem
  • Although there is no general algorithm, it is
    possible to isolate certain constructs in
    productions which lead to ambiguous grammars

21
Eliminating Ambiguity (Cont.)
  • For example, a grammar containing the production
    A?AA ? would be ambiguous, because the
    substring aaa has two parses

A
A
A
A
A
A
A
A
a
A
A
a
a
a
a
a
  • This ambiguity disappears if we use the
    productions
  • A?AB B and B? ?
  • or the productions
  • A?BA B and B? ?.

22
Eliminating Ambiguity (Cont.)
  • Examples of ambiguous productions
  • A?AaA
  • A?aA Ab and
  • A?aA aAbA
  • A language generated by an ambiguous CFG is
    inherently ambiguous if it has no unambiguous CFG
  • An example of such a language is
  • Laibjcm ij or jm which can be generated
    by the grammar
  • S?AB DC
  • A?aA e C?cC e
  • B?bBc e D?aDb e

23
Elimination of Left Recursion
  • A grammar is left recursive if it has a
    nonterminal A and a derivation A?Aa for some
    string a. Top-down parsing methods (to be
    discussed shortly) cannot handle left-recursive
    grammars, so a transformation to eliminate left
    recursion is needed.
  • Immediate left recursion (productions of the form
    A?A ? ) can be easily eliminated.
  • We group the A-productions as
  • A?A ?1 A ?2 A ?m b1 b2 bn
  • where no bi begins with A. Then we replace the
    A-productions by
  • A? b1 A b2 A bn A
  • A? ?1 A ?2 A ?m A e

24
Elimination of Left Recursion (Cont.)
  • The previous transformation, however, does not
    eliminate left recursion involving two or more
    steps. For example, consider the grammar
  • S?Aa b
  • A?Ac Sd e
  • S is left-recursive because S?Aa??Sda, but it is
    not immediately left recursive

25
Elimination of Left Recursion (Cont.)
  • Algorithm. Eliminate left recursion
  • Arrange nonterminals in some order A1, A2 ,,, An
  • for i 1 to n
  • for j 1 to i -1
  • replace each production of the form Ai?Aj g
  • by the production Ai? d1 g d2 g dn g
  • where Aj? d1 d2 dn are all the current
    Aj-productions
  • eliminate the immediate left recursion among the
    Ai-productions

26
Elimination of Left Recursion (Cont.)
  • To show that the previous algorithm actually
    works all we need notice is that iteration i only
    changes productions with Ai on the left-hand
    side. And m gt i in all productions of the form
    Ai?Am ?
  • Induction proof
  • Clearly true for i1
  • If it is true for all iltk, then when the outer
    loop is executed for ik, the inner loop will
    remove all productions Ai?Am ? with m lt i
  • Finally, with the elimination of self recursion,
    m in the Ai?Am ? productions is forced to be gt i
  • So, at the end of the algorithm, all derivations
    of the form Ai?Ama will have m gt i and therefore
    left recursion would not be possible

27
Left Factoring
  • Left factoring helps transform a grammar for
    predictive parsing
  • For example, if we have the two productions
  • S ? if b then S else S
  • if b then S
  • on seeing the input token if, we cannot
    immediately tell which production to choose to
    expand S
  • In general, if we have A? ? b1 ? b2 and the
    input begins with a, we do not know (without
    looking further) which production to use to
    expand A

28
Left Factoring (Cont.)
  • However, we may defer the decision by expanding A
    to ?A
  • Then after seeing the input derived from ?, we
    may expand A to ?1 or to ?2
  • Left-factored, the original productions become
  • A? ? A
  • A? b1 b2

29
Non-Context-Free Language Constructs
  • Examples of non-context-free languages are
  • L1 wcw w is of the form (ab)
  • L2 anbmcndm n ? 1 and m? 1
  • L3 anbncn n ? 0
  • Languages similar to these that are context free
  • L1 wcwR w is of the form (ab) (wR stands
    for w reversed)
  • This language is generated by the grammar
  • S? aSa bSb c
  • L2 anbmcmdn n ? 1 and m? 1
  • This language is generated by the grammar
  • S? aSd aAd
  • A? bAc bc

30
Non-Context-Free Language Constructs (Cont.)
  • L2anbncmdm n ? 1 and m? 1
  • is generated by the grammar
  • S? AB
  • A? aAb ab
  • B? cBd cd
  • L3anbn n ? 1
  • is generated by the grammar
  • S? aSb ab
  • This language is not definable by any regular
    expression

31
Non-Context-Free Language Constructs (Cont.)
  • Suppose we could construct a DFSM D accepting
    L3.
  • D must have a finite number of states, say k.
  • Consider the sequence of states s0, s1, s2, , sk
    entered by D having read ?, a, aa, , ak.
  • Since D only has k states, two of the states in
    the sequence have to be equal. Say, si ? sj
    (i?j).
  • From si, a sequence of i bs leads to an accepting
    (final) state. Therefore, the same sequence of i
    bs will also lead to an accepting state from sj.
    Therefore D would accept ajbi which means that
    the language accepted by D is not identical to
    L3. A contradiction.

32
Parsing
  • The parsing problem is Given string of tokens
    w, find a parse tree whose frontier is w.
    (Equivalently, find a derivation from w.)
  • A parser for a grammar G reads a list of tokens
    and finds a parse tree if they form a sentence
    (or reports an error otherwise)
  • Two classes of algorithms for parsing
  • Top-down
  • Bottom-up

33
Parser generators
  • A parser generator is a program that reads a
    grammar and produces a parser
  • The best known parser generator is yacc It
    produces bottom-up parsers
  • Most parser generators - including yacc - do not
    work for every CFG they accept a restricted
    class of CFGs that can be parsed efficiently
    using the method employed by that parser generator

34
Top-down parsing
  • Starting from parse tree containing just S, build
    tree down toward input. Expand left-most
    non-terminal.
  • Algorithm (next slide)

35
Top-down parsing (cont.)
  • Let input a1a2...an
  • current sentential form (csf) S
  • loop
  • suppose csf t1...tkA?
  • if t1...tk ??a1...ak , its an error
  • based on ak1..., choose production A ??
  • csf becomes t1...tk??

36
Top-down parsing example
  • Grammar H L ??E L E
    E ??a b
  • Input ab
  • Parse tree Sentential form Input

L
ab
EL
ab
aL
ab
37
Top-down parsing example (cont.)
  • Parse tree Sentential form Input

aE
ab
ab
ab
38
LL(1) parsing
  • Efficient form of top-down parsing
  • Use only first symbol of remaining input (ak1)
    to choose next production. That is, employ a
    function M? ? N? P in choose production step
    of algorithm.
  • When this works, grammar is called LL(1)

39
LL(1) examples
  • Example 1
  • H L ??E L E E ??a b
  • Given input ab, so next symbol is a.
  • Which production to use? Cant tell.
  • ? H not LL(1)

40
LL(1) examples
  • Example 2
  • Exp ?? Term Exp
  • Exp ? Exp
  • Term ??id
  • (Use for end-of-input symbol.)

Grammar is LL(1) Exp and Term have only one
production Exp has two productions but only
one is applicable at any time.
41
Nonrecursive predictive parsing
  • It is possible to build a nonrecursive predictive
    parser by maintaining a stack explicitly, rather
    than implicitly via recursive calls
  • The key problem during predictive parsing is that
    of determining the production to be applied for a
    non-terminal

42
Nonrecursive predictive parsing
  • Algorithm. Nonrecursive predictive parsing
  • Set ip to point to the first symbol of w.
  • repeat
  • Let X be the top of the stack symbol and a the
    symbol pointed to by ip
  • if X is a terminal or then
  • if X a then
  • pop X from the stack and advance ip
  • else error()
  • else // X is a nonterminal
  • if MX,a X?Y1 Y2 Y k then
  • pop X from the stack
  • push YkY k-1, , Y1 onto the stack with Y1 on
    top
  • (push nothing if Y1 Y2 Y k is ? )
  • output the production X?Y1 Y2 Y k
  • else error()
  • until X

43
LL(1) grammars
  • No left recursion
  • A ?? Aa If this production is chosen, parse
    makes no progress.
  • No common prefixes
  • A ?? ab ag
  • Can fix by left factoring
  • A ?? aA
  • A ? b g

44
LL(1) grammars (cont.)
  • No ambiguity
  • Precise definition requires that production to
    choose be unique (choose function M very hard
    to calculate otherwise)

45
Top-down Parsing
L
Start symbol and root of parse tree
Input tokens ltt0,t1,,t-i,...gt
E0 E-n
L
Input tokens ltt-i,...gt
E0 E-n
From left to right, grow the parse tree
downwards
...
46
Checking LL(1)-ness
  • For any sequence of grammar symbols ?, define set
    FIRST(a) ? S to be
  • FIRST(a) a a ? ab for some b

47
Checking LL(1)-ness
  • Define Grammar G (N, ?, P, S) is LL(1) iff
    whenever there are two left-most derivations (in
    which the leftmost non-terminal is always
    expanded first)
  • S gt wA? gt w?? gt wx
  • S gt wA? gt w?? gt wy
  • such that FIRST(x) FIRST(y), it follows that ?
    ?
  • In other words, given
  • 1. A string wA? in V and
  • 2. The first terminal symbol to be derived from
    A?, say t
  • there is at most one production that can be
    applied to A to
  • yield a derivation of any terminal string
    beginning with wt
  • FIRST sets can often be calculated by inspection

48
FIRST Sets
Exp ?? Term Exp Exp ? Exp Term
??id (Use for end-of-input symbol)
FIRST() FIRST( Exp) FIRST() ?
FIRST( Exp) ? grammar is LL(1)
49
FIRST Sets
L ??E L EE ??a b

FIRST(E L) a, b FIRST(E) FIRST(E L) ?
FIRST(E) ? ? grammar not LL(1).
50
Computing FIRST Sets
  • Algorithm. Compute FIRST(X) for all grammar
    symbols X
  • forall X ? V do FIRST(X)
  • forall X ? ? (X is a terminal) do FIRST(X)X
  • forall productions X ? ? do FIRST(X) FIRST(X)
    U ?
  • repeat
  • c forall productions X?Y1 Y2 Y k do
  • forall i ? 1,k do
  • FIRST(X) FIRST(X) U (FIRST(Yi) - ?) if
    ? ? FIRST(Yi) then continue c
  • FIRST(X) FIRST(X) U ?
  • until no more terminals or ? are added to any
    FIRST set

51
FIRST Sets of Strings of Symbols
  • FIRST(X1X2Xn) is the union of FIRST(X1) and all
    FIRST(Xi) such that ? ? FIRST(Xk) for k1, 2, ,
    i-1
  • FIRST(X1X2Xn) contains ? iff ? ? FIRST(Xk) for
    k1, 2, , n

52
FIRST Sets do not Suffice
  • Given the productions
  • A? T x
  • A? T y T? w T? e
  • T? w should be applied when the next input token
    is w.
  • T? e should be applied whenever the next terminal
    (the one pointed to by ip) is either x or y

53
FOLLOW Sets
  • For any nonterminal X, define set FOLLOW(X) ? S
    as
  • FOLLOW(X) a S ?aXab

54
Computing the FOLLOW Set
  • Algorithm. Compute FOLLOW(X) for all nonterminals
    X
  • FOLLOW(S)
  • forall productions A ? ?B? do FOLLOW(B)Follow(B)
    U (FIRST(?) - ?)
  • repeat
  • forall productions A ? ?B or A ? ?B? with ? ?
    FIRST(?) do
  • FOLLOW(B) FOLLOW(B) U FOLLOW(A)
  • until all FOLLOW sets remain the same

55
Construction of a predictive parsing table
  • Algorithm. Construction of a predictive parsing
    table
  • M,
  • forall productions A ? ? do
  • forall a ? FIRST(?) do
  • MA,a MA,a U A ? ?
  • if ? ? FIRST(?) then
  • forall b ? FOLLOW(A) do
  • MA,b MA,b U A ? ?
  • Make all empty entries of M be error

56
Another Definition of LL(1)
  • Define Grammar G is LL(1) if for every A? N
    with productions A ? a1 . . . an
  • FIRST(ai FOLLOW(A)) ? FIRST(aj FOLLOW(A) ) ?
    for all i, j

57
Regular Languages
  • Definition. A regular grammar is one whose
    productions are all of the type
  • A ? aB
  • A ? a
  • A Regular Expression is either
  • a
  • R1 R2
  • R1 R2
  • R

58
Nondeterministic Finite State Automaton
a
b
b
start
a
0
1
2
3
b
59
Regular Languages
  • Theorem. The classes of languages
  • Generated by a regular grammar
  • Expressed by a regular expression
  • Recognized by a NDFS automaton
  • Recognized by a DFS automaton
  • coincide.

60
Deterministic Finite Automaton
space, tab, new line
START
digit
digit
NUM



KEYWORD
letter
, , -, /, (, )
OPERATOR
61
Scanner code
  • state start
  • loop
  • if no input character buffered then read
    one, and add it to the accumulated token
  • case state of
  • start
  • case input_char of
  • A..Z, a..z state id
  • 0..9 state num
  • else ...
  • end
  • id
  • case input_char of
  • A..Z, a..z state id
  • 0..9 state id
  • else ...
  • end
  • num
  • case input_char of
  • 0..9 ...

62
Table-driven DFA
63
Language Classes
L0
L0
CSL
CFL NPA
LR(1)
LL(1)
RL DFANFA
64
Question
  • Are regular expressions, as provided by Perl or
    other languages, sufficient for parsing nested
    structures, e.g. XML files?
Write a Comment
User Comments (0)
About PowerShow.com