Title: Parsing
1Parsing
Programming Language Principles Lecture 3
- Prepared by
- Manuel E. Bermúdez, Ph.D.
- Associate Professor
- University of Florida
2Context-Free Grammars
- Definition A context-free grammar (CFG) is a
quadruple G (?, ?, P, S), where all productions
are of the form A ? ?, for A ? ? and ? ? (?u? ). - Re-writing using grammar rules
- ßA? gt ß?? if A ? ? (derivation).
3String Derivations
- Left-most derivation At each step, the
left-most nonterminal is re-written. - Right-most derivation At each step, the
right-most nonterminal is re-written. -
4(No Transcript)
5Derivation Trees
- Derivation trees
- Describe re-writes, independently of the order
(left-most or right-most). - Each tree branch matches a production rule in the
grammar.
6(No Transcript)
7Derivation Trees
- Notes
- Leaves are terminals.
- Bottom contour is the sentence.
- Left recursion causes left branching.
- Right recursion causes right branching.
8Goal of Parsing
- Examine input string, determine whether it's
legal. - Equivalent to building derivation tree.
- Added benefit tree embodies syntactic structure
of input. - Therefore, tree should be unique.
9Ambiguous Grammars
- Definition A CFG is ambiguous if there exist
two different right-most (or left-most, but not
both) derivations for some sentence z. - (Equivalent) Definition A CFG is ambiguous if
there exist two different derivation trees for
some sentence z.
10Ambiguous Grammars
- Classic ambiguities
- Simultaneous left/right recursion
- E ? E E
- ? i
- Dangling else problem
- S ? if E then S
- ? if E then S else S
- ?
11(No Transcript)
12Operator Precedence and Associativity
- Lets build a CFG for expressions consisting of
-
- elementary identifier i.
- and - (binary ops) have lowest precedence, and
are left associative . - and / (binary ops) have middle precedence, and
are right associative. - and - (unary ops) have highest precedence, and
are right associative.
13Corresponding Grammar for Expressions
- E ? E T E consists of T's,
- ? E - T separated by s and 's
- ? T (lowest precedence).
- T ? F T T consists of F's,
- ? F / T separated by 's and /'s
- ? F (next precedence).
- F ? - F F consists of a single P,
- ? F preceded by 's and -'s.
- ? P (next precedence).
- P ? '(' E ')' P consists of a parenthesized
E, - ? i or a single i (highest
precedence).
14Operator Precedence and Associativity
- Operator precedence
- The lower in the grammar, the higher the
precedence. - Operator Associativity
- Tie breaker for precedence.
- Left recursion in the grammar means
- left associativity of the operator,
- left branching in the tree.
- Right recursion in the grammar means
- right associativity of the operator,
- right branching in the tree.
15Building Derivation Trees
- Sample Input
- - i - i ( i i ) / i i
- (Human) derivation tree construction
- Bottom-up.
- On each pass, scan entire expression, process
operators with highest precedence (parentheses
are highest). - Lowest precedence operators are last, at the top
of tree.
16(No Transcript)
17Abstract Syntax Trees
- AST is a condensed version of the derivation
tree. - No noise (intermediate nodes).
- String-to-tree transduction grammar
- rules of the form A ? ? gt 's'.
- Build 's' tree node, with one child per tree from
each nonterminal in ?.
18Example
- E ? E T gt
- ? E - T gt -
- ? T
- T ? F T gt
- ? F / T gt /
- ? F
- F ? - F gt neg
- ? F gt
- ? P
- P ? '(' E ')'
- ? i gt i
19Sample Input - i - i ( i i ) / i i
20String-to-Tree Transduction
- We transduce from vocabulary of input symbols, to
vocabulary of tree node names. - Could eliminate construction of unary node,
anticipating semantics. - F ? - F gt neg
- ? F // no more unary node
- ? P
21The Game of Syntactic Dominoes
- The grammar
- E ? ET T ? PT P ? (E)
- ? T ? P ? i
- The playing pieces An arbitrary supply of each
piece (one per grammar rule). - The game board
- Start domino at the top.
- Bottom dominoes are the "input."
22(No Transcript)
23The Game of Syntactic Dominoes
- Game rules
- Add game pieces to the board.
- Match the flat parts and the symbols.
- Lines are infinitely elastic.
- Object of the game
- Connect start domino with the input dominoes.
- Leave no unmatched flat parts.
24Parsing Strategies
- Same as for the game of syntactic dominoes.
- Top-down parsing start at the start symbol,
work toward the input string. - Bottom-up parsing start at the input string,
work towards the goal symbol. - In either strategy, can process the input
left-to-right ? or right-to-left ?
25Top-Down Parsing
- Attempt a left-most derivation, by predicting the
re-write that will match the remaining input. - Use a string (a stack, really) from which the
input can be derived.
26Top-Down Parsing
- Start with S on the stack.
- At every step, two alternatives
- ? (the stack) begins with a terminal t. Match t
against the first input symbol. - ? begins with a nonterminal A. Consult an OPF
(Omniscient Parsing Function) to determine which
production for A would lead to a match with the
first symbol of the input. - The OPF does the predicting in such a
predictive parser.
27(No Transcript)
28Classical Top-Down Parsing Algorithm
- Push (Stack, S)
- while not Empty (Stack) do
- if Top(Stack) ??
- then if Top(Stack) Head(input)
- then input tail(input)
- Pop(Stack)
- else error (Stack, input)
- else P OPF (Stack, input)
- Push (Pop(Stack), RHS(P))
- od
-
29(No Transcript)
30Top-Down Parsing
- Most parsing methods impose bounds on the amount
of stack lookback and input lookahead. For
programming languages, a common choice is (1,1). - We must define OPF (A,t), where A is the top
element of the stack, and t is the first symbol
on the input. - Storage requirements O(n2), where n is the size
of the grammar vocabulary - (a few hundred).
31LL(1) Grammars
- Definition
- A CFG G is LL(1) (Left-to-right, Left-most,
one-symbol lookahead) - iff for all A??, and for all A??, A??, ? ? ?,
- Select (A ? ?) n Select (A ? ?) ?
- Previous example Grammar is not LL(1).
- More later on why, and what do to about it.
32Example
- S ? A b,?
- A ? bAd b
- ? d, ?
Disjoint! Grammar is LL(1)!
d b ?
S S ? A S ? P
A A ? A ? bAd A ?
(At most) one production per entry.
33Parsing
Programming Language Principles Lecture 3
- Prepared by
- Manuel E. Bermúdez, Ph.D.
- Associate Professor
- University of Florida