Agenda - PowerPoint PPT Presentation

About This Presentation
Title:

Agenda

Description:

the Dangling-ELSE problem. CPSC4600. 36. Handling operator precedence. Rewrite the grammar ... Resolving the 'dangling else' else matches the closest unmatched then ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 77
Provided by: rasb6
Category:

less

Transcript and Presenter's Notes

Title: Agenda


1
Agenda
  • Scanner vs. parser
  • Regular grammar vs. context-free grammar
  • Grammars (context-free grammars)
  • grammar rules
  • derivations
  • parse trees
  • ambiguous grammars
  • useful examples
  • Reading
  • Chapter 2, 4.1 and 4.2 ,

2
Characteristics of a Parser
  • Input sequence of tokens from scanner
  • Output parse tree of the program
  • parse tree is generated (implicitly or
    explicitly) if the input is a legal program
  • if input is an illegal program, syntax errors are
    issued
  • Note
  • Instead of parse tree, some parsers produce
    directly
  • abstract syntax tree (AST) symbol table , or
  • intermediate code, or
  • object code
  • In the following lectures, well assume that
    parse tree is generated.

3
Comparison with Lexical Analysis
Phase Input Output
Lexical Analysis String of characters String of tokens
Syntax Analysis String of tokens Parse tree
4
Example
  • The program
  • x y z
  • Input to parser
  • ID TIMES ID PLUS ID
  • well write tokens as follows
  • id id id
  • Output of parser
  • the parse tree ?

E
E
E

E
E
id

id
id
5
Why are Regular Grammars Not Enough?
  • Write an automaton that accepts strings
  • a, (a), ((a)), and (((a)))
  • a, (a), ((a)), (((a))), (ka)k

6
What must parser do?
  • Recognizer not all strings of tokens are
    programs
  • must distinguish between valid and invalid
    strings of tokens
  • Translator must expose program structure
  • e.g., associativity and precedence
  • hence must return the parse tree
  • We need
  • A language for describing valid strings of tokens
  • context-free grammars
  • (analogous to regular grammars in the scanner)
  • A method for distinguishing valid from invalid
    strings of tokens (and for building the parse
    tree)
  • the parser
  • (analogous to the state machine in the scanner)

7
Context-free grammars (CFGs)
  • Example Simple Arithmetic Expressions Grammar
  • In English
  • An integer is an arithmetic expression.
  • If exp1 and exp2 are arithmetic expressions,
    then so are the following
  • exp1 - exp2
  • exp1 / exp2
  • ( exp1 )
  • the corresponding CFG well write tokens as
    follows
  • exp ? INTLITERAL E ? intlit
  • exp ? exp MINUS exp E ? E - E
  • exp ? exp DIVIDE exp E ? E / E
  • exp ? LPAREN exp RPAREN E ? ( E )

8
Reading the CFG
  • The grammar has five terminal symbols
  • intlit, -, /, (, )
  • terminals of a grammar tokens returned by the
    scanner.
  • The grammar has one non-terminal symbol
  • E
  • non-terminals describe valid sequences of tokens
  • The grammar has four productions or rules,
  • each of the form E ? ?
  • left-hand side a single non-terminal.
  • right-hand side either
  • a sequence of one or more terminals and/or
    non-terminals, or
  • ? (an empty production)

9
Example, revisited
  • Note
  • a more compact way to write previous grammar
  • E ? INTLITERAL E - E E / E ( E )
  • or
  • E ? INTLITERAL
  • E - E
  • E / E
  • ( E )

10
A formal definition of CFGs
  • A CFG consists of
  • A set of terminals T
  • A set of non-terminals N
  • A start symbol S (a non-terminal)
  • A set of productions
  • X ? X1 X2 Xn
  • where X ? N and Yi ? T U N U ?

11
Notational Conventions
  • In these lecture notes
  • Non-terminals are written upper-case
  • Terminals are written lower-case
  • The start symbol is the left-hand side of the
    first production

12
The Language of a CFG
  • The language defined by a CFG is the set of
    strings that can be derived from the start symbol
    of the grammar.
  • Derivation Read productions as rules
  • X ? Y1 Yn
  • ? Means X can be replaced by Y1 Yn

13
Derivation key idea
  • 1. Begin with a string consisting of the start
    symbol S
  • 2. Replace any non-terminal X in the string by a
    the right-hand side of some production
  • 3. Repeat (2) until there are no non-terminals in
    the string



14
Derivation an example
derivation
  • CFG
  • E ? id
  • E ? E E
  • E ? E E
  • E ? ( E )
  • Is string id id id in the
  • language defined by the grammar?

15
Terminals
  • Terminals are called so because there are no
    rules for replacing them
  • Once generated, terminals are permanent
  • Therefore, terminals are the tokens of the
    language

16
The Language of a CFG (Cont.)
  • More formally, write
  • X1 X2 Xn ? X1 X2 X i-1 Y1 Y2 Ym X i1 Xn
  • if there is a production
  • X i ? Y1 Y2 Ym

17
The Language of a CFG (Cont.)
  • Write
  • X1 X2 Xn ? Y1 Y2 Ym
  • if
  • X1 X2 Xn ? ? .. ? Y1 Y2 Ym
  • in 0 or more steps

18
The Language of a CFG
  • Let G be a context-free grammar with start
    symbol S. Then the language of G is
  • a1 a2 an S ? a1 a2 an
  • where ai, i 1,2, .., n are terminal symbols

19
Examples
  • Strings of balanced parentheses
  • The grammar

sameas
20
Arithmetic Expression Example
  • Simple arithmetic expressions
  • Some elements of the language

21
Notes
  • The idea of a CFG is a big step. But
  • Membership in a language is yes or no
  • we also need parse tree of the input!
  • furthermore, we must handle errors gracefully
  • Need an implementation of CFGs,
  • i.e. the parser
  • well create the parser using a parser generator
  • available generators CUP, bison, yacc

22
More Notes
  • Form of the grammar is important
  • Many grammars generate the same language
  • Parsers are sensitive to the form of the grammar
  • Example
  • E ? E E
  • E E
  • intlit
  • is not suitable for an LL(1) parser (a common
    kind of parser).

23
Derivations and Parse Trees
  • A derivation is a sequence of productions
  • S .. ? .. ? ..
  • A derivation can be drawn as a tree
  • Start symbol is the trees root
  • For a production X ? Y1 Y2 add children Y1 Y2
  • to node X

24
Derivation Example
  • Grammar
  • String

25
Derivation Example (Cont.)
E
E
E

E
E
id

id
id
26
Notes on Derivations
  • A parse tree has
  • Terminals at the leaves
  • Non-terminals at the interior nodes
  • An in-order traversal of the leaves is the
    original input
  • The parse tree shows the association of
    operations, the input string does not

27
Left-most and Right-most Derivations
  • The example is a left-most derivation
  • At each step, replace the left-most non-terminal
  • There is an equivalent notion of a right-most
    derivation

28
Derivations and Parse Trees
  • Note that right-most and left-most derivations
    have the same parse tree
  • The difference is the order in which branches are
    added

29
Remarks on Derivation
  • We are not just interested in whether s e
    L(G)
  • We need a parse tree for s, (because we need to
    build the AST)
  • A derivation defines a parse tree
  • But one parse tree may have many derivations
  • Left-most and right-most derivations are
    important in parser implementation

30
Ambiguity(1)
  • Grammar
  • String

31
Ambiguity (2)
  • This string has two parse trees

E
E
E
E
E
E


E
E
id

E
E
id

id
id
id
id
32
Ambiguity(3)
  • for each of the two parse trees, find the
    corresponding left-most derivation
  • for each of the two parse trees, find the
    corresponding right-most derivation

33
Ambiguity (4)
  • A grammar is ambiguous if, for some string of the
    language
  • it has more than one parse tree, or
  • there is more than one right-most derivation, or
  • there is more than one left-most derivation.
  • (the three conditions are equivalent)
  • Ambiguity Leaves meaning of some programs
    ill-defined

34
Dealing with Ambiguity
  • There are several ways to handle ambiguity
  • Most direct method is to rewrite grammar
    unambiguously
  • Enforces precedence of over

35
Removing Ambiguity
  • Rewriting
  • Expression Grammars
  • precedence
  • associativity
  • IF-THEN-ELSE
  • the Dangling-ELSE problem

36
Handling operator precedence
  • Rewrite the grammar
  • use a different nonterminal for each precedence
    level
  • start with the lowest precedence (MINUS)
  • E ? E - E E / E ( E ) id
  • rewrite to
  • E ? E - T T
  • T ? T / F F
  • F ? id ( E )

37
Example
  • parse tree for id id / id
  • E ? E - T T
  • T ? T / F F
  • F ? id ( E )

E
T
E
-
T
/
T
F
F
F
id
id
id
38
Handling Operator Associativity
  • The grammar captures operator precedence, but it
    is still ambiguous!
  • fails to express that both subtraction and
    division are left associative
  • e.g., 5-3-2 is equivalent to ((5-3)-2) and not
    to (5-(3-2)).

39
Recursion
  • A grammar is recursive in nonterminal X if
  • X ? X
  • ? means in one or more steps, X derives a
    sequence of symbols that includes an X
  • A grammar is left recursive in X if
  • X ? X
  • in one or more steps, X derives a sequence of
    symbols that starts with an X
  • A grammar is right recursive in X if
  • X ? X
  • in one or more steps, X derives a sequence of
    symbols that ends with an X

40
Resolving ambiguity due to associativity
  • The grammar given above is both left and right
    recursive in nonterminals E and T
  • To correctly expresses operator associativity
  • For left associativity, use left recursion.
  • For right associativity, use right recursion.
  • Here's the correct grammar
  • E ? E T T
  • T ? T / F F
  • F ? id ( E )

41
The Dangling Else ambiguity
  • Consider the grammar
  • St ? if E then St
  • if E then St else St
  • other
  • This grammar is also ambiguous

42
Resolving the dangling else
  • else matches the closest unmatched then
  • We can describe this in the grammar
  • E ? MIF / all then are
    matched /
  • UIF / some then are
    unmatched /
  • MIF ? if E then MIF else MIF
  • print
  • UIF ? if E then E
  • if E then MIF else UIF
  • Describes the same set of strings

43
Precedence and Associativity Declarationsin
Parser Generators
  • Instead of rewriting the grammar
  • Use the more natural (ambiguous) grammar
  • Along with disambiguating declarations
  • Most parser generators allow precedence and
    associativity declarations to disambiguate
    grammars

44
Parsing Approaches
  • Top-down parsing
  • build parse tree from start symbol (root)
  • match terminal symbols(tokens) in the production
    rules with tokens in the input stream
  • simple but limited in power
  • Bottom-up parsing
  • start from input token stream
  • build parse tree from terminal symbols (tokens)
    until get start symbol
  • complex but powerful

45
Top Down vs. Bottom Up
start here
result
match
start here
result
input token stream
input token stream
Top-down Parsing
Bottom-up Parsing
46
Top-down Parsing
  • A top-down parsing algorithm parses an input
    string of tokens by tracing out the steps in a
    leftmost derivation.
  • The parse tree associated with the input string
    is constructed using preorder traversal and hence
    the name top-down.

47
Top-down parsers
  • There are mainly two kinds of top-down parsers
  • 1. Predictive parsers
  • - Tries to make decisions about the
    structure of the tree below a node based on a few
    lookahead tokens (usually one!).
  • - Weakness Little program structure
    has been seen before predictive decisions must be
    made.
  • 2. Backtracking parsers
  • - Backtracking parsers solve the
    lookahead problem by backtracking if one decision
    turns out to be wrong and making a different
    choice.
  • - Weakness Backtracking parsers are
    slow (exponential time in general).

48
Recursive-descent parsing
  • Main idea
  • 1. Use the grammar rules as recipes for
    procedure code that parses the rule
  • 2. Each non-terminal corresponds to a
    procedure
  • 3. Each appearance of a terminal in the
    right hand side of a rule causes a token to be
    matched.
  • 4. Each appearance of a non-terminal
    corresponds to a call of the associated
    procedure.

49
Example Recursive-descent Parsing
  • F ? (E) num
  • Code
  • void F()
  • if (token num) match(num)
  • else
  • match(()
  • E()
  • match())// match token (

50
Example Recursive-descent Parsing (2)
  • Observation
  • Note how lookahead is not a problem in this
    example if the token is number, go one way, if
    the token is ( go the other, and if the token
    is neither, declare error
  • void match(Token expect)
  • if (token expect)
  • getToken() //get next token



  • else error(token,expect)

51
Example Recursive-descent Parsing (3)
  • A recursive-descent procedure can also compute
    values or syntax trees
  • int F()
  • if (token num)
  • int temp atoi(lexeme)
  • match(number) return temp
  • else
  • match(() int temp E()
  • match()) return temp

52
When Recursive Descent Does Not Work
  • E ? E term term
  • void E()
  • if (token ??)
  • E() // uh, oh!!
  • match()
  • term()
  • else term()
  • - A left-recursive grammar has a non-terminal A
  • A ? A? for some ?
  • - Recursive descent does not work in such cases

53
Elimination of Left Recursion
  • Consider the left-recursive grammar
  • A ? A?b for some sentential forms a and b
  • S generates all strings starting with a ? and
    followed by a number of ?
  • Can rewrite the grammar using right-recursion
  • A ? ? A
  • A ? ? A ?
  • where A is a new nonterminal

54
Elimination of Left Recursion (2)
  • In general
  • A ? A ?1 A ?n ?1
    ?m
  • All strings derived from A start with one of
    ?1,,?m and continue with several instances of
    ?1,,?n
  • Rewrite as
  • A ? ?1 A ?m A
  • A ? ?1 A ?n A ?

55
General Left Recursion
  • The grammar
  • S ? A ? ?
  • A ? S ?
  • is also left-recursive because
  • S ? S ? ?
  • This left-recursion can also be eliminated
  • See book, Section 4.3 for general algorithm

56
Summary of Recursive Descent with backtracking
  • Simple and general parsing strategy
  • Left-recursion must be eliminated first
  • but that can be done automatically
  • Unpopular because of backtracking
  • Thought to be too inefficient
  • In practice, backtracking is eliminated by
    restricting the grammar

57
Predictive Parsers
  • Like recursive-descent but parser can predict
    which production to use
  • - By looking at the next few tokens
  • - No backtracking
  • Predictive parsers accept LL(k) grammars
  • - L means left-to-right scan of input
  • - L means leftmost derivation
  • - k means predict based on k tokens of
    lookahead
  • In practice, LL(1) is used

58
LL(1) Languages
  • In recursive-descent, for each non-terminal and
    input token there may be a choice of production
  • LL(1) means that for each non-terminal and token
    there is only one production
  • Can be specified via 2D tables
  • - One dimension for current non-terminal to
    expand
  • - One dimension for next token
  • - A table entry contains one production

59
Predictive Parsing and Left Factoring
  • Consider the grammar
  • E ? T E T
  • T ? num num T ( E )
  • Hard to predict because
  • For T, two productions start with num
  • For E, it is not clear how to predict
  • A grammar must be left-factored before use for
    predictive parsing

60
Left-Factoring Example
  • Recall the grammar
  • E ? T E T
  • T ? num num T ( E )

Factor out common prefixes of productions E
? T X X ? E ? T ? ( E ) num Y
Y ? T ?
61
LL(1) Parsing Table Example
  • Left-factored grammar
  • E ? T X X ? E ?
  • T ? ( E ) num Y Y ? T ?
  • The LL(1) parsing table

62
LL(1) Parsing Table Example (Cont.)
  • Consider the E, num entry
  • - When current non-terminal is E and next input
    is num, use production E ? T X
  • - This production can generate a num in the first
    place
  • Consider the Y, entry
  • - When current non-terminal is Y and current
    token is , get rid of Y
  • Y can be followed by only in a derivation in
    which Y ? ?

63
LL(1) Parsing Tables. Errors
  • Blank entries indicate error situations
  • Consider the E, entry
  • There is no way to derive a string starting with
    from non-terminal E

64
Using Parsing Tables
  • Method similar to recursive descent, except
  • - For each non-terminal S
  • - We look at the next token a
  • - And chose the production shown at S,a
  • We use a stack to keep track of pending
    non-terminals
  • We reject when we encounter an error state
  • We accept when we encounter end-of-input

65
LL(1) Parsing Algorithm
  • Start nonterminal end-of-input
    symbol
  • initialize stack ltS gt and Token nextToken()

  • repeat
  • case stack of
  • ltX, restgt if TX,Token Y1Yn
  • then stack ? ltY1 Yn
    restgt
  • else error ()
  • ltt, restgt if t nextToken
  • then stack ? ltrestgt
  • else error ()
  • until stack lt gt // empty

66
LL(1) Parsing Example
  • Stack Input
    Action
  • E num num
    T X
  • T X num num
    num Y
  • num Y X num num
    terminal
  • Y X num
    T
  • T X num
    terminal
  • T X num
    num Y
  • int Y X num
    terminal
  • Y X
    ?
  • X
    ?

  • ACCEPT

67
Constructing Parsing Tables
  • LL(1) languages are those defined by a parsing
    table for the LL(1) algorithm
  • No table entry can be multiply defined
  • We want to generate parsing tables from CFG

68
Constructing Parsing Tables First and Follow sets
  • If A ? ?, where in the row of A we place ? ?
  • Answer In the column of t where t can start a
    string derived from ?
  • ? ? t ?
  • We say that t ? First(?)
  • In the column of t if ? is ? and t can follow an
    A
  • S ? ? A t ?
  • We say t ? Follow(A)

69
Computing First Sets
  • Definition First(X) t X ? t? ? ? X
    ? ?
  • Algorithm sketch (see book for details)
  • for all terminals t do First(t) ? t
  • for each production X ? ? do First(X) ? ?
  • if X ? A1 An ? and ? ? First(Ai), 1 ? i ? n
    do
  • add First(?) to First(X)
  • for each X ? A1 An s.t. ? ? First(Ai), 1 ? i ?
    n do
  • add ? to First(X)
  • repeat steps 3 4 until no First set can be grown

70
First Sets. Example
  • Recall the grammar
  • E ? T X X ? E
    ?
  • T ? ( E ) num Y Y ? T ?
  • First sets
  • First( ( ) ( First( T )
    num, (
  • First( ) ) ) First( E )
    num, (
  • First( num) num First( X ) , ?
  • First( ) First( Y )
    , ?
  • First( )

71
Computing Follow Sets
  • Definition
  • Follow(X) t S ? ? X t ?
  • Intuition
  • If S is the start symbol then ? Follow(S)
  • If X ? A B then First(B) ? Follow(A) and
  • Follow(X) ?
    Follow(B)
  • Also if B ? ? then Follow(X) ? Follow(A)

72
Computing Follow Sets (Cont.)
  • Algorithm sketch
  • 1. Follow(S) ?
  • 2. For each production A ? ? X ?
  • add First(?) \ ? to Follow(X)
  • 3. For each A ? ? X ? where ? ? First(?)
  • add Follow(A) to Follow(X)
  • repeat step(s) 2 and 3 until no Follow set grows

73
Follow Sets. Example
  • Recall the grammar
  • E ? T X X ? E
    ?
  • T ? ( E ) num Y Y ? T ?
  • Follow sets
  • Follow( ) num, ( Follow( )
    num, (
  • Follow( ( ) num, ( Follow( E )
    ),
  • Follow( X ) , ) Follow( T )
    , ) ,
  • Follow( ) ) , ) , Follow( Y )
    , ) ,
  • Follow( num) , , ) ,

74
Constructing LL(1) Parsing Tables
  • Construct a parsing table T for CFG G
  • For each production A ? ? in G do
  • For each terminal t ? First(?) do
  • TA, t ?
  • If ? ? First(?), for each t ? Follow(A) do
  • TA, t ?
  • If ? ? First(?) and ? Follow(A) do
  • TA, ?

75
Notes on LL(1) Parsing Tables
  • If any entry is multiply defined then G is not
    LL(1)
  • If G is ambiguous
  • If G is left recursive
  • If G is not left-factored
  • Most programming language grammars are not LL(1)
  • There are tools that build LL(1) tables

76
Review
  • For some grammars there is a simple parsing
    strategy
  • Predictive parsing
  • Next time Bottom-up parsing
Write a Comment
User Comments (0)
About PowerShow.com