Title: Compiler Design Chapter 3
1Compiler Design - Chapter 3
Parsing Context-Free Grammars
2Syntax
3Symbols representing regular expressions
- Abbreviation
- digits 0-9
- sum (digits)digits
- Defines sums of the form
- 283019
4Balanced Parentheses
- digits 0-9
- sum exprexpr
- expr (sum)digits
- defines
- (10923)
- 61
- (1(2503))
- Automaton cannot recognize balanced parentheses
a machine with N states cannot remember
parenthesis-nesting greater than N
5How does a lexical analyzer implement regular
expression abbreviations?
digits 0-9
RHS 0-9 substituted for digits before
translation to finite automaton not possible
for sum-and-expr
Mutually Recursive
Recursion - Important
6Alternation at top level
can be re-written as
7Elimination of alternation
can be re-written as
8Kleene closure unnecessary
can be re-written as
9Grammars
- Regular expressions describe the structure of
lexical tokens - Grammars define syntactic structure declaratively
- Need a more powerful tool than finite automata to
parse languages described by grammars - Grammars can be used to describe the structure of
lexical tokens but regular expressions are
adequate and more concise
10Context-free Grammars
- Language set of strings
- Each string finite set symbols taken from a
finitealphabet - PARSING
- Strings source programs
- Symbols lexical tokens
- Alphabet set of token types returned by
lexicalanalyzer
11Context-free Grammars
- Context-free grammar describes a language
- A grammar has a set of productions of the form
Zero or more symbols on RHS
- Symbol can be
- Terminal token from the alphabet of stringsin
the language - token can not appear on LHS
- Non-terminal appears on LHS of some production
- can appear on RHS also
12Context-free grammar example
A syntax for straight-line programs
13Context-free grammar example
Sentence (statement) in this language
From source language (before lexical analysis)
Token- types (terminal symbols) id, num, , etc
Names (a, b, c, d) Numbers (7, 5, 6)
semantic values associated with tokens
14Derivation
- Derivation to show this sentence is in the
language of the grammar - Start with the start symbol
- Repeatedly replace any non-terminal by its RHS
- Leftmost always expand leftmost non-terminal
first - Rightmost always expand rightmost non-terminal
first - Neither leftmost or rightmost
15Left-most derivation
16Parse tree
- Parse tree
- Connecting each symbol in derivation to the
one from which it was derived - Two different derivations can have the same
parse tree
17Ambiguous Grammars
Grammar ambiguous if the same sentence can be
derived with two different parse trees
Two parse trees for the same sentence
18Another Ambiguous Grammar
Grammar 3.5
-4
2
19Another Ambiguous Grammar
Grammar 3.5
9
7
Different semantics!
20Ambiguous Grammars
- Problematic for compiling
- Need unambiguous grammars
- Ambiguous grammars can often be transformed to
unambiguous grammars
21Transforming an Ambiguous Grammar
- has higher precedence (bind tighter) than
- Each operator associates to the left (1-2)-3 not
1-(2-3) - Introduce new non-terminal symbols
- E expression
- T term (things you add)
- F factor (things you multiply)
22Unambiguous Grammar
23Associate to the right
24Ambiguous Languages
- Ambiguity can usually be eliminated
- Some languages, howeveronly ambiguous grammars
- These languages are problematic as programming
languages - Syntactic ambiguity problems in
writing/understanding programs
25EOF Marker
Start symbol
Augmented Grammar Added
EOF Marker