Title: CS 3240: Languages and Computation
1CS 3240 Languages and Computation
- Ambiguity and
- Chomsky Norm Form
2Parsing
- Generative aspect of CFG it is easy to derive
strings w?L(G) from a CFG G - Analytical aspect Given a CFG G and strings w,
decide if w?L(G) and if so how do you determine
the derivation tree or the sequence of rules that
produce w? - This is the problem of parsing.
3Derivation Examples
- Grammar
- S?SS SS (S) id
- String
- int int int
- Derivation
- S ? S S ? S S S ? int S S ? int int
S ? int int int - Can be represented using a tree structure
Numbers indicate orderof substitution.
4Left-Most and Right-Most
- Left-most derivations
- At each step, replace the left-most non-terminal
- Right-most derivations
- At each step, replace the left-most non-terminal
- E.g., S ? S S ? S int ? S S int ? S int
int ? int int int - Note Different traversal order of a tree
5Defining a Parse Tree
- For a CFG G (V, S, R, S) a derivationtree has
the following properties - The root is labeled S
- Each leaf is from S ??
- Each interior node is in V
- If node has label A?V and its children a1an
(from L to R),then P must have the ruleA? a1an
(with aj?V?T??) - A leaf labeled ? is a singlechild (has no
siblings). - Let G be a CFG. We have w?L(G) if and only if
there exists a derivation tree of G that yields w.
6Another Example
- Consider CFG S ? 0 1 ?(S) (S)?(S) (S)?(S)
for terminals 0, 1, ?, ?, ?, ( and
). - Derivations of (0)?((0)?(1))
- leftmost S ? (S)?(S) ? (0)?(S) ?
(0)?((S)?(S)) ? (0)?((0)?(S)) ?
(0)?((0)?(1)) - rightmost S ? (S)?(S) ? (S)?((S)?(S)) ?
(S)?((S)?(1)) ? (S)?((0)?(1)) ? (0)?((0)?(1)) - something else S ? (S)?(S) ? (0)?(S) ?
(0)?((S)?(S)) ? (0)?((S)?(1)) ? (0)?((0)?(0)) - Note We are typically use either leftmost or
rightmost
7Its Parse Tree
The derivation S ? (0)?((0)?(1)) can be
expressed by the following parse tree
S
8Ambiguity
A string w?L(G) is derived ambiguously if it has
more than one derivation tree (or equivalently
if it hasmore than one leftmost derivation (or
rightmost)). A grammar is ambiguous if some
strings are derived ambiguously.
Example rule S ? 0 1 SS S?S S ? SS ?
S?SS ? 0?SS ? 0?1S ? 0?11 versus S ? S?S ?
0?S ? 0?SS ? 0?1S ? 0?11
9Ambiguity and Parse Trees
The ambiguity of 0?11 is shown by the
twodifferent parse trees
10Inherently Ambiguous
- Languages that can only be generated by ambiguous
grammars are inherently ambiguous. - Example L aibjck, ij or jk.
- The way to make a CFG for this L somehow has to
involve the step S ? S1S2 where S1 produces the
strings anbncm and S2 the strings anbmcm. - This will be ambiguous on strings anbncn.
- Proving this rigorously is difficult.
11Abstract Syntax Trees
- (Abstract) syntax trees are simplified
representations of parse trees - Example 324
3
2
4
Abstract syntax tree
Parse tree
12Resolution of Ambiguities
- Some ambiguities are inessential but some others
must be resolved - The following grammar is ambiguous
- exp ? exp op exp ( exp ) numberop ? -
- Sample ambiguous strings 123 and 1-2-3
- Resolution of ambiguity
- Precedence has higher precedence than and -
- Left-association perform ops from left to right
- Full parenthesization
- Can we revise the grammar rules to incorporate
these techniques?
13Resolution of Ambiguities
- Precedence Group operators into different groups
make operations with lower precedence closer to
the root - exp ? exp addop exp termaddop ? -term ?
term mulop term factormulop ? factor ? ( exp
) number - Associativity allow recursion only on left
- exp ? exp addop term termterm ? term mulop
factor factor
14Dangling Else Statement
- Ambiguous grammar
- statement ? if-stmt other
- if-stmt ? if ( exp) statement if ( exp)
statement else statement - exp ? ...
- Resolution
- Bracketing with endif (e.g., shell script)
- Revise the grammar
- statement ? matched-stmt unmatched-stmt
- matched-stmt ? if ( exp) matched-stmt else
matched-stmt other - unmatched-stmt ? if ( exp) statement if ( exp)
matched-stmt else unmatched-stmt - In practice, compilers typically use
disambiguating rules instead of changing grammar
rules
15Chomsky normal form
- Method of simplifying a CFG
- Definition A context-free grammar is in Chomsky
normal form if every rule is of one of the
following forms - A ? BC
- A ? a
- where a is any terminal and A is any variable,
and B, and C are any variables or terminals other
than the start variable - if S is the start variable then
- the rule S ? e is the only permitted ? rule
16CFGs and Chomsky normal form
- Theorem Any context-free language is generated
by a context-free grammar in Chomsky normal form. - Proof idea Convert any CFG to one in Chomsky
normal form by removing or replacing rules in
wrong form - Add a new start symbol
- Eliminate e rules of the form A ? e
- Eliminate unit rules of the form A ? B
- Convert remaining rules into proper form
- Example (work out on whiteboard)
- S ?ASA aB
- A ?B S
- B ?b ?
17Convert a CFG to Chomsky normal form
- Add a new start symbol
- Create the following new rule
- S0 ? S
- where S is the start symbol and S0 is not used
in the CFG
18Convert a CFG to Chomsky normal form
- Eliminate all e rules A ? e, where A is not the
start variable - For each rule with an occurrence of A on the
right-hand side, add a new rule with the A
deleted - R ? uAv becomes R ? uAv uv
- R ? uAvAw becomes R ? uAvAw uvAw uAvw uvw
- If we have R ? A, replace it with R ? e unless
we had already removed R ? e
19Convert a CFG to Chomsky normal form
- Eliminate all unit rules of the form A ? B
- For each rule B ? u, add a new rule A ? u, where
u is a string of terminals and variables, unless
this rule had already been removed - Repeat until all unit rules have been replaced
20Convert a CFG to Chomsky normal form
- Convert remaining rules into proper form
- Whats left?
- Replace each rule A ? u1u2uk, where k ? 3 and ui
is a variable or a terminal with k-1 rules - A ? u1A1 A1 ? u2A2 Ak-2 ? uk-1uk
21Chomsky Hierarchy
- Chomsky Hierarchy of grammars
- Type 0 unrestricted grammars
- Type 1 context-senstive grammars
- Type 2 context-free grammars
- Type 3 regular
- Named after Noam Chomsky who made seminal
contributions to the field of theoretical
linguisticsin the 60s. (cf. Chomsky hierarchyof
languages).
1928- Linguist / Political theorist