Title: Approximating ContextFree Grammar Ambiguity
1Approximating Context-Free Grammar Ambiguity
- Claus Brabrand
- brabrand_at_brics.dk
- BRICS, Department of Computer Science
- University of Aarhus, Denmark
2// Abstract
Approximating Context-Free Grammar Ambiguity
Context-free grammar ambiguity is
undecidable. However, just because its
undecidable, doesnt mean there arent (good)
approximations! Indeed, the whole area of static
analysis works on side-stepping
undecidability. We exhibit a characterization
of context-free ambiguity which induces a whole
framework for approximating the problem. In
particular, we give an approximation, AMN, based
on the Mohri-Nederhof, 2000 regular
approximation of context-free grammars and show
how to boost the precision even further.
3// Outline
- Introduction
- Vertical / Horizontal Ambiguity
- Characterization of Ambiguity
- (Over-)Approximation Framework
- Approximation (AMN)
- Assessment
- Related Work
- Conclusion
4// Context-Free Grammar
- N finite set of nonterminals
- ? finite set of terminals
- s ? N start nonterminal
- ? N ? P(E) production function, E N ?
?
G ? N, ?, s, ? ?
- Assume
- All n?N reachable (from s)
- All n?N derive some (finite) string
L G ? P(?) language of G,
L(G)
5// Relevant CFG Decision Problems
- Decidable
- Membership ? ? L(GCFG)
- Emptyness L(GCFG) ?
- Intersection (w/ REG) L(GCFG) ? L(RREG)
L(CCFG) - constructively
- Undecidable
- Intersection (w/ CFG) L(GCFG) ? L(GCFG) ?
- Ambiguity ???? 2 derivation trees ?
6// Ambiguity Undecidable!
- Algorithms
- Undecidable!
- However
- Ambiguity ???? 2 derivation trees ?
s
s
?
T
T
?
?
?
ambiguous
unambiguous
7// Side-Stepping Undecidability
However, just because its undecidable, doesnt
mean there arent (good) approximations! Indeed,
the whole area of static analysis works on
side-stepping undecidability.
- Unsafe approximation
- Safe approximation
ambiguous
unambiguous
unsafe approximation
ambiguous
ambiguous
unambiguous
unambiguous
safe (over-)approximation
safe (under-)approximation
8// Motivation
- Use safe (over-)approximation
- Yes! ? G guaranteed unambiguous!!!
- Safely use any GLR parser on G
- Because never two parses at runtime!
- Hence
- dynamic parse ambiguity ? static parse ambiguity
ambiguous
unambiguous
.
Yes!
9// Motivation (contd)
- Undecidability means therell always be a
slack - However, still useful!
- Possible interpretations of No?
- Treat as error (reject grammar)
- Please redesign your grammar (as in LALR(k))
- Treat as warning
- Here are some potential problems
ambiguous
.
unambiguous
.
No?
10// Vertical Ambiguity
- Vertical ambiguity
- Example
G
?n ? N ??, ? ? ?(n) ? ? ? ? L(?) ?
L(?) ?
Z x A y x B y A a B a
?
Ambiguous string
xay
reduce/reduce conflict in Yacc
11// Horizontal Ambiguity
- Horizontal ambiguity
- where
- Example
G
?n ? N ?? ? ?(n) ?i ? 1..?-1 L(?0 .. ?i-1)
L(?i .. ??-1 ) ?
P(?) ? P(?) ? P(?)
X Y xay x,y?? ? a?? ? x,xa?L(X) ?
y,ay?L(Y)
Z A B A x a x B a y y
?
Ambiguous string
xay
shift/reduce conflict in Yacc
12// Characterization of Ambiguity
- Theorem 1
- Lemma 1a (?)
- Lemma 1b (?)
G ? G ? G unambiguous
G ? G ? G unambiguous
G ? G ? G unambiguous
13// Proof (Lemma 1a) ?
G ? G ? G unambiguous
- or contrapositively
- Proof
- Assume G ambiguous (i.e. ? 2 der. trees for ?)
- Show
- by induction in max height of the 2 derivation
trees
G ambiguous ? G ? G
G ? G
14// Proof (Lemma 1a) ? (Base)
- Base case (height ? 1)
- The ambiguity means that (for p?p)
- Which means
- i.e., we have a vertical ambiguity
N
?
N
1
1
p
p
?
?
?
?
L(?) ? L(?) ? ? ? ?
G
15// Proof (Lemma 1a) ? (I.H.)
- Induction step (height ? n)
- Assume induction hypothesis (for height ? n-1)
- The ambiguity means
N
N
1
1
p
p
?i
?i
?
n-1 ?
? n-1
..
..
..
..
?i
?i
??-1 ?0
??-1
?0
?
16// Proof (Lemma 1a) ? (p?p)
- Case p q (different production)
- but then ?
- i.e., we have a vertical ambiguity
p ? p
L(?) ? L(?) ? ? ? ?
G
N
N
1
1
p
p
?i
?i
?
n-1 ?
? n-1
..
..
..
..
?i
?i
??-1 ?0
??-1
?0
?
17// Proof (Lemma 1a) ? (pp,1)
- Case p ? q (same prod. ? )
- i.e. the top of the trees are the same
- Case
- ? ambiguity in subtreei ( deriving same ?i)
- Induction hypothesis (this subtree) ?
p p
?i ?i ?i
?i ?i ?i
?
G
G
N
N
1
1
p
p
?i
?i
?
n-1 ?
? n-1
..
..
..
..
?i
?i
??-1 ?0
??-1
?0
?
18// Proof (Lemma 1a) ? (pp,2)
p p
- Case p ? q (same prod. ? )
- Case
- but then (assume WLOG
) - Now pick any k
- ...then
?i ?i ?i
?
?i ?i ? ?i
? ?i ?i ?i
- least such i
- 2nd least such j
?j?i ?j ? ?j
i ? k lt j
?
L(?0 .. ?k) L(?k1 .. ?? ) ? ?
G
N
N
1
1
p
p
?i
?j
?i
?j
. .
. .
?
n-1 ?
? n-1
?i
?j
?i
?j
k
k
19// Proof (Lemma 1b) ?
G ? G ? G unambiguous
- Contrapositively
- Assume (vertical conflict)
- Then for some N?N
- But then derive (using reachability
derivability of N)
G ambiguous ? G ? G
N ? ? ? a, N ? ? ? a, L(?) ? L(?) ? a ? ?
s ? x N ? ? x ? ? ? x a ? ? x a y
s ? x N ? ? x ? ? ? x a ? ? x a y
20// Proof (Lemma 1b) ? (contd)
- Assume (horizontal conflict)
- Then for some N?N
- But then derive (using reachability
derivability of N)
N ? ? ? , L(?) L(?) ? ?
i.e.
?x,y ? ? ?a ? ? x,xa ? L(?) ? y,ay ? L(?)
s ? v N ? ? v ? ? ? ? v x ? ? ? v x a y ?
? v x a y w
s ? v N ? ? v ? ? ? ? v x a ? ? ? v x a y ? ?
v x a y w
21// (Over-)Approximation (A)
- (Over-)Approximation A E ?
P(?) - A decidable ? and decidable on
co-dom(A) - Approximated vertical ambiguity
- Approximated horizontal ambiguity
?? ? E L(?) ? A(?)
?
G
A
?n ? N ??, ? ? ?(n) A(?) ? A(?) ?
G
A
?n ? N ?? ? ?(n) ?i ? 1..?-1 A(?0 .. ?i-1)
A(?i .. ??-1) ?
22// Ambiguity Approximation
- Theorem 2
- Proof
- Conflicts w/ smaller sets ? conflicts w/ larger
sets
? ? G unambiguous
G
G
A
A
? ? ?
G
G
G
G
A
A
A(?) ? A(?) ? ? L(?) ? L(?) ?
A(?) A(?) ? ? L(?) L(?) ?
23// Compositionality (of As)
- Colloary 3
- Proof
- Follows from definition omited
- i.e. Approximations are compositional!
A, A decidable (over-)approximations
? A ? A decidable (over-)approximation
A
ambiguous
A ? A
unambiguous
ambiguous
unambiguous
?
ambiguous
unambiguous
A
24// Choice(s) of A?
- A?(?) ? (constant)
- Worst approximation
- but safe approximation!
- Useless
- Cannot determine that any grammars are
unambiguous
ambiguous
unambiguous
worst approximation
25// Choice(s) of A? (contd)
- AMN(?) Mohri-Nederhof(?)
- CFG ? DFA (NFA) Approximation
- Properties of this Black-box
- Good (over-)approximation!
- Works on language, L(G)
- not on grammatical structure, G
- Approximation parameterizable
- E.g. unfold nonterminals n times
Regular Approximation of Context-Free Grammars
through Transformation Mohri-Nederhof, 2000
Black-box
26// Decidability (of AMN)
-
- ? decidable (using DFAs)
- O(XNFAYNFA)
-
- decidable (using DFAs)
- O(XNFAYNFA)
-
- AMN decidable
- With potential counterexamples (using DFAs)
X ? Y ?
X Y ?
? ? G unambiguous
AMN
AMN
27// Decision Algorithm for (X Y)
?
?
- For X,Y regular languages
- All overlappings, xay, as DFAs variant of ?
construction!
a
a
?
x
y
XNFA
YNFA
XNFA
YNFA
XYNFA
XYNFA
a
X
Y
? a ? path
X ? Y
?
?
x
a
y
a
X
Y
28// Three Approximation Answers
- Y!
- G definitely not ambiguous!
- ?/D?
- ? Dont know?
- could not find any potential counterexamples.
- D? Dont know look at over-approx, D?
- and here are all potential counterexamples
- Note some strings do not even parse!
- Improve Parse S ?FIN D ? subset of real
counterexamples
True answer
29// Regaining Lost Precision!
- Now parse all counterexamples!
- i.e. parse DFA, DDFA
- 1) i.e. construct
- Decidable in O(DG)
- 2) Decide emptyness on C
- Decidable in O(C DG)
- Only potential counterexamples that parse!
L(CCFG) L(DDFA) ? L(GCFG)
L(CCFG) ?
30// Three Approximation Answers
- Y!
- G definitely not ambiguous!
- ?/C?
- ? Dont know?
- could not find any counterexamples.
- C? Dont know look at over-approx, C?
- and here are all potential counterexamples
- Note all strings actually parse (maybe not
ambiguously)! - Improve extract finite under-approximation...?
True answer
31// Asymptotic (Time) Complexity
h
- Mohri-Nederhof O(n2vh)
- Vertical Amb O(n3v4h4)
- Horizontal Amb O(n3v3h5)
- Total O(n3v3h4(vh)) ? O(g5)
N1 e1,1 ea,1 e1,p ea,p
- n N
- v max?(N), N?N
- h max?, ???(N), N?N
- g nvh G
v
n
32// Related Work (Dynamic)
- Dynamic disambiguation
- Disambiguation-by-convention
- Longest match, most specific match,
- Customizable
- Bison v. 1.5 dprec, merge
- ASFSDF disambiguation filters
- Dynamic ambiguity interception
- GLR (Tomita, Early, Bison, ASFSDF, )
33// Related Work (Static)
- Static disambiguation
- Disambiguation-by-convention
- First match, most specific match,
- Customizable
- Yacc left, right, nonassoc, prec
- Static ambiguity interception
- LL(k), LA-LR(k),
- Our work goes here (but for GLR)!
34// Implementation
In progress!
35// Assessment
- Quality of approximation
Quantity of false-positives - Precision
- Our \ LR(k) ?
- LR(k) \ Our ?
- False-positives ?
- Characterize ? / N?
- In terms of grammatical structure ?
- Efficiency (in practise)
In progress!
36// Example Expression chains
E -gt E T -gt T T -gt T F -gt F F -gt
( E ) -gt x
37// Example Balancing Structures
- Nasty
- Requires
- Unbounded memory ( xes)
- i.e. CFG structure
- Unbounded lookahead
- i.e. any finite k is insufficient
- ? False-positives!
S -gt A A A -gt x A x -gt y
Example string
xxyxxxyx
38// Future Work
- Permit
- With disambiguating conventions for
- Associativity
- Precedence
- Parsing optimization
- Exploit compile-time analysis information at
runtime
E -gt E ? E
39// Conclusion
Approximating Context-Free Grammar Ambiguity
Context-free grammar ambiguity is
undecidable. However, just because its
undecidable, doesnt mean there arent (good)
approximations! Indeed, the whole area of static
analysis works on side-stepping
undecidability. We exhibit a characterization
of context-free ambiguity which induces a whole
framework for (over-)approximation. In
particular, we give an approximation based on the
Mohri-Nederhof, 2000 regular approximation of
context-free grammars and show how to boost the
precision even further.
But wait, theres more
40// Lessons Learned
- Framework
- Plug in your favorite (over-)approximation of
L(?) - Even take intersection of them A ?i Ai
- Approximation closed under intersection
- Methodology
- Just because its undecidable doesnt mean there
arent (good) approximations - Quantity of false-positives (practically
motivated) - What to do with false-positives (pratically
motivated) - Dont be scared of undecidability
41bonus slides
42// Membership Decidable!
- Membership (aka. parsing)
- Given ? ? ?
- Is the string, ?, in the language of G
- Algorithms
- LL(k) O(?)
- LA-LR(k) O(?)
- GLR O(?3)
? ? L(G)
43// Parsing Greedily Left-to-Right
- The ambiguity problem for XY...
- In fact, already a problem if x goes too far
- Thus, we only have a problem if (X eats into
Y) - Essentially disambiguation by picking longest
match
... may occur in 2 cases
x
y
- (too little) Not possible (due to
greediness)
x
y
- (too much) Only this is a problem!
x
y
x
y
X Y ?
? X ? X( prefix(Y) \ ? ) ? ? ?