Title: Improved Inference for Unlexicalized Parsing
1Improved Inference for Unlexicalized Parsing
- Slav Petrov and Dan Klein
2Unlexicalized Parsing
Petrov et al. 06
- Hierarchical, adaptive refinement
91.2 F1 score on Dev Set (1600 sentences)
3 4Coarse-to-Fine Parsing
Goodman 97, CharniakJohnson 05
5Prune?
- For each chart item Xi,j, compute posterior
probability -
lt threshold
E.g. consider the span 5 to 12
coarse
refined
6- 1621 min
- 111 min
- (no search error)
7Multilevel Coarse-to-Fine Parsing
Charniak et al. 06
- Add more rounds of
- pre-parsing
- Grammars coarser
- than X-bar
8Hierarchical Pruning
- Consider again the span 5 to 12
coarse
split in two
split in four
split in eight
9Intermediate Grammars
X-BarG0
G
10- 1621 min
- 111 min
- 35 min
- (no search error)
11State Drift (DT tag)
12Projected Grammars
X-BarG0
G
13Estimating Projected Grammars
NP0
NP1
VP1
VP0
S0
S1
Nonterminals in ?(G)
Nonterminals in G
14Estimating Projected Grammars
S ? NP VP
S1 ? NP1 VP1 0.20 S1 ? NP1 VP2 0.12 S1 ?
NP2 VP1 0.02 S1 ? NP2 VP2 0.03 S2 ? NP1
VP1 0.11 S2 ? NP1 VP2 0.05 S2 ? NP2 VP1
0.08 S2 ? NP2 VP2 0.12
15Estimating Projected Grammars
Corazza Satta 06
Estimating Grammars
0.56
16Calculating Expectations
- Nonterminals
- ck(X) expected counts up to depth k
- Converges within 25 iterations (few seconds)
- Rules
17- 1621 min
- 111 min
- 35 min
- 15 min
- (no search error)
18Parsing times
X-BarG0
G
19Bracket Posteriors
(after G0)
20Bracket Posteriors (after G1)
21Bracket Posteriors
(Movie)
(Final Chart)
22Bracket Posteriors (Best Tree)
23Parse Selection
- Computing most likely unsplit tree is NP-hard
- Settle for best derivation.
- Rerank n-best list.
- Use alternative objective function.
24Parse Risk Minimization
Titov Henderson 06
- Expected loss according to our beliefs
- TT true tree
- TP predicted tree
- L loss function (0/1, precision, recall, F1)
- Use n-best candidate list and approximate
- expectation with samples.
25Reranking Results
26Dynamic Programming
Matsuzaki et al. 05 Approximate posterior
parse distribution
à la Goodman 98 Maximize number of expected
correct rules
27Dynamic Programming Results
28Final Results (Efficiency)
- Berkeley Parser
- 15 min
- 91.2 F-score
- Implemented in Java
- Charniak Johnson 05 Parser
- 19 min
- 90.7 F-score
- Implemented in C
29Final Results (Accuracy)
30Conclusions
- Hierarchical coarse-to-fine inference
- Projections
- Marginalization
- Multi-lingual unlexicalized parsing
31Thank You!
- Parser available at
- http//nlp.cs.berkeley.edu