Title: Improved Inference for Unlexicalized Parsing
1Improved Inference for Unlexicalized Parsing
- Slav Petrov and Dan Klein
2Unlexicalized Parsing
Petrov et al. 06
- Hierarchical, adaptive refinement
91.2 F1 score on Dev Set (1600 sentences)
1,140 Nonterminal symbols 1621min Parsing time
531,200 Rewrites
3 4Coarse-to-Fine Parsing
Goodman 97, CharniakJohnson 05
5Prune?
- For each chart item Xi,j, compute posterior
probability -
lt threshold
E.g. consider the span 5 to 12
coarse
QP NP VP
refined
6- 1621 min
- 111 min
- (no search error)
7Multilevel Coarse-to-Fine Parsing
Charniak et al. 06
- Add more rounds of
- pre-parsing
- Grammars coarser
- than X-bar
8Hierarchical Pruning
- Consider again the span 5 to 12
coarse
QP NP VP
split in two
QP1 QP2 NP1 NP2 VP1 VP2
split in four
QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4
split in eight
9Intermediate Grammars
X-BarG0
G
10- 1621 min
- 111 min
- 35 min
- (no search error)
11State Drift (DT tag)
12Projected Grammars
X-BarG0
G
13Estimating Projected Grammars
NP0
NP1
VP1
VP0
S0
S1
Nonterminals in ?(G)
Nonterminals in G
14Estimating Projected Grammars
S ? NP VP
S1 ? NP1 VP1 0.20 S1 ? NP1 VP2 0.12 S1 ?
NP2 VP1 0.02 S1 ? NP2 VP2 0.03 S2 ? NP1
VP1 0.11 S2 ? NP1 VP2 0.05 S2 ? NP2 VP1
0.08 S2 ? NP2 VP2 0.12
15Estimating Projected Grammars
Corazza Satta 06
Estimating Grammars
0.56
16Calculating Expectations
- Nonterminals
- ck(X) expected counts up to depth k
- Converges within 25 iterations (few seconds)
- Rules
17- 1621 min
- 111 min
- 35 min
- 15 min
- (no search error)
18Parsing times
X-BarG0
G
19Bracket Posteriors
(after G0)
20Bracket Posteriors (after G1)
21Bracket Posteriors
(Movie)
(Final Chart)
22Bracket Posteriors (Best Tree)
23Parse Selection
- Computing most likely unsplit tree is NP-hard
- Settle for best derivation.
- Rerank n-best list.
- Use alternative objective function.
24Parse Risk Minimization
Titov Henderson 06
- Expected loss according to our beliefs
- TT true tree
- TP predicted tree
- L loss function (0/1, precision, recall, F1)
- Use n-best candidate list and approximate
- expectation with samples.
25Reranking Results
Objective Precision Recall F1 Exact
BEST DERIVATION BEST DERIVATION BEST DERIVATION BEST DERIVATION BEST DERIVATION
Viterbi Derivation 89.6 89.4 89.5 37.4
RERANKING RERANKING RERANKING RERANKING RERANKING
Precision (sampled) 91.1 88.1 89.6 21.4
Recall (sampled) 88.2 91.3 89.7 21.5
F1 (sampled) 90.2 89.3 89.8 27.2
Exact (sampled) 89.5 89.5 89.5 25.8
Exact (non-sampled) 90.8 90.8 90.8 41.7
Exact/F1 (oracle) 95.3 94.4 95.0 63.9
26Dynamic Programming
Matsuzaki et al. 05 Approximate posterior
parse distribution
à la Goodman 98 Maximize number of expected
correct rules
27Dynamic Programming Results
Objective Precision Recall F1 Exact
BEST DERIVATION BEST DERIVATION BEST DERIVATION BEST DERIVATION BEST DERIVATION
Viterbi Derivation 89.6 89.4 89.5 37.4
DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING
Variational 90.7 90.9 90.8 41.4
Max-Rule-Sum 90.5 91.3 90.9 40.4
Max-Rule-Product 91.2 91.1 91.2 41.4
28Final Results (Efficiency)
- Berkeley Parser
- 15 min
- 91.2 F-score
- Implemented in Java
- Charniak Johnson 05 Parser
- 19 min
- 90.7 F-score
- Implemented in C
29Final Results (Accuracy)
40 words F1 all F1
ENG CharniakJohnson 05 (generative) 90.1 89.6
ENG This Work 90.6 90.1
ENG CharniakJohnson 05 (reranked) 92.0 91.4
GER Dubey 05 76.3 -
GER This Work 80.8 80.1
CHN Chiang et al. 02 80.0 76.6
CHN This Work 86.3 83.4
30Conclusions
- Hierarchical coarse-to-fine inference
- Projections
- Marginalization
- Multi-lingual unlexicalized parsing
31Thank You!
- Parser available at
- http//nlp.cs.berkeley.edu