Improved Inference for Unlexicalized Parsing

About This Presentation

Title:

Improved Inference for Unlexicalized Parsing

Description:

Parser available at http://nlp.cs.berkeley.edu To conclude, ... The dynamic programming results are significantly better than the reranking results, ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 32

Provided by: EECS

Learn more at: http://nlp.cs.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Improved Inference for Unlexicalized Parsing

1
Improved Inference for Unlexicalized Parsing

Slav Petrov and Dan Klein

2
Unlexicalized Parsing
Petrov et al. 06

Hierarchical, adaptive refinement

91.2 F1 score on Dev Set (1600 sentences)

1,140 Nonterminal symbols 1621min Parsing time
531,200 Rewrites
3

1621 min

4
Coarse-to-Fine Parsing
Goodman 97, CharniakJohnson 05

5
Prune?

For each chart item Xi,j, compute posterior
probability

lt threshold
E.g. consider the span 5 to 12
coarse
QP NP VP

refined

6

1621 min
111 min
(no search error)

7
Multilevel Coarse-to-Fine Parsing
Charniak et al. 06

Add more rounds of
pre-parsing
Grammars coarser
than X-bar

8
Hierarchical Pruning

Consider again the span 5 to 12

coarse
QP NP VP

split in two

QP1 QP2 NP1 NP2 VP1 VP2
split in four
QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4

split in eight

9
Intermediate Grammars
X-BarG0
G
10

1621 min
111 min
35 min
(no search error)

11
State Drift (DT tag)
12
Projected Grammars
X-BarG0
G
13
Estimating Projected Grammars

Nonterminals?

NP0
NP1
VP1
VP0
S0
S1
Nonterminals in ?(G)
Nonterminals in G
14
Estimating Projected Grammars

Rules?

S ? NP VP
S1 ? NP1 VP1 0.20 S1 ? NP1 VP2 0.12 S1 ?
NP2 VP1 0.02 S1 ? NP2 VP2 0.03 S2 ? NP1
VP1 0.11 S2 ? NP1 VP2 0.05 S2 ? NP2 VP1
0.08 S2 ? NP2 VP2 0.12
15
Estimating Projected Grammars
Corazza Satta 06
Estimating Grammars
0.56
16
Calculating Expectations

Nonterminals
ck(X) expected counts up to depth k
Converges within 25 iterations (few seconds)
Rules

1621 min
111 min
35 min
15 min
(no search error)

18
Parsing times
X-BarG0
G
19
Bracket Posteriors
(after G0)
20
Bracket Posteriors (after G1)
21
Bracket Posteriors
(Movie)
(Final Chart)
22
Bracket Posteriors (Best Tree)
23
Parse Selection

Computing most likely unsplit tree is NP-hard
Settle for best derivation.
Rerank n-best list.
Use alternative objective function.

24
Parse Risk Minimization
Titov Henderson 06

Expected loss according to our beliefs
TT true tree
TP predicted tree
L loss function (0/1, precision, recall, F1)

Use n-best candidate list and approximate
expectation with samples.

25
Reranking Results
Objective Precision Recall F1 Exact

BEST DERIVATION BEST DERIVATION BEST DERIVATION BEST DERIVATION BEST DERIVATION
Viterbi Derivation 89.6 89.4 89.5 37.4
RERANKING RERANKING RERANKING RERANKING RERANKING
Precision (sampled) 91.1 88.1 89.6 21.4
Recall (sampled) 88.2 91.3 89.7 21.5
F1 (sampled) 90.2 89.3 89.8 27.2
Exact (sampled) 89.5 89.5 89.5 25.8
Exact (non-sampled) 90.8 90.8 90.8 41.7
Exact/F1 (oracle) 95.3 94.4 95.0 63.9
26
Dynamic Programming
Matsuzaki et al. 05 Approximate posterior
parse distribution
à la Goodman 98 Maximize number of expected
correct rules
27
Dynamic Programming Results
Objective Precision Recall F1 Exact

BEST DERIVATION BEST DERIVATION BEST DERIVATION BEST DERIVATION BEST DERIVATION
Viterbi Derivation 89.6 89.4 89.5 37.4

DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING DYNAMIC PROGRAMMING
Variational 90.7 90.9 90.8 41.4
Max-Rule-Sum 90.5 91.3 90.9 40.4
Max-Rule-Product 91.2 91.1 91.2 41.4
28
Final Results (Efficiency)

Berkeley Parser
15 min
91.2 F-score
Implemented in Java
Charniak Johnson 05 Parser
19 min
90.7 F-score
Implemented in C

29
Final Results (Accuracy)
40 words F1 all F1
ENG CharniakJohnson 05 (generative) 90.1 89.6
ENG This Work 90.6 90.1
ENG CharniakJohnson 05 (reranked) 92.0 91.4

GER Dubey 05 76.3 -
GER This Work 80.8 80.1

CHN Chiang et al. 02 80.0 76.6
CHN This Work 86.3 83.4
30
Conclusions