Title: Learning and Inference for Hierarchically Split PCFGs
1Learning and Inference for Hierarchically Split
PCFGs
- Slav Petrov and Dan Klein
2The Game of Designing a Grammar
- Annotation refines base treebank symbols to
improve statistical fit of the grammar - Parent annotation Johnson 98
3The Game of Designing a Grammar
- Annotation refines base treebank symbols to
improve statistical fit of the grammar - Parent annotation Johnson 98
- Head lexicalization Collins 99, Charniak 00
4The Game of Designing a Grammar
- Annotation refines base treebank symbols to
improve statistical fit of the grammar - Parent annotation Johnson 98
- Head lexicalization Collins 99, Charniak 00
- Automatic clustering?
5Learning Latent Annotations
Matsuzaki et al. 05
- Brackets are known
- Base categories are known
- Only induce subcategories
Just like Forward-Backward for HMMs.
6Overview
- Hierarchical Training - Adaptive Splitting -
Parameter Smoothing
7Refinement of the DT tag
DT
8Refinement of the DT tag
DT
9Hierarchical refinement of the DT tag
DT
10Hierarchical Estimation Results
Model F1
Baseline 87.3
Hierarchical Training 88.4
11Refinement of the , tag
- Splitting all categories the same amount is
wasteful
12Adaptive Splitting
- Want to split complex categories more
- Idea split everything, roll back splits which
were least useful
13Adaptive Splitting
- Want to split complex categories more
- Idea split everything, roll back splits which
were least useful
14Adaptive Splitting Results
Model F1
Previous 88.4
With 50 Merging 89.5
15Number of Phrasal Subcategories
16Number of Phrasal Subcategories
NP
VP
PP
17Number of Phrasal Subcategories
NAC
X
18Number of Lexical Subcategories
POS
TO
,
19Number of Lexical Subcategories
NNP
JJ
NNS
NN
20Smoothing
- Heavy splitting can lead to overfitting
- Idea Smoothing allows us to pool
- statistics
21Result Overview
Model F1
Previous 89.5
With Smoothing 90.7
22Linguistic Candy
- Proper Nouns (NNP)
- Personal pronouns (PRP)
NNP-14 Oct. Nov. Sept.
NNP-12 John Robert James
NNP-2 J. E. L.
NNP-1 Bush Noriega Peters
NNP-15 New San Wall
NNP-3 York Francisco Street
PRP-0 It He I
PRP-1 it he they
PRP-2 it them him
23Linguistic Candy
- Relative adverbs (RBR)
- Cardinal Numbers (CD)
RBR-0 further lower higher
RBR-1 more less More
RBR-2 earlier Earlier later
CD-7 one two Three
CD-4 1989 1990 1988
CD-11 million billion trillion
CD-0 1 50 100
CD-3 1 30 31
CD-9 78 58 34
24Inference
Exhaustive parsing 1 min per sentence
25Coarse-to-Fine Parsing
Goodman 97, CharniakJohnson 05
26Hierarchical Pruning
lt t
- Consider again the span 5 to 12
coarse
QP NP VP
split in two
QP1 QP2 NP1 NP2 VP1 VP2
split in four
QP1 QP1 QP3 QP4 NP1 NP2 NP3 NP4 VP1 VP2 VP3 VP4
split in eight
27Intermediate Grammars
X-BarG0
G
28Projected Grammars
X-BarG0
G
29Final Results (Efficiency)
- Parsing the development set (1600 sentences)
- Berkeley Parser
- 10 min
- Implemented in Java
- Charniak Johnson 05 Parser
- 19 min
- Implemented in C
30Final Results (Accuracy)
40 words F1 all F1
ENG CharniakJohnson 05 (generative) 90.1 89.6
ENG This Work 90.6 90.1
GER Dubey 05 76.3 -
GER This Work 80.8 80.1
CHN Chiang et al. 02 80.0 76.6
CHN This Work 86.3 83.4
31Extensions
- Acoustic modeling
- Infinite Grammars
- Nonparametric Bayesian Learning
Petrov, Pauls Klein 07
Liang, Petrov, Jordan Klein 07
32Conclusions
- Split Merge Learning
- Hierarchical Training
- Adaptive Splitting
- Parameter Smoothing
- Hierarchical Coarse-to-Fine Inference
- Projections
- Marginalization
- Multi-lingual Unlexicalized Parsing
33Thank You!
- http//nlp.cs.berkeley.edu