Title: Learning Accurate, Compact, and Interpretable Tree Annotation
1Learning Accurate, Compact, and Interpretable
Tree Annotation
- Slav Petrov, Leon Barrett, Romain Thibaux, Dan
Klein
2The Game of Designing a Grammar
- Annotation refines base treebank symbols to
improve statistical fit of the grammar - Parent annotation Johnson 98
3The Game of Designing a Grammar
- Annotation refines base treebank symbols to
improve statistical fit of the grammar - Parent annotation Johnson 98
- Head lexicalization Collins 99, Charniak 00
4The Game of Designing a Grammar
- Annotation refines base treebank symbols to
improve statistical fit of the grammar - Parent annotation Johnson 98
- Head lexicalization Collins 99, Charniak 00
- Automatic clustering?
5Previous WorkManual Annotation
Klein Manning 03
- Manually split categories
- NP subject vs object
- DT determiners vs demonstratives
- IN sentential vs prepositional
- Advantages
- Fairly compact grammar
- Linguistic motivations
- Disadvantages
- Performance leveled out
- Manually annotated
6Previous WorkAutomatic Annotation Induction
Matsuzaki et. al 05, Prescher 05
- Advantages
- Automatically learned
- Label all nodes with latent variables.
- Same number k of subcategories for all
categories. - Disadvantages
- Grammar gets too large
- Most categories are oversplit while others are
undersplit.
7Previous work is complementary
8Learning Latent Annotations
- Brackets are known
- Base categories are known
- Only induce subcategories
Just like Forward-Backward for HMMs.
9Overview
- Hierarchical Training - Adaptive Splitting -
Parameter Smoothing
10Refinement of the DT tag
DT
11Refinement of the DT tag
DT
12Hierarchical refinement of the DT tag
13Hierarchical Estimation Results
14Refinement of the , tag
- Splitting all categories the same amount is
wasteful
15The DT tag revisited
16Adaptive Splitting
- Want to split complex categories more
- Idea split everything, roll back splits which
were least useful
17Adaptive Splitting
- Want to split complex categories more
- Idea split everything, roll back splits which
were least useful
18Adaptive Splitting
- Want to split complex categories more
- Idea split everything, roll back splits which
were least useful
19Adaptive Splitting
- Evaluate loss in likelihood from removing each
split - Data likelihood with split reversed
- Data likelihood with split
- No loss in accuracy when 50 of the splits are
reversed.
20Adaptive Splitting Results
21Number of Phrasal Subcategories
22Number of Phrasal Subcategories
NP
VP
PP
23Number of Phrasal Subcategories
NAC
X
24Number of Lexical Subcategories
POS
TO
,
25Number of Lexical Subcategories
RB
VBx
IN
DT
26Number of Lexical Subcategories
NNP
JJ
NNS
NN
27Smoothing
- Heavy splitting can lead to overfitting
- Idea Smoothing allows us to pool
- statistics
28Linear Smoothing
29Result Overview
30Result Overview
31Result Overview
32Final Results
33Final Results
34Linguistic Candy
- Proper Nouns (NNP)
- Personal pronouns (PRP)
35Linguistic Candy
- Relative adverbs (RBR)
- Cardinal Numbers (CD)
36Conclusions
- New Ideas
- Hierarchical Training
- Adaptive Splitting
- Parameter Smoothing
- State of the Art Parsing Performance
- Improves from X-Bar initializer 63.4 to 90.2
- Linguistically interesting grammars to sift
through.
37Thank You!
- petrov_at_eecs.berkeley.edu
38Other things we tried
- X-Bar vs structurally annotated grammar
- X-Bar grammar starts at lower performance, but
provides more flexibility - Better Smoothing
- Tried different (hierarchical) smoothing methods,
all worked about the same - (Linguistically) constraining rewrite
possibilities between subcategories - Hurts performance
- EM automatically learns that most subcategory
combinations are meaningless 90 of the
possible rewrites have 0 probability