Learning Accurate, Compact, and Interpretable Tree Annotation

About This Presentation

Title:

Learning Accurate, Compact, and Interpretable Tree Annotation

Description:

Evaluate loss in likelihood from removing each split = Data ... Linguistically interesting grammars to sift through. Thank You! petrov_at_eecs.berkeley.edu ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 39

Provided by: EECS

Category:

more less

Transcript and Presenter's Notes

Title: Learning Accurate, Compact, and Interpretable Tree Annotation

1
Learning Accurate, Compact, and Interpretable
Tree Annotation

Slav Petrov, Leon Barrett, Romain Thibaux, Dan
Klein

2
The Game of Designing a Grammar

Annotation refines base treebank symbols to
improve statistical fit of the grammar
Parent annotation Johnson 98

3
The Game of Designing a Grammar

Annotation refines base treebank symbols to
improve statistical fit of the grammar
Parent annotation Johnson 98
Head lexicalization Collins 99, Charniak 00

4
The Game of Designing a Grammar

Annotation refines base treebank symbols to
improve statistical fit of the grammar
Parent annotation Johnson 98
Head lexicalization Collins 99, Charniak 00
Automatic clustering?

5
Previous WorkManual Annotation
Klein Manning 03

Manually split categories
NP subject vs object
DT determiners vs demonstratives
IN sentential vs prepositional
Advantages
Fairly compact grammar
Linguistic motivations
Disadvantages
Performance leveled out
Manually annotated

6
Previous WorkAutomatic Annotation Induction
Matsuzaki et. al 05, Prescher 05

Advantages
Automatically learned
Label all nodes with latent variables.
Same number k of subcategories for all
categories.
Disadvantages
Grammar gets too large
Most categories are oversplit while others are
undersplit.

7
Previous work is complementary
8
Learning Latent Annotations

EM algorithm

Brackets are known
Base categories are known
Only induce subcategories

Just like Forward-Backward for HMMs.
9
Overview
- Hierarchical Training - Adaptive Splitting -
Parameter Smoothing
10
Refinement of the DT tag
DT
11
Refinement of the DT tag
DT
12
Hierarchical refinement of the DT tag
13
Hierarchical Estimation Results
14
Refinement of the , tag

Splitting all categories the same amount is
wasteful

15
The DT tag revisited
16
Adaptive Splitting

Want to split complex categories more
Idea split everything, roll back splits which
were least useful

17
Adaptive Splitting

Want to split complex categories more
Idea split everything, roll back splits which
were least useful

18
Adaptive Splitting

Want to split complex categories more
Idea split everything, roll back splits which
were least useful

19
Adaptive Splitting

Evaluate loss in likelihood from removing each
split
Data likelihood with split reversed
Data likelihood with split
No loss in accuracy when 50 of the splits are
reversed.

20
Adaptive Splitting Results
21
Number of Phrasal Subcategories
22
Number of Phrasal Subcategories
NP
VP
PP
23
Number of Phrasal Subcategories
NAC
X
24
Number of Lexical Subcategories
POS
TO
,
25
Number of Lexical Subcategories
RB
VBx
IN
DT
26
Number of Lexical Subcategories
NNP
JJ
NNS
NN
27
Smoothing

Heavy splitting can lead to overfitting

Idea Smoothing allows us to pool
statistics

28
Linear Smoothing
29
Result Overview
30
Result Overview
31
Result Overview
32
Final Results
33
Final Results
34
Linguistic Candy

Proper Nouns (NNP)
Personal pronouns (PRP)

35
Linguistic Candy

Relative adverbs (RBR)
Cardinal Numbers (CD)

36
Conclusions

New Ideas
Hierarchical Training
Adaptive Splitting
Parameter Smoothing
State of the Art Parsing Performance
Improves from X-Bar initializer 63.4 to 90.2
Linguistically interesting grammars to sift
through.

37
Thank You!

petrov_at_eecs.berkeley.edu

38
Other things we tried

X-Bar vs structurally annotated grammar
X-Bar grammar starts at lower performance, but
provides more flexibility
Better Smoothing
Tried different (hierarchical) smoothing methods,
all worked about the same
(Linguistically) constraining rewrite
possibilities between subcategories
Hurts performance
EM automatically learns that most subcategory
combinations are meaningless 90 of the
possible rewrites have 0 probability