Learning and Inference for Hierarchically Split PCFGs - PowerPoint PPT Presentation

1 / 62

About This Presentation

Title:

Learning and Inference for Hierarchically Split PCFGs

Description:

Learning accurate, compact, and interpretable tree annotation ... Compact Grammar. Very tedious. Allocates splits where needed. Captures many features ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 63

Provided by: EECS

Category:

more less

Transcript and Presenter's Notes

Title: Learning and Inference for Hierarchically Split PCFGs

1
Learning and Inference for Hierarchically Split
PCFGs

Slav Petrov
Joint work with Dan Klein

2
Motivation (Syntax)

Task

He was right.

Why?
Information Extraction
Syntactic Machine Translation

3
Treebank Parsing
4
The Game of Designing a Grammar

Annotation refines base treebank symbols to
improve statistical fit of the grammar
Parent annotation Johnson 98

5
The Game of Designing a Grammar

Annotation refines base treebank symbols to
improve statistical fit of the grammar
Parent annotation Johnson 98
Head lexicalization Collins 99, Charniak 00

6
The Game of Designing a Grammar

Annotation refines base treebank symbols to
improve statistical fit of the grammar
Parent annotation Johnson 98
Head lexicalization Collins 99, Charniak 00
Automatic clustering?

7
Learning accurate, compact, and interpretable
tree annotation

Slav Petrov, Leon Barrett, Romain Thibaux and Dan
Klein
ACL 2006

8
Previous WorkManual Annotation
Klein Manning 03

Manually split categories
NP subject vs object
DT determiners vs demonstratives
IN sentential vs prepositional
Advantages
Fairly compact grammar
Linguistic motivations
Disadvantages
Performance leveled out
Manually annotated

9
Previous WorkAutomatic Annotation Induction
Matsuzaki et. al 05, Prescher 05

Advantages
Automatically learned
Label all nodes with latent variables.
Same number k of subcategories for all
categories.
Disadvantages
Grammar gets too large
Most categories are oversplit while others are
undersplit.

10
Previous work is complementary
11
Learning Latent Annotations

EM algorithm

Brackets are known
Base categories are known
Only induce subcategories

Just like Forward-Backward for HMMs.
12
Overview
- Hierarchical Training - Adaptive Splitting -
Parameter Smoothing
13
Refinement of the DT tag
DT
14
Refinement of the DT tag
DT
15
Hierarchical refinement of the DT tag
DT
16
Hierarchical Estimation Results
17
Refinement of the , tag

Splitting all categories the same amount is
wasteful

18
The DT tag revisited
19
Adaptive Splitting

Want to split complex categories more
Idea split everything, roll back splits which
were least useful

20
Adaptive Splitting

Want to split complex categories more
Idea split everything, roll back splits which
were least useful

21
Adaptive Splitting

Evaluate loss in likelihood from removing each
split
Data likelihood with split reversed
Data likelihood with split
No loss in accuracy when 50 of the splits are
reversed.

22
Adaptive Splitting Results
23
Number of Phrasal Subcategories
24
Number of Phrasal Subcategories
NP
VP
PP
25
Number of Phrasal Subcategories
NAC
X
26
Number of Lexical Subcategories
POS
TO
,
27
Number of Lexical Subcategories
RB
VBx
IN
DT
28
Number of Lexical Subcategories
NNP
JJ
NNS
NN
29
Smoothing

Heavy splitting can lead to overfitting

Idea Smoothing allows us to pool
statistics

30
Linear Smoothing
31
Result Overview
32
Linguistic Candy

Proper Nouns (NNP)
Personal pronouns (PRP)

33
Linguistic Candy

Relative adverbs (RBR)
Cardinal Numbers (CD)

34
Improved Inference for Unlexicalized Parsing

Slav Petrov and Dan Klein
NAACL 2007

35
Time to parse 1576 sentences

1621 min

36
Coarse-to-Fine Parsing
Goodman 97, CharniakJohnson 05
37
Prune?

For each chart item Xi,j, compute posterior
probability

lt threshold
E.g. consider the span 5 to 12
coarse
refined
38
Time to parse 1576 sentences

1621 min
111 min
(no search error)

39
Hierarchical Pruning

Consider again the span 5 to 12

coarse
split in two
split in four
split in eight
40
Intermediate Grammars
X-BarG0
G
41
Time to parse 1576 sentences

1621 min
111 min
35 min
(no search error)

42
State Drift (DT tag)
43
Projected Grammars
X-BarG0
G
44
Estimating Projected Grammars

Nonterminals?

NP0
NP1
VP1
VP0
S0
S1
Nonterminals in ?(G)
Nonterminals in G
45
Estimating Projected Grammars

Rules?

S ? NP VP
S1 ? NP1 VP1 0.20 S1 ? NP1 VP2 0.12 S1 ?
NP2 VP1 0.02 S1 ? NP2 VP2 0.03 S2 ? NP1
VP1 0.11 S2 ? NP1 VP2 0.05 S2 ? NP2 VP1
0.08 S2 ? NP2 VP2 0.12
46
Estimating Projected Grammars
Corazza Satta 06
Estimating Grammars
0.56
47
Calculating Expectations

Nonterminals
ck(X) expected counts up to depth k
Converges within 25 iterations (few seconds)
Rules

48
Time to parse 1576 sentences

1621 min
111 min
35 min
15 min
(no search error)

49
Parsing times
X-BarG0
G
50
Bracket Posteriors
(after G0)
51
Bracket Posteriors (after G1)
52
Bracket Posteriors
(Movie)
(Final Chart)
53
Bracket Posteriors (Best Tree)
54
Parse Selection