Learning and Inference for Hierarchically Split PCFGs - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

Learning and Inference for Hierarchically Split PCFGs

Description:

Learning accurate, compact, and interpretable tree annotation ... Compact Grammar. Very tedious. Allocates splits where needed. Captures many features ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 63
Provided by: EECS
Category:

less

Transcript and Presenter's Notes

Title: Learning and Inference for Hierarchically Split PCFGs


1
Learning and Inference for Hierarchically Split
PCFGs
  • Slav Petrov
  • Joint work with Dan Klein

2
Motivation (Syntax)
  • Task

He was right.
  • Why?
  • Information Extraction
  • Syntactic Machine Translation

3
Treebank Parsing
4
The Game of Designing a Grammar
  • Annotation refines base treebank symbols to
    improve statistical fit of the grammar
  • Parent annotation Johnson 98

5
The Game of Designing a Grammar
  • Annotation refines base treebank symbols to
    improve statistical fit of the grammar
  • Parent annotation Johnson 98
  • Head lexicalization Collins 99, Charniak 00

6
The Game of Designing a Grammar
  • Annotation refines base treebank symbols to
    improve statistical fit of the grammar
  • Parent annotation Johnson 98
  • Head lexicalization Collins 99, Charniak 00
  • Automatic clustering?

7
Learning accurate, compact, and interpretable
tree annotation
  • Slav Petrov, Leon Barrett, Romain Thibaux and Dan
    Klein
  • ACL 2006

8
Previous WorkManual Annotation
Klein Manning 03
  • Manually split categories
  • NP subject vs object
  • DT determiners vs demonstratives
  • IN sentential vs prepositional
  • Advantages
  • Fairly compact grammar
  • Linguistic motivations
  • Disadvantages
  • Performance leveled out
  • Manually annotated

9
Previous WorkAutomatic Annotation Induction
Matsuzaki et. al 05, Prescher 05
  • Advantages
  • Automatically learned
  • Label all nodes with latent variables.
  • Same number k of subcategories for all
    categories.
  • Disadvantages
  • Grammar gets too large
  • Most categories are oversplit while others are
    undersplit.

10
Previous work is complementary
11
Learning Latent Annotations
  • EM algorithm
  • Brackets are known
  • Base categories are known
  • Only induce subcategories

Just like Forward-Backward for HMMs.
12
Overview
- Hierarchical Training - Adaptive Splitting -
Parameter Smoothing
13
Refinement of the DT tag
DT
14
Refinement of the DT tag
DT
15
Hierarchical refinement of the DT tag
DT
16
Hierarchical Estimation Results
17
Refinement of the , tag
  • Splitting all categories the same amount is
    wasteful

18
The DT tag revisited
19
Adaptive Splitting
  • Want to split complex categories more
  • Idea split everything, roll back splits which
    were least useful

20
Adaptive Splitting
  • Want to split complex categories more
  • Idea split everything, roll back splits which
    were least useful

21
Adaptive Splitting
  • Evaluate loss in likelihood from removing each
    split
  • Data likelihood with split reversed
  • Data likelihood with split
  • No loss in accuracy when 50 of the splits are
    reversed.

22
Adaptive Splitting Results
23
Number of Phrasal Subcategories
24
Number of Phrasal Subcategories
NP
VP
PP
25
Number of Phrasal Subcategories
NAC
X
26
Number of Lexical Subcategories
POS
TO
,
27
Number of Lexical Subcategories
RB
VBx
IN
DT
28
Number of Lexical Subcategories
NNP
JJ
NNS
NN
29
Smoothing
  • Heavy splitting can lead to overfitting
  • Idea Smoothing allows us to pool
  • statistics

30
Linear Smoothing
31
Result Overview
32
Linguistic Candy
  • Proper Nouns (NNP)
  • Personal pronouns (PRP)

33
Linguistic Candy
  • Relative adverbs (RBR)
  • Cardinal Numbers (CD)

34
Improved Inference for Unlexicalized Parsing
  • Slav Petrov and Dan Klein
  • NAACL 2007

35
Time to parse 1576 sentences
  • 1621 min

36
Coarse-to-Fine Parsing
Goodman 97, CharniakJohnson 05
37
Prune?
  • For each chart item Xi,j, compute posterior
    probability

lt threshold
E.g. consider the span 5 to 12
coarse
refined
38
Time to parse 1576 sentences
  • 1621 min
  • 111 min
  • (no search error)

39
Hierarchical Pruning
  • Consider again the span 5 to 12

coarse
split in two
split in four
split in eight
40
Intermediate Grammars
X-BarG0
G
41
Time to parse 1576 sentences
  • 1621 min
  • 111 min
  • 35 min
  • (no search error)

42
State Drift (DT tag)
43
Projected Grammars
X-BarG0
G
44
Estimating Projected Grammars
  • Nonterminals?

NP0
NP1
VP1
VP0
S0
S1
Nonterminals in ?(G)
Nonterminals in G
45
Estimating Projected Grammars
  • Rules?

S ? NP VP
S1 ? NP1 VP1 0.20 S1 ? NP1 VP2 0.12 S1 ?
NP2 VP1 0.02 S1 ? NP2 VP2 0.03 S2 ? NP1
VP1 0.11 S2 ? NP1 VP2 0.05 S2 ? NP2 VP1
0.08 S2 ? NP2 VP2 0.12
46
Estimating Projected Grammars
Corazza Satta 06
Estimating Grammars
0.56
47
Calculating Expectations
  • Nonterminals
  • ck(X) expected counts up to depth k
  • Converges within 25 iterations (few seconds)
  • Rules

48
Time to parse 1576 sentences
  • 1621 min
  • 111 min
  • 35 min
  • 15 min
  • (no search error)

49
Parsing times
X-BarG0
G
50
Bracket Posteriors
(after G0)
51
Bracket Posteriors (after G1)
52
Bracket Posteriors
(Movie)
(Final Chart)
53
Bracket Posteriors (Best Tree)
54
Parse Selection
  • Computing most likely unsplit tree is NP-hard
  • Settle for best derivation.
  • Rerank n-best list.
  • Use alternative objective function.

55
Final Results (Efficiency)
  • Berkeley Parser
  • 15 min
  • 91.2 F-score
  • Implemented in Java
  • Charniak Johnson 05 Parser
  • 19 min
  • 90.7 F-score
  • Implemented in C

56
Final Results (Accuracy)
57
Extensions
  • Learning Structured Models for Phone Recognition
    EMNLP 07
  • The Infinite PCFG Using Hierarchical Dirichlet
    Processes EMNLP 07
  • Discriminative Log-Linear Grammars with hidden
    Variables NIPS 07

58
Conclusions
  • Split Merge Learning
  • Hierarchical Training
  • Adaptive Splitting
  • Parameter Smoothing
  • Hierarchical Coarse-to-Fine Inference
  • Projections
  • Marginalization
  • Multi-lingual Unlexicalized Parsing

59
  • Thank You!

60
Inside/Outside Scores
Inside
Outside
Ax
61
Learning Latent Annotations (Details)
  • E-Step
  • M-Step

62
Adaptive Splitting (Details)
  • True data likelihood
  • Approximate likelihood with split at n reversed
  • Approximate loss in likelihood
Write a Comment
User Comments (0)
About PowerShow.com