Title: Probabilistic Parsing II
1Probabilistic Parsing II
- (many slides adapted from slides by
- Michael Collins)
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box. AAAAA
2How well do PCFGs work?
- Not very well
- a PCFG adequate to parse over 90 of the MIT
Voyager Corpus was successful in picking the
correct parse on only 35 of a reserved test set. - Sample Sentences -- The MIT Voyager Corpus
- I'm currently at MIT
- What kind of food does LaGroceria serve
- Where is the closest library to MIT
- What's the closest ice cream parlor to Harvard
University - Is there a subway stop by the Mount Auburn
Hospital - Can you show me the intersection of Cambridge
Street and Hampshire Street - Which subway stop is closest to the library at
forty five Pearl Street
3Voyager Experiments Results
- Results parsing reserved Voyager corpus. Ref
Magerman Marcus 1991
4PP Attachment V-NP-PP forced choice
S
VP
PP
NP
NP
V
P
- He joined the board as a nonexecutive
director - Quintuple (V-attach, vjoined, n1board, pas,
n2director) - Training set 20,801 quintuples (V- or N-attach,
v, n1, p, n2) - Test set 3097 quintuples
- Development set 4059 quintuples
5Core Statistical Approach
- Estimate
- If
- Noun-attach
- Else
- Verb-attach
- Estimation using Maximum Likelihood Estimate
6The final algorithm w/ backoff
7Q How to make PCFGs sensitive to
- Lexical relationships
- Larger Contexts
8Lexical relationships paths in the parse tree
9Lexical relationships are now coded locally in
the tree itself
10 The SPATTER Parser (Magerman 95)
11A Lexicalized PCFG
12Factoring rule expansion Charniak 97
13Smoothed Estimation
14Smoothed Estimation II
15Smoothed Estimation III
16Independence Assumptions
17Head Probabilities
18Rule Probabilities
19Estimating Head Probabilities
20(No Transcript)