Title: Introduction to Probabilistic Parsing revised
1Introduction to Probabilistic Parsing(revised)
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box.
AAAAAAAAAAAAAAAAAAAAA
2Parsing NL requires three components
- A grammar that specifies what sentences are
legal - Context Free Grammars provide one very simple
specification. - Very large grammars have been written in various
formalisms with about 80-90 coverage. -
- A parsing algorithm that assigns possible
structures to new word strings. - The CKY algorithm and various top down algorithms
do this for CFGs. - A method for resolving ambiguities to decide
which analysis of an ambiguous sentence is
intended in the current context. - Standard parsing techniques fall short here
- Here 40-60 correct is the best symbolic
techniques have done. - Probabilistic grammars provide a natural
declarative method for ordering alternative
parses.
3Why Corpus Based Approaches?
- Informal IBM study in 1990
- Compared a range of best broad coverage parsers
in U.S. - Test material sentences of length 13 words from
AP news - All but best under 40 correct (hand checked)
- Best claimed 60 - (I dont believe it)
- How could this be true?
- Most successful work in NLP previously in
interactive systems, where user magically adapted
to the capability of the system
4The Apparent Problem
- The grammars of natural languages are vast
- A very good descriptive grammar of English is
over 1700 pages, and quite incomplete at that - There may be a small core of very general,
abstract grammatical phenomena , but this is a
vast residue of lexically tied idiosyncratic
phenomena. - Working Hypothesis (as of 1987) We need to build
systems that learn
5Robust Systems will Combine NATURE and NURTURE
- Nature Chomsky and Generative Grammarians
- Some linguistic phenomena are extremely abstract,
far from surface apparent, and apparently
universal - Nurture Harris and American Structuralists
- Distributional Analysis
- The grammar of a natural language is huge and
largely idiosyncratic, but largely surface
apparent. - Neither Theory Alone Appears to Capture the Facts
6Probabilistic CFGs
- A given CFG G can be expanded into a
Probabilistic CFG (PCFG) by adding a probability
to each production rule of G. - Technical Point Every production rule must
participate in some proper derivation, i.e. it
must fully expand to a non-empty string of
terminals. - The probability of each production is conditional
on the non-terminal being expanded.
7A Sample PCFG
8Production Probabilities are Conditional on LHSs
9Computing the Probability of a Derivation
- Given a PCFG G, for some string ?, the
probability of deriving ?, i.e. that ,
is the sum of the probabilities of all the
derivations of ?. - The probability of a particular derivation of ?
is the product of the rules used at each step of
that derivation. - The probability of each subconstituent is then
just the product of the rules used in each step
of the derivation of that subconstituent.
10An Example Derivation
11How well do PCFGs work?
- Not very well
- a PCFG adequate to parse over 90 of the MIT
Voyager Corpus was successful in picking the
correct parse on only 35 of a reserved test set. - Sample Sentences -- The MIT Voyager Corpus
- I'm currently at MIT
- What kind of food does LaGroceria serve
- Where is the closest library to MIT
- What's the closest ice cream parlor to Harvard
University - Is there a subway stop by the Mount Auburn
Hospital - Can you show me the intersection of Cambridge
Street and Hampshire Street - Which subway stop is closest to the library at
forty five Pearl Street
12Adding More Linguistic Context Helps
- Hypothesis Conditioning rule expansion on
current nonterminal doesn't provide enough
linguistic context for accurately capturing
parse preferences. - Evidence In English, NP ? Pronoun much more
likely as expansion of S?NP than of VP?V NP - Experiment I Parse Voyager corpus with exanded
PCFG. Rule probabilities conditioned on both the
non-terminal being expanded and the index of the
immediately dominating rule. - Example P(NP ?Pronoun NP S ?NP VP) .05
13Adding More Linguistic Context Helps II
- Experiment II Extend conditioning context in
experiment I to include the most likely parts of
speech for the next two words in the input
stream.
14Results
- Results parsing reserved Voyager corpus. Ref
Magerman Marcus 1991
15A Key Subproblem of ParsingResolving PP
Attachment Ambiguities
- The Problem The Role of Prepositional Phrases is
often ambiguous. - I saw the man with the telescope.
- The seeing was with the telescope
- VP ? V NP PP
- The man had the telescope
- NP ? N' PP
- Desired A workable solution which is not AI
complete.''
16Structural Approaches to PP Attachment
- Right Association -- a constituent tends to
attach to another constituent immediately to its
left Kimball 1973. - Minimal Attachment -- a constituent tends to
attach so as to involve the fewest additional
syntactic nodes Frazier 1979. - But these together only account for 55 of
attachments in travel information experiment
Whittemore et al. 1990
17Lexical Statistical Approach I Hindle Rooth 92
- Estimate which head of potential attachment sites
(e.g. see'' or man'') most often cooccurs
with the key lexical items in the PP (e.g.
with''), and attach the PP accordingly - Unsupervised learner, given a parser
18Resolving PP Attachment Using T-scores
- Method Given a verb--noun--prep ambiguity,
determine whether prep is significantly more
likely to occur - following the preceding verb or
- Following the preceding noun.
- This can be done using a t-score contrasting
- the conditional probability of seeing a
particular prep given a noun - with the conditional probability of seeing that
prep given a verb.
19T-scores (t-tests)
- provide a measure of how different the means of
two Gaussian distributions are
20Example
- Moscow sent more than 100,000 soldiers into
Afghanistan. - v n prep
- For a year of AP Newswire, t ? -8.81,
representing a significant association of into
with the verb sent, so the procedure associates
into with sent rather than soldier in subject or
pre-verbal position.
21Estimating Lexical Associations I
22Estimating Lexical Associations II
- For clear v-prep and n-prep pairs,
- Add to bigram counts by assigning each prep to
the n or v it occurs with. - If entire triple of v-n-prep occurs,
- If the absolute value of the t-score for the
ambiguity is greater than 2.1, - then assign the prep according to the t-score.
- Iterate through all triples until no more
attachments result. - For the remaining unresolved triples,
- Split the attachment between the noun and verb .5
and .5.
23Test of Lexical Association
- Test corpus 1000 reserved sentences from same
corpus. - All verb-noun-prep triples in test corpus
hand-graded by two judges. - All triples then regraded using full sentence
context. - 10 misidentified by parser.
- Most surprisingly, 10 remained difficult even
with full context. - Examples
- But over time, misery has given way to mending.
- We don't have preventive detention in the United
States.
24Test of Lexical Association
- RESULTS
- Structural Methods
- Right Association 64 correct
- Minimal Attachment 36 correct.
- Statistical Algorithm
25 Lexical Approach II
- Supervised, using Transformation Based Learning
- Brill Resnik 1994
- Terminology We see/V the boy/N1 on/P the hill/N2
- Start State Always attach to N1.
- Transformations Change the attachment location
from X to Y if - N1 is word w
- N2 is word w
- V is word w
- P is word w
- N1 is word w and V is word w2
- etc.
26First 20 Learned Transformations
27 Results (Penn Treebank training data)
28Lexical Approach III
- Supervised, using a backed-off model
- Collins Brooks 95
- Very simple, clean underlying statistical model
- Complex back-off strategy for smoothing
29Attachment Quintuples
S
VP
PP
NP
NP
V
P
- He joined the board as a nonexecutive
director - Quintuple (V-attach, vjoined, n1board, pas,
n2director) - Training set 20,801 quintuples (V- or N-attach,
v, n1, p, n2) - Test set 3097 quintuples
- Development set 4059 quintuples
30Core Statistical Approach
- Estimate
- If
- Noun-attach
- Else
- Verb-attach
- Estimation using Maximum Likelihood Estimate
- Ooops!
31Remember Language Modeling?
- How to estimate ?
- MLE
- Backing-off
32Lets Apply the Same Idea to Our Problem
33Bug
34Which Tuples to Use to Back Off??
- Tuples with Prepositions are Important
35Combining tuples including prep
36The final algorithm
37Some Baselines
38Results
- Results on 3097 test sentences
- With morphological processing
39Comparison with Other Work
40Next Time
- How can we apply this lexical insight to parsing?