Introduction to Probabilistic Parsing revised - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Introduction to Probabilistic Parsing revised

Description:

Test material: sentences of length 13 words from AP news ... For a year of AP Newswire, t -8.81, representing a significant association of ... – PowerPoint PPT presentation

Number of Views:70

Avg rating:3.0/5.0

Slides: 41

Provided by: mitchel4

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Probabilistic Parsing revised

1
Introduction to Probabilistic Parsing(revised)
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box.
AAAAAAAAAAAAAAAAAAAAA
2
Parsing NL requires three components

A grammar that specifies what sentences are
legal
Context Free Grammars provide one very simple
specification.
Very large grammars have been written in various
formalisms with about 80-90 coverage.
A parsing algorithm that assigns possible
structures to new word strings.
The CKY algorithm and various top down algorithms
do this for CFGs.
A method for resolving ambiguities to decide
which analysis of an ambiguous sentence is
intended in the current context.
Standard parsing techniques fall short here
Here 40-60 correct is the best symbolic
techniques have done.
Probabilistic grammars provide a natural
declarative method for ordering alternative
parses.

3
Why Corpus Based Approaches?

Informal IBM study in 1990
Compared a range of best broad coverage parsers
in U.S.
Test material sentences of length 13 words from
AP news
All but best under 40 correct (hand checked)
Best claimed 60 - (I dont believe it)
How could this be true?
Most successful work in NLP previously in
interactive systems, where user magically adapted
to the capability of the system

4
The Apparent Problem

The grammars of natural languages are vast
A very good descriptive grammar of English is
over 1700 pages, and quite incomplete at that
There may be a small core of very general,
abstract grammatical phenomena , but this is a
vast residue of lexically tied idiosyncratic
phenomena.
Working Hypothesis (as of 1987) We need to build
systems that learn

5
Robust Systems will Combine NATURE and NURTURE

Nature Chomsky and Generative Grammarians
Some linguistic phenomena are extremely abstract,
far from surface apparent, and apparently
universal
Nurture Harris and American Structuralists
Distributional Analysis
The grammar of a natural language is huge and
largely idiosyncratic, but largely surface
apparent.
Neither Theory Alone Appears to Capture the Facts

6
Probabilistic CFGs

A given CFG G can be expanded into a
Probabilistic CFG (PCFG) by adding a probability
to each production rule of G.
Technical Point Every production rule must
participate in some proper derivation, i.e. it
must fully expand to a non-empty string of
terminals.
The probability of each production is conditional
on the non-terminal being expanded.

7
A Sample PCFG
8
Production Probabilities are Conditional on LHSs
9
Computing the Probability of a Derivation

Given a PCFG G, for some string ?, the
probability of deriving ?, i.e. that ,
is the sum of the probabilities of all the
derivations of ?.
The probability of a particular derivation of ?
is the product of the rules used at each step of
that derivation.
The probability of each subconstituent is then
just the product of the rules used in each step
of the derivation of that subconstituent.

10
An Example Derivation
11
How well do PCFGs work?

Not very well
a PCFG adequate to parse over 90 of the MIT
Voyager Corpus was successful in picking the
correct parse on only 35 of a reserved test set.
Sample Sentences -- The MIT Voyager Corpus
I'm currently at MIT
What kind of food does LaGroceria serve
Where is the closest library to MIT
What's the closest ice cream parlor to Harvard
University
Is there a subway stop by the Mount Auburn
Hospital
Can you show me the intersection of Cambridge
Street and Hampshire Street
Which subway stop is closest to the library at
forty five Pearl Street

12
Adding More Linguistic Context Helps

Hypothesis Conditioning rule expansion on
current nonterminal doesn't provide enough
linguistic context for accurately capturing
parse preferences.
Evidence In English, NP ? Pronoun much more
likely as expansion of S?NP than of VP?V NP
Experiment I Parse Voyager corpus with exanded
PCFG. Rule probabilities conditioned on both the
non-terminal being expanded and the index of the
immediately dominating rule.
Example P(NP ?Pronoun NP S ?NP VP) .05

13
Adding More Linguistic Context Helps II

Experiment II Extend conditioning context in
experiment I to include the most likely parts of
speech for the next two words in the input
stream.

14
Results

Results parsing reserved Voyager corpus. Ref
Magerman Marcus 1991

15
A Key Subproblem of ParsingResolving PP
Attachment Ambiguities

The Problem The Role of Prepositional Phrases is
often ambiguous.
I saw the man with the telescope.
The seeing was with the telescope
VP ? V NP PP
The man had the telescope
NP ? N' PP
Desired A workable solution which is not AI
complete.''

16
Structural Approaches to PP Attachment

Right Association -- a constituent tends to
attach to another constituent immediately to its
left Kimball 1973.
Minimal Attachment -- a constituent tends to
attach so as to involve the fewest additional
syntactic nodes Frazier 1979.
But these together only account for 55 of
attachments in travel information experiment
Whittemore et al. 1990

17
Lexical Statistical Approach I Hindle Rooth 92

Estimate which head of potential attachment sites
(e.g. see'' or man'') most often cooccurs
with the key lexical items in the PP (e.g.
with''), and attach the PP accordingly
Unsupervised learner, given a parser

18
Resolving PP Attachment Using T-scores

Method Given a verb--noun--prep ambiguity,
determine whether prep is significantly more
likely to occur
following the preceding verb or
Following the preceding noun.
This can be done using a t-score contrasting
the conditional probability of seeing a
particular prep given a noun
with the conditional probability of seeing that
prep given a verb.

19
T-scores (t-tests)

provide a measure of how different the means of
two Gaussian distributions are

20
Example

Moscow sent more than 100,000 soldiers into
Afghanistan.
v n prep
For a year of AP Newswire, t ? -8.81,
representing a significant association of into
with the verb sent, so the procedure associates
into with sent rather than soldier in subject or
pre-verbal position.

21
Estimating Lexical Associations I
22
Estimating Lexical Associations II