Introduction to Probabilistic Parsing revised - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Introduction to Probabilistic Parsing revised

Description:

Test material: sentences of length 13 words from AP news ... For a year of AP Newswire, t -8.81, representing a significant association of ... – PowerPoint PPT presentation

Number of Views:70
Avg rating:3.0/5.0
Slides: 41
Provided by: mitchel4
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Probabilistic Parsing revised


1
Introduction to Probabilistic Parsing(revised)
TexPoint fonts used in EMF. Read the TexPoint
manual before you delete this box.
AAAAAAAAAAAAAAAAAAAAA
2
Parsing NL requires three components
  • A grammar that specifies what sentences are
    legal
  • Context Free Grammars provide one very simple
    specification.
  • Very large grammars have been written in various
    formalisms with about 80-90 coverage.
  • A parsing algorithm that assigns possible
    structures to new word strings.
  • The CKY algorithm and various top down algorithms
    do this for CFGs.
  • A method for resolving ambiguities to decide
    which analysis of an ambiguous sentence is
    intended in the current context.
  • Standard parsing techniques fall short here
  • Here 40-60 correct is the best symbolic
    techniques have done.
  • Probabilistic grammars provide a natural
    declarative method for ordering alternative
    parses.

3
Why Corpus Based Approaches?
  • Informal IBM study in 1990
  • Compared a range of best broad coverage parsers
    in U.S.
  • Test material sentences of length 13 words from
    AP news
  • All but best under 40 correct (hand checked)
  • Best claimed 60 - (I dont believe it)
  • How could this be true?
  • Most successful work in NLP previously in
    interactive systems, where user magically adapted
    to the capability of the system

4
The Apparent Problem
  • The grammars of natural languages are vast
  • A very good descriptive grammar of English is
    over 1700 pages, and quite incomplete at that
  • There may be a small core of very general,
    abstract grammatical phenomena , but this is a
    vast residue of lexically tied idiosyncratic
    phenomena.
  • Working Hypothesis (as of 1987) We need to build
    systems that learn

5
Robust Systems will Combine NATURE and NURTURE
  • Nature Chomsky and Generative Grammarians
  • Some linguistic phenomena are extremely abstract,
    far from surface apparent, and apparently
    universal
  • Nurture Harris and American Structuralists
  • Distributional Analysis
  • The grammar of a natural language is huge and
    largely idiosyncratic, but largely surface
    apparent.
  • Neither Theory Alone Appears to Capture the Facts

6
Probabilistic CFGs
  • A given CFG G can be expanded into a
    Probabilistic CFG (PCFG) by adding a probability
    to each production rule of G.
  • Technical Point Every production rule must
    participate in some proper derivation, i.e. it
    must fully expand to a non-empty string of
    terminals.
  • The probability of each production is conditional
    on the non-terminal being expanded.

7
A Sample PCFG
8
Production Probabilities are Conditional on LHSs
9
Computing the Probability of a Derivation
  • Given a PCFG G, for some string ?, the
    probability of deriving ?, i.e. that ,
    is the sum of the probabilities of all the
    derivations of ?.
  • The probability of a particular derivation of ?
    is the product of the rules used at each step of
    that derivation.
  • The probability of each subconstituent is then
    just the product of the rules used in each step
    of the derivation of that subconstituent.

10
An Example Derivation
11
How well do PCFGs work?
  • Not very well
  • a PCFG adequate to parse over 90 of the MIT
    Voyager Corpus was successful in picking the
    correct parse on only 35 of a reserved test set.
  • Sample Sentences -- The MIT Voyager Corpus
  • I'm currently at MIT
  • What kind of food does LaGroceria serve
  • Where is the closest library to MIT
  • What's the closest ice cream parlor to Harvard
    University
  • Is there a subway stop by the Mount Auburn
    Hospital
  • Can you show me the intersection of Cambridge
    Street and Hampshire Street
  • Which subway stop is closest to the library at
    forty five Pearl Street

12
Adding More Linguistic Context Helps
  • Hypothesis Conditioning rule expansion on
    current nonterminal doesn't provide enough
    linguistic context for accurately capturing
    parse preferences.
  • Evidence In English, NP ? Pronoun much more
    likely as expansion of S?NP than of VP?V NP
  • Experiment I Parse Voyager corpus with exanded
    PCFG. Rule probabilities conditioned on both the
    non-terminal being expanded and the index of the
    immediately dominating rule.
  • Example P(NP ?Pronoun NP S ?NP VP) .05

13
Adding More Linguistic Context Helps II
  • Experiment II Extend conditioning context in
    experiment I to include the most likely parts of
    speech for the next two words in the input
    stream.

14
Results
  • Results parsing reserved Voyager corpus. Ref
    Magerman Marcus 1991

15
A Key Subproblem of ParsingResolving PP
Attachment Ambiguities
  • The Problem The Role of Prepositional Phrases is
    often ambiguous.
  • I saw the man with the telescope.
  • The seeing was with the telescope
  • VP ? V NP PP
  • The man had the telescope
  • NP ? N' PP
  • Desired A workable solution which is not AI
    complete.''

16
Structural Approaches to PP Attachment
  • Right Association -- a constituent tends to
    attach to another constituent immediately to its
    left Kimball 1973.
  • Minimal Attachment -- a constituent tends to
    attach so as to involve the fewest additional
    syntactic nodes Frazier 1979.
  • But these together only account for 55 of
    attachments in travel information experiment
    Whittemore et al. 1990

17
Lexical Statistical Approach I Hindle Rooth 92
  • Estimate which head of potential attachment sites
    (e.g. see'' or man'') most often cooccurs
    with the key lexical items in the PP (e.g.
    with''), and attach the PP accordingly
  • Unsupervised learner, given a parser

18
Resolving PP Attachment Using T-scores
  • Method Given a verb--noun--prep ambiguity,
    determine whether prep is significantly more
    likely to occur
  • following the preceding verb or
  • Following the preceding noun.
  • This can be done using a t-score contrasting
  • the conditional probability of seeing a
    particular prep given a noun
  • with the conditional probability of seeing that
    prep given a verb.

19
T-scores (t-tests)
  • provide a measure of how different the means of
    two Gaussian distributions are

20
Example
  • Moscow sent more than 100,000 soldiers into
    Afghanistan.
  • v n prep
  • For a year of AP Newswire, t ? -8.81,
    representing a significant association of into
    with the verb sent, so the procedure associates
    into with sent rather than soldier in subject or
    pre-verbal position.

21
Estimating Lexical Associations I
22
Estimating Lexical Associations II
  • For clear v-prep and n-prep pairs,
  • Add to bigram counts by assigning each prep to
    the n or v it occurs with.
  • If entire triple of v-n-prep occurs,
  • If the absolute value of the t-score for the
    ambiguity is greater than 2.1,
  • then assign the prep according to the t-score.
  • Iterate through all triples until no more
    attachments result.
  • For the remaining unresolved triples,
  • Split the attachment between the noun and verb .5
    and .5.

23
Test of Lexical Association
  • Test corpus 1000 reserved sentences from same
    corpus.
  • All verb-noun-prep triples in test corpus
    hand-graded by two judges.
  • All triples then regraded using full sentence
    context.
  • 10 misidentified by parser.
  • Most surprisingly, 10 remained difficult even
    with full context.
  • Examples
  • But over time, misery has given way to mending.
  • We don't have preventive detention in the United
    States.

24
Test of Lexical Association
  • RESULTS
  • Structural Methods
  • Right Association 64 correct
  • Minimal Attachment 36 correct.
  • Statistical Algorithm

25
Lexical Approach II
  • Supervised, using Transformation Based Learning
  • Brill Resnik 1994
  • Terminology We see/V the boy/N1 on/P the hill/N2
  • Start State Always attach to N1.
  • Transformations Change the attachment location
    from X to Y if
  • N1 is word w
  • N2 is word w
  • V is word w
  • P is word w
  • N1 is word w and V is word w2
  • etc.

26
First 20 Learned Transformations
27
Results (Penn Treebank training data)
28
Lexical Approach III
  • Supervised, using a backed-off model
  • Collins Brooks 95
  • Very simple, clean underlying statistical model
  • Complex back-off strategy for smoothing

29
Attachment Quintuples
S
VP
PP
NP
NP
V
P
  • He joined the board as a nonexecutive
    director
  • Quintuple (V-attach, vjoined, n1board, pas,
    n2director)
  • Training set 20,801 quintuples (V- or N-attach,
    v, n1, p, n2)
  • Test set 3097 quintuples
  • Development set 4059 quintuples

30
Core Statistical Approach
  • Estimate
  • If
  • Noun-attach
  • Else
  • Verb-attach
  • Estimation using Maximum Likelihood Estimate
  • Ooops!

31
Remember Language Modeling?
  • How to estimate ?
  • MLE
  • Backing-off

32
Lets Apply the Same Idea to Our Problem
  • Attempt 1

33
Bug
34
Which Tuples to Use to Back Off??
  • Tuples with Prepositions are Important

35
Combining tuples including prep
36
The final algorithm
37
Some Baselines
38
Results
  • Results on 3097 test sentences
  • With morphological processing

39
Comparison with Other Work
40
Next Time
  • How can we apply this lexical insight to parsing?
Write a Comment
User Comments (0)
About PowerShow.com