Natural Language Parsing: Graphs, the A* Algorithm, and Modularity PowerPoint PPT Presentation

presentation player overlay
1 / 48
About This Presentation
Transcript and Presenter's Notes

Title: Natural Language Parsing: Graphs, the A* Algorithm, and Modularity


1
Natural Language Parsing Graphs, the A
Algorithm, and Modularity
  • Christopher Manning
  • and Dan Klein, Roger Levy
  • Depts of Computer Science and Linguistics
  • Stanford University
  • http//nlp.stanford.edu/manning/

2
1. Hasnt this been solved?
  • Time complexity of (general) CFG parsing is
    dominated by the number of traversals done.
  • Traversals represent the combination of two
    adjacent parse items into a larger one

S0,3
O(g3n3)
NP0,2
VP2,3
3
Is the problem just cycles?
  • Bill Gates, Remarks to Gartner Symposium, October
    6, 1997
  • Applications always become more demanding. Until
    the computer can speak to you in perfect English
    and understand everything you say to it and learn
    in the same way that an assistant would learn
    until it has the power to do that we need all
    the cycles. We need to be optimized to do the
    best we can. Right now linguistics are right on
    the edge of what the processor can do. As we get
    another factor of two, then speech will start to
    be on the edge of what it can do.

4
Why is Natural Language Understanding difficult?
  • The hidden structure of language is highly
    ambiguous
  • Tree for Fed raises interest rates 0.5 in
    effort to control inflation (NYT headline
    5/17/00)

5
Where are the ambiguities?
6
The bad effects of V/N ambiguities
7
The ambiguity of language Newspaper headlines
  • Ban on Nude Dancing on Governor's Desk from a
    Georgia newspaper discussing current legislation
  • Juvenile Court to Try Shooting Defendant
  • Teacher Strikes Idle Kids
  • Stolen Painting Found by Tree
  • Local High School Dropouts Cut in Half
  • Red Tape Holds Up New Bridges
  • and a couple of new ones
  • China to orbit human on Oct. 15
  • Moon wants to go to space

8
Goal Information ? Knowledge
  • Lots of unstructured text/web information
  • that wed like to turn into usable knowledge
  • employs(stanfordUniversity, chrisManning)
  • ?t ?e ?x1 ?x2 (employing(e) employer(e, x1)
    employed(e, x2) name(x2, Christopher Manning)
    name(x1, Stanford University) at(e, t) t
    ? 1999, 2003

9
Question answering (QA) from text
  • TREC 8 Question A competition
  • With massive collections of on-line documents,
    manual translation of knowledge is impractical.
  • We want answers from textbases e.g.
    bioinformatics
  • Pasca and Harabagiu (2001)
  • Good IR is needed SMART paragraph retrieval
  • Large taxonomy of question types and expected
    answer types is crucial
  • Statistical parser used to parse questions and
    relevant text for answers, and to build KB

10
Question Answering Example
  • How hot does the inside of an active volcano get?
  • get(TEMPERATURE, inside(volcano(active)))
  • Lava fragments belched out of the mountain were
    as hot as 300 degrees Fahrenheit.
  • fragments(lava, TEMPERATURE(degrees(300)),
  • belched(out, mountain))
  • volcano ISA mountain
  • lava ISPARTOF volcano ? lava inside volcano
  • fragments of lava HAVEPROPERTIESOF lava
  • The needed semantic information is in WordNet
    definitions, and was successfully translated into
    a form that can be used for rough proofs

11
Parsing Goals
  • The goal develop grammars and parsers that are
  • Accurate produce good parses
  • Model Optimal find their models actual best
    parses
  • Fast seconds to parse long sentences
  • Technology exists to get any two, but not all
    three.
  • Exhaustive Parsing Not Fast
  • Chart Parsing Earley 70
  • Approximate Parsing Not Optimal
  • Beam Parsing, Collins 97, Charniak 01
  • Best-First Parsing Charniak et al. 98
  • Always Build Right-Branching Structure Not
    Accurate
  • The problem involves both learning and inference

12
Talk Outline
  1. Big picture overview
  2. Parsing and graphs Hypergraph parsing
  3. A parsing efficient unlexicalized parsing
  4. A factored, lexicalized parsing model
  5. Accurate unlexicalized parsing

13
2. Parsing as Search
Xh i,j
goal
Sfell 0,5
Edge
VPfell 2,5
PPin 3,5
NPpayrolls 0,2
NNFactory 0,1
NNpayrolls 1,2
VBDfell 2,3
INin 3,4
NNSeptember 4,5
start
14
CKY Parsing
  • In CKY parsing, we visit edges by span size
  • Guarantees correctness by working inside-out.
  • Build all small bits before any larger bits that
    could possibly require them.
  • Exhaustive the goal is among the nodes with
    largest span size!

15
What can go wrong?
  • We can build too many edges.
  • Most edges that can be built, shouldnt.
  • CKY builds them all!
  • We can build in an bad order.
  • Might find bad parses before good parses.
  • Will trigger best-first propagation.

Speed build promising edges first.
Correctness keep edges on the agenda until
youre sure youve seen their best parse.
16
Uniform-Cost Parsing
  • We want to work on good parses inside-out.
  • CKY does this synchronously, by span size.
  • Uniform-cost orders edges by their best known
    score.
  • Why its correct
  • Adding structure incurs probability cost.
  • Trees have lower probability than their
    sub-parts.
  • What makes things tricky
  • We dont have a full graph to explore
  • The graph is built dynamically correctness
    depends on the right bits of the graph being
    built before an edge is finished

? ? ??
built before
17
3. A Search
?
  • Problem with uniform-cost
  • Even unlikely small edges have high score.
  • We end up processing every small edge!
  • Solution A Search
  • Small edges have to fit into a full parse.
  • The smaller the edge, the more the full parse
    will cost.
  • Consider both the cost to build (?) and the cost
    to complete (?).
  • We figure out ? during parsing.
  • We GUESS at ? in advance (pre-processing).

Score ?
?
?
Score ? ?
18
A Parsing
  • The true cost-to-completion (?)
  • We look for easy to compute bounds (a), a ? ?
  • The trivial bound, a(E,w) 1, gives uniform cost
    search.
  • The exact bound a(E,w) ?(E,w) gives a perfect
    search, but is impractical.
  • Why is A parsing not the standard thing to do?
  • Useful admissible estimates are hard to engineer.

19
Finding Estimates
  • Challenge find estimates which are
  • Admissible/Monotonic
  • Informative
  • Easy to precompute
  • Example Span-State (SX)
  • In a sentence w1w10, whats the best completion
    for NP1,4?
  • There is a best completion for NP using 1 left
    and 6 right words.
  • That completion is probably not valid for this
    sentence.
  • BUT, its probability is not less than the actual
    best completion!

SX Completion Score -11.3
True Completion Score -18.1
20
Pre-Calculating SX
  • Best way to parse an X using K words
  • Best way to parse ?X? if ?L and ?R words.

Table size XmaxLen
X
X
Y
Z
s
K-s
K
Table size XmaxLen2
Z
Y
X
X
L
R
s
R-s
L
21
Enriching Context
  • The more detailed the context, the sharper our
    estimate

Fix outside size Score -11.3
22
Context Summary Savings
Estimate Time Memory Items Blocked Items Blocked
NULL 0 0 11.2
S 1 min 2.5K 40.5
SX 1 min 5M 80.3
SXL 30 min 250M 83.5
SXR 30 min 250M 93.8
BEST 540 min 1G 94.6
Over WSJ sentences, length 18 26, to facilitate
comparison to previous work.
23
Context Summary Sharpness
Adding local information changes the intercept,
but not the slope!
24
What to do?
  • Option 1 Find Global Estimates
  • Idea instead of pre-building a table, build
    estimates tailored to each sentence.
  • It had better be fast, per sentence!
  • Option 2 Live Dangerously
  • We could just boost our estimates.
  • Lose admissibility, monotonicity, correctness,
    O(n3) bound.
  • Need to do substantial extra work per item.
  • Best-first parsing Charniak et al. 98
    resembles an inadmissible A search.

25
Grammar Projection Estimates
  • Alternative to context summary
  • Pre-parse the full context exhaustively, but
    using a bounding grammar.
  • If the bounding grammar is simple enough,
    exhaustive parsing can be fast enough to be
    useful.
  • In general an equivalence map ? over grammar
    symbols.
  • Example X-bar.

G
? (G)
?
NP NP? ? CC NP CC NP
NP NP?
26
Example Forward Filter
  • Let ? collapse all phrasal symbols to X
  • When can X? ? CC X CC X be completed?
  • Whenever the right context includes two CCs!
  • Gives an admissible lower bound on this
    projection that is very efficient to calculate.

NP NP? ? CC NP CC NP
X X? ? CC X CC X
?
X
X? ? CC X CC X
and or
27
Grammar Projection Savings
Context Estimates augmented by the FILTER Estimate
Estimate Time Memory Items Blocked Items Blocked
NULL 0 0 58.3
S 1 min 2.5K 77.8
SX 1 min 5M 95.3
SXL 30 min 250M 96.1
SXR 30 min 250M 96.9
BEST 540 min 1G 97.3
The price of optimality? Item threshold for 96
parse success Caraballo and Charniak 98
10K BESTFILTER 6K Charniak et al. 98
2K
28
4. Lexicalized PCFG Models
  • Word-to-word affinities are useful for certain
    ambiguities

29
Modeling a Lexicalized Tree
  • Task assign a score to each local tree
  • Joint generative models Collins 97 Charniak 97,
    00
  • P(NPpayrolls VPfell Sfell) is modeled
    directly, through complex back-off models.
  • Is this necessary?
  • In linguistics, syntax is standardly described
    using categories with little mention of words
  • This is possible because acceptable syntactic
    configurations are independent of words
  • Conversely, lexical preferences can effectively
    decide attachments of arguments and modifiers
    without paying much attention to syntax Church
    and Gale 93

P(NPpayrolls VPfell Sfell)
30
Lexicalized A Parsing
  • Grammar projections shine for lexicalized
    parsing.
  • Use two coupled projections
  • One strips the words and leaves the PCFG
    backbone.
  • One strips the PCFG symbols and leaves the word
    dependency tree.
  • Each projection can be parsed exhaustively much
    faster than exhaustive lexicalized parsing.
  • Use the projections to guide a search over the
    full lexicalized model.
  • Works best for a special factored form of lexical
    models.

31
Factored A Estimates
  • If w factors over projections ?i, then for a
    path P
  • Factored scores have the following natural A
    estimate

32
Projecting Syntax and Semantics
Lexicalized Tree T (C,D) P(T) P(C)P(D)
Syntax C P(C) is a standard PCFG, captures
structural patterns
Semantics D P(D) is a dependency grammar,
captures word-word patterns
33
Parsing Results Time
  • Total time dominated by calculation of A tables
    in each projection O(n3)

34
Parsing Results Nodes
  • Exact lexical parsing is in general infeasible
  • Suppressed work is 99.997 at length 10, and
    further approaches 100 as length goes up.

35
Michael Collins (2003, COLT)
36
Sparseness 1 million words is like nothing
  • Much work uses bilexical statistics likelihoods
    of relationships between pairs of words
  • Very sparse, even on topics central to the WSJ
  • stocks plummeted 2 occurrences
  • stocks stabilized 1 occurrence
  • stocks skyrocketed 0 occurrences
  • stocks discussed 0 occurrences
  • So far very little success in augmenting the
    treebank with extra unannotated materials or
    using semantic classes or clusters
  • Gildea 01 You only lose 0.5 by eliminating
    bilexical statistics on WSJ nothing cross-domain

37
5. Accurate Unlexicalized Parsing with PCFGs
  • The symbols in a PCFG define independence
    assumptions
  • At any node, the material inside that node is
    independent of the material outside that node,
    given the label of that node.
  • Any information that statistically connects
    behavior inside and outside a node must flow
    through that node.

S
S ? NP VP NP ? DT NN
NP
NP
VP
38
Breaking Up the Symbols
  • We can relax independence assumptions by encoding
    dependencies into the PCFG symbols
  • A symbol like NPNP-POS is equivalent to
  • NP parent NP, POS were doing GPSG.
  • Parent annotation
  • Johnson 98

Marking possesive NPs
39
Experimental Process
  • Well take a highly conservative approach
  • Annotate as sparingly as possible
  • Highest accuracy with fewest symbols
  • Error-driven, largely manual hill-climb, adding
    one annotation type at a time

40
Unlexicalized PCFGs
  • What do we mean by an unlexicalized PCFG?
  • Grammar rules are not systematically specified
    down to the level of lexical items
  • NP-stocks is not allowed
  • NPS-CC is fine
  • Closed vs. open class words (PPVP-for)
  • Long tradition in linguistics of using function
    words as features or markers for selection
  • Contrary to the bilexical idea of semantic heads
  • Open-class selection really a proxy for semantics
  • Honesty checks
  • Number of symbols keep the grammar very small
  • No smoothing over-annotating is a real danger

41
Tag Splits
  • Problem Treebank tags are too coarse.
  • Example Sentential, PP, and other prepositions
    are all marked IN.
  • Partial Solution
  • Subdivide the IN tag.

Annotation F1 Size
Previous 78.3 8.0K
SPLIT-IN 80.3 8.1K
42
Yield Splits
  • Problem sometimes the behavior of a category
    depends on something inside its future yield.
  • Examples
  • Possessive NPs
  • Finite vs. infinite VPs
  • Lexical heads!
  • Solution annotate future elements into nodes.

Annotation F1 Size
Previous 82.3 9.7K
POSS-NP 83.1 9.8K
SPLIT-VP 85.7 10.5K
43
A Fully Annotated Tree
44
Unlexicalized Sec. 23 Results
Parser LP LR F1 CB 0 CB
Magerman 95 84.9 84.6 84.7 1.26 56.6
Collins 96 86.3 85.8 86.0 1.14 59.9
Current Work 86.9 85.7 86.3 1.10 60.3
Charniak 97 87.4 87.5 87.4 1.00 62.1
Collins 99 88.7 88.6 88.6 0.90 67.1
  • Beats first generation lexicalized parsers.
  • Much of the power of lexicalization from
    closed-class monolexicalization.

45
Conclusions
  • Parsing as shortest path finding is an appealing
    conceptual approach
  • A parsing can give very considerable speedups
    while maintaining exact inference
  • A modularized lexicalized parser
  • Is fast through the use of sharp A estimates
  • Continues to provide exact inference in this more
    complex case
  • Accurate unlexicalized parsing
  • Shows component models can be improved
  • One can parse accurately without lexicalization

46
For more
  • Papers
  • Dan Klein and Christopher D. Manning. 2002. Fast
    Exact Inference with a Factored Model for Natural
    Language Parsing. Advances in Neural Information
    Processing Systems 15 (NIPS 2002), December 2002.
  • Dan Klein and Christopher D. Manning. 2003.
    Accurate Unlexicalized Parsing. ACL 2003, pp.
    423-430.
  • Available at http//nlp.stanford.edu/manning/pap
    ers/
  • Parser
  • http//nlp.stanford.edu/
  • (Chinese and German included in the box)

47
The End
Thank you!
48
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com