RTE Stanford - PowerPoint PPT Presentation

About This Presentation
Title:

RTE Stanford

Description:

Jenny Finkel, Jeff Michels, Kristina Toutanova, Bill ... (NP (NNP Shrek) (CD 2)) (VP (VBD rang_up) (NP (QP ($ $) (CD 92) (CD million)))) MONEY, 9200000 ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 35
Provided by: christo394
Learn more at: https://nlp.stanford.edu
Category:
Tags: rte | shrek | stanford

less

Transcript and Presenter's Notes

Title: RTE Stanford


1
RTE _at_ Stanford
  • Rajat Raina, Aria Haghighi, Christopher Cox,
  • Jenny Finkel, Jeff Michels, Kristina Toutanova,
  • Bill MacCartney, Marie-Catherine de Marneffe,
  • Christopher D. Manning and Andrew Y. Ng
  • PASCAL Challenges Workshop
  • April 12, 2005

2
Our approach
  • Represent using syntactic dependencies
  • But also use semantic annotations.
  • Try to handle language variability.
  • Perform semantic inference over this
    representation
  • Use linguistic knowledge sources.
  • Compute a cost for inferring hypothesis from
    text.
  • Low cost ? Hypothesis is entailed.

3
Outline of this talk
  • Representation of sentences
  • Syntax Parsing and post-processing
  • Adding annotations on representation (e.g.,
    semantic roles)
  • Inference by graph matching
  • Inference by abductive theorem proving
  • A combined system
  • Results and error analysis

4
Sentence processing
  • Parse with a standard PCFG parser. Klein
    Manning, 2003
  • Al Qaeda Aal -Qa?ieda
  • Train on some extra sentences from recent news.
  • Used a high-performing Named Entity Recognizer
    (next slide)
  • Force parse tree to be consistent with certain NE
    tags.
  • Example American Ministry of Foreign Affairs
    announced that Russia called the United States...
  • (S
  • (NP (NNP American_Ministry_of_Foreign_Affairs))
  • (VP (VBD announced)
  • ()))

5
Named Entity Recognizer
  • Trained a robust conditional random field model.
    Finkel et al., 2003
  • Interpretation of numeric quantity statements
  • Example
  • T Kessler's team conducted 60,643 face-to-face
    interviews with adults in 14 countries.
  • H Kessler's team interviewed more than 60,000
    adults in 14 countries. TRUE
  • Annotate numerical values implied by
  • 6.2 bn, more than 60000, around 10,
  • MONEY/DATE named entities

6
Parse tree post-processing
  • Recognize collocations using WordNet
  • Example Shrek 2 rang up 92 million.
  • (S
  • (NP (NNP Shrek) (CD 2))
  • (VP (VBD rang_up)
  • (NP
  • (QP ( ) (CD 92) (CD million))))
  • (. .))

MONEY, 9200000
7
Parse tree ? Dependencies
  • Find syntactic dependencies
  • Transform parse tree representations into typed
    syntactic dependencies, including a certain
    amount of collapsing and normalization
  • Example Bills mother walked to the grocery
    store.
  • subj(walked, mother)
  • poss(mother, Bill)
  • to(walked, store)
  • nn(store, grocery)
  • Dependencies can also be written as a logical
    formula
  • mother(A) Bill(B) poss(B, A) grocery(C)
    store(C) walked(E, A, C)

8
Representations
Logical formula mother(A) Bill(B) poss(B,
A) grocery(C) store(C) walked(E, A, C)
  • Dependency graph

VBD
PERSON
ARGM-LOC
VBD
PERSON
  • Can make representation richer
  • walked is a verb
  • Bill is a PERSON (named entity).
  • store is the location/destination of walked.

9
Annotations
  • Parts-of-speech, named entities
  • Already computed.
  • Semantic roles
  • Example
  • T C and D Technologies announced that it has
    closed the acquisition of Datel, Inc.
  • H1 C and D Technologies acquired Datel Inc.
    TRUE
  • H2 Datel acquired C and D Technologies. FALSE
  • Use a state-of-the-art semantic role classifier
    to label verb arguments. Toutanova et al. 2005

10
More annotations
  • Coreference
  • Example
  • T Since its formation in 1948, Israel
  • H Israel was established in 1948. TRUE
  • Use a conditional random field model for
    coreference detection.
  • Note Appositive references were previously
    detected.
  • T Bush, the President of USA, went to Florida.
  • H Bush is the President of USA. TRUE
  • Other annotations
  • Word stems (very useful)
  • Word senses (no performance gain in our system)

11
Event nouns
  • Use a heuristic to find event nouns
  • Augment text representation using WordNet
    derivational links.
  • Example
  • T witnessed the murder of police commander ...
  • H Police officer killed. TRUE
  • Text logical formula
  • murder(M) police_commander(P) of(M, P)
  • Augment
  • murder(E, M, P)

NOUN
VERB
12
Outline of this talk
  • Representation of sentences
  • Syntax Parsing and post-processing
  • Adding annotations on representation (e.g.,
    semantic roles)
  • Inference by graph matching
  • Inference by abductive theorem proving
  • A combined system
  • Results and error analysis

13
Graph Matching Approach
  • Why Graph Matching?
  • Dependency tree has natural graphical
    interpretation
  • Successful in other domains e.g., Lossy image
    matching
  • Input Hypothesis (H) and Text (T) Graphs
  • Toy example
  • Vertices are words and phrases
  • Edges are labeled dependencies
  • Output Cost of matching H to T (next slide)

14
Graph Matching Idea
  • Idea Align H to T so that vertices are similar
    and preserve relations (as in machine
    translation)
  • A matching M is a mapping from vertices of H to
    vertices of T
  • Thus, for each vertex v in H, M(v) is a vertex
    in T

15
Graph Matching Costs
  • The cost of a matching MatchCost(M) measures the
    quality of a matching M
  • VertexCost(M) Compare vertices in H with
    matched vertices in T
  • RelationCost(M) Compare edges (relations) in H
    with corresponding edges (relations) in T
  • MatchCost(M) (1 - ß) VertexCost(M) ß
    RelationCost(M)

16
Graph Matching Costs
  • VertexCost(M)
  • For each vertex v in H, and vertex M(v) in T
  • Do vertex heads share same stem and/or POS ?
  • Is T vertex head a hypernym of H vertex head?
  • Are vertex heads similar phrases? (next slide)
  • RelationCost(M)
  • For each edge (v,v) in H, and edge (M(v),M(v))
    in T
  • Are parent/child pairs in H parent/child in T ?
  • Are parent/child pairs in H ancestor/descendant
    in T ?
  • Do parent/child pairs in H share a common
    ancestor in T?

17
Digression Phrase similarity
  • Measures based on WordNet (Resnik/Lesk).
  • Distributional similarity
  • Example run and marathon are related.
  • Latent Semantic Analysis to discover words that
    are distributionally similar (i.e., have common
    neighbors).
  • Used a web-search based measure
  • Query google.com for all pages with
  • run
  • marathon
  • Both run and marathon
  • Learning paraphrases. Similar to DIRT Lin and
    Pantel, 2001
  • World knowledge (labor intensive)
  • CEO Chief_Executive_Officer
  • Philippines ? Filipino
  • Can add common facts Paris is the capital of
    France,

18
Graph Matching Costs
  • VertexCost(M)
  • For each vertex v in H, and vertex M(v) in T
  • Do vertex heads share same stem and or POS ?
  • Is T vertex head a hypernym of H vertex head?
  • Are vertex heads similar phrases? (next slide)
  • RelationCost(M)
  • For each edge (v,v) in H, and edge (M(v),M(v))
    in T
  • Are parent/child pairs in H parent/child in T ?
  • Are parent/child pairs in H ancestor/descendant
    in T ?
  • Do parent/child pairs in H share a common
    ancestor in T?

19
Graph Matching Example
VertexCost (0.0 0.2 0.4)/3 0.2
RelationCost 0 (Graphs Isomorphic) ß 0.45
(say) MatchCost 0.55 (0.2) 0.45
(0.0) 0.11
20
Outline of this talk
  • Representation of sentences
  • Syntax Parsing and post-processing
  • Adding annotations on representation (e.g.,
    semantic roles)
  • Inference by graph matching
  • Inference by abductive theorem proving
  • A combined system
  • Results and error analysis

21
Abductive inference
  • Idea
  • Represent text and hypothesis as logical
    formulae.
  • A hypothesis can be inferred from the text if and
    only if the hypothesis logical formula can be
    proved from the text logical formula.
  • Toy example

Prove?
Allow assumptions at various costs BMW(t) 2
gt car(t) bought(p, q, r) 1 gt purchased(p, q,
r)
22
Abductive assumptions
  • Assign costs to all assumptions of the form
  • Build an assumption cost model

23
Abductive theorem proving
  • Each assumption provides a potential proof step.
  • Find the proof with the minimum total cost
  • Uniform cost search
  • If there is a low-cost proof, the hypothesis is
    entailed.
  • Example
  • T John(A) BMW(B) bought(E, A, B) H John(x)
    car(y) purchased(z, x, y)
  • Here is a possible proof by resolution
    refutation (for the earlier costs)
  • 0 -John(x) -car(y) -purchased(z, x, y) Given
    negation of hypothesis
  • 0 -car(y) -purchased(z, A, y) Unify with
    John(A)
  • 2 -purchased(z, A, B) Unify with BMW(B)
  • 3 NULL Unify with purchased(E, A, B)
  • Proof cost 3

24
Abductive theorem proving
  • Can automatically learn good assumption costs
  • Start from a labeled dataset (e.g. the PASCAL
    development set)
  • Intuition Find assumptions that are used in the
    proofs for TRUE examples, and lower their costs
    (by framing a log-linear model). Iterate.
    Details Raina et al., in submission

25
Some interesting features
  • Examples of handling complex constructions in
    graph matching/abductive inference.
  • Antonyms/Negation High cost for matching verbs,
    if they are antonyms or one is negated and the
    other not.
  • T Stocks fell. H
    Stocks rose. FALSE
  • T Clintons book was not a hit H Clintons
    book was a hit. FALSE
  • Non-factive verbs
  • T John was charged for doing X. H John
    did X. FALSE
  • Can detect because doing in text has
    non-factive charged as
  • a parent but did does not have such a parent.

26
Some interesting features
  • Superlative check
  • T This is the tallest tower in western Japan.
  • H This is the tallest tower in Japan. FALSE

27
Outline of this talk
  • Representation of sentences
  • Syntax Parsing and post-processing
  • Adding annotations on representation (e.g.,
    semantic roles)
  • Inference by graph matching
  • Inference by abductive theorem proving
  • A combined system
  • Results and error analysis

28
Results
  • Combine inference methods
  • Each system produces a score.
  • Separately normalize each systems score
    variance.
  • Suppose normalized scores are s1 and s2.
  • Final score S w1s1 w2s2
  • Learn classifier weights w1 and w2 on the
    development set using logistic regression. Two
    submissions
  • Train one classifier weight for all RTE tasks.
    (General)
  • Train different classifier weights for each RTE
    task. (ByTask)

29
Results
  • Best other results Accuracy58.6, CWS0.617
  • Balanced predictions. 55.4, 51.2 predicted
    TRUE on test set.

30
Results by task
31
Partial coverage results
Task-specific optimization seems better!
ByTask
ByTask
General
  • Can also draw coverage-CWS curves. For example
  • at 50 coverage, CWS 0.781
  • at 25 coverage, CWS 0.873

32
Some interesting issues
  • Phrase similarity
  • away from the coast ? farther inland
  • won victory in presidential election ? became
    President
  • stocks get a lift ? stocks rise
  • life threatening ? fatal
  • Dictionary definitions
  • believe there is only one God ? are monotheistic
  • World knowledge
  • K Club, venue of the Ryder Cup, ? K Club will
    host the Ryder Cup

33
Future directions
  • Need more NLP components in there
  • Better treatment of frequent nominalizations,
    parenthesized material, etc.
  • Need much more ability to do inference
  • Fine distinctions between meanings, and fine
    similarities.
  • e.g., reach a higher level and rise
  • We need a high-recall, reasonable precision
    similarity measure!
  • Other resources (e.g., antonyms) are also very
    sparse.
  • More task-specific optimization.

34
  • Thanks!
Write a Comment
User Comments (0)
About PowerShow.com