Recognising Textual Entailment and Computational Semantics - PowerPoint PPT Presentation

1 / 86
About This Presentation
Title:

Recognising Textual Entailment and Computational Semantics

Description:

Boxer. Possible Evaluation methods. Recognising Textual ... Semantic construction in Boxer. Work is done in the lexicon. Lambda calculus as glue language ... – PowerPoint PPT presentation

Number of Views:178
Avg rating:3.0/5.0
Slides: 87
Provided by: Joha52
Category:

less

Transcript and Presenter's Notes

Title: Recognising Textual Entailment and Computational Semantics


1
Recognising Textual Entailment and Computational
Semantics

Johan Bos Dipartimento di Informatica University
of Rome "La Sapienza"
2
State of the Art
  • Computational semantics has now reached a
    state where we have at our disposal robust
    systems that are able to compute semantic
    representations for natural language texts,
    achieving wide coverage in open domains

3
Evaluation
  • In order to understand what we are doing, we
    should be able to measure the performance of our
    systems
  • Also important for funding and assessing
    possibilities for commercial development
  • What would be a good method to evaluate systems
    that produce semantic representations?

4
A sem-beauty contest?
Sem World 2007
5
Recognising Textual Entailment
  • Recently, a new shared task has been organised
    RTE

6
Recognising Textual Entailment
  • Recently, a new shared task has been organised
    RTE
  • Is RTE a good method for semantic evaluation?

7
Outline of this talk
  • Wide-coverage Semantics
  • Boxer
  • Possible Evaluation methods
  • Recognising Textual Entailment
  • Discussion

8
Wide-coverage semantics
  • Lingo/LKB Minimal Recursive Semantics
    Copestake 2002

9
Wide-coverage semantics
  • Lingo/LKB Minimal Recursive Semantics
    Copestake 2002
  • ShalmaneserFrame SemanticsErk Pado 2006

10
Wide-coverage semantics
  • Lingo/LKB Minimal Recursive Semantics
    Copestake 2002
  • ShalmaneserFrame SemanticsErk Pado 2006
  • BoxerDiscourse Representation Structures Bos
    2005

11
Boxer
  • Works on output of the CC parser
  • Input CCG derivations
  • Output DRT boxes
  • The CC Parser
  • Statistical, robust, wide-coverage
  • Clark Curran (ACL 2004)
  • Grammar derived from CCGbank
  • 409 different categories
  • Hockenmaier Steedman (ACL 2002)

12
Semantic construction in Boxer
  • Work is done in the lexicon
  • Lambda calculus as glue language
  • Function application and beta-conversion
  • Semantic formalism
  • Discourse Representation Structures
  • First-order logic formulas
  • Output format
  • Prolog terms
  • XML

13
CCGDRT lexical semantics
14
CCGDRT derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p_at_xq_at_x ?z.
    ?x.x_at_?y.

15
CCGDRT derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p_at_xq_at_x ?z.
    ?x.x_at_?y.
  • -----------------------------------------------
    - (FA)
  • NP a spokesman
  • ?p. ?q. p_at_xq_at_x_at_?z.

16
CCGDRT derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p_at_xq_at_x ?z.
    ?x.x_at_?y.
  • --------------------------------------------------
    ------ (FA)
  • NP a spokesman
  • ?q. q_at_x

17
CCGDRT derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p_at_xq_at_x ?z.
    ?x.x_at_?y.
  • --------------------------------------------------
    ------ (FA)
  • NP a spokesman
  • ?q. q_at_x

18
CCGDRT derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p_at_xq_at_x ?z.
    ?x.x_at_?y.
  • --------------------------------------------------
    ------ (FA)
  • NP a spokesman
  • ?q. q_at_x
  • ---------------------------------------
    ----------------------------------------- (BA)

  • S a spokesman lied
  • ?x.x_at_?y.
    _at_?q. q_at_x

19
CCGDRT derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p_at_xq_at_x ?z.
    ?x.x_at_?y.
  • --------------------------------------------------
    ------ (FA)
  • NP a spokesman
  • ?q. q_at_x
  • ---------------------------------------
    ----------------------------------------- (BA)

  • S a spokesman lied
  • ?q.
    q_at_x _at_ ?y.

20
CCGDRT derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p_at_xq_at_x ?z.
    ?x.x_at_?y.
  • --------------------------------------------------
    ------ (FA)
  • NP a spokesman
  • ?q. q_at_x
  • ---------------------------------------
    ----------------------------------------- (BA)

  • S a spokesman lied


21
CCGDRT derivation
  • NP/Na Nspokesman
    S\NPlied
  • ?p. ?q. p_at_xq_at_x ?z.
    ?x.x_at_?y.
  • --------------------------------------------------
    ------ (FA)
  • NP a spokesman
  • ?q. q_at_x
  • ---------------------------------------
    ----------------------------------------- (BA)

  • S a spokesman lied


22
Example Output
  • ExamplePierre Vinken, 61 years old, will join
    the board as a nonexecutive director Nov. 29. Mr.
    Vinken is chairman of Elsevier N.V., the Dutch
    publishing group.
  • Semantic representation, DRT
  • Complete Wall Street Journal

23
Outline of this talk
  • Wide-coverage Semantics
  • Boxer
  • Possible Evaluation methods
  • Recognising Textual Entailment
  • Discussion

24
Evaluation -- possible methods
  • Annotated corpus of semantic representations
  • Task-oriented evaluation
  • Evaluation by inference

25
Annotated Corpus
  • Basic idea
  • Create a corpus of sentences annotated with
    semantic representations
  • Key example Penn Tree Bank
  • Going in this direction propbank, framenet
  • Problems
  • We dont have such a corpus
  • There are no resources to build one
  • What constitutes a good semantic representation?

26
Annotation Problems 1/3
  • How shallow, how deep?
  • Davidsonian or neo-Davidsonian?
  • Plural noun phrases?
  • Tense and aspect?
  • Superlative and comparatives?

27
Annotation Problems 2/3
  • Underspecified or resolved?
  • Scope of quantifiers and operators
  • Word senses from Wordnet?
  • Anaphora
  • Presupposition

28
Annotation Problems 3/3
  • Which semantic formalism?
  • Minimal Recursive Semantics
  • Discourse Representation Theory
  • First-order logic
  • OWL?

29
Task-oriented evaluation
  • Employ semantics in existing NLP applications,
    such as QA and MT
  • Measure performance with and without semantics,
    or comparative analysis
  • Nice idea, but
  • Interface problems
  • Substitution problems

30
Message
  • Evaluating semantic components is not
    kerfuffle-free

31
Outline of this talk
  • Wide-coverage Semantics
  • Boxer
  • Possible Evaluation methods
  • Recognising Textual Entailment
  • Discussion

32
RTE
  • Recognising
  • Textual
  • Entailment

33
RTE
  • Recognising Textual Entailment

34
RTE
  • Recognising Textual Entailment

35
Recognising Textual Entailment
  • A task for NLP systems to recognise entailment
    between two (short) texts
  • Introduced in 2004/2005 as part of the PASCAL
    Network of Excellence
  • Proved to be a difficult, but popular task
  • PASCAL provided a development and test set of
    several hundred examples
  • Organised by Ido Dagan and others

36
RTE Example (entailment)
RTE 1977 (TRUE)
His family has steadfastly denied the
charges. ----------------------------------------
------------- The charges were denied by his
family.
?
37
RTE Example (no entailment)
RTE 2030 (FALSE)
Lyon is actually the gastronomical capital of
France. ------------------------------------------
----------- Lyon is the capital of France.
X
38
RTE is hard, example 1
Example (TRUE)
The leaning tower is a building in Pisa. Pisa is
a town in Italy. ---------------------------------
-------------------- The leaning tower is a
building in Italy.
?
39
RTE is hard, example 1
Example (FALSE)
The leaning tower is the highest building in
Pisa. Pisa is a town in Italy. -------------------
---------------------------------- The leaning
tower is the highest building in Italy.
X
40
RTE is hard, example 2
Example (TRUE)
Johan is walking around. -------------------------
---------------------------- Johan is walking.
?
41
RTE is hard, example 2
Example (FALSE)
Johan is farting around. -------------------------
---------------------------- Johan is farting.
X
42
History of RTE
  • Old or new?
  • Perhaps surprisingly, RTE is not really a new
    task
  • Has been present implicitly in computational and
    formal semantics
  • 1996 FRACAS test suite
  • 2004/5 PASCAL challenges

43
History of RTE
  • Old or new?
  • Perhaps surprisingly, RTE is not really a new
    task
  • Has been present implicitly in computational and
    formal semantics
  • 1996 FRACAS test suite
  • 2004/5 PASCAL challenges
  • The first RTE examples are over two thousand
    years old!

44
Aristotles Syllogisms
ARISTOTLE 1 (TRUE)
All men are mortal. Socrates is a
man. ------------------------------- Socrates is
mortal.
?
45
Aristotles Syllogisms
ARISTOTLE 2 (FALSE)
All men are mortal. Socrates is not a
man. ------------------------------- Socrates is
mortal.
X
46
The CURT system
  • Blackburn Bos 2005 includes a system that
    checks for consistency and informativity of new
    utterances
  • Not robust, but arguably one of the first
    systems that implement textual inference
  • First-order logic, theorem proving, and model
    building

47
CURT
  • Testing a discourse for informativity

48
CURT
  • Testing a discourse for informativity

49
CURT
  • Testing a discourse for informativity

50
CURT
  • Testing a discourse for informativity

51
The FRACAS test suite
  • European Project on Computational Semantics, in
    the mid 1990s
  • Test suite published in D16, but since then
    forgotten?
  • Cooper et al. (1996) Using the Framework. Fracas
    deliverable D16, section 3
  • Aim of test suite measure semantic competence of
    NLP system

52
The FRACAS test suite
  • Grouped on linguistic and semantic phenomena
  • Generalised quantifiers
  • Plurals
  • Nominal Anaphora
  • Ellipsis
  • Adjectives
  • Comparatives
  • Temporal Reference
  • Verbs
  • Attitudes

53
FRACAS example pairs
  • 3.209 Mickey is a small animal.
    Dumbo is a large animal.
    Is Mickey
    smaller than Dumbo? YES
  • 3.205 Dumbo is a large animal.
    .. ... Is Dumbo
    a small animal? NO
  • 3.206 Fido is not a small animal.
    .. Is Fido a
    large animal? DONT KNOW

54
PASCAL RTE
  • First organised evaluation campaign on natural
    language entailment
  • RTE-1 UK 2005
  • RTE-2 Venice 2006
  • RTE-3 Prague 2007
  • Coordinated by Ido Dagan and others
  • Now already a well established shared task in
    computational linguistics

55
Approaches to RTE
  • There are several methods
  • We will look at five of them to see how difficult
    RTE actually is
  • And whether computational semantics can play a
    role

56
Recognising Textual Entailment
  • Method 1
  • Flipping a coin

57
Flipping a coin
  • Advantages
  • Easy to implement
  • Disadvantages
  • Just 50 accuracy

58
Recognising Textual Entailment
  • Method 2
  • Calling a friend

59
Calling a friend
  • Advantages
  • High accuracy (95)
  • Disadvantages
  • Lose friends
  • High phonebill

60
Recognising Textual Entailment
  • Method 3
  • Ask the audience

61
Ask the audience
RTE 893 (????)
The first settlements on the site of Jakarta
wereestablished at the mouth of the Ciliwung,
perhapsas early as the 5th century
AD. ----------------------------------------------
------------------ The first settlements on the
site of Jakarta wereestablished as early as the
5th century AD.
62
Human Upper Bound
RTE 893 (TRUE)
The first settlements on the site of Jakarta
wereestablished at the mouth of the Ciliwung,
perhapsas early as the 5th century
AD. ----------------------------------------------
------------------ The first settlements on the
site of Jakarta wereestablished as early as the
5th century AD.
?
63
Recognising Textual Entailment
  • Method 4
  • Word Overlap

64
Word Overlap Approaches
  • Popular approach
  • Ranging in sophistication from simple bag of word
    to use of WordNet
  • Accuracy rates ca. 55

65
Word Overlap
  • Advantages
  • Relatively straightforward algorithm
  • Disadvantages
  • Hardly better than flipping a coin

66
RTE State-of-the-Art
  • Pascal RTE challenge
  • Hard problem
  • Requires semantics

67
Recognising Textual Entailment
  • Method 5
  • Computational Semantics

68
Nutcracker
  • Components of Nutcracker
  • The CC parser for CCG
  • Boxer
  • Vampire, a FOL theorem prover
  • Paradox and Mace, FOL model builders
  • Background knowledge
  • WordNet hyponyms, synonyms
  • NomLex nominalisations

69
How Nutcracker works
  • Given a textual entailment pair T/H with text T
    and hypothesis H
  • Produce DRSs for T and H
  • Translate these DRSs into FOL
  • Give the following input to the theorem prover
    Vampire
  • T ? H
  • If Vampire finds a proof, then we predict that T
    entails H

70
Example (Vampire proof)
RTE-2 112 (TRUE)
On Friday evening, a car bomb exploded outside a
Shiite mosque in Iskandariyah, 30 miles south of
the capital. -------------------------------------
---------------- A bomb exploded outside a mosque.
?
71
Example (Vampire proof)
RTE-2 489 (TRUE)
Initially, the Bundesbank opposed the
introduction of the euro but was compelled to
accept it in light of the political pressure of
the capitalist politicians who supportedits
introduction. ------------------------------------
----------------- The introduction of the euro
has been opposed.
?
72
WordNet at work
RTE 1952 (TRUE)
Crude oil prices soared to record
levels. ------------------------------------------
----------- Crude oil prices rise.
?
  • Background Knowledge?x(soar(x)?rise(x))

73
Nutcracker results
  • Nutcracker, combined with a shallow overlap
    system, was one of the top systems at RTE-1

74
World Knowledge 1
RTE 1049 (TRUE)
Four Venezuelan firefighters who were traveling
to a training course in Texas were killed when
their sport utility vehicle drifted onto the
shoulder of a Highway and struck a parked
truck. -------------------------------------------
--------------------- Four firefighters were
killed in a car accident.
?
75
World Knowledge 2
RTE-2 235 (TRUE)
Indonesia says the oil blocks are within its
borders, as does Malaysia, which has also sent
warships to the area, claiming that its waters
and airspace have been violated. ----------------
----------------------------------------------- Th
ere is a territorial waters dispute.
?
76
Outline of this talk
  • Wide-coverage Semantics
  • Boxer
  • Possible Evaluation methods
  • Recognising Textual Entailment
  • Discussion

77
Discussion
  • We now know what RTE is and how it relates to
    computational semantics
  • From the perspective of computational semantics
  • What is good about RTE?
  • What is bad about RTE?

78
Good about RTE
  • Reasonably natural examples
  • Evaluation measure simple
  • Independent of semantic formalism
  • A relatively large set of examples

79
Bad about RTE
  • Unfocussed
  • Examples can contain more than one phenomenon
    that you to have get right
  • Difficult to use in system developing
  • No control over whether a system understands
    certain phenomena or not
  • Unclear how much background knowledge is
    permitted
  • Unclear how much pragmatics is assumed

80
An example of a focused test suite
  • Text from real data
  • The Osaka World Trade Center is the highest
    building in Japan.
  • Possible Hypotheses
  • The Osaka World Trade Center is the third highest
    building in Japan. NO
  • The Osaka World Trade Center is a building in
    Japan. YES
  • The Osaka World Trade Center is one of the
    highest buildings in Japan. YES
  • The Osaka World Trade Center is the highest
    building in Western Japan. MAYBE

81
The Future
  • In order to make progress, we need more focused
    test suites
  • The FRACAS collection is just a start
  • These are not meant as a replacement for the
    PASCAL RTE examples
  • Note this is not really a new idea. But the time
    seems ripe to develop such test suites

82
Textual Inference Data Sets
  • Pascal data sets
  • RTE 1 (1.376 pairs)
  • RTE 2 (1.600 pairs)
  • Other less known data sets
  • FRACAS (346 pairs) Cooper et al. 1996
  • PARC (76 pairs) Zaenen, Karttunen Crouch 2005
  • Specific phenomena
  • Adjectives (ca. 1.000 pairs) Amoia Gardent
    2006

83
Comparing FRACAS, CURT PASCAL
  • FRACAS test suite
  • YES / NO / MAYBE
  • CURT system
  • Uninformative, Inconsistent, OK
  • PASCAL RTE dataset
  • TRUE / FALSE

84
Available for research
  • What we need is systems to experiment with
  • Boxer is freely available for research purposes
  • Nutcracker will be made available at some point

85
Conclusion
  • RTE can be seen as the Turing Test for
    computational semantics
  • Therefore, computational semanticists should take
    the RTE task seriously
  • The number of computational semanticists
    participating in the RTE campaigns has been
    surprisingly low
  • Let's hope that changes in the future!

86
No Sem World 2007
Sem World 2007
Write a Comment
User Comments (0)
About PowerShow.com