Recognising Textual Entailment and Computational Semantics

About This Presentation

Title:

Recognising Textual Entailment and Computational Semantics

Description:

Boxer. Possible Evaluation methods. Recognising Textual ... Semantic construction in Boxer. Work is done in the lexicon. Lambda calculus as glue language ... – PowerPoint PPT presentation

Number of Views:178

Avg rating:3.0/5.0

Slides: 87

Provided by: Joha52

Category:

more less

Transcript and Presenter's Notes

Title: Recognising Textual Entailment and Computational Semantics

1
Recognising Textual Entailment and Computational
Semantics

Johan Bos Dipartimento di Informatica University
of Rome "La Sapienza"
2
State of the Art

Computational semantics has now reached a
state where we have at our disposal robust
systems that are able to compute semantic
representations for natural language texts,
achieving wide coverage in open domains

3
Evaluation

In order to understand what we are doing, we
should be able to measure the performance of our
systems
Also important for funding and assessing
possibilities for commercial development
What would be a good method to evaluate systems
that produce semantic representations?

4
A sem-beauty contest?
Sem World 2007
5
Recognising Textual Entailment

Recently, a new shared task has been organised
RTE

6
Recognising Textual Entailment

Recently, a new shared task has been organised
RTE
Is RTE a good method for semantic evaluation?

7
Outline of this talk

Wide-coverage Semantics
Boxer
Possible Evaluation methods
Recognising Textual Entailment
Discussion

8
Wide-coverage semantics

Lingo/LKB Minimal Recursive Semantics
Copestake 2002

9
Wide-coverage semantics

Lingo/LKB Minimal Recursive Semantics
Copestake 2002
ShalmaneserFrame SemanticsErk Pado 2006

10
Wide-coverage semantics

Lingo/LKB Minimal Recursive Semantics
Copestake 2002
ShalmaneserFrame SemanticsErk Pado 2006
BoxerDiscourse Representation Structures Bos
2005

11
Boxer

Works on output of the CC parser
Input CCG derivations
Output DRT boxes
The CC Parser
Statistical, robust, wide-coverage
Clark Curran (ACL 2004)
Grammar derived from CCGbank
409 different categories
Hockenmaier Steedman (ACL 2002)

12
Semantic construction in Boxer

Work is done in the lexicon
Lambda calculus as glue language
Function application and beta-conversion
Semantic formalism
Discourse Representation Structures
First-order logic formulas
Output format
Prolog terms
XML

13
CCGDRT lexical semantics
14
CCGDRT derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y.

15
CCGDRT derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y.
-----------------------------------------------
- (FA)
NP a spokesman
?p. ?q. p_at_xq_at_x_at_?z.

16
CCGDRT derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y.
--------------------------------------------------
------ (FA)
NP a spokesman
?q. q_at_x

17
CCGDRT derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y.
--------------------------------------------------
------ (FA)
NP a spokesman
?q. q_at_x

18
CCGDRT derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y.
--------------------------------------------------
------ (FA)
NP a spokesman
?q. q_at_x
---------------------------------------
----------------------------------------- (BA)
S a spokesman lied
?x.x_at_?y.
_at_?q. q_at_x

19
CCGDRT derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y.
--------------------------------------------------
------ (FA)
NP a spokesman
?q. q_at_x
---------------------------------------
----------------------------------------- (BA)
S a spokesman lied
?q.
q_at_x _at_ ?y.

20
CCGDRT derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y.
--------------------------------------------------
------ (FA)
NP a spokesman
?q. q_at_x
---------------------------------------
----------------------------------------- (BA)
S a spokesman lied

21
CCGDRT derivation

NP/Na Nspokesman
S\NPlied
?p. ?q. p_at_xq_at_x ?z.
?x.x_at_?y.
--------------------------------------------------
------ (FA)
NP a spokesman
?q. q_at_x
---------------------------------------
----------------------------------------- (BA)
S a spokesman lied

22
Example Output

ExamplePierre Vinken, 61 years old, will join
the board as a nonexecutive director Nov. 29. Mr.
Vinken is chairman of Elsevier N.V., the Dutch
publishing group.
Semantic representation, DRT
Complete Wall Street Journal

23
Outline of this talk

Wide-coverage Semantics
Boxer
Possible Evaluation methods
Recognising Textual Entailment
Discussion

24
Evaluation -- possible methods

Annotated corpus of semantic representations
Task-oriented evaluation
Evaluation by inference

25
Annotated Corpus

Basic idea
Create a corpus of sentences annotated with
semantic representations
Key example Penn Tree Bank
Going in this direction propbank, framenet
Problems
We dont have such a corpus
There are no resources to build one
What constitutes a good semantic representation?

26
Annotation Problems 1/3

How shallow, how deep?
Davidsonian or neo-Davidsonian?
Plural noun phrases?
Tense and aspect?
Superlative and comparatives?

27
Annotation Problems 2/3

Underspecified or resolved?
Scope of quantifiers and operators
Word senses from Wordnet?
Anaphora
Presupposition

28
Annotation Problems 3/3

Which semantic formalism?
Minimal Recursive Semantics
Discourse Representation Theory
First-order logic
OWL?

29
Task-oriented evaluation

Employ semantics in existing NLP applications,
such as QA and MT
Measure performance with and without semantics,
or comparative analysis
Nice idea, but
Interface problems
Substitution problems

30
Message

Evaluating semantic components is not
kerfuffle-free

31
Outline of this talk

Wide-coverage Semantics
Boxer
Possible Evaluation methods
Recognising Textual Entailment
Discussion

32
RTE

Recognising
Textual
Entailment

33
RTE

Recognising Textual Entailment

34
RTE

Recognising Textual Entailment

35
Recognising Textual Entailment

A task for NLP systems to recognise entailment
between two (short) texts
Introduced in 2004/2005 as part of the PASCAL
Network of Excellence
Proved to be a difficult, but popular task
PASCAL provided a development and test set of
several hundred examples
Organised by Ido Dagan and others

36
RTE Example (entailment)
RTE 1977 (TRUE)
His family has steadfastly denied the
charges. ----------------------------------------
------------- The charges were denied by his
family.
?
37
RTE Example (no entailment)
RTE 2030 (FALSE)
Lyon is actually the gastronomical capital of
France. ------------------------------------------
----------- Lyon is the capital of France.
X
38
RTE is hard, example 1
Example (TRUE)
The leaning tower is a building in Pisa. Pisa is
a town in Italy. ---------------------------------
-------------------- The leaning tower is a
building in Italy.
?
39
RTE is hard, example 1
Example (FALSE)
The leaning tower is the highest building in
Pisa. Pisa is a town in Italy. -------------------
---------------------------------- The leaning
tower is the highest building in Italy.
X
40
RTE is hard, example 2
Example (TRUE)
Johan is walking around. -------------------------
---------------------------- Johan is walking.
?
41
RTE is hard, example 2
Example (FALSE)
Johan is farting around. -------------------------
---------------------------- Johan is farting.
X
42
History of RTE

Old or new?
Perhaps surprisingly, RTE is not really a new
task
Has been present implicitly in computational and
formal semantics
1996 FRACAS test suite
2004/5 PASCAL challenges

43
History of RTE

Old or new?
Perhaps surprisingly, RTE is not really a new
task
Has been present implicitly in computational and
formal semantics
1996 FRACAS test suite
2004/5 PASCAL challenges
The first RTE examples are over two thousand
years old!

44
Aristotles Syllogisms
ARISTOTLE 1 (TRUE)
All men are mortal. Socrates is a
man. ------------------------------- Socrates is
mortal.
?
45
Aristotles Syllogisms
ARISTOTLE 2 (FALSE)
All men are mortal. Socrates is not a
man. ------------------------------- Socrates is
mortal.
X
46
The CURT system

Blackburn Bos 2005 includes a system that
checks for consistency and informativity of new
utterances
Not robust, but arguably one of the first
systems that implement textual inference
First-order logic, theorem proving, and model
building

47
CURT

Testing a discourse for informativity

48
CURT

Testing a discourse for informativity

49
CURT

Testing a discourse for informativity

50
CURT

Testing a discourse for informativity

51
The FRACAS test suite

European Project on Computational Semantics, in
the mid 1990s
Test suite published in D16, but since then
forgotten?
Cooper et al. (1996) Using the Framework. Fracas
deliverable D16, section 3
Aim of test suite measure semantic competence of
NLP system

52
The FRACAS test suite

Grouped on linguistic and semantic phenomena
Generalised quantifiers
Plurals
Nominal Anaphora
Ellipsis
Adjectives
Comparatives
Temporal Reference
Verbs
Attitudes

53
FRACAS example pairs

3.209 Mickey is a small animal.
Dumbo is a large animal.
Is Mickey
smaller than Dumbo? YES
3.205 Dumbo is a large animal.
.. ... Is Dumbo
a small animal? NO
3.206 Fido is not a small animal.
.. Is Fido a
large animal? DONT KNOW

54
PASCAL RTE

First organised evaluation campaign on natural
language entailment
RTE-1 UK 2005
RTE-2 Venice 2006
RTE-3 Prague 2007
Coordinated by Ido Dagan and others
Now already a well established shared task in
computational linguistics

55
Approaches to RTE

There are several methods
We will look at five of them to see how difficult
RTE actually is
And whether computational semantics can play a
role

56
Recognising Textual Entailment

Method 1
Flipping a coin

57
Flipping a coin

Advantages
Easy to implement
Disadvantages
Just 50 accuracy

58
Recognising Textual Entailment

Method 2
Calling a friend

59
Calling a friend

Advantages
High accuracy (95)
Disadvantages
Lose friends
High phonebill

60
Recognising Textual Entailment

Method 3
Ask the audience

61
Ask the audience
RTE 893 (????)
The first settlements on the site of Jakarta
wereestablished at the mouth of the Ciliwung,
perhapsas early as the 5th century
AD. ----------------------------------------------
------------------ The first settlements on the
site of Jakarta wereestablished as early as the
5th century AD.
62
Human Upper Bound
RTE 893 (TRUE)
The first settlements on the site of Jakarta
wereestablished at the mouth of the Ciliwung,
perhapsas early as the 5th century
AD. ----------------------------------------------
------------------ The first settlements on the
site of Jakarta wereestablished as early as the
5th century AD.
?
63
Recognising Textual Entailment

Method 4
Word Overlap

64
Word Overlap Approaches

Popular approach
Ranging in sophistication from simple bag of word
to use of WordNet
Accuracy rates ca. 55

65
Word Overlap

Advantages
Relatively straightforward algorithm
Disadvantages
Hardly better than flipping a coin

66
RTE State-of-the-Art

Pascal RTE challenge
Hard problem
Requires semantics

67
Recognising Textual Entailment

Method 5
Computational Semantics

68
Nutcracker

Components of Nutcracker
The CC parser for CCG
Boxer
Vampire, a FOL theorem prover
Paradox and Mace, FOL model builders
Background knowledge
WordNet hyponyms, synonyms
NomLex nominalisations

69
How Nutcracker works

Given a textual entailment pair T/H with text T
and hypothesis H
Produce DRSs for T and H
Translate these DRSs into FOL
Give the following input to the theorem prover
Vampire
T ? H
If Vampire finds a proof, then we predict that T
entails H

70
Example (Vampire proof)
RTE-2 112 (TRUE)
On Friday evening, a car bomb exploded outside a
Shiite mosque in Iskandariyah, 30 miles south of
the capital. -------------------------------------
---------------- A bomb exploded outside a mosque.
?
71
Example (Vampire proof)
RTE-2 489 (TRUE)
Initially, the Bundesbank opposed the
introduction of the euro but was compelled to
accept it in light of the political pressure of
the capitalist politicians who supportedits
introduction. ------------------------------------
----------------- The introduction of the euro
has been opposed.
?
72
WordNet at work
RTE 1952 (TRUE)
Crude oil prices soared to record
levels. ------------------------------------------
----------- Crude oil prices rise.
?

Background Knowledge?x(soar(x)?rise(x))

73
Nutcracker results

Nutcracker, combined with a shallow overlap
system, was one of the top systems at RTE-1

74
World Knowledge 1
RTE 1049 (TRUE)
Four Venezuelan firefighters who were traveling
to a training course in Texas were killed when
their sport utility vehicle drifted onto the
shoulder of a Highway and struck a parked
truck. -------------------------------------------
--------------------- Four firefighters were
killed in a car accident.
?
75
World Knowledge 2
RTE-2 235 (TRUE)
Indonesia says the oil blocks are within its
borders, as does Malaysia, which has also sent
warships to the area, claiming that its waters
and airspace have been violated. ----------------
----------------------------------------------- Th
ere is a territorial waters dispute.
?
76
Outline of this talk

Wide-coverage Semantics
Boxer
Possible Evaluation methods
Recognising Textual Entailment
Discussion

77
Discussion

We now know what RTE is and how it relates to
computational semantics
From the perspective of computational semantics
What is good about RTE?
What is bad about RTE?

78
Good about RTE

Reasonably natural examples
Evaluation measure simple
Independent of semantic formalism
A relatively large set of examples

79
Bad about RTE

Unfocussed
Examples can contain more than one phenomenon
that you to have get right
Difficult to use in system developing
No control over whether a system understands
certain phenomena or not
Unclear how much background knowledge is
permitted
Unclear how much pragmatics is assumed

80
An example of a focused test suite

Text from real data
The Osaka World Trade Center is the highest
building in Japan.
Possible Hypotheses
The Osaka World Trade Center is the third highest
building in Japan. NO
The Osaka World Trade Center is a building in
Japan. YES
The Osaka World Trade Center is one of the
highest buildings in Japan. YES
The Osaka World Trade Center is the highest
building in Western Japan. MAYBE

81
The Future

In order to make progress, we need more focused
test suites
The FRACAS collection is just a start
These are not meant as a replacement for the
PASCAL RTE examples
Note this is not really a new idea. But the time
seems ripe to develop such test suites

82
Textual Inference Data Sets

Pascal data sets
RTE 1 (1.376 pairs)
RTE 2 (1.600 pairs)
Other less known data sets
FRACAS (346 pairs) Cooper et al. 1996
PARC (76 pairs) Zaenen, Karttunen Crouch 2005
Specific phenomena
Adjectives (ca. 1.000 pairs) Amoia Gardent
2006

83
Comparing FRACAS, CURT PASCAL

FRACAS test suite
YES / NO / MAYBE
CURT system
Uninformative, Inconsistent, OK
PASCAL RTE dataset
TRUE / FALSE

84
Available for research

What we need is systems to experiment with
Boxer is freely available for research purposes
Nutcracker will be made available at some point

85
Conclusion

RTE can be seen as the Turing Test for
computational semantics
Therefore, computational semanticists should take
the RTE task seriously
The number of computational semanticists
participating in the RTE campaigns has been
surprisingly low
Let's hope that changes in the future!

86
No Sem World 2007
Sem World 2007

Write a Comment

User Comments (0)

About PowerShow.com

Recognising Textual Entailment and Computational Semantics - PowerPoint PPT Presentation

Recognising Textual Entailment and Computational Semantics

Boxer. Possible Evaluation methods. Recognising Textual ... Semantic construction in Boxer. Work is done in the lexicon. Lambda calculus as glue language ... – PowerPoint PPT presentation