Title: Recognizing Textual Entailment Challenge PASCAL
1Recognizing Textual Entailment ChallengePASCAL
2Textual Entailment
- Textual entailment recognition is the task of
deciding, given two text fragments, whether the
meaning of one text is entailed (can be inferred)
from another text.
3Task Definition
- Given pairs of small text snippets, referred to
as Text-Hypothesis (T-H) pairs. Build a system
that will decide for each T-H pair whether T
indeed entails H or not. Results will be compared
to the manual gold standard generated by
annotators. - Example
- T Kurdistan Regional Government Prime Minister
Dr. Barham Salih was unharmed after an
assassination attempt. - Prime minister targeted for assassination
4Dataset Collection and Application Settings
- The dataset of Text-Hypothesis pairs was
collected by human annotators. It consists of
seven subsets - Information Retrieval (IR)
- Comparable Documents (CD)
- Reading Comprehension (RC)
- Question Answering (QA)
- Information Extraction (IE)
- Machine Translation (MT)
- Paraphrase Acquisition (PP)
5Approaching textual entailment recognition
- Solution approaches can be categorizes as.
- Deep analysis or understanding
- Using types of linguistic knowledge and resources
to accurately recognize textual entailment - Patterns of entailment (e.g. lexical relations,
syntactic alternations) - Processing technology (word co-occurrence
statistics, thesaurus, parsing, etc.) - Shallow approach
6Baseline
- Given that half the pairs are FALSE, the simplest
baseline is to label all pairs FALSE. This will
achieve 50 accuracy.
7Application of the BLEU (BiLingual Evaluation
Understudy) algorithm
- Shallow based on lexical level.
- It is based on calculating the percentage of
n-grams for a given translation to the human
standard one, a typical values for N are taken,
i.e. 1, 2, 3, 4. - It limits each n-gram appearance to a maximum
frequency. - The result of each n-gram is combined, and a
penalty is added to short text. - Scored 54 for development set, and a 50 in the
test set. - Good results in the CD, bad results in IE and IR.
- Problem, does not recognize syntactical or
semantics, such as synonyms and antonyms.
8Syntactic similarities
- Human annotators were asked to divide the data
set to - True by syntax
- False by syntax
- Not syntax
- Cannot decide
- Then using a robust parser to establish the
results. - A partial submission was provided. And humans
were used for the test.
9Tree edit distance
- The text as well as the Hypothesis is transformed
to a tree using a sentence splitter and a parser
to create the syntactic representation. - A matching module, find the beast sequence of
editing operations to obtain H from T. - Each editing operation (deletion, insertion and
substitution) is given a relative score. - Finally the total score is evaluated, if it
exceeds a certain limit them the pair is labeled
as true. - High accuracy for CD but 55 overall accuracy.
- Should be enriched by using resources as WordNet
and other libraries.
10Dependency Analysis and WordNet
- A dependency parser is used to normalized data in
appropriate tree representation. - Then a lexical entailment module is used, where
the sub branches of T an H can be entailed from
the other using - Synonymy and similarity
- Hyponymy and WordNet entailment, i.e. death
entail kill. - Multiwords, i.e. melanoma entails skin-cancer.
- Negation and antonymy, where negation is
propagated through tree leaves. - A matching between dependency trees using a
matching algorithm searching for matching
branches between T and H. - Results show high score in CD and a between 42 to
55 in other fields
11Syntactic Graph Distance a rule based and a SVM
based approach
- Use a graph distance theory, where a graph is
used to represent the H and T pair. - Use similarity measures to determine entailment
- T semantically subsumes H, e.g. H The cat eats
the mouse and T the cat devours the mouse,
eat generalizes devour). - T syntactically subsumes H, e.g., H The cat
eats the mouse and T the cat eats the mouse in
the garden, T contains a specializing
prepositional phrase). - T directly implies H (e.g., H The cat killed
the mouse, T the cat devours the mouse).
12Cont.
- A rule based system realize the following
- Node similarity
- Syntactic similarity
- Semantic similarity
- Applying a machine learning technique to evaluate
the parameters and make the final decision - Results high for CD .76 and .44-.59 for others
13hierarchical knowledge representation
- A hierarchical logic passed representation o the
T H pairs, where a description logic inspired
language is used, extended feature description
login (EFDL) which is similar to concept graph. - Nodes in the graph represent words or phrases.
- Manually generated rewriting rules are used for
semantic and syntactic representations. - A sentence in the text can have different
alternatives - The evaluation is based if any of the sentence
representations can infer H. - Results in the system set 64.8 while in the test
56.1, high CD lowest QA 50
14Logic like formula representation
- A parser is used to transfer the pair T and H to
graph, of logical phrases, where the nodes are
the words and the links are the relations. - A matching score is given for each pair of terms.
- The theorem proof is used to find the proof with
the lowest coast. - The final cost is evaluated is it is less than a
threshold, then the entailment is proved. - High results in the CD 79, Lowest with MT 47
average 55.
15Atomic Propositions
- Find entailment relation by comparing the atomic
proposition contained in the T and H. - The comparison of the atomic propositions is done
using a deduction system OTTER. - The atomic propositions are extracted from the
text using a parser. - WordNet is used for word relations.
- A semantic analyzer is used to transform the
output of the parser to first order logic. - Low accuracy .5188 especially for QA 47.
16Combining shallow over lapping technique with
deep theorem proving
- In the shallow stage a simple frequency test of
over lapping words is used. - In the deep stage CCGparser is used to generate
DRS, discourse representation theory. Which is
transformed to first order logic. - Vampire theorem prover and Paradox where used for
entailment proof. - A knowledge base was used to validate results
with real world. - WordNet
- Geographical knowledge from CIA
- Generic axioms for, for instance, the semantics
of possessives, active-passives, and locations. - The combined system has accuracy of .562 while
the shallow approach has an accuracy of 0.55.
17Applying COGEX logic prover
- First use parser to convert into logic.
- Then use COGEX, which is a modified version of
OTTER. - The prover requires a set of clauses called the
set of support which is used to initiate the
search for inferences. - The set of support is loaded with the negated
form of the hypothesis as well as the predicates
that make up the text passage. - Another list is required called the usable list,
contains clauses used by OTTER to generate
inferences. - The usable list consists of all the axioms that
have been generated either automatically or by
hand. - World Knowledge Axioms (Manually)
- NLP Axioms(SS and SM)
- WordNet Lexical Chains
- Overall accuracy .551, a lot of errors in the
parsing stage.
18Comparing task accuracy
CD Comparable Documents IE Information
Extraction QA Question Answering IR
Information Retrieval MT Machine
Translation RC Reading Comprehension PP
Paraphrasing
19Rsullt comparison
20Future work
- Search for a candidate parser to transform NL to
first order logic. - Use the largest set of KB to caputre similarity.
- Search for a robust theorem prover.
21References
- The first PASCAL Recognising Textual Entailment
Challenge (RTE I) - Ido Dagan, Oren Glickman and Bernardo Magnini.
The PASCAL Recognising Textual Entailment
Challenge.In Proceedings of the PASCAL
Challenges Workshop on Recognising Textual
Entailment, 2005. - http//www.cs.biu.ac.il/glikmao/rte05/index.html