Toward Dependency Path based Entailment - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Toward Dependency Path based Entailment

Description:

Toward Dependency Path based Entailment. Rodney Nielsen, Wayne Ward, and James Martin ... articles, 2.5B words, 347 words/doc. Gigaword (Graff, 2003) 77% of ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 22
Provided by: rodn68
Category:

less

Transcript and Presenter's Notes

Title: Toward Dependency Path based Entailment


1
Toward Dependency Path based Entailment
  • Rodney Nielsen, Wayne Ward, and James Martin

2
Dependency Path-based Entailment
  • DIRT (Lin and Pantel, 2001)
  • Unsupervised method to discover inference rules
  • X is author of Y X wrote Y
  • X solved Y X found a solution to Y
  • If two dependency paths tend to link the same
    sets of words, they hypothesize that their
    meanings are similar

3
ML Classification Approach
Dependency Path Based Entailment
Bag of Words
Graph Matching
  • Features derived from corpus statistics
  • Unigram co-occurrence
  • Surface form bigram co-occurrence
  • Dependency-derived bigram co-occurrence
  • Mixture of experts
  • About 18 ML classifiers from Weka toolkit
  • Classify by majority vote or average probability

4
Corpora
  • 7.4M articles, 2.5B words, 347 words/doc
  • Gigaword (Graff, 2003) 77 of documents
  • Reuters Corpus (Lewis et al., 2004)
  • TIPSTER
  • Lucene IR engine
  • Two indices
  • Word surface form
  • Porter stem filter
  • Stop words a, an, the

5
Core Features
6
Dependency Features
  • Dependency bigram features

7
Dependency Features
  • Descendent relation statistics

8
Dependency Features
  • Descendent relation statistics

9
Dependency Features
  • Descendent relation statistics

10
Dependency Features
  • Descendent relation statistics

11
Verb Dependency Features
  • Combined verb descendent relation features
  • Worst verb descendent relation features

12
Subject Dependency Features
  • Combined and worst subject descendent relations
  • Combined and worst subject-to-verb paths

13
Other Dependency Features
  • Repeat these same features for
  • Object
  • pcomp-n
  • Other descendent relations

14
Results
15
Feature Analysis
  • All feature sets are contributing according to
    cross validation on the training set
  • Most significant feature set
  • Unigram stem based word alignment
  • Most significant core repeated feature
  • Average MLE

16
Questions
Dependency Path Based Entailment
Bag of Words
Graph Matching
  • Mixture of experts classifier using corpus
    co-occurrence statistics
  • Moving in the direction of DIRT
  • Domain of Interest Student response analysis in
    intelligent tutoring systems

Hypothesis h
Text t
17
Why Entailment
  • Intelligent Tutoring Systems
  • Student Interaction Analysis
  • Are all aspects of the students answer entailed
    by the text and the gold standard answer
  • Are all aspects of the desired answer entailed by
    the students response

18
Word Alignment Features
  • Unigram word alignment

19
Word Alignment Features
  • Bigram word alignment
  • Example
  • lttgtNewspapers choke on rising paper costs and
    falling revenue.lt/tgtlthgtThe cost of paper is
    rising.lt/hgt
  • MLE(cost, t) ncost of, costs of /ncosts of
    6086/35800 0.17

20
Word Alignment Features
  • Average unigram and bigram
  • Stem-based tokens

21
Corpora
  • 7.4M articles/docs 2.5B words, 347 words/doc
  • Gigaword (Graff, 2003) -
  • 5.7M articles, 2.1B words, 375 words/article
  • 77 of documents and 83 of indexed words
  • Reuters Corpus (Lewis et al., 2004)
  • 0.8M articles, 0.17B words, 213 words/article
  • TIPSTER
  • 0.9M articles, 0.26B words, 291 words/article
Write a Comment
User Comments (0)
About PowerShow.com