A Practically Unsupervised Learning Method to Identify SingleSnippet Answers to Definition Questions - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

A Practically Unsupervised Learning Method to Identify SingleSnippet Answers to Definition Questions

Description:

The results may improve further with more training windows. ... Results may improve further by using more acquired patterns (we currently 200 ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 15
Provided by: ResearchM53
Category:

less

Transcript and Presenter's Notes

Title: A Practically Unsupervised Learning Method to Identify SingleSnippet Answers to Definition Questions


1
A Practically Unsupervised Learning Method to
Identify Single-Snippet Answers to Definition
Questions on the Web
  • Ion Androutsopoulos and Dimitrios Galanis
  • Department of Informatics
  • Athens University of Economics and Business
  • Greece

2
An add-on for Web search engines
  • Definition questions What is a tsunami?, Who
    was Duke Ellington?, What are pathogens?
  • Input a term to be defined (possibly multi-word)
    and the most highly ranked Web pages the search
    engine (Altavista) returned for that term.
  • Assumptions a classifier separates definition
    questions from others another module identifies
    the term to be defined.
  • Return at most m 250-character snippets.
  • Success if at least one of the m snippets
    contains an acceptable short definition.
  • Much as in earlier TRECs (for m 5).
  • Single-snippet definitions, unlike recent TRECs.

3
Examples
  • For the term generic drug
  • time pharmacist questioning pharmacist
    consulting pharmacy drug compounding drug price
    what is generic drug what is a generic drug? a
    generic drug is one which is identified by its
    official chemical name rather than an advertised
    brand name
  • there are new results for return to results
    glaxo wellcome's perspective on gatt patent
    debate what is the generic drug industry's
    response to glaxo wellcome's? pr newswire
    12/5/1995 read the full article, get a free trial
    for

4
Previous method (MA 2004)
  • Trains an SVM to classify 250-character windows
    of the target term as definitions or
    non-definitions.
  • Returns the m windows the SVM is most certain
    they are definitions.
  • Attributes of the SVM
  • from Joho et al. (e.g., position of window in
    doc, manually crafted patterns, simplistic
    centroid),
  • ranking of search engine,
  • automatically acquired lexical patterns based on
    their precision (best results with 200 patterns).
  • Cross-validation on TREC 2000-2001.
  • Outperformed Joho et al.s attributes and Prager
    et al.s WordNet-based method.

5
Training the SVM
training target terms
TREC docs or Web pages
tsunami


... ... tsunami ... ...

nanometer

... ... nanometer ... ...
...
...
6
How do we train for the Web?
  • Train on TREC questions and documents, using the
    TREC patterns apply the trained system to Web
    pages.
  • But Web pages generally differ from TREC news
    articles.
  • And if we want to add more training questions, we
    need to create additional marking patterns, which
    is not easy.
  • Mark manually (thousands of) training windows
    deriving from Web pages returned by the search
    engine for training questions.
  • Find a way to mark automatically training windows
    obtained from Web pages, possibly excluding
    training windows we cannot be certain about their
    categories, and possibly allowing some errors.
  • We use training terms for which there are
    multiple definitions from different on-line
    encyclopedias.
  • We mark a training window as a definition when
    its vocabulary is very similar to that of several
    corresponding definitions from the encyclopedias.

7
Using on-line encyclopedias
training window ...discipline comparative
genomics functional genomics bioinformatics the
emergence of genomics as a discipline in 1920,
the term genome was proposed to denote the
totality of all genes on all chromosomes in the
nucleus of a cell. biology has...
8
Possible objections and answers
  • Why not just use Googles define?
  • There are always terms that are not included in
    on-line encyclopedias, glossaries etc. (e.g.,
    fresh technical terms, names of persons,
    products).
  • Our method can supplement on-line glossaries etc.
    by finding definitions in ordinary Web pages.
  • Why not train directly on the definitions of
    on-line encyclopedias, glossaries etc.?
  • The expressions of glossaries, encyclopedias etc.
    are not representative of all the expressions
    that may be used to provide definitions on the
    Web.
  • The definitions of encyclopedias etc. provide
    only positive examples (definitions). To train
    the SVM we also need negative examples.

9
How do we measure similarity?
definitions from encyclopedias
training window

... ... tsunami ... ...
C

W
... ... tsunami ... ...
For each word wi of W
10
Which similarity threshold?
  • We use two thresholds t and t-
  • If sim(W,C) gt t , mark training window W as
    positive (definition).
  • If sim(W,C) lt t- , mark training window W as
    negative.
  • t and t- selected by studying recall/precision
    diagrams of sim(W,C) with 300 randomly selected
    and manually marked training windows.
  • And making sure we maintain the original
    definition to non-definition ratio, to avoid
    introducing bias in the SVM.

negative precision 0.92, negative recall 0.75
positive precision 0.72, positive recall 0.49
not used when training the SVM
sim(W,C)
t- 0.32
t 0.5
11
Systems evaluated
  • DEFQA-T The system of (MA 2004), trained on all
    questions/documents from TREC-2000, 2001, applied
    to Web pages at run-time
  • 160 training terms, 3800 training windows.
  • ?10 top ranked docs/pages, first ?5 windows per
    doc/page.
  • 200 automatically acquired patterns.
  • We allow only one snippet per question (m 1).
  • DEFQA-S Same as DEFQA-T, but trained on windows
    of Web pages via sim(W,C).
  • 480 training terms from index of
    www.encyclopedia.com.
  • 7200 training windows.
  • We can generate automatically as many training
    windows as we want in effect, unsupervised
    learning.
  • BASE-1 First window of top-ranked page.
  • BASE-R Randomly selected from 105 windows.

12
Evaluation results (1 snippet per question)
With 81 new target-terms from www.encyclopedia.com
.
  • ? 0.86 (strong agreement between judges)
  • All differences are statistically significant.
  • One-tailed difference of proportions tests (a
    0.001).
  • Worse results from those of MA 2004 (85 with
    cross-validation on TREC 2000, 2001), but here we
    allow only m 1 snippet per question.

13
Discussion of the results
  • DEFQA-S answered correctly almost twice as many
    questions as DEFQA-T.
  • Despite the noise (wrong categories) in the
    training data.
  • More training windows (7200 vs. 3800).
  • We can add as many training windows as we want.
    The results may improve further with more
    training windows.
  • Much fewer irrelevant acquired patterns in
    DEFQA-S than in DEFQA-T.
  • Probably because of more training windows.
  • Results may improve further by using more
    acquired patterns (we currently 200 patterns as
    in MA 2004).
  • Acquired some Web-specific patterns.
  • e.g. FAQ target , home page target ,
    target page , What is a target , ? A
    target

14
Future directions
  • Try alternative similarity measures.
  • Ideas from Rouge (from automatic summarization),
    to compare n-grams rather than single words.
  • Compare with the centroid-based method of Cui et
    al. and/or include it as an attribute of the SVM.
  • More training windows and more automatically
    acquired patterns.
  • Cluster similar snippets.
  • To avoid ranking them separately.
  • Introduce attributes for the layout of the Web
    pages (fonts, bullet lists, etc.).
  • Does the snippet look like a part of a glossary?
  • Already used in Googles define?
Write a Comment
User Comments (0)
About PowerShow.com