Title: A Practically Unsupervised Learning Method to Identify SingleSnippet Answers to Definition Questions
1A Practically Unsupervised Learning Method to
Identify Single-Snippet Answers to Definition
Questions on the Web
- Ion Androutsopoulos and Dimitrios Galanis
- Department of Informatics
- Athens University of Economics and Business
- Greece
2An add-on for Web search engines
- Definition questions What is a tsunami?, Who
was Duke Ellington?, What are pathogens? - Input a term to be defined (possibly multi-word)
and the most highly ranked Web pages the search
engine (Altavista) returned for that term. - Assumptions a classifier separates definition
questions from others another module identifies
the term to be defined. - Return at most m 250-character snippets.
- Success if at least one of the m snippets
contains an acceptable short definition. - Much as in earlier TRECs (for m 5).
- Single-snippet definitions, unlike recent TRECs.
3Examples
- For the term generic drug
- time pharmacist questioning pharmacist
consulting pharmacy drug compounding drug price
what is generic drug what is a generic drug? a
generic drug is one which is identified by its
official chemical name rather than an advertised
brand name - there are new results for return to results
glaxo wellcome's perspective on gatt patent
debate what is the generic drug industry's
response to glaxo wellcome's? pr newswire
12/5/1995 read the full article, get a free trial
for
4Previous method (MA 2004)
- Trains an SVM to classify 250-character windows
of the target term as definitions or
non-definitions. - Returns the m windows the SVM is most certain
they are definitions. - Attributes of the SVM
- from Joho et al. (e.g., position of window in
doc, manually crafted patterns, simplistic
centroid), - ranking of search engine,
- automatically acquired lexical patterns based on
their precision (best results with 200 patterns). - Cross-validation on TREC 2000-2001.
- Outperformed Joho et al.s attributes and Prager
et al.s WordNet-based method.
5Training the SVM
training target terms
TREC docs or Web pages
tsunami
... ... tsunami ... ...
nanometer
... ... nanometer ... ...
...
...
6How do we train for the Web?
- Train on TREC questions and documents, using the
TREC patterns apply the trained system to Web
pages. - But Web pages generally differ from TREC news
articles. - And if we want to add more training questions, we
need to create additional marking patterns, which
is not easy. - Mark manually (thousands of) training windows
deriving from Web pages returned by the search
engine for training questions. - Find a way to mark automatically training windows
obtained from Web pages, possibly excluding
training windows we cannot be certain about their
categories, and possibly allowing some errors. - We use training terms for which there are
multiple definitions from different on-line
encyclopedias. - We mark a training window as a definition when
its vocabulary is very similar to that of several
corresponding definitions from the encyclopedias.
7Using on-line encyclopedias
training window ...discipline comparative
genomics functional genomics bioinformatics the
emergence of genomics as a discipline in 1920,
the term genome was proposed to denote the
totality of all genes on all chromosomes in the
nucleus of a cell. biology has...
8Possible objections and answers
- Why not just use Googles define?
- There are always terms that are not included in
on-line encyclopedias, glossaries etc. (e.g.,
fresh technical terms, names of persons,
products). - Our method can supplement on-line glossaries etc.
by finding definitions in ordinary Web pages. - Why not train directly on the definitions of
on-line encyclopedias, glossaries etc.? - The expressions of glossaries, encyclopedias etc.
are not representative of all the expressions
that may be used to provide definitions on the
Web. - The definitions of encyclopedias etc. provide
only positive examples (definitions). To train
the SVM we also need negative examples.
9How do we measure similarity?
definitions from encyclopedias
training window
... ... tsunami ... ...
C
W
... ... tsunami ... ...
For each word wi of W
10Which similarity threshold?
- We use two thresholds t and t-
- If sim(W,C) gt t , mark training window W as
positive (definition). - If sim(W,C) lt t- , mark training window W as
negative. - t and t- selected by studying recall/precision
diagrams of sim(W,C) with 300 randomly selected
and manually marked training windows. - And making sure we maintain the original
definition to non-definition ratio, to avoid
introducing bias in the SVM.
negative precision 0.92, negative recall 0.75
positive precision 0.72, positive recall 0.49
not used when training the SVM
sim(W,C)
t- 0.32
t 0.5
11Systems evaluated
- DEFQA-T The system of (MA 2004), trained on all
questions/documents from TREC-2000, 2001, applied
to Web pages at run-time - 160 training terms, 3800 training windows.
- ?10 top ranked docs/pages, first ?5 windows per
doc/page. - 200 automatically acquired patterns.
- We allow only one snippet per question (m 1).
- DEFQA-S Same as DEFQA-T, but trained on windows
of Web pages via sim(W,C). - 480 training terms from index of
www.encyclopedia.com. - 7200 training windows.
- We can generate automatically as many training
windows as we want in effect, unsupervised
learning. - BASE-1 First window of top-ranked page.
- BASE-R Randomly selected from 105 windows.
12Evaluation results (1 snippet per question)
With 81 new target-terms from www.encyclopedia.com
.
- ? 0.86 (strong agreement between judges)
- All differences are statistically significant.
- One-tailed difference of proportions tests (a
0.001). - Worse results from those of MA 2004 (85 with
cross-validation on TREC 2000, 2001), but here we
allow only m 1 snippet per question.
13Discussion of the results
- DEFQA-S answered correctly almost twice as many
questions as DEFQA-T. - Despite the noise (wrong categories) in the
training data. - More training windows (7200 vs. 3800).
- We can add as many training windows as we want.
The results may improve further with more
training windows. - Much fewer irrelevant acquired patterns in
DEFQA-S than in DEFQA-T. - Probably because of more training windows.
- Results may improve further by using more
acquired patterns (we currently 200 patterns as
in MA 2004). - Acquired some Web-specific patterns.
- e.g. FAQ target , home page target ,
target page , What is a target , ? A
target
14Future directions
- Try alternative similarity measures.
- Ideas from Rouge (from automatic summarization),
to compare n-grams rather than single words. - Compare with the centroid-based method of Cui et
al. and/or include it as an attribute of the SVM. - More training windows and more automatically
acquired patterns. - Cluster similar snippets.
- To avoid ranking them separately.
- Introduce attributes for the layout of the Web
pages (fonts, bullet lists, etc.). - Does the snippet look like a part of a glossary?
- Already used in Googles define?