Title: What
1Whats in store for question-answering?Prognosti
cations based on corpus analysis of several
hundred million questions
- John B. Lowe
- Vice President for Language Engineering and Chief
Linguist - Ask Jeeves, Inc. Emeryville, CA
- October 7, 2000
- Joint SIGDAT Conference on Empirical Methods in
Natural Language Processing and Very Large
Corpora - Hong Kong
2Overview
- Take-home messages when considering the Q-A
task - Make sure you understand the question
- Know what constitutes an answer
- Robustness, Robustness, Robustness
- Some anecdotes and a few statistics
- Query types (keywords, questions, stories, etc.)
- From both the consumer side of AJ (I.e. ask.com)
as well as the corporate side - Prognostications
- The best systems will be hybrids
- Knowledge cliff is as tall as ever, if not
taller, and it will be some time before it is
climbed. Set expectations accordingly!
3Another View of an Overview
- This presentation contains
- 12 Actual or nearly actual UQs (indeed all
queries cited are real, unless cited from the
literature or specifically marked) - 7 Rhetorical questions
- 5 Summary statistics covering a subcorpus of
approximately 1B UQs
4Definitions What is an answer?
- Short, coherent, responsive snippet of text
- Result of a Computation or Deduction
- The Trace of the process of arriving at a result
- Longer snippet of text (a Passage)
- Reference to a document or part of a document
- Summary or extract from a document
- Document
- Set of documents
- Audio, video, etc.
- Some combination of the above
Increasing length, complexity
5Definitions what is a query?
- One or more Keywords
- Keywords with Boolean Operators or additional
user supplied structure - numeric or other Parametric Values set via UI
- Phrases (keywords with linguistic coherence)
- (Grammatical) Sentences with interrogative or
imperative syntax - Short Discourses, usually concluded with a
question - Audio, video, etc.
- Some combination of the above
Increasing length, complexity
6Question-Answering vs. IR
- Classically, question-answering systems provided
answers in response to questions - In contrast, IR systems provided documents in
response to queries, normally composed of
keywords - Corollaries
- Providing documents in response to questions is
not question-answering - Providing answers in response to queries is not
IR - However, the world is not black and white. More
like black and grey Graham Greene
7Question-Answering vs. IR
Classical Question- Answering Classical Infor
mation Retrieval TREC-8 Like Hybrid
8Query types arranged on a Difficulty Scale
- Keywords and Keywords Plus
- Short, factual (TREC-8-like) Questions
- Hard Lookup Questions
- Questions that look hard but arent
- Questions that look hard and are
- Story Problems Two Flavors
- These will usually be all jumbled together!
Difficulty
9An Anecdotal Analysis of the Question-Answering
Task, Based on User Behavior and Expectations,
as Reflected by What They Ask
- NB
- Most of these UQs are from ask.com, the
open-domain consumer-oriented web site. - Some UQs from corporate implementations have been
modified to protect the anonymity of customers
10Users may not be trainable
- Users often expect the system to derive or
otherwise obtain appropriate context - At minimum, POS, WSD and other basic linguistic
and semantic distinctions are expected. - Users may attempt to provide such context if they
feel it is important or unavailable to the system - In which case, watch out!
- Users may evaluate the system to determine how
best to provide input (and context) - Often this is done by experimental input
- Muddies the user log and challenges adaptive
approaches
11Intentions of users are complex
NB this data is for ask.com!
12Short, factual questions
- Where is Greenwich, CT? (Lehnert 1982)
- 42N, 80W
- About 90 miles north of New York City
-
- Where is the Taj Mahal? (TREC-8)
- Atlantic City, New Jersey, USA
- Agra, Uttar Pradesh, India
- NB answer reflects cultural bias of corpus from
which answer was obtained - What is the meaning of life?
- For Ask Jeeves, it is (found in) a URL
13Answers to Hard Lookup Questions
- Can my gynacologist sic tell my parents if Im
pregnant? - Answers can be very, very short!
- In this case, however, though the length of the
answer is a single bit, an authoritative document
is probably the best response
14Hard Lookup questions (contd)
- Can my gynacologist sic tell my parents if Im
pregnant? - Sometimes spelling errors reflect language
competence (and therefore indirectly age) - Sex of asker is (nominally) clear. This bit of
personalization is of course based on real world
knowledge - Use of if rather than that prevents
presupposing the asker is pregnant - Utility of the answer is very different depending
on whether the presupposition is true or not!
15Another Hard Lookup Question
- What did Tom Hanksi say to Private Ryan as hei
was dying? - Answer is a 34 second snippet which occurs at
about 23600 out of 24800 total duration - Soundtrack is complex at this moment hard to
pick out even for native speakers, but the
utterance seems to be - earn this earn it
16How do we get the answer?
- Assumptions
- We have the film and permission to use it.
- We have time-aligned markup of text and video
- We have the tools to handle such multimedia
access - All of these technical issues are still a
challenge - But the really tough part is still the
relationship between the language and the real
world (i.e. primarily linguistics) - Does the markup indicate the states of people --
alive, dead, or in between (i.e. dying)? - More importantly, interpreting the question seems
to require Mental Spaces (Fauconnier 1985, 1988,
c).
17Mental Spaces Required?
In reality, Tom Hanks never said anything to
Private Ryan.
Conclusion while IR may bring one within
striking distance of the answer, high-level NLU
potentially required to determine if you really
got the right one.
18Having said all that
- Purely serendipitously, there is an IR solution
to this PARTICULAR question (using the
encyclopedic aspect of the web) -
- Some search engines retrieve discussions about
this apparently important moment (which in some
ways is the climax of the movie)
19Some hard questions are easy
- Sometimes, Big Differences are not important
the following two sentences have quite
different syntax, but share most of - their answers in common. But note
both cannot be answered Oh, not far!
English What is the distance from Tokyo to Yoko
hama ? How far is it from Tokyo to Yokohama ?
Japanese ?? ? ?? ? ??????????????? toukyou
to yokohama no aida no kyori ha, ikura desu
ka? ?? ?? ?? ???? ????????????? toukyou
kara yokohama made ha, dono gurai hanarete imasu
ka?
20Some easy questions are hard
- Small Differences may be important
- Books by kids
- Books for kids these differ only by
stopwords - Books for under 20
- Books about kids
- (Rilloff et. al (1994), Pustejovsky, Lexeme,
NPR 6/2000) - In this case the stop words are critical
21Story Problem 1 (conventional)
- After Bobrow 1967 (and Dreyfus 1972, 1992)
- NB requires you to Show your work! (I.e.
display trace)
Elizabeth, Brian, Dean and Leslie want to cross
a bridge. They all begin on the same side and
have only 17 minutes to get everyone across to
the other side. It is night and there is only one
flashlight. A max of two people can cross at one
time. Any party who crosses, either 1 or 2
people, must have the flashlight with them. The
flashlight must be walked back and forth it
cannot be thrown. Each student walks at a
different speed - Elizabeth 1 minute, Brian 2
minutes, Dean, 5 minutes, and Leslie 10 minutes.
A pair must walk together at the rate of the
slower students pace. How can they get everyone
across in 17 minutes?
22Story Problem 2, (a users lament)
- Typically, a customer support problem
- Often, these are not really questions
PLEASE HELP ME! I don't know who to ask. I
want to mail merge a specific category from
address book in email program and all I can
figure out how to do is merge the entire
mailing list. If you can't help me, please tell
me who can. Thank you
23Story Problem 3, (Ive almost got it)
- Sentence punctuation is poor
- Identification and tokenization of NEs is a
challenge
I cant install Age of Empires now that I 've
upgraded to win98 from Win 95 computer says I
have 1GB of hard drive space but the installation
failed after taking 30 minutes with the words not
enough hard drive space. Should I update the
drive to FAT 32 and try again?
24Story Problem 4, (share my misery)
- "I have SuperOS 1776 and a Hogwarts Color 999
printer. I had to reformat my computer and now I
haven't been able to find a driver to reinstall
the printer.. I've only found a driver for
SuperOS 1492 and 1789. I tried it anyway, and big
surprise, it didn't work. - Deep NLP is going to have some trouble here (even
people do!) - Would IR work better?
25Story Problem 5, (a poem)
- i want to customize my mouse and keyboard
- by having mouse in 3-dimensional
- and having mouse trails
- i also want to slow down the cursor blink rate
- am also having trouble ith the left mouse butoon
- as i am left handed
- which steps do i take to change these things
- The e e cummings approach to keyboard entry
- The point tremendous stylistic variation exist
in typography, orthography, conceptualization,
and so on.
26Statistical Properties of AJ UQs
Average length of user query (on ask.com punctuation excluded) 4.8
UQs which contain an unknown token 30
Unknown tokens which are errors 60
Queries which begin with wh-words 37
27Building REs from NEs
- for Britney Spears
- Br.n.y Sp.r.s
- How to build these from scratch?
- But see Brill, et. al. in this workshop for a
solution!
28Distribution of query lengths
!
29Zipfs Law applies to user queries
- Rank-frequency distribution of UQs with 3600 gt f
gt 1100 (for a day or so)
-
- 3524 where can i browse lyrics?
- 2713 where can i find online airfare specials?
- 2532 is jeeves gay?
- 2216 how can i find someone?
- 2190 where can i find a reverse phone directory?
- 2120 where can i find information on captech?
- 1980 (filtered)
- 1934 (filtered)
- 1852 where can i find erotica from white shadows?
- 1787 where can i get driving directions between
cities? - 1585 am i in love?
- 1567 how do i make a web page?
- 1544 cars
- 1520 where can i find a reverse email directory?
- 1485 how do i use the internet to find a job?
- 1420 (filtered)
- 1336 where can i find the lyrics to songs by
eminem? - 1323 where can i listen to music online?
30Conclusions
31Sobering Insights (or Nothing New)?
- The bar has been considerably raised!
- Communication is not accomplished by the
exchange of symbolic expressions. Communication
is, rather, the successful interpretation by an
addressee of a speakers intent in performing a
linguistic act. (Green 1996) - Hybrid approaches will be de rigueur for many
practical applications need to work on combining
outputs from - Search engines
- QA systems
- Other inferencing engines (decision tree, CBR,
etc.) - We must make friends with our users (and provide
cognitively appealing UIs) - If you ask the same question, you get the same
answer! (a distinctly unhuman behavior (e.g.
Where?) - Knowing when you dont know understanding
failure modes of the system and communicating
this to the user
32Thank You!