Title: Question Answering Techniques and Systems
1Question Answering Techniques and Systems
- Mihai Surdeanu (TALP)
- Marius Pasca (Google - Research)
TALP Research Center Dep. Llenguatges i Sistemes
Informà tics Universitat Politècnica de
Catalunya surdeanu_at_lsi.upc.es
The work by Marius Pasca (currently
mars_at_google.com) was performed as part of his PhD
work at Southern Methodist University in Dallas,
Texas.
2Overview
- What is Question Answering?
- A traditional system
- Other relevant approaches
- Distributed Question Answering
3Problem of Question Answering
When was the San Francisco fire? were driven
over it. After the ceremonial tie was removed -
it burned in the San Francisco fire of 1906
historians believe an unknown Chinese worker
probably drove the last steel spike into a wooden
tie. If so, it was only
What is the nationality of Pope John Paul II?
stabilize the country with its help, the Catholic
hierarchy stoutly held out for pluralism, in
large part at the urging of Polish-born Pope John
Paul II. When the Pope emphatically defended the
Solidarity trade union during a 1987 tour of the
Where is the Taj Mahal? list of more than 360
cities around the world includes the Great Reef
in Australia, the Taj Mahal in India, Chartres
Cathedral in France, and Serengeti National Park
in Tanzania. The four sites Japan has listed
include
4Problem of Question Answering
Natural language question, not keyword queries
What is the nationality of Pope John Paul II?
stabilize the country with its help, the Catholic
hierarchy stoutly held out for pluralism, in
large part at the urging of Polish-born Pope John
Paul II. When the Pope emphatically defended the
Solidarity trade union during a 1987 tour of the
Short text fragment, not URL list
5Compare with
Document collection
Searching for Etna
Where is Naxos?
Searching for Naxos
What continent is Taormina in?
What is the highest volcano in Europe?
Searching for Taormina
6Beyond Document Retrieval
- Document Retrieval
- Users submit queries corresponding to their
information needs. - System returns (voluminous) list of full-length
documents. - It is the responsibility of the users to find
information of interest within the returned
documents. - Open-Domain Question Answering (QA)
- Users ask questions in natural language.
- What is the highest volcano in Europe?
- System returns list of short answers.
- Under Mount Etna, the highest volcano
in Europe, perches the fabulous town - Often more useful for specific information needs.
7Evaluating QA Systems
- National Institute of Standards and Technology
(NIST) organizes yearly the Text Retrieval
Conference (TREC), which has had a QA track for
the past 5 years from TREC-8 in 1999 to TREC-12
in 2003. - The document set
- Newswire textual documents from LA Times, San
Jose Mercury News, Wall Street Journal, NY Times
etcetera over 1M documents now. - Well-formed lexically, syntactically and
semantically (were reviewed by professional
editors). - The questions
- Hundreds of new questions every year, the total
is close to 2000 for all TRECs. - Task
- Initially extract at most 5 answers long (250B)
and short (50B). - Now extract only one exact answer.
- Several other sub-tasks added later definition,
list, context. - Metrics
- Mean Reciprocal Rank (MRR) each question
assigned the reciprocal rank of the first correct
answer. If correct answer at position k, the
score is 1/k.
8Overview
- What is Question Answering?
- A traditional system
- SMU ranked first at TREC-8 and TREC-9
- The foundation of LCCs PowerAnswer system
(http//www.languagecomputer.com) - Other relevant approaches
- Distributed Question Answering
9QA Block Architecture
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
10Question Processing Flow
Question semantic representation
Construction of the question representation
Q
Question parsing
Answer type detection
AT category
Keyword selection
Keywords
11Lexical Terms Examples
- Questions approximated by sets of unrelated words
(lexical terms) - Similar to bag-of-word IR models
Question (from TREC QA track) Lexical terms
Q002 What was the monetary value of the Nobel Peace Prize in 1989? monetary, value, Nobel, Peace, Prize
Q003 What does the Peugeot company manufacture? Peugeot, company, manufacture
Q004 How much did Mercury spend on advertising in 1993? Mercury, spend, advertising, 1993
Q005 What is the name of the managing director of Apricot Computer? name, managing, director, Apricot, Computer
12Question Stems and Answer Type Examples
- Identify the semantic category of expected answers
Question Question stem Answer type
Q555 What was the name of Titanics captain? What Person
Q654 What U.S. Government agency registers trademarks? What Organization
Q162 What is the capital of Kosovo? What City
Q661 How much does one ton of cement cost? How much Quantity
- Other question stems Who, Which, Name, How
hot... - Other answer types Country, Number, Product...
13Building the Question Representation
from the question parse tree, bottom-up traversal
with a set of propagation rules
Q006 Why did David Koresh ask the FBI for a word
processor?
SBARQ
SQ
VP
PP
WHADVP NP
NP NP
WRB VBD NNP NNP VB DT NNP
IN DT NN NN
Why did David Koresh ask the
FBI for a word processor
published in COLING 2000
- - assign labels to non-skip leaf nodes
- propagate label of head child node, to parent
node - link head child node to other children nodes
14Building the Question Representation
from the question parse tree, bottom-up traversal
with a set of propagation rules
Q006 Why did David Koresh ask the FBI for a word
processor?
SBARQ
SQ
VP
PP
WHADVP NP
NP NP
WRB VBD NNP NNP VB DT NNP
IN DT NN NN
Why did David Koresh ask the
FBI for a word processor
Koresh
FBI
ask
Question representation
David
REASON
processor
word
15Detecting the Expected Answer Type
- In some cases, the question stem is sufficient to
indicate the answer type (AT) - Why ? REASON
- When ? DATE
- In many cases, the question stem is ambiguous
- Examples
- What was the name of Titanics captain ?
- What U.S. Government agency registers trademarks?
- What is the capital of Kosovo?
- Solution select additional question concepts (AT
words) that help disambiguate the expected answer
type - Examples
- captain
- agency
- capital
16AT Detection Algorithm
- Select the answer type word from the question
representation. - Select the word(s) connected to the question.
Some content-free words are skipped (e.g.
name). - From the previous set select the word with the
highest connectivity in the question
representation. - Map the AT word in a previously built AT
hierarchy - The AT hierarchy is based on WordNet, with some
concepts associated with semantic categories,
e.g. writer ? PERSON. - Select the AT(s) from the first hypernym(s)
associated with a semantic category.
17Answer Type Hierarchy
PERSON
PERSON
18Evaluation of Answer Type Hierarchy
- Controlled variation of the number of WordNet
synsets included in answer type hierarchy. - Test on 800 TREC questions.
Precision score (50-byte answers)
Hierarchy coverage
0 0.296 3
0.404 10
0.437 25
0.451 50 0.461
- The derivation of the answer type is the main
source of unrecoverable errors in the QA system
19Keyword Selection
- AT indicates what the question is looking for,
but provides insufficient context to locate the
answer in very large document collection - Lexical terms (keywords) from the question,
possibly expanded with lexical/semantic
variations provide the required context
20Keyword Selection Algorithm
- Select all non-stop words in quotations
- Select all NNP words in recognized named entities
- Select all complex nominals with their adjectival
modifiers - Select all other complex nominals
- Select all nouns with adjectival modifiers
- Select all other nouns
- Select all verbs
- Select the AT word (which was skipped in all
previous steps)
21Keyword Selection Examples
- What researcher discovered the vaccine against
Hepatitis-B? - Hepatitis-B, vaccine, discover, researcher
- What is the name of the French oceanographer who
owned Calypso? - Calypso, French, own, oceanographer
- What U.S. government agency registers trademarks?
- U.S., government, trademarks, register, agency
- What is the capital of Kosovo?
- Kosovo, capital
22Passage Retrieval
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
23Passage Retrieval Architecture
Passage Quality
Keywords
Yes
Keyword Adjustment
Passage Scoring
Passage Ordering
No
Passages
Ranked Passages
Passage Extraction
Documents
Document Retrieval
24Passage Extraction Loop
- Passage Extraction Component
- Extracts passages that contain all selected
keywords - Passage size dynamic
- Start position dynamic
- Passage quality and keyword adjustment
- In the first iteration use the first 6 keyword
selection heuristics - If the number of passages is lower than a
threshold ? query is too strict ? drop a keyword - If the number of passages is higher than a
threshold ? query is too relaxed ? add a keyword
25Passage Scoring (1/2)
- Passages are scored based on keyword windows
- For example, if a question has a set of keywords
k1, k2, k3, k4, and in a passage k1 and k2 are
matched twice, k3 is matched once, and k4 is not
matched, the following windows are built
Window 1
Window 2
k1 k2
k3 k2 k1
k1 k2
k3 k2 k1
Window 3
Window 4
k1 k2
k3 k2 k1
k1 k2
k3 k2 k1
26Passage Scoring (2/2)
- Passage ordering is performed using a radix sort
that involves three scores largest
SameWordSequenceScore, largest DistanceScore,
smallest MissingKeywordScore. - SameWordSequenceScore
- Computes the number of words from the question
that are recognized in the same sequence in the
window - DistanceScore
- The number of words that separate the most
distant keywords in the window - MissingKeywordScore
- The number of unmatched keywords in the window
27Answer Extraction
Question Semantics
Passage Retrieval
Answer Extraction
Question Processing
Q
A
Passages
Keywords
WordNet
WordNet
Document Retrieval
Parser
Parser
NER
NER
28Ranking Candidate Answers
Q066 Name the first private citizen to fly in
space.
- Answer type Person
- Text passage Among them was Christa McAuliffe,
the first private citizen to fly in space. Karen
Allen, best known for her starring role in
Raiders of the Lost Ark, plays McAuliffe. Brian
Kerwin is featured as shuttle pilot Mike
Smith... - Best candidate answer Christa McAuliffe
29Features for Answer Ranking
- relNMW number of question terms matched in the
answer passage - relSP number of question terms matched in the
same phrase as the candidate answer - relSS number of question terms matched in the
same sentence as the candidate answer - relFP flag set to 1 if the candidate answer is
followed by a punctuation sign - relOCTW number of question terms matched,
separated from the candidate answer by at most
three words and one comma - relSWS number of terms occurring in the same
order in the answer passage as in the question - relDTW average distance from candidate answer
to question term matches
Robust heuristics that work on unrestricted text!
30Answer Ranking based on Machine Learning
- Relative relevance score computed for each pair
of candidates (answer windows) - relPAIR wSWS ? ?relSWS wFP ? ?relFP
- wOCTW ? ?relOCTW wSP ? ?relSP wSS
? ?relSS - wNMW ? ?relNMW wDTW ? ?relDTW
threshold - if relPAIR positive, then first candidate from
pair is more relevant - Perceptron model used to learn the weights
- published in SIGIR 2001
- Scores in the 50 MRR for short answers, in the
60 MRR for long answers
31Evaluation on the Web
- test on 350 questions from TREC (Q250-Q600)
- extract 250-byte answers
Google Answer extraction from Google AltaVista Answer extraction from AltaVista
Precision score 0.29 0.44 0.15 0.37
Questions with a correct answer among top 5 returned answers 0.44 0.57 0.27 0.45
32System ExtensionAnswer Justification
- Experiments with Open-Domain Textual Question
Answering. Sanda Harabagiu, Marius Pasca and
Steve Maiorano. - Answer justification using unnamed relations
extracted from the question representation and
the answer representation (constructed through a
similar process).
33System ExtensionDefinition Questions
- Definition questions ask about the definition or
description of a concept - Who is John Galt?
- What is anorexia nervosa?
- Many information nuggets are acceptable answers
- Who is George W. Bush?
- George W. Bush, the 43rd President of the
United States - George W. Bush defeated Democratic incumbentAnn
Richards to become the 46th Governor of the State
of Texas - Scoring
- Any information nugget is acceptable
- Precision score over all information nuggets
34Answer Detection with Pattern Matching
Q386 What is anorexia nervosa? cause of anorexia nervosa, an eating disorder...
Q358 What is a meerkat? the meerkat, a type of mongoose, thrives in...
Q340 Who is Zebulon Pike? in 1806, explorer Zebulon Pike sighted the...
35Answer Detection with Concept Expansion
- Enhancement for Definition questions
- Identify terms that are semantically related to
the phrase to define - WordNet hypernyms (more general concepts)
Question WordNet hypernym Detected answer candidate
What is a shaman? priest, non-Christian priest Mathews is the priest or shaman
What is a nematode? worm nematodes, tiny worms in soil
What is anise? herb, herbaceous plant anise, rhubarb and other herbs
published in AAAI Spring Symposium 2002
36Evaluation on Definition Questions
- Determine the impact of answer type detection
with pattern matching and concept expansion - test on the Definition questions from TREC-9 and
TREC-10 (approx. 200 questions) - extract 50-byte answers
- Results
- precision score 0.56
- questions with a correct answer among top 5
returned answers 0.67
37References
- Marius Pasca. High-Performance, Open-Domain
Question Answering from Large Text Collections,
Ph.D. Thesis, Computer Science and Engineering
Department, Southern Methodist University,
Defended September 2001, Dallas, Texas - Marius Pasca. Open-Domain Question Answering from
Large Text Collections, Center for the Study of
Language and Information (CSLI Publications,
series Studies in Computational Linguistics),
Stanford, California, Distributed by the
University of Chicago Press, ISBN (Paperback)
1575864282, ISBN (Cloth) 1575864274. 2003
38Overview
- What is Question Answering?
- A traditional system
- Other relevant approaches
- LCCs PowerAnswer COGEX
- IBMs PIQUANT
- CMUs Javelin
- ISIs TextMap
- BBNs AQUA
- Distributed Question Answering
39PowerAnswer COGEX (1/2)
- Automated reasoning for QA A ? Q, using a logic
prover. Facilititates both answer validation and
answer extraction. - Both question and answer(s) transformed in logic
forms. Example - Heavy selling of Standard Poors 500-stock
index futures in Chicago relentlessly beat stocks
downwards. - Heavy_JJ(x1) selling_NN(x1) of_IN(x1,x6)
Standard_NN(x2) _CC(x13,x2,x3) Poor(x3)
s_POS(x6,x13) 500-stock_JJ(x6) index_NN(x4)
futures(x5) nn_NNC(x6,x4,x5) in_IN(x1,x8)
Chicago_NNP(x8) relentlessly_RB(e12)
beat_VB(e12,x1,x9) stocks_NN(x9)
downward_RB(e12)
40PowerAnswer COGEX (2/2)
- World knowledge from
- WordNet glosses converted to logic forms in the
eXtended WordNet (XWN) project (http//www.utdalla
s.edu/moldovan) - Lexical chains
- gamen3 ? HYPERNYM ? recreationn1 ? HYPONYM ?
sportn1 - Argentinea1 ? GLOSS ? Argentinan1
- NLP axioms to handle complex NPs, coordinations,
appositions, equivalence classes for prepositions
etcetera - Named-entity recognizer
- John Galt ? HUMAN
- A relaxation mechanism is used to iteratively
uncouple predicates, remove terms from LFs. The
proofs are penalized based on the amount of
relaxation involved.
41IBMs Piquant
- Question processing conceptually similar to SMU,
but a series of different strategies (agents)
available for answer extraction. For each
question type, multiple agents might run in
parallel. - Reasoning engine and general-purpose ontology
from Cyc used as sanity checker. - Answer resolution remaining answers are
normalized and a voting strategy is used to
select the correct (meaning most redundant)
answer.
42Piquant QA Agents
- Predictive annotation agent
- Predictive annotation the technique of
indexing named entities and other NL constructs
along with lexical terms. Lemur has built-in
support for this now. - General-purpose agent, used for almost all
question types. - Statistical Query Agent
- Derivation from a probabilistic IR model, also
developed at IBM. - Also general-purpose.
- Description Query
- Generic descriptions appositions, parenthetical
expressions. - Applied mostly to definition questions.
- Structured Knowledge Agent
- Answers from WordNet/Cyc.
- Applied whenever possible.
- Pattern-Based Agent
- Looks for specific syntactic patterns based on
the question form. - Applied when the answer is expected in a
well-structured form. - Dossier Agent
- For Who is X? questions.
- A dynamic set of factual questions used to learn
information nuggets about persons.
43Pattern-based Agent
- Motivation some questions (with or without AT)
indicate that the answer might be in a structured
form - What does Knight Rider publish? ? transitive
verb, missing object. - Knight Rider publishes X.
- Patterns generated
- From a static pattern repository, e.g. birth and
death dates recognition. - Dynamically from the question structure.
- Matching of the expected answer pattern with the
actual answer text is not at word level, but at a
higher linguistic level based on full parse trees
(see IE lecture).
44Dossier Agent
- Addresses Who is X? questions.
- Generates initially a series of generic
questions - When was X born?
- What was Xs profession?
- Future iterations dynamically decided based on
the previous answers? - If Xs profession is writer the next question
is What did X write? - A static ontology of biographical questions used.
45CyC Sanity Checker
- Post-processing component that
- Rejects insane answers
- How much does a grey wolf weigh?
- 300 tons
- A grey wold IS-A wolf. Weight of a wolf known in
Cyc. - Cyc returns SANE, INSANE, or DONT KNOW.
- Boosts answer confidence when the answer is SANE.
- Typically called for numerical answer types
- What is the population of Maryland?
- How much does a grey wolf weigh?
- How high is Mt. Hood?
46Answer Resolution
- Called when multiple agents are applied for the
same question. Distribution of agents the
predictive-annotation and the statistical agent
by far the most common. - Each agent provides a canonical answer (e.g.
normalized named entity) and a confidence score. - Final confidence for each candidate answer
computed using a ML model with SVM.
47CMUs Javelin
- Architecture combines SMUs and IBMs approaches.
- Question processing close to SMUs approach.
- Passage retrieval loop conceptually similar to
SMUs, but an elegant implementation. - Multiple answer strategies similar to IBMs
system. All of them are based on ML models (K
nearest neighbours, decision trees) that use
shallow-text features (close to SMUs). - Answer voting, similar to IBMs, used to exploit
answer redundancy.
48Javelins Retrieval Strategist
- Implements passage retrieval, including the
passage retrieval loop. - Uses the Inquiry IR system, probably Lemur by
now. - The retrieval loop uses all keywords in close
proximity of each other initially (stricter than
SMU). - Subsequent iterations relax the following query
terms - Proximity for all question keywords 20, 100,
250, AND - Phrase proximity for phrase operators less than
3 words or PHRASE - Phrase proximity for named entities less than 3
words or PHRASE - Inclusion/exclusion of AT word
- Accuracy for TREC-11 queries how many questions
had at least one correct document in the top N
documents - Top 30 docs 80
- Top 60 docs 85
- Top 120 docs 86
49ISIs TextMap Pattern-Based QA
- Examples
- Who invented the cotton gin?
- ltwhogt invented the cotton gin
- ltwhogt's invention of the cotton gin
- ltwhogt received a patent for the cotton gin
- How did Mahatma Gandhi die?
- Mahatma Gandhi died lthowgt
- Mahatma Gandhi drowned
- ltwhogt assassinated Mahatma Gandhi
- Patterns generated from the question form
(similar to IBM), learned using a pattern
discovery mechanism, or added manually to a
pattern repository - The pattern discovery mechanism performs a series
of generalizations from annotated examples - Babe Ruth was born in Baltimore, on February 6,
1895. - PERSON was born g in DATE
50TextMap QA ? Machine Translation
- In machine translation, one collects translations
pairs (s, d) and learns a model how to transform
the source s into the destination d. - QA is redefined in a similar way collect
question-answer pairs (a, q) and learn a model
that computes the probability that a question is
generated from the given answer p(q
parsetree(a)). The correct answer maximizes this
probability. - Only the subsets of answer parse trees where the
answer lies are used as training (not the whole
sentence). - An off-the-shelf machine translation package
(Giza) used to train the model.
51TextMapExploiting the Data Redundancy
- Additional knowledge resources are used whenever
applicable - WordNet glosses
- What is a meerkat?
- www.acronymfinder.com
- What is ARDA?
- Etcetera
- The known answers are then simply searched in
the document collection together with question
keywords - Google is used for answer redundancy
- TREC and Web (through Google) are searched in
parallel. - Final answer selected using a maximum entropy ML
model. - IBM introduced redundancy for QA agents, ISI uses
data redundancy.
52BBNs AQUA
- Factual system converts both question and answer
to a semantic form (close to SMUs) - Machine learning used to measure the similarity
of the two representations. - Was ranked best at the TREC definition pilot
organized before TREC-12 - Definition system conceptually close to SMUs
- Had pronominal and nominal coreference resolution
- Used a (probably) better parser (Charniak)
- Post-ranking of candidate answers using a tf
idf model
53Overview
- What is Question Answering?
- A traditional system
- Other relevant approaches
- Distributed Question Answering
54Sequential Q/A Architecture
Keywords
Question
Question Processing
Accepted Paragraphs
Paragraphs
Paragraph Retrieval
Paragraph Scoring
Paragraph Ordering
Answer Processing
Answers
55Sequential Architecture Analysis
- Analysis conclusions
- Performance bottleneck modules have
well-specified resource requirements ? fit for
DLB - Iterative tasks ? fit for partitioning
- Reduced inter-module communication ? effective
module migration/partitioning
56Inter-Question Parallelism (1)
Internet/DNS
Node 1
Node N
Question Dispatcher
Load Monitor
Question Dispatcher
Load Monitor
Q/A Task
Q/A Task
Local Interconnection Network
57Inter-Question Parallelism (2)
- Question dispatcher
- Improves upon the DNS blind allocation
- Allocates a new question to the processor p best
fit for the average question. Processor p
minimizes - Recovers from failed questions
- Load monitor
- Updates and broadcasts local load
- Receives remote load information
- Detects system configuration changes
58Intra-Question Parallelism (1)
Paragraph Retrieval Dispatcher
Paragraph Merging
Paragraph Retrieval (1)
Paragraph Scoring (1)
Keywords
Paragraphs
Paragraph Retrieval (2)
Paragraph Scoring (2)
Question Processing
Question
Paragraph Retrieval (k)
Paragraph Scoring (k)
Load Monitor
59Intra-Question Parallelism (2)
Answer Processing Dispatcher
Answer Merging
Answer Processing (1)
Accepted Paragraphs
Unranked Answers
Paragraphs
Paragraph Ordering
Answer Sorting
Answer Processing (2)
Answers
Answer Processing (n)
Load Monitor
60Meta-Scheduling Algorithm
- metaScheduler(task, loadFunction,
underloadCondition) - select all processors p with underloadCondition(p)
true - if none selected then select processor p with the
smallest value for loadFunction(p) - assign to each selected processor p an weight wp
based on its current load - assign to each selected processor p a fraction wp
of the global task
61Migration Example
processors
time
QP
QP
PR
PR
PS
PS
PO
PO
AP
AP
P1
P2
Pn
62Partitioning Example
processors
QP
time
PR1
PR2
PRn
PS1
PS2
PSn
PO
AP1
AP2
APn
P1
P2
Pn
63Inter-Question ParallelismSystem Throughput
64Intra-Question Parallelism
65End
Grà cies!