Title: Answering Questions through Understanding
1Answering Questions throughUnderstanding
AnalysisBBN AQUA
- Ralph Weischedel, Ana Licuanan, Scott Miller,
Jinxi Xu
4 December 2003
2Executive Summary of Accomplishments
- Technical innovation
- New hybrid approach to finding extended answers
across documents - Answers questions regarding terms, organizations,
and persons (biographies) - Performs very well given NIST TREC QA evaluation
- Automatic approach to evaluating extended answers
- Collaborative contributions
- Developed answer taxonomy and distributed trained
name tagger to five other teams for their QA
systems - Co-led pilot study with Dan Moldovan in
definitional questions, which became part of TREC
2003 QA evaluation - Question classification training data distributed
to UMass
3Outline
- Approach
- Component technologies
- Factoid QA
- Extended answers from multiple documents
- Accomplishments
4Approach
- Overview
- Key Components
- Factoid/List Questions
- Questions requiring Extended Answers
5BBNs Hybrid Approach to QA
- Theme Extract features from questions and
answers using various technologies - Stems/words (from document retrieval)
- Names and descriptions (from information
extraction) - Parse trees
- Entity recognition (from information extraction)
- Proposition recognition (from information
extraction) - Analyze the question
- Reduce question to propositions and a bag of
words - Predict the type of the answer
- Finding answers
- Rank candidate answers using passage retrieval
from primary corpus (the Aquaint corpus) - Other knowledge sources (e.g. the Web
structured data) are optionally used to rerank
answers - Re-rank candidates based on all features
(propositions, patterns, etc.) - Eliminate redundancy (for list questions and
extended answers) - Estimate confidence for answers
- Presenting the answer
6Question Classification
- A hybrid approach based on rules and statistical
parsing question templates - Match question templates against statistical
parses - Back off to statistical bag-of-word
classification - Example features used for classification
- The type of WHNP starting the question (e.g.
Who, What, When ) - The headword of the core NP
- WordNet definition
- Bag of words
- Main verb of the question
- Example Which pianist won the last
International Tchaikovsky Competition? - Headword of core NPpianist,
- WordNet definitionperson
- Answer type Person
7Question-Answer Types
Thanks to USC/ISI and IBM groups for sharing the
conclusions of their analyses.
8Question Answer Types (contd)
9Frequency of Q Types
10Name Extraction via Hidden Markov Models (HMMs)
The delegation, which included the commander of
the U.N. troops in Bosnia, Lt. Gen. Sir Michael
Rose, went to the Serb stronghold of Pale, near
Sarajevo, for talks with Bosnian Serb leader
Radovan Karadzic.
Training Program
training sentences
answers
HMM
Entities
Text
Extractor
- Performance
- Over 90 F on English newswire
- 72 F on English broadcasts with 30 word
error rate - 85 F on Chinese newswire
- 76 F on Chinese OCR with 15 word error
- 88 F on Arabic news
- 90 F on Spanish news
11Name Extraction to Pinpoint Candidate Short
Answers
- IdentiFinderTM extracts names for 24 types
- Current IdentiFinder performance on types
- IdentiFinder easily trainable for other
languages, e.g., Arabic and Chinese - Distributed to Carnegie-Mellon University,
Columbia Univ., Univ. of Albany, Univ. of
Colorado, MIT, USC/Information Sciences Institute
12Parsing via Lexicalized Probabilistic CFGs
- Performance
- 88 F on English newswire with 900,000 word
training set - 81 F on English newswire with 100,000 word
training set - 80 F on Chinese newswire with 100,000 word
training set - 74 F on Arabic newswire with 100,000 word
training set
13Proposition Indexing
- A shallow semantic representation
- Deeper than bags of words
- But broad enough to cover all the text
- Characterizes documents by
- Entities they contain
- Propositions (relations) involving those entities
- Resolves all references to entities
- Whether named, described, or pronominal
14Proposition Finding Example
- Propositions
- (e1 Dell)
- (e2 Comaq)
- (e3 the most PCs)
- (e4 2001)
- (sold subje1, obje3, ine4)
- (beating subje1, obje2)
- Question Which company sold the most PCs in
2001? - Text Dell, beating Compaq, sold the most PCs in
2001. - Passage retrieval would select the wrong answer
Answer
15Proposition Recognition Strategy
- Start with a lexicalized, probabilistic (LPCFG)
parsing model - Distinguish names by replacing NP labels with NPP
- Currently, rules normalize the parse tree to
produce propositions - At a later date, extend the statistical model to
- Predict argument labels for clauses
- Resolve references to entities
16Basic System for Factoid/List Questions
Question
17Confidence Estimation
- Compute probability P(correctQ,A) from the
following features - P(correctQ,A)?P(correcttype(Q), ltm,ngt, PropSat)
- type(Q) question type
- m question length
- n number of matched question words in answer
context - PropSat whether answer satisfies propositions in
the question - Confidence for answers found on the Web
- P(correctQ,A)?P(correctFreq, InTrec)
- FreqNumber of Web hits, using Google
- InTrecWhether Q was also a top answer from
Aquaint corpus
18Technique for Questions Requiring Extended
Answers
- Select nuggets (phrases) by feature
- Linguistic features
- Appositives
- Copula constructions
- Surface structure patterns
- Propositions
- Semantic features from information extraction
- Co-reference within document
- Relations
- Rank features via information retrieval
- Remove redundancy
- Cut off at target length of answer
19Providing Extended Answers
20Linguistic Features
- Method
- Good features include the target (QTERM) as an
argument in - Propositions
- Appositives
- Copulas extracted from parse trees
- Surface structure patterns
- Example Blobel, a biologist at Rockefeller
University, won the Nobel Prize in Medicine. - Proposition ltsubgtQTERM ltverbgtwon
ltobjgtprize - Appositive ltappositivegtbiologist
- Example The court , formally known as the
International Court of Justice , is the judicial
arm of the United Nations . - Copula ltcopulagtthe judicial arm of the United
Nations - Example The International Court of Justice is
composed of 15 Judges and has its headquarters at
The Hague . - Surface structure patterns QTERM is composed
of NP
21Semantic Features
- Method
- SERIF, a state of the art information extraction
engine, was used - Co-reference used for name comparison, e.g.,
- Depending on context, he and Bush may be the
same person - Relations used as additional features for
sentence selection. Types of relations include - Spouse-of (e.g. Clinton, Hillary)
- Founder-of (e.g. Gates, Microsoft)
- Management-of (e.g. Welch, GE)
- Residence-of (e.g. John Doe, Boston)
- Citizenship-of (e.g. John Doe, American)
- Staff-of (e.g. Weischedel, BBN)
22Relation Types (8/1/2003)
- Person-Organization
- Affiliation
- Owner
- Founder
- Management
- Client
- General Staff
- Located-at
- Member
- Other
- Person-Location
- Resident-of
- Citizen-of
- Located-at
- Other
- Organization-Location
- Located-in
- Organization-Organization
- Subsidiary
- Affiliate
- Client
- Member
- Other
- Person-Person
- Parent
- Sibling
- Spouse
- Grandparent
- Client
- Associate
- Contact
- Manager
- Other-relative
- Other-professional
- Other
23How to Extract Phrases for Features
- Motivation
- A good sentence may contain portions irrelevant
to the question - Goal extract only the pertinent parts of a
sentence - Method
- Operations are performed on parse trees
- Find the smallest phrase that contains all the
arguments of an important fact (i.e. proposition,
appositive, copula, relation, etc.) - Relative clauses not attached to the question
term are trimmed from phrase
24How to Extract Phases for Features
- Examples
- In 1971, Blobel and Dr. David D. Sabatini, who
now heads cell biology at New York University
School of Medicine, proposed a bold idea known as
the signal hypothesis. - Proposition verbproposed subTERM objidea
- Phrase "In 1971 , Blobel and Dr. David D.
Sabatini , , proposed a bold idea known as the
signal hypothesis . - Though Warner - Lambert has been one of the drug
industry's hottest performers -- routinely
reporting quarterly earnings gains of more than
30 percent -- analysts were concerned that its
pipeline lacked the depth of those of some
competitors. - Copula Warner Lambert has been
- Phrase Warner - Lambert has been one of the
drug industry's hottest performers -- routinely
reporting quarterly earnings gains of more than
30 percent
25Ranking of Features
- Each feature is reduced to a bag of words
- All features are ranked according to two factors
- The type of the feature
- Appositives/copulas gt patterns gt special
propositions gt relations gt propositions/sentences - Similarity score (tf.idf) between the feature and
the question profile - The question profile is a bag of words, which
models the importance of different words in
defining the question - Profile is compiled from
- Existing definitions (e.g., Webster dictionary,
Encyclopedia, etc.) - A collection of human created biographies
- The centriod of all features
26Performance in TREC 2003
TREC 2003 Definitional Questions
27TREC2003 Error Analysis
- Of 50 definitional questions, 9 received score
zero of those 9 questions, - 4 are due to faulty heuristics for question
interpretation that could not deal with the
question context - What is ETA in Spain? (The exact string ETA in
Spain is assumed to be the question term.) - What is Ph in biology?
- What is the medical condition shingles?
- Who is Akbar the Great? (Great is assumed to be
the last name of a person) - 1 is due to misspelling
- Who is Charles Lindberg? (The more common variant
should be Lindbergh)
28TREC 2003 Error Analysis (Continued)
- When a question received a low F score,
- Often because of low recall, not low precision.
- Average F score is 0.555 (BBN2003C)
- Assuming perfect precision (1.0) for all
questions, the score would be 0.614 - Assuming perfect recall (1.0) for all questions,
the score would be 0.797 - NIST F score designed to favor recall
- For some questions, low recall arises from
aggressive / errorful redundancy removal - Example, Who is Ari Fleisher?
- Ari Fleischer , Dole 's former spokesman who now
works for Bush - Ari Fleischer , a Bush spokesman
29Lessons Learned
- Approach yields interesting performance by
combining - Information retrieval
- Linguistic analysis
- Information extraction
- Redundancy detection
- Trainability
- From examples of biographies, organizational
profiles, and term dictionary/encyclopedia - Improves performance
- Selects items like those seen in human-generated
answers - Offers customizability
30Towards Automatic Evaluation
- Goal
- A repeatable, automatic scorer to allow frequent
experiments - Test 26 biographical questions with human
created answers (3 human answers/question) - ¼ from pilot corpus
- ¾ from BBN creation
- For each question, system produces the top N
response items that are less or equal to the size
of the manual answer - BLEU metric from machine translation evaluations
used - Answer brevity, which should be rewarded for
bio/def QA, is penalized by BLEU
31BLEU vs. Human Judgments
- Too few human judgments and too little data to
draw firm conclusions - BLEU promising for automatic evaluation of
progress in system development - May not be accurate enough for cross-system
evaluation - Result Used in our development
32Challenges (1)
- Modeling context to better rank importance to
user - Distinguishing redundancy
- Redundancy example
- Jeff Bezos, the former stock trader who founded
the company - Bezos was setting up Amazon.com
- Jeff Bezos opened the Seattle-based company in
1995 - Jeff Bezos started the company in his garage in
1994 - Jeff Bezos, Amazon's founder
- Organizing parts of answer by time, e.g.,
- Positions in career as part of biography
- Merger history of a company
33Challenges (2)
- Detecting Inconsistency, e.g.,
- Barbara Jordan, a reporter and former news editor
of The Chronicle of Willimantic, died Sunday of
cancer. 1999-10-04 - Jordan would keep the extent of her health
problems concealed until the day she died early
in 1996. - BARBARA JORDAN (1936-1996) Jordan was born in
Houston. - Jordan, 33, went on to become an all-conference
outfielder at CSUN 1999-09-26 - But Jordan, from Agoura, realized a different
goal last week - Detecting ambiguous questions, e.g.,
- Who is Barbara Jordan? (3 in the AQUAINT corpus)
- A newspaper editor,
- Texas congressperson,
- A softball coach
- Automatic evaluation
34Accomplishments
- New hybrid approach to finding extended answers
across documents - Answers questions regarding terms, organizations,
and persons (biographies) - Emphasizes high recall, avoiding redundancy, and
relative importance - Employs techniques from information retrieval,
linguistic analysis, information extraction - Performs very well given NIST evaluation
- Automatic approach to evaluating extended answers
- Developed answer taxonomy and distributed trained
name tagger to five other teams for their QA
systems (English, Arabic, and Chinese available) - Co-led pilot study with Dan Moldovan in
definitional questions, which became part of TREC
2003 QA evaluation