Answering Questions through Understanding - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Answering Questions through Understanding

Description:

Ralph Weischedel, Ana Licuanan, Scott Miller, Jinxi Xu. 4 December 2003. 2. AQUAINT ... Spouse-of (e.g. 'Clinton', 'Hillary') Founder-of (e.g. 'Gates', 'Microsoft' ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 35

Provided by: jac83

Category:

more less

Transcript and Presenter's Notes

Title: Answering Questions through Understanding

1
Answering Questions throughUnderstanding
AnalysisBBN AQUA

Ralph Weischedel, Ana Licuanan, Scott Miller,
Jinxi Xu

4 December 2003
2
Executive Summary of Accomplishments

Technical innovation
New hybrid approach to finding extended answers
across documents
Answers questions regarding terms, organizations,
and persons (biographies)
Performs very well given NIST TREC QA evaluation
Automatic approach to evaluating extended answers
Collaborative contributions
Developed answer taxonomy and distributed trained
name tagger to five other teams for their QA
systems
Co-led pilot study with Dan Moldovan in
definitional questions, which became part of TREC
2003 QA evaluation
Question classification training data distributed
to UMass

3
Outline

Approach
Component technologies
Factoid QA
Extended answers from multiple documents
Accomplishments

4
Approach

Overview
Key Components
Factoid/List Questions
Questions requiring Extended Answers

5
BBNs Hybrid Approach to QA

Theme Extract features from questions and
answers using various technologies
Stems/words (from document retrieval)
Names and descriptions (from information
extraction)
Parse trees
Entity recognition (from information extraction)
Proposition recognition (from information
extraction)
Analyze the question
Reduce question to propositions and a bag of
words
Predict the type of the answer
Finding answers
Rank candidate answers using passage retrieval
from primary corpus (the Aquaint corpus)
Other knowledge sources (e.g. the Web
structured data) are optionally used to rerank
answers
Re-rank candidates based on all features
(propositions, patterns, etc.)
Eliminate redundancy (for list questions and
extended answers)
Estimate confidence for answers
Presenting the answer

6
Question Classification

A hybrid approach based on rules and statistical
parsing question templates
Match question templates against statistical
parses
Back off to statistical bag-of-word
classification
Example features used for classification
The type of WHNP starting the question (e.g.
Who, What, When )
The headword of the core NP
WordNet definition
Bag of words
Main verb of the question
Example Which pianist won the last
International Tchaikovsky Competition?
Headword of core NPpianist,
WordNet definitionperson
Answer type Person

7
Question-Answer Types
Thanks to USC/ISI and IBM groups for sharing the
conclusions of their analyses.
8
Question Answer Types (contd)
9
Frequency of Q Types
10
Name Extraction via Hidden Markov Models (HMMs)
The delegation, which included the commander of
the U.N. troops in Bosnia, Lt. Gen. Sir Michael
Rose, went to the Serb stronghold of Pale, near
Sarajevo, for talks with Bosnian Serb leader
Radovan Karadzic.
Training Program
training sentences
answers

HMM
Entities
Text
Extractor

Performance
Over 90 F on English newswire
72 F on English broadcasts with 30 word
error rate
85 F on Chinese newswire
76 F on Chinese OCR with 15 word error
88 F on Arabic news
90 F on Spanish news

11
Name Extraction to Pinpoint Candidate Short
Answers

IdentiFinderTM extracts names for 24 types
Current IdentiFinder performance on types
IdentiFinder easily trainable for other
languages, e.g., Arabic and Chinese
Distributed to Carnegie-Mellon University,
Columbia Univ., Univ. of Albany, Univ. of
Colorado, MIT, USC/Information Sciences Institute

12
Parsing via Lexicalized Probabilistic CFGs

Performance
88 F on English newswire with 900,000 word
training set
81 F on English newswire with 100,000 word
training set
80 F on Chinese newswire with 100,000 word
training set
74 F on Arabic newswire with 100,000 word
training set

13
Proposition Indexing

A shallow semantic representation
Deeper than bags of words
But broad enough to cover all the text
Characterizes documents by
Entities they contain
Propositions (relations) involving those entities
Resolves all references to entities
Whether named, described, or pronominal

14
Proposition Finding Example

Propositions
(e1 Dell)
(e2 Comaq)
(e3 the most PCs)
(e4 2001)
(sold subje1, obje3, ine4)
(beating subje1, obje2)

Question Which company sold the most PCs in
2001?
Text Dell, beating Compaq, sold the most PCs in
2001.
Passage retrieval would select the wrong answer

Answer
15
Proposition Recognition Strategy

Start with a lexicalized, probabilistic (LPCFG)
parsing model
Distinguish names by replacing NP labels with NPP
Currently, rules normalize the parse tree to
produce propositions
At a later date, extend the statistical model to
Predict argument labels for clauses
Resolve references to entities

16
Basic System for Factoid/List Questions
Question
17
Confidence Estimation

Compute probability P(correctQ,A) from the
following features
P(correctQ,A)?P(correcttype(Q), ltm,ngt, PropSat)
type(Q) question type
m question length
n number of matched question words in answer
context
PropSat whether answer satisfies propositions in
the question
Confidence for answers found on the Web
P(correctQ,A)?P(correctFreq, InTrec)
FreqNumber of Web hits, using Google
InTrecWhether Q was also a top answer from
Aquaint corpus

18
Technique for Questions Requiring Extended
Answers

Select nuggets (phrases) by feature
Linguistic features
Appositives
Copula constructions
Surface structure patterns
Propositions
Semantic features from information extraction
Co-reference within document
Relations
Rank features via information retrieval
Remove redundancy
Cut off at target length of answer

19
Providing Extended Answers
20
Linguistic Features

Method
Good features include the target (QTERM) as an
argument in
Propositions
Appositives
Copulas extracted from parse trees
Surface structure patterns
Example Blobel, a biologist at Rockefeller
University, won the Nobel Prize in Medicine.
Proposition ltsubgtQTERM ltverbgtwon
ltobjgtprize
Appositive ltappositivegtbiologist
Example The court , formally known as the
International Court of Justice , is the judicial
arm of the United Nations .
Copula ltcopulagtthe judicial arm of the United
Nations
Example The International Court of Justice is
composed of 15 Judges and has its headquarters at
The Hague .
Surface structure patterns QTERM is composed
of NP

21
Semantic Features

Method
SERIF, a state of the art information extraction
engine, was used
Co-reference used for name comparison, e.g.,
Depending on context, he and Bush may be the
same person
Relations used as additional features for
sentence selection. Types of relations include
Spouse-of (e.g. Clinton, Hillary)
Founder-of (e.g. Gates, Microsoft)
Management-of (e.g. Welch, GE)
Residence-of (e.g. John Doe, Boston)
Citizenship-of (e.g. John Doe, American)
Staff-of (e.g. Weischedel, BBN)

22
Relation Types (8/1/2003)

Person-Organization
Affiliation
Owner
Founder
Management
Client
General Staff
Located-at
Member
Other
Person-Location
Resident-of
Citizen-of
Located-at
Other
Organization-Location
Located-in

Organization-Organization
Subsidiary
Affiliate
Client
Member
Other
Person-Person
Parent
Sibling
Spouse
Grandparent
Client
Associate
Contact
Manager
Other-relative
Other-professional
Other

23
How to Extract Phrases for Features

Motivation
A good sentence may contain portions irrelevant
to the question
Goal extract only the pertinent parts of a
sentence
Method
Operations are performed on parse trees
Find the smallest phrase that contains all the
arguments of an important fact (i.e. proposition,
appositive, copula, relation, etc.)
Relative clauses not attached to the question
term are trimmed from phrase

24
How to Extract Phases for Features

Examples
In 1971, Blobel and Dr. David D. Sabatini, who
now heads cell biology at New York University
School of Medicine, proposed a bold idea known as
the signal hypothesis.
Proposition verbproposed subTERM objidea
Phrase "In 1971 , Blobel and Dr. David D.
Sabatini , , proposed a bold idea known as the
signal hypothesis .
Though Warner - Lambert has been one of the drug
industry's hottest performers -- routinely
reporting quarterly earnings gains of more than
30 percent -- analysts were concerned that its
pipeline lacked the depth of those of some
competitors.
Copula Warner Lambert has been
Phrase Warner - Lambert has been one of the
drug industry's hottest performers -- routinely
reporting quarterly earnings gains of more than
30 percent

25
Ranking of Features

Each feature is reduced to a bag of words
All features are ranked according to two factors
The type of the feature
Appositives/copulas gt patterns gt special
propositions gt relations gt propositions/sentences
Similarity score (tf.idf) between the feature and
the question profile
The question profile is a bag of words, which
models the importance of different words in
defining the question
Profile is compiled from
Existing definitions (e.g., Webster dictionary,
Encyclopedia, etc.)
A collection of human created biographies
The centriod of all features

26
Performance in TREC 2003
TREC 2003 Definitional Questions
27
TREC2003 Error Analysis

Of 50 definitional questions, 9 received score
zero of those 9 questions,
4 are due to faulty heuristics for question
interpretation that could not deal with the
question context
What is ETA in Spain? (The exact string ETA in
Spain is assumed to be the question term.)
What is Ph in biology?
What is the medical condition shingles?
Who is Akbar the Great? (Great is assumed to be
the last name of a person)
1 is due to misspelling
Who is Charles Lindberg? (The more common variant
should be Lindbergh)

28
TREC 2003 Error Analysis (Continued)

When a question received a low F score,
Often because of low recall, not low precision.
Average F score is 0.555 (BBN2003C)
Assuming perfect precision (1.0) for all
questions, the score would be 0.614
Assuming perfect recall (1.0) for all questions,
the score would be 0.797
NIST F score designed to favor recall
For some questions, low recall arises from
aggressive / errorful redundancy removal
Example, Who is Ari Fleisher?
Ari Fleischer , Dole 's former spokesman who now
works for Bush
Ari Fleischer , a Bush spokesman

29
Lessons Learned

Approach yields interesting performance by
combining
Information retrieval
Linguistic analysis
Information extraction
Redundancy detection
Trainability
From examples of biographies, organizational
profiles, and term dictionary/encyclopedia
Improves performance
Selects items like those seen in human-generated
answers
Offers customizability

30
Towards Automatic Evaluation

Goal
A repeatable, automatic scorer to allow frequent
experiments
Test 26 biographical questions with human
created answers (3 human answers/question)
¼ from pilot corpus
¾ from BBN creation
For each question, system produces the top N
response items that are less or equal to the size
of the manual answer
BLEU metric from machine translation evaluations
used
Answer brevity, which should be rewarded for
bio/def QA, is penalized by BLEU

31
BLEU vs. Human Judgments

Too few human judgments and too little data to
draw firm conclusions
BLEU promising for automatic evaluation of
progress in system development
May not be accurate enough for cross-system
evaluation
Result Used in our development

32
Challenges (1)

Modeling context to better rank importance to
user
Distinguishing redundancy
Redundancy example
Jeff Bezos, the former stock trader who founded
the company
Bezos was setting up Amazon.com
Jeff Bezos opened the Seattle-based company in
1995
Jeff Bezos started the company in his garage in
1994
Jeff Bezos, Amazon's founder
Organizing parts of answer by time, e.g.,
Positions in career as part of biography
Merger history of a company

33
Challenges (2)

Detecting Inconsistency, e.g.,
Barbara Jordan, a reporter and former news editor
of The Chronicle of Willimantic, died Sunday of
cancer. 1999-10-04
Jordan would keep the extent of her health
problems concealed until the day she died early
in 1996.
BARBARA JORDAN (1936-1996) Jordan was born in
Houston.
Jordan, 33, went on to become an all-conference
outfielder at CSUN 1999-09-26
But Jordan, from Agoura, realized a different
goal last week
Detecting ambiguous questions, e.g.,
Who is Barbara Jordan? (3 in the AQUAINT corpus)
A newspaper editor,
Texas congressperson,
A softball coach
Automatic evaluation

34
Accomplishments

New hybrid approach to finding extended answers
across documents
Answers questions regarding terms, organizations,
and persons (biographies)
Emphasizes high recall, avoiding redundancy, and
relative importance
Employs techniques from information retrieval,
linguistic analysis, information extraction
Performs very well given NIST evaluation
Automatic approach to evaluating extended answers
Developed answer taxonomy and distributed trained
name tagger to five other teams for their QA
systems (English, Arabic, and Chinese available)
Co-led pilot study with Dan Moldovan in
definitional questions, which became part of TREC
2003 QA evaluation