Title: Lab. for Intelligent Internet Research
2Is Question Answeringan Acquired Skill?
- Soumen ChakrabartiIIT Bombay
WithGanesh RamakrishnanDeepa Paranjpe Vijay
Krishnan Arnab Nandi
3Web search and QA
- Information need words relating things
thing aliases telegraphic Web queries - Cheapest laptop with wireless ?best price laptop
802.11 - Why is the sky blue? ? sky blue reason
- When was the Space Needle built? ?Space Needle
history - Entity relation extraction technology better
than ever (SemTag, KnowItAll, Biotext) - Ontology extension (e.g., is a kind of)
- List extraction (e.g., is an instance of)
- Slot-filling (author X wrote book Y)
4Factoid QA
- Specialize given domain to a token related to
ground constants in the query - What animal is Winnie the Pooh?
- hyponym(animal) NEAR Winnie the Pooh
- When was television invented?
- instance-of(time) NEAR television NEAR
synonym(invented) - FIND x NEAR GroundConstants(question) WHERE x
IS-A Atype(question) - Ground constants Winnie the Pooh, television
- Atypes animal, time
5A relational view of QA
Attributeor columnname
Locate whichcolumn to read
Entity class
Limit searchto certain rows
Answer zone
Answer zone
- Entity class or atype may be expressed by
- A finite IS-A hierarchy (e.g. WordNet, TAP)
- A surface pattern matching infinitely many
strings (e.g. digit, Xx, preceded by a
preposition) - Match selectors, specialize atype to answer tokens
6Benefits of the relational view
- Scaling up by dumbing down
- Next stop after vector-space
- Far short of real knowledge representation and
inference - Barely getting practical at (near) Web scale
- Can set up as a learning problem train with
questions (query logs) and answers in context - Transparent, self-tuning, easy to deploy
- Feature extractors used in entity taggers
- Relational/graphical learning on features
7What TREC QA feels like
- How to assemble chunker, parser, POS and NE
tagger, WordNet, WSD, into a QA system? - Experts get much insight from old QA pairs
- Matching an upper-cased term adds a 60 bonus
for multi-words terms and 30 for single words - Matching a WordNet synonym discounts by 10
(lower case) and 50 (upper case) - Lower-case term matches after Porter stemming are
discounted 30 upper-case matches 70
8Talk outline
- Relational interpretation of QA
- Motivation for a clean-room IEML system
- Learning to map between questions and answers
using is-a hierarchies and IE-style surface
patterns - Can handle prominent finite set of atypes
person, place, time, measurements, - Extending to arbitrary atype specializations
- Required for what and which questions
- Ongoing work and concluding remarks
9Feature Soft match
- FIND x NEAR GroundConstants(question) WHERE x
IS-A Atype(question) - No fixed question or answer type system
- Convert x IS-A Atype(question) to a soft match
DoesAtypeMatch(x, question)
Answer tokens
IE-style surfacefeature extractors
IE-style surfacefeature extractors
Question feature vector
WordNet hypernymfeature extractors
Learn joint distrib.
Snippet feature vector
10Feature extraction Intuition
NNP, person
paper_moneyn1 currencyn1
writer, composer,artist, musician
A cheetah can chase its preyat up to 90 km/h
Nothing moves faster than186,000 miles per hour,
thespeed of light
How fast can a cheetah run?
How fast does light travel?
11Feature extractors
- Question features 1, 2, 3-token sequences
starting with standard wh-words - Passage surface features hasCap, hasXx,
isAbbrev, hasDigit, isAllDigit, lpos, rpos, - Passage WordNet features all noun hypernym
ancestors of all senses of token - Get top 300 passages from IR engine
- For each token invoke feature extractors
- Label 1 if token is in answer span, 0 o/w
- Question vector xq, passage vector xp
12Preliminary likelihood ratio tests
- Surface patterns WordNet hypernyms
13A simple, flat conditional model
- Let x xq ? xp (pairwise product of elems)
- Model Pr(Y1x) exp(w?x)/(1exp(w?x))
- For every question-feature, passage-feature pair,
w has a parameter - Expect to performbetter than linearmodel
x(xp,xq) - Can discount for redundancy in pair info
- If xq (xp) is fixed, what xp (xq) will yield the
largest Pr(Y1x)? (linear iceberg query)
14Classification accuracy
- Pairing more accurate than linear model
- Steep learning curve linear never gets it
beyond prior atypes like proper nouns (common
in TREC) - Are the estimated w parameters meaningful?
15Parameter anecdotes
- Surface and WordNet features complement each
other - General concepts get negative params use in
predictive annotation - Learning is symmetric (Q?A)
16Query-driven information extraction
- Basis of atypes A, a ? A could be a synset, a
surface pattern, feature of a parse tree - Question q projected to vector (wa a ? A) in
atype space via learning conditional model - E.g. if q is when or how long whasDigit and
wtime_periodn1 are large, wregionn1 is small - Each corpus token t has associated indicator
features ?a(t ) for every a - E.g. ?hasDigit(3,000) ?is-a(regionn1)(Japan)
1 - Can also learn 0,1 value of is-a proximity
17Single token scoring
- A token t is a candidate answer if
- Hq(t ) Reward tokens appearing near selectors
matched from question - 0/1 appears within fixed window with selector/s
- Activation in linear token sequence model
- Proximity in chunk sequences, parse trees,
- Order tokens by decreasing
Projection of questionto atype space
Atype indicator features of the token
the armadillo, found in Texas, is covered with
strong horny plates
18Mean reciprocal rank (MRR)
- nq smallest rank among answer passages
- MRR (1/Q) ?q?Q(1/nq)
- Dropping passage from 1 to 2 as bad as dropping
it from 2 to ? - TREC requires MRR5 round up nqgt5 to ?
- Improving rank from 20 to 6 as useless as
improving it from 20 to 15 - Aggregate score influenced by many complex
subsystems - Complete description rarely available
19Effect of eliminating non-answers
- 300 top IR score hits
- If Pr(Y1token) lt threshold reject token
- All tokens rejected then reject passage
- Present survivors in IR order
20Drill-down and ablation studies
- Scale average MRR improvement to 1
- What, Which lt average
- Who ?? average
- Atype of what and which not captured well by
3-grams starting at wh-words - Atype ranges over essentially infiniteset with
relativelylittle training data
22What, which, name atype clues
- Assumption Question sentence has a wh-word and a
main/auxiliary verb - Observation Atype clues are embedded in a noun
phrase (NP) adjoining the main or auxiliary verb - Heuristic Atype clue head of this NP
- Use a shallow parser and apply rule
- Head can have attributes
- Which (American (general)) is buried in Salzburg?
- Name (Saturns (largest (moon)))
23Atype clue extraction stats
- Simple heuristic quite effective
- If successful, extracted atype is mapped to
WordNet synset (moon?celestial body etc.) - If no atype of this form available, try the
self-evident atypes (who, when, where, how_X
etc.) - New boolean feature for candidate token is token
hyponym of atype synset?
24The last piece Learning selectors
- Which question words are likely to appear
(almost) unchanged in an answer passage? - Constants in select-clauses of SQL queries
- Guides backoff policy for keyword query
- Local and global features
- POS of word, POS of adjacent words, case info,
proximity to wh-word - Suppose word is associated with synset set S
- NumSense size of S (how polysemous is the
word?) - NumLemma average lemmas describing s ? S
25Selector results
- Global features (IDF, NumSense, NumLemma)
essential for accuracy - Best F1 accuracy with local features alone
7173 - With local and global features 81
- Decision trees better than logistic regression
- F181 as against LR F175
- Intuitive decision branches
- But logistic regression gives scores for query
26Putting together a QA system
Learning tools
Shallow parser
N-E Tagger
27Putting together a QA system
Keyword querygenerator
Keyword query
Sentence splitterPassage indexer
28Learning to re-rank passages
- Remove passage tokens matching selectors
- User already knows these are in passage
- Find passage token/s specializing atype
- For each candidate token collect
- Atype of question, original rank of passage
- Min, avg linear distances to matched selectors
- POS and entity tag of token if available
How many inhabitants live in the town of Ushuaia
Ushuaia, a port of about 30,000 dwellers set
between the Beagle Channel and
29Re-ranking results
- Categorical andnumeric attributes
- Logistic regression
- Good precision,poor recall
- Use logit score tore-rank passages
- Rank of first correctpassage shifts
30MRR gains from what, which, name
- Substantial gain in MRR
- What/which now show above-average MRR gains
- TREC 2000 top MRRs0.76 0.71 0.46 0.46 0.31
31Generalization across corpora
- Across-year numbers close to train/test split on
a single year - Features and model seem to capture
corpus-independent linguistic QA artifacts
- Clean-room QA feature extractionlearning
- Recover structure info from question
- Learn correlations between question structure and
passage features - Competitive accuracy with negligible domain
expertise or manual intervention - Ongoing work
- Model how selector and atype are related
- Model coefficients to predictive annotation
- Combine token scores to better passage scores
- Treat all question types uniformly
- Use redundancy available from the Web