Lab. for Intelligent Internet Research

1 / 32

About This Presentation

Title:

Lab. for Intelligent Internet Research

Description:

What animal is Winnie the Pooh? hyponym('animal') NEAR 'Winnie the Pooh' ... Ground constants: Winnie the Pooh, television. Atypes: animal, time. QA. Chakrabarti ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 33

Provided by: soumencha

more less

Transcript and Presenter's Notes

Title: Lab. for Intelligent Internet Research

1
(No Transcript)
2
Is Question Answeringan Acquired Skill?

Soumen ChakrabartiIIT Bombay

WithGanesh RamakrishnanDeepa Paranjpe Vijay
Krishnan Arnab Nandi
3
Web search and QA

Information need words relating things
thing aliases telegraphic Web queries
Cheapest laptop with wireless ?best price laptop
802.11
Why is the sky blue? ? sky blue reason
When was the Space Needle built? ?Space Needle
history
Entity relation extraction technology better
than ever (SemTag, KnowItAll, Biotext)
Ontology extension (e.g., is a kind of)
List extraction (e.g., is an instance of)
Slot-filling (author X wrote book Y)

4
Factoid QA

Specialize given domain to a token related to
ground constants in the query
What animal is Winnie the Pooh?
hyponym(animal) NEAR Winnie the Pooh
When was television invented?
instance-of(time) NEAR television NEAR
synonym(invented)
FIND x NEAR GroundConstants(question) WHERE x
IS-A Atype(question)
Ground constants Winnie the Pooh, television
Atypes animal, time

5
A relational view of QA
Question
Atypeclues
Attributeor columnname
Selectors
Locate whichcolumn to read
Directsyntacticmatch
Entity class
IS-A
Limit searchto certain rows
Answerpassage
Questionwords
Answer zone
Answer zone

Entity class or atype may be expressed by
A finite IS-A hierarchy (e.g. WordNet, TAP)
A surface pattern matching infinitely many
strings (e.g. digit, Xx, preceded by a
preposition)
Match selectors, specialize atype to answer tokens

6
Benefits of the relational view

Scaling up by dumbing down
Next stop after vector-space
Far short of real knowledge representation and
inference
Barely getting practical at (near) Web scale
Can set up as a learning problem train with
questions (query logs) and answers in context
Transparent, self-tuning, easy to deploy
Feature extractors used in entity taggers
Relational/graphical learning on features

7
What TREC QA feels like

How to assemble chunker, parser, POS and NE
tagger, WordNet, WSD, into a QA system?
Experts get much insight from old QA pairs
Matching an upper-cased term adds a 60 bonus
for multi-words terms and 30 for single words
Matching a WordNet synonym discounts by 10
(lower case) and 50 (upper case)
Lower-case term matches after Porter stemming are
discounted 30 upper-case matches 70

8
Talk outline

Relational interpretation of QA
Motivation for a clean-room IEML system
Learning to map between questions and answers
using is-a hierarchies and IE-style surface
patterns
Can handle prominent finite set of atypes
person, place, time, measurements,
Extending to arbitrary atype specializations
Required for what and which questions
Ongoing work and concluding remarks

9
Feature Soft match

FIND x NEAR GroundConstants(question) WHERE x
IS-A Atype(question)
No fixed question or answer type system
Convert x IS-A Atype(question) to a soft match
DoesAtypeMatch(x, question)

Passage
Question
Answer tokens
IE-style surfacefeature extractors
IE-style surfacefeature extractors
Question feature vector
WordNet hypernymfeature extractors
Learn joint distrib.
Snippet feature vector
10
Feature extraction Intuition
how
who
abstractionn6NNS
NNP, person
fast
many
far
rich
wrote
first
raten2
explorer
milen3linear_unitn1
paper_moneyn1 currencyn1
writer, composer,artist, musician
measuren3definite_quantityn1
raten2magnitude_relationn1
A cheetah can chase its preyat up to 90 km/h
Nothing moves faster than186,000 miles per hour,
thespeed of light
How fast can a cheetah run?
How fast does light travel?
11
Feature extractors

Question features 1, 2, 3-token sequences
starting with standard wh-words
Passage surface features hasCap, hasXx,
isAbbrev, hasDigit, isAllDigit, lpos, rpos,
Passage WordNet features all noun hypernym
ancestors of all senses of token
Get top 300 passages from IR engine
For each token invoke feature extractors
Label 1 if token is in answer span, 0 o/w
Question vector xq, passage vector xp

12
Preliminary likelihood ratio tests

Surface patterns WordNet hypernyms

13
A simple, flat conditional model

Let x xq ? xp (pairwise product of elems)
Model Pr(Y1x) exp(w?x)/(1exp(w?x))
For every question-feature, passage-feature pair,
w has a parameter
Expect to performbetter than linearmodel
x(xp,xq)
Can discount for redundancy in pair info
If xq (xp) is fixed, what xp (xq) will yield the
largest Pr(Y1x)? (linear iceberg query)

14
Classification accuracy

Pairing more accurate than linear model
Steep learning curve linear never gets it
beyond prior atypes like proper nouns (common
in TREC)
Are the estimated w parameters meaningful?

15
Parameter anecdotes

Surface and WordNet features complement each
other
General concepts get negative params use in
predictive annotation
Learning is symmetric (Q?A)

16
Query-driven information extraction

Basis of atypes A, a ? A could be a synset, a
surface pattern, feature of a parse tree
Question q projected to vector (wa a ? A) in
atype space via learning conditional model
E.g. if q is when or how long whasDigit and
wtime_periodn1 are large, wregionn1 is small
Each corpus token t has associated indicator
features ?a(t ) for every a
E.g. ?hasDigit(3,000) ?is-a(regionn1)(Japan)
1
Can also learn 0,1 value of is-a proximity

17
Single token scoring

A token t is a candidate answer if
Hq(t ) Reward tokens appearing near selectors
matched from question
0/1 appears within fixed window with selector/s
Activation in linear token sequence model
Proximity in chunk sequences, parse trees,
Order tokens by decreasing

Projection of questionto atype space
Atype indicator features of the token
the armadillo, found in Texas, is covered with
strong horny plates
18
Mean reciprocal rank (MRR)

nq smallest rank among answer passages
MRR (1/Q) ?q?Q(1/nq)
Dropping passage from 1 to 2 as bad as dropping
it from 2 to ?
TREC requires MRR5 round up nqgt5 to ?
Improving rank from 20 to 6 as useless as
improving it from 20 to 15
Aggregate score influenced by many complex
subsystems
Complete description rarely available

19
Effect of eliminating non-answers

300 top IR score hits
If Pr(Y1token) lt threshold reject token
All tokens rejected then reject passage
Present survivors in IR order

20
Drill-down and ablation studies

Scale average MRR improvement to 1
What, Which lt average
Who ?? average
Atype of what and which not captured well by
3-grams starting at wh-words
Atype ranges over essentially infiniteset with
relativelylittle training data

21
Talk outline

Relational interpretation of QA
Motivation for a clean-room IEML system
Learning to map between questions and answers
using is-a hierarchies and IE-style surface
patterns
Can handle prominent finite set of atypes
person, place, time, measurements,
Extending to arbitrary atype specializations
Required for what and which questions
Ongoing work and concluding remarks

22
What, which, name atype clues

Assumption Question sentence has a wh-word and a
main/auxiliary verb
Observation Atype clues are embedded in a noun
phrase (NP) adjoining the main or auxiliary verb
Heuristic Atype clue head of this NP
Use a shallow parser and apply rule
Head can have attributes
Which (American (general)) is buried in Salzburg?
Name (Saturns (largest (moon)))

23
Atype clue extraction stats

Simple heuristic quite effective
If successful, extracted atype is mapped to
WordNet synset (moon?celestial body etc.)
If no atype of this form available, try the
self-evident atypes (who, when, where, how_X
etc.)
New boolean feature for candidate token is token
hyponym of atype synset?

24
The last piece Learning selectors

Which question words are likely to appear
(almost) unchanged in an answer passage?
Constants in select-clauses of SQL queries
Guides backoff policy for keyword query
Local and global features
POS of word, POS of adjacent words, case info,
proximity to wh-word
Suppose word is associated with synset set S
NumSense size of S (how polysemous is the
word?)
NumLemma average lemmas describing s ? S

POS_at_0
POS_at_1
POS_at_-1
25
Selector results

Global features (IDF, NumSense, NumLemma)
essential for accuracy
Best F1 accuracy with local features alone
7173
With local and global features 81
Decision trees better than logistic regression
F181 as against LR F175
Intuitive decision branches
But logistic regression gives scores for query
backoff

26
Putting together a QA system
Learning tools
TrainingCorpus
Shallow parser
Wordnet
QASystem
POSTagger
N-E Tagger
27
Putting together a QA system
Question
Keyword querygenerator
Keyword query
PassageIndex
Candidatepassage
Sentence splitterPassage indexer
Corpus
28
Learning to re-rank passages