Exploring Redundancy in Question Answering - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

Exploring Redundancy in Question Answering

Description:

... the highest ranking passage from each document are discarded and the top k are ... top ranking passages. The TREC 100GB VLC2 corpus was used which was of lower ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 21

Provided by: Scor7

Category:

more less

Transcript and Presenter's Notes

Title: Exploring Redundancy in Question Answering

1
Exploring Redundancy in Question Answering
Charles L. A. Clarke,Gordon V. Cormack, Thomas
R. Lynam
Group 4 Kuljit Singh Lesley Ponneri Phebu
George Pradeep Nayar
2
Question Answering

Find concise answers for short questions
Exploiting Redundancy in Question Answering

Example Question Who is the President of the
USA? Answer George Bush
3
What is QA

Typical QA task requires concise answers to
short
questions
Large target corpus used as the source for these
answers

Answers take the form of values
names, phrases, sentences or brief text
fragments

Unlike IR systems where, a full length document,
here a paragraph or a text fragment is retrieved
4
Architecture
Query
Question
Parsing
Passage Retrieval
Corpus
Passage
Answer Selection
Selection Rules
Answers
5
Components

Parser
Parser analyzes the question to generate
two types of information
1) A Query for submitting to the passage
retrieval component
2) Set of selection rules to extract answers
like
Person, monetary Value, Date

Passage Retrieval This component executes
the query over the target corpus, retrieving a
ranked list of top k passages for further
analysis by the answer selection component
6
Components
Answer Selection Identifies
possible answers from the passage and then ranks
them on a variety of heuristics The
heuristics takes into account 1) Number of
times a candidate has occurred in the
component 2) Its location 3) Other special cases
information provided by selection rules
7
Passage Retrieval
Each document in the corpus is treated as an
ordered sequence of terms D d1,d2,d3..dm
A Query is generated from the questions and takes
the form of Q q1,q2,q3.
Term set T q2,q3
A passage from D is represented as an
extent(u,v), an ordered pair of coordinates with
1ltultvltm
An extent(u,v ) satisfies a term T ? Q if the
subsequence of D defined by the extent contains
at least one occurrence of each of the terms T
8
Example
Suppose we have a passage Microsoft's new Web
services software will allow developers to create
secure applications more easily and screen out
the kind of unauthorized commands that are
commonly used by malicious hackers. Query run is
Web services software where q1,q2,q3 are Web,
services and software Let term set T be services
software The extent above (dotted )is a cover
which satisfies T as the subsequence of D defined
by the extent contains at least one occurrence of
each of the terms in T contains no subsequence
that also satisfies T
9
Passage Retrieval
An extent(u,v) is a cover for T if (u,v)
satisfies T and the subsequence corresponding to
(u,v) contains no subsequence that also satisfies
T
We finally end with the equation that the
probability that an extent (u,v) contains all
the terms from T is ? log (N/f t ) - T
log (l) ....eqn 1 t? T
ft is the total number of times t appears in the
target Corpus
N is the total length of all the documents
l is the length of the extent
10
Passage Retrieval
The above eqn 1 assigns a higher score to a
passage whose probability of occurrence is lower
For a given Query Q ,generate all covers for all
subsets using the above equation. All but the
highest ranking passage from each document are
discarded and the top k are used for further
analysis
Implementation of this technique depends on a
fast algorithm that computes all covers
11
Answer Selection

The goal of TREC9 QA experiments was to select
answers fragments of length 50 to 250 bytes

Candidates are single terms depending on the
category of the question
If a question asked for a proper noun The
candidate consists of those terms that match a
simple syntactic pattern of a proper noun

If the question asks for length The candidates
consists of those numeric values that precede
appropriate units

If not classified the candidate consists of all
the non-query, non-stop word terms appearing in
the retrieved passage

12
Answer Selection
The term is assigned a weight W ct
log(N/ft)
Of passages in which the term appeared
Relative Frequency of term in Database
ct represents the Redundancy
The weights of the candidates are used to select
answer fragments
13
Answer Selection
Score of each answer fragment is the summation
of the weights of the candidates in the fragment

Other heuristics used
a) Rank of the passage in which the fragment
appears
b) Location of fragment relative to the
center point

Once the highest scoring fragments are selected,
the weight of the candidates in that fragment is
reduced to zero
Fragments are re-scored, and highest scoring
fragment selected

This process is repeated until the 5 fragments
are selected

14
Answer Selection
W ct log(N/ft)
The weights assigned are dependent on Redundancy
Factor and term Frequency Factor
The individual contribution can be ascertained by
setting one to a constant
15
Exploring Redundancy