Exploring Redundancy in Question Answering - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Exploring Redundancy in Question Answering

Description:

... the highest ranking passage from each document are discarded and the top k are ... top ranking passages. The TREC 100GB VLC2 corpus was used which was of lower ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 21
Provided by: Scor7
Category:

less

Transcript and Presenter's Notes

Title: Exploring Redundancy in Question Answering


1
Exploring Redundancy in Question Answering
Charles L. A. Clarke,Gordon V. Cormack, Thomas
R. Lynam
Group 4 Kuljit Singh Lesley Ponneri Phebu
George Pradeep Nayar
2
Question Answering
  • Find concise answers for short questions
  • Exploiting Redundancy in Question Answering
  •  

Example Question Who is the President of the
USA?   Answer George Bush
3
What is QA
  • Typical QA task requires concise answers to
    short
  • questions
  • Large target corpus used as the source for these
  • answers
  • Answers take the form of values
  • names, phrases, sentences or brief text
  • fragments

Unlike IR systems where, a full length document,
here a paragraph or a text fragment is retrieved
4
Architecture
Query
Question
Parsing
Passage Retrieval
Corpus
Passage
Answer Selection
Selection Rules
Answers
5
Components
  • Parser
  • Parser analyzes the question to generate
    two types of information
  • 1) A Query for submitting to the passage
    retrieval component
  • 2) Set of selection rules to extract answers
    like
  • Person, monetary Value, Date

Passage Retrieval This component executes
the query over the target corpus, retrieving a
ranked list of top k passages for further
analysis by the answer selection component
6
Components
Answer Selection Identifies
possible answers from the passage and then ranks
them on a variety of heuristics The
heuristics takes into account 1) Number of
times a candidate has occurred in the
component 2) Its location 3) Other special cases
information provided by selection rules
7
Passage Retrieval
Each document in the corpus is treated as an
ordered sequence of terms D d1,d2,d3..dm
A Query is generated from the questions and takes
the form of Q q1,q2,q3.
Term set T q2,q3
A passage from D is represented as an
extent(u,v), an ordered pair of coordinates with
1ltultvltm
An extent(u,v ) satisfies a term T ? Q if the
subsequence of D defined by the extent contains
at least one occurrence of each of the terms T
8
Example
Suppose we have a passage Microsoft's new Web
services software will allow developers to create
secure applications more easily and screen out
the kind of unauthorized commands that are
commonly used by malicious hackers. Query run is
Web services software where q1,q2,q3 are Web,
services and software Let term set T be services
software The extent above (dotted )is a cover
which satisfies T as the subsequence of D defined
by the extent contains at least one occurrence of
each of the terms in T contains no subsequence
that also satisfies T
9
Passage Retrieval
An extent(u,v) is a cover for T if (u,v)
satisfies T and the subsequence corresponding to
(u,v) contains no subsequence that also satisfies
T
We finally end with the equation that the
probability that an extent (u,v) contains all
the terms from T is ? log (N/f t ) - T
log (l) ....eqn 1 t? T
ft is the total number of times t appears in the
target Corpus
N is the total length of all the documents
l is the length of the extent
10
Passage Retrieval
The above eqn 1 assigns a higher score to a
passage whose probability of occurrence is lower
For a given Query Q ,generate all covers for all
subsets using the above equation. All but the
highest ranking passage from each document are
discarded and the top k are used for further
analysis
Implementation of this technique depends on a
fast algorithm that computes all covers
11
Answer Selection
  • The goal of TREC9 QA experiments was to select
    answers fragments of length 50 to 250 bytes

Candidates are single terms depending on the
category of the question
If a question asked for a proper noun The
candidate consists of those terms that match a
simple syntactic pattern of a proper noun
  • If the question asks for length The candidates
    consists of those numeric values that precede
    appropriate units
  • If not classified the candidate consists of all
    the non-query, non-stop word terms appearing in
    the retrieved passage

12
Answer Selection
The term is assigned a weight W ct
log(N/ft)
Of passages in which the term appeared
Relative Frequency of term in Database
ct represents the Redundancy
The weights of the candidates are used to select
answer fragments
13
Answer Selection
Score of each answer fragment is the summation
of the weights of the candidates in the fragment
  • Other heuristics used
  • a) Rank of the passage in which the fragment
    appears
  • b) Location of fragment relative to the
    center point

Once the highest scoring fragments are selected,
the weight of the candidates in that fragment is
reduced to zero
Fragments are re-scored, and highest scoring
fragment selected
  • This process is repeated until the 5 fragments
    are selected

14
Answer Selection
W ct log(N/ft)
The weights assigned are dependent on Redundancy
Factor and term Frequency Factor
The individual contribution can be ascertained by
setting one to a constant
15
Exploring Redundancy
  • A single category of questions, that requires
    the name of a person as the answer was explored
  • Redundancy was used to isolate the required
    name from the
  • top ranking passages
  • The TREC 100GB VLC2 corpus was used which was
    of lower
  • quality, implying that answers to all
    questions were not
  • present
  • A simple syntactic pattern was used to identify
    candidate
  • answers

16
Exploring Redundancy
  • For each query the top k passages were
    retrieved using
  • Equation 1
  • Each passage was expanded symmetrically about
    its center
  • point to w bytes
  • For the experiment the parameters k and w, the
    depth and
  • width respectively were varied
  • Candidate answers were identified in the
    passages and
  • assigned a score which is the count of the
    number of distinct
  • passages in which the candidate appeared (ct )
  • Ties were broken by applying a rule that takes
    into account the
  • distance of each candidate from the center
    point of the
  • passages

17
Results
  • Experiments with TREC-8 questions suggested
    that a depth of
  • k 50 and width of w 1000 would produce
    reasonable
  • results
  • Using these parameters, 49 (56) of the 87
    questions are
  • answered correctly and for 34 (39), a correct
    answer is
  • ranked first
  • Question runs for a range of depth and width
    values are listed

18
Table
19
Opinion
  • We rate the authors work at 7 on a scale from 1
    to 10

Reasons
  • Has a clear explanation of a QA System
  • The experiments were conducted on TREC 9
    corpus
  • The impact of their experiments on lower
    quality corpuses
  • was justified where there is no guarantee
    that answers
  • exist

20
Conclusion
Redundancy thus can be used as a method for
answer validation in a Question Answering
Systems
Write a Comment
User Comments (0)
About PowerShow.com