Evaluating High Accuracy Retrieval Techniques - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Evaluating High Accuracy Retrieval Techniques

Description:

1. Evaluating High Accuracy Retrieval Techniques. Chirag Shah ,W. Bruce Croft ... Method3: Using clarity scores to find terms to expand with WordNet. ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 28
Provided by: YiT9
Category:

less

Transcript and Presenter's Notes

Title: Evaluating High Accuracy Retrieval Techniques


1
Evaluating High Accuracy Retrieval Techniques
  • Chirag Shah ,W. Bruce Croft
  • Center for Intelligent Information Retrieval
  • Department of Computer Science
  • University of Massachusetts
  • Presented By Yi-Ting Chen

2
Outline
  • Introduction
  • Analysis of the problem
  • The approaches for modifying queries
  • Method1 giving more weight to the headwords.
  • Method2 Using clarity scores as weights.
  • Method3 Using clarity scores to find terms to
    expand with WordNet.
  • Experiments and analysis
  • Conclusion and future work

3
Introduction
  • Ad-hoc retrieval and question answering
  • two major research streams in the present IR.
  • Different goals- Ad-hoc retrieval is mainly about
    retrieving a set of documents with good precision
    and recall, whereas QA focuses on getting one
    correct answer or a small set of answers with
    high accuracy.
  • In this paper, tried to link these two achieving
    high accuracy with respect to the most relevant
    results.

4
Introduction
  • HARDTREC introduced a new track called High
    Accuracy Retrieval from Documents in 2003.
  • This paper task also deals with the problem of
    getting high accuracy in retrieval, but with
    contrast to HARD, do not make use of any
    additional information.
  • Studying how QA techniques can help in getting
    high accuracy for ad-hoc retrieval and propose a
    different measure for evaluation (MRR) instead of
    recall and precision.

5
Analysis of the problem
  • To facilitate the evaluation of a system with
    high results in the rank list, QA system use mean
    reciprocal rank (MRR) as the measure.
  • MRR id defined as the inverse of the rank of the
    retrieved result.
  • Investigates the problem of achieving high
    accuracy with this perspective and analyzes the
    reasons behind low accuracy.

6
Analysis of the problem
  • Baseline
  • TRECs Tipster vol. I and II as datasets and
    topics 51-200 (total 150) as queries (both title
    and description).
  • Used the language modeling framework for
    retrieval.
  • Used the Lemur toolkit for implementing our
    retrieval system.
  • The results of our baseline runs along with those
    of various QA systems of TREC-8 26, TREC-9
    27, TREC-10 28, and TREC-11 29 are given in
    table 1.

7
(No Transcript)
8
Analysis of the problem
  • We can see that our baselines have a correct
    document in rank 1 about 40.
  • Look at the distributions of queries with respect
    to the rank.
  • The average MRR could be increased by moving up
    relevant documents form lower ranks in the case
    of poorly performing queries, and by moving some
    of the big group of relevant document at rank 2
    to rank 1.

9
(No Transcript)
10
(No Transcript)
11
Analysis of the problem
  • Analyzing some bad queries
  • We could improve the MRR value in all the cases
    by improving the query.
  • There were three problems that we could identify
  • ambiguous words in the query.
  • mixtures of words of different importance in the
    query.
  • Query-document information mis-match.
  • Not all the words are equally important in a
    query.
  • Not all kinds of expansion can help.

12
(No Transcript)
13
The approaches for modifying queries
  • Method1 giving more weight to the headwords
    ?Find the headword in a given query and giving it
    more weight than other words of the query.
  • Parse the query for part-of speech (POS) tagging.
  • Find the first noun phrase using POS information.
  • Consider the last noun of this noun phrase as the
    headword.
  • Give this headword more weight.

14
The approaches for modifying queries
  • Method2 Using clarity scores as weights
  • The reason for poor retrieval is often the use of
    ambiguous words in the query.
  • To implement this idea, He used Cornen-Townsend
    etal.s technique of finding query clarity
    scores.
  • Computing the relative entropy between a query
    language model and the corresponding collection
    language model.

15
The approaches for modifying queries
  • Method2 Using clarity scores as weights
  • Find query clarity scores based on the technique.
  • Construct weighted queries with clarity score of
    each word as its weight as we want to give more
    weight to words that have high clarity scores.
  • Method3 Using clarity scores to find terms to
    expand with WordNet.
  • Query-dataset mismatch is another factor that
    affects the accuracy of the retrieval.
  • It is not useful to expand every word of the
    given query.

16
The approaches for modifying queries
  • Method3 Using clarity scores to find terms to
    expand with WordNet.
  • Find query clarity scores.
  • Divide all the terms into the following tree
    categories
  • Terms with high clarity scores should not be
    touched.
  • Terms with very low clarity scores are likely to
    be very ambiguous. These words are ignored.
  • Expand the terms whose scores are between the two
    limits of clarity scores using wordNet synonyms.

17
Experiments and analysis
  • More than 700,000 documents from Tipster
    collection and taking more than 2GB of disk
    space.
  • Extracted from topics 51-200 making total 150
    queries.
  • The experiments were conducted on both title
    queries as well as description queries.

18
Experiments and analysis
  • Title queries

19
Experiments and analysis
20
Experiments and analysis
21
Experiments and analysis
22
Experiments and analysis
  • Description query

23
Experiments and analysis
24
Experiments and analysis
25
Experiments and analysis
26
Experiments and analysis
  • We can see in the runs for title queries that we
    pushed more queries to rank1, we also got more
    queries at ranks higher than 100. This means that
    while trying to improve the queries, we also hurt
    some queries.
  • Runs for description queries be significantly
    better.

27
Conclusion and future work
  • Our focus in the presented work was to improve
    the MRR of the first relevant document, the
    proposed techniques also helped in improving
    overall precision in many case.
  • As one of our next steps in this research, we
    carried out experiments with relevance models.
  • We noticed that while bringing some queries up in
    the rank list, the model also drove some other
    down in the list.
Write a Comment
User Comments (0)
About PowerShow.com