RETRIEVAL EVALUATION - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

RETRIEVAL EVALUATION

Description:

Time lag average interval between the time the user query request is made and ... b. A query request was submitted and the following documents were retrieved and ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 23
Provided by: FSK6
Category:

less

Transcript and Presenter's Notes

Title: RETRIEVAL EVALUATION


1
Chapter 2
  • RETRIEVAL EVALUATION

2
INTRODUCTION
  • Evaluation necessary
  • Why evaluate ?
  • What to evaluate?
  • How to evaluate?

3
WHY EVALUATE
  • Need to know the advantages and disadvantages of
    using a particular IRS . The user should be be
    able to decide whether he / she wants to use an
    IRS based on evaluation results.
  • The user should also be able to decide whether it
    is cost-effective to use a particular IRS based
    on evaluation results.

4
WHAT TO EVALUATE
  • What can be measured and should reflect the
    ability of the IRS to satisfy user needs.
  • Coverage of system to what extent the IRS
    includes relevant material
  • Time lag average interval between the time the
    user query request is made and the time taken to
    obtain an answer set.
  • Form of presentation of output
  • Effort involved on part of user in getting
    answers to his / her query request.
  • Recall of the IRS - of relevant materials
    actually retrieved in the answer to a query
    request.
  • Precision of the IRS - of retrieved material
    that is actually relevant.

5
HOW TO EVALUATE?
  • Various methods available.

6
EVALUATION
  • 2 main processes in IR
  • User query request/query request/ information
    query/query retrieval strategy / search request
  • Answer set / Hits
  • Need to know whether the documents retrieved in
    the answer set fulfills the user query request.
  • Evaluation process known as retrieval performance
    evaluation.
  • Evaluation is based on 2 main components
  • Test reference collection
  • Evaluation measure.

7
EVALUATION
  • Test reference collection consists of
  • A collection of documents
  • A set of example information requests
  • A set of relevant documents (provided by
    specialists) for each information request
  • 2 interrelated measures RECALL and PRECISION

8
RETRIEVAL PERFORMANCE EVALUATION
  • Relevance
  • Recall and Precision
  • Parameters defined
  • I information request
  • R set of relevant documents
  • R number of documents in this set
  • A document answer set retrieved by the
    information request
  • A number of documents in this set
  • Ra number of documents in the intersection of
    sets R and A

9
RETRIEVAL PERFORMANCE EVALUATION
  • Recall fraction of the relevant documents (set
    R) which have been retrieved
  • Ra
  • R -----
  • R
  • Precision fraction of the retrieved documents
    (set A) which is relevant
  • Ra
  • P -----
  • A

10
Collection
Relevant docs in answer set Ra
Relevant docs R
Answer set A
Precision and Recall for a given example
information request
11
RETRIEVAL PERFORMANCE EVALUATION
  • Recall and precision are expressed as
  • Sorted by degree of relevance or ranking.
  • User will see a ranked list.

12
RETRIEVAL PERFORMANCE EVALUATION
  • a. 10 documents in an IRS with a collection of
    100 documents has been identified by specialists
    as being relevant to a particular query request
    - d3, d5, d9, d25, d39, d44, d56, d71, d89, d123
  • b. A query request was submitted and the
    following documents were retrieved and ranked
    according to relevance.

13
RETRIEVAL PERFORMANCE EVALUATION
  • d123
  • d84
  • d56
  • d6
  • d8
  • d9
  • d511
  • d129
  • d187
  • d25
  • d38
  • d48
  • d250
  • d113
  • d3

14
RETRIEVAL PERFORMANCE EVALUATION
  • c. Only 5 documents retrieved (d123, d56, d9,
    d25, d3) are relevant to the query and matches
    the ones in (a).

15
  • d123 ranked 1st R1/10 x 100 10
  • P1/1 x 100 100
  • d56 ranked 3rd R2/10 x 100 20
  • P2/3 x100 66
  •  
  • d9 ranked 6th R3/10 x 100 30
  • P3/6 x 100 50
  •  
  • d25 ranked 10th R4/10 x 100 40
  • P4/10 x 100 40
  •  
  • d3 ranked 15th R5/10 x 100 50
  • P5/15 x 100 33

16
A relevant documents non-relevant
documentsC retrieved documentsC not
retrieved documentsN total number of documents
in the system Relevant Non-relevantRetr
ieved A ? C Â ? CNot retrieved A ? C Â ? C
  • Contigency table

17
RETRIEVAL PERFORMANCE EVALUATION
  • Contingency table
  • N 100
  • A10, A 90
  • C15, C 85

Recall 5/10X100 50 , Precision 5/15X100
33
18
OTHER ALTERNATIVE MEASURES
  • Harmonic mean a single measure which combines R
    P
  • E measures - a single measure which combines R
    P, user specifies whether interested in R or P
  • User-oriented measures - based on a the users
    interpretation of which documents are relevant
    and which documents are not relevant
  • Expected search length
  • Satisfaction focuses only on relevant docs
  • Frustration focuses on non-relevant docs

19
REFERENCE COLLECTION
  • Experimentations on IR done on test collection
    eg. of test collection
  • 1. Yearly conference known as TREC Text
    Retrieval Conference
  • Dedicated to experimentation with a large test
    collection of over 1 million documents, testing
    is time consuming.
  • For each TREC conference, a set of reference
    experiments designed and research groups use
    these reference experiments to compare their IRS
    TREC NIST site http//trec.nist.gov

20
(No Transcript)
21
REFERENCE COLLECTION
  • Collection known as TIPSTER
  • TIPSTER/TREC test collection
  • Collection composed of
  • Documents
  • A set of example information requests ot topics
  • A set of relevant documents for each example
    information request

22
OTHER TEST COLLECTIONS
  • ADI documents on information science
  • CACM computer science
  • INSPEC abstracts on electronics, computer and
    physics
  • ISI library science
  • Medlars medical articles
  • developed by E.A. Fox for his PhD thesis at
    Cornell University, Ithaca, New York in 1883
    Extending the Boolean and vector space models of
    information retrieval with p-norm queries and
    multiple concept types http//www.ncstrl.org
Write a Comment
User Comments (0)
About PowerShow.com