RETRIEVAL EVALUATION - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

RETRIEVAL EVALUATION

Description:

Time lag average interval between the time the user query request is made and ... b. A query request was submitted and the following documents were retrieved and ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 23

Provided by: FSK6

Category:

more less

Transcript and Presenter's Notes

Title: RETRIEVAL EVALUATION

1
Chapter 2

RETRIEVAL EVALUATION

2
INTRODUCTION

Evaluation necessary
Why evaluate ?
What to evaluate?
How to evaluate?

3
WHY EVALUATE

Need to know the advantages and disadvantages of
using a particular IRS . The user should be be
able to decide whether he / she wants to use an
IRS based on evaluation results.
The user should also be able to decide whether it
is cost-effective to use a particular IRS based
on evaluation results.

4
WHAT TO EVALUATE

What can be measured and should reflect the
ability of the IRS to satisfy user needs.
Coverage of system to what extent the IRS
includes relevant material
Time lag average interval between the time the
user query request is made and the time taken to
obtain an answer set.
Form of presentation of output
Effort involved on part of user in getting
answers to his / her query request.
Recall of the IRS - of relevant materials
actually retrieved in the answer to a query
request.
Precision of the IRS - of retrieved material
that is actually relevant.

5
HOW TO EVALUATE?

Various methods available.

6
EVALUATION

2 main processes in IR
User query request/query request/ information
query/query retrieval strategy / search request
Answer set / Hits
Need to know whether the documents retrieved in
the answer set fulfills the user query request.
Evaluation process known as retrieval performance
evaluation.
Evaluation is based on 2 main components
Test reference collection
Evaluation measure.

7
EVALUATION

Test reference collection consists of
A collection of documents
A set of example information requests
A set of relevant documents (provided by
specialists) for each information request
2 interrelated measures RECALL and PRECISION

8
RETRIEVAL PERFORMANCE EVALUATION

Relevance
Recall and Precision
Parameters defined
I information request
R set of relevant documents
R number of documents in this set
A document answer set retrieved by the
information request
A number of documents in this set
Ra number of documents in the intersection of
sets R and A

9
RETRIEVAL PERFORMANCE EVALUATION

Recall fraction of the relevant documents (set
R) which have been retrieved
Ra
R -----
R

Precision fraction of the retrieved documents
(set A) which is relevant
Ra
P -----
A

10
Collection
Relevant docs in answer set Ra
Relevant docs R
Answer set A
Precision and Recall for a given example
information request
11
RETRIEVAL PERFORMANCE EVALUATION

Recall and precision are expressed as
Sorted by degree of relevance or ranking.
User will see a ranked list.

12
RETRIEVAL PERFORMANCE EVALUATION

a. 10 documents in an IRS with a collection of
100 documents has been identified by specialists
as being relevant to a particular query request
- d3, d5, d9, d25, d39, d44, d56, d71, d89, d123
b. A query request was submitted and the
following documents were retrieved and ranked
according to relevance.

13
RETRIEVAL PERFORMANCE EVALUATION

d123
d84
d56
d6
d8
d9
d511
d129
d187

d25
d38
d48
d250
d113
d3

14
RETRIEVAL PERFORMANCE EVALUATION

c. Only 5 documents retrieved (d123, d56, d9,
d25, d3) are relevant to the query and matches
the ones in (a).

d123 ranked 1st R1/10 x 100 10
P1/1 x 100 100
d56 ranked 3rd R2/10 x 100 20
P2/3 x100 66
d9 ranked 6th R3/10 x 100 30
P3/6 x 100 50
d25 ranked 10th R4/10 x 100 40
P4/10 x 100 40
d3 ranked 15th R5/10 x 100 50
P5/15 x 100 33

16
A relevant documentsÂ non-relevant
documentsC retrieved documentsC not
retrieved documentsN total number of documents
in the system Relevant Non-relevantRetr
ieved A ? C Â ? CNot retrieved A ? C Â ? C

Contigency table

17
RETRIEVAL PERFORMANCE EVALUATION

Contingency table
N 100
A10, A 90
C15, C 85

Recall 5/10X100 50 , Precision 5/15X100
33
18
OTHER ALTERNATIVE MEASURES

Harmonic mean a single measure which combines R
P
E measures - a single measure which combines R
P, user specifies whether interested in R or P
User-oriented measures - based on a the users
interpretation of which documents are relevant
and which documents are not relevant
Expected search length
Satisfaction focuses only on relevant docs
Frustration focuses on non-relevant docs

19
REFERENCE COLLECTION

Experimentations on IR done on test collection
eg. of test collection
1. Yearly conference known as TREC Text
Retrieval Conference
Dedicated to experimentation with a large test
collection of over 1 million documents, testing
is time consuming.
For each TREC conference, a set of reference
experiments designed and research groups use
these reference experiments to compare their IRS
TREC NIST site http//trec.nist.gov

20
(No Transcript)
21
REFERENCE COLLECTION

Collection known as TIPSTER
TIPSTER/TREC test collection
Collection composed of
Documents
A set of example information requests ot topics
A set of relevant documents for each example
information request

22
OTHER TEST COLLECTIONS

ADI documents on information science
CACM computer science
INSPEC abstracts on electronics, computer and
physics
ISI library science
Medlars medical articles
developed by E.A. Fox for his PhD thesis at
Cornell University, Ithaca, New York in 1883
Extending the Boolean and vector space models of
information retrieval with p-norm queries and
multiple concept types http//www.ncstrl.org