Title: RETRIEVAL EVALUATION
1Chapter 2
2INTRODUCTION
- Evaluation necessary
- Why evaluate ?
- What to evaluate?
- How to evaluate?
3WHY EVALUATE
- Need to know the advantages and disadvantages of
using a particular IRS . The user should be be
able to decide whether he / she wants to use an
IRS based on evaluation results. - The user should also be able to decide whether it
is cost-effective to use a particular IRS based
on evaluation results.
4WHAT TO EVALUATE
- What can be measured and should reflect the
ability of the IRS to satisfy user needs. - Coverage of system to what extent the IRS
includes relevant material - Time lag average interval between the time the
user query request is made and the time taken to
obtain an answer set. - Form of presentation of output
- Effort involved on part of user in getting
answers to his / her query request. - Recall of the IRS - of relevant materials
actually retrieved in the answer to a query
request. - Precision of the IRS - of retrieved material
that is actually relevant.
5HOW TO EVALUATE?
- Various methods available.
6EVALUATION
- 2 main processes in IR
- User query request/query request/ information
query/query retrieval strategy / search request - Answer set / Hits
- Need to know whether the documents retrieved in
the answer set fulfills the user query request. - Evaluation process known as retrieval performance
evaluation. - Evaluation is based on 2 main components
- Test reference collection
- Evaluation measure.
7EVALUATION
- Test reference collection consists of
- A collection of documents
- A set of example information requests
- A set of relevant documents (provided by
specialists) for each information request - 2 interrelated measures RECALL and PRECISION
8RETRIEVAL PERFORMANCE EVALUATION
- Relevance
- Recall and Precision
- Parameters defined
- I information request
- R set of relevant documents
- R number of documents in this set
- A document answer set retrieved by the
information request - A number of documents in this set
- Ra number of documents in the intersection of
sets R and A
9RETRIEVAL PERFORMANCE EVALUATION
- Recall fraction of the relevant documents (set
R) which have been retrieved - Ra
- R -----
- R
- Precision fraction of the retrieved documents
(set A) which is relevant - Ra
- P -----
- A
10Collection
Relevant docs in answer set Ra
Relevant docs R
Answer set A
Precision and Recall for a given example
information request
11RETRIEVAL PERFORMANCE EVALUATION
- Recall and precision are expressed as
- Sorted by degree of relevance or ranking.
- User will see a ranked list.
12RETRIEVAL PERFORMANCE EVALUATION
- a. 10 documents in an IRS with a collection of
100 documents has been identified by specialists
as being relevant to a particular query request
- d3, d5, d9, d25, d39, d44, d56, d71, d89, d123 - b. A query request was submitted and the
following documents were retrieved and ranked
according to relevance.
13RETRIEVAL PERFORMANCE EVALUATION
- d123
- d84
- d56
- d6
- d8
- d9
- d511
- d129
- d187
14RETRIEVAL PERFORMANCE EVALUATION
- c. Only 5 documents retrieved (d123, d56, d9,
d25, d3) are relevant to the query and matches
the ones in (a).
15- d123 ranked 1st R1/10 x 100 10
- P1/1 x 100 100
- d56 ranked 3rd R2/10 x 100 20
- P2/3 x100 66
- Â
- d9 ranked 6th R3/10 x 100 30
- P3/6 x 100 50
- Â
- d25 ranked 10th R4/10 x 100 40
- P4/10 x 100 40
- Â
- d3 ranked 15th R5/10 x 100 50
- P5/15 x 100 33
16A relevant documents non-relevant
documentsC retrieved documentsC not
retrieved documentsN total number of documents
in the system Relevant Non-relevantRetr
ieved A ? C Â ? CNot retrieved A ? C Â ? C
17RETRIEVAL PERFORMANCE EVALUATION
- Contingency table
- N 100
- A10, A 90
- C15, C 85
Recall 5/10X100 50 , Precision 5/15X100
33
18OTHER ALTERNATIVE MEASURES
- Harmonic mean a single measure which combines R
P - E measures - a single measure which combines R
P, user specifies whether interested in R or P - User-oriented measures - based on a the users
interpretation of which documents are relevant
and which documents are not relevant - Expected search length
- Satisfaction focuses only on relevant docs
- Frustration focuses on non-relevant docs
19REFERENCE COLLECTION
- Experimentations on IR done on test collection
eg. of test collection - 1. Yearly conference known as TREC Text
Retrieval Conference - Dedicated to experimentation with a large test
collection of over 1 million documents, testing
is time consuming. - For each TREC conference, a set of reference
experiments designed and research groups use
these reference experiments to compare their IRS
TREC NIST site http//trec.nist.gov
20(No Transcript)
21REFERENCE COLLECTION
- Collection known as TIPSTER
- TIPSTER/TREC test collection
- Collection composed of
- Documents
- A set of example information requests ot topics
- A set of relevant documents for each example
information request
22OTHER TEST COLLECTIONS
- ADI documents on information science
- CACM computer science
- INSPEC abstracts on electronics, computer and
physics - ISI library science
- Medlars medical articles
- developed by E.A. Fox for his PhD thesis at
Cornell University, Ithaca, New York in 1883
Extending the Boolean and vector space models of
information retrieval with p-norm queries and
multiple concept types http//www.ncstrl.org