2ID10: Information Retrieval Lecture 2: IR Evaluation

About This Presentation

Title:

2ID10: Information Retrieval Lecture 2: IR Evaluation

Description:

set of documents. set of queries. set of relevance judgments (which docs relevant to each query) ... search to find all of the relevant documents in the corpus ... – PowerPoint PPT presentation

Number of Views:271

Avg rating:3.0/5.0

Slides: 63

Provided by: loraa

Category:

more less

Transcript and Presenter's Notes

Title: 2ID10: Information Retrieval Lecture 2: IR Evaluation

1
2ID10 Information RetrievalLecture 2 IR
Evaluation Queries

Lora Aroyo
4 April 2006

2
Lecture 1 Summary
Compare the information need with the information
generate a ranking which reflects relevance
Information Need
Ranked list of documents
IR System
feedback
3
Lecture 1 Summary

IR Classic Models
Document Representation
Query representation
Indexing
Weighting Similarity
TF-IDF

4
Lecture 2 Overview

Types of evaluation
Relevance and test collections
Effectiveness measures
Recall and Precision
Significance tests
Query languages

5
Types of IR Evaluation

Assistance in formulating queries
Speed of retrieval
Resources required
Presentation of documents
Ability to find relevant documents
Appealing to users (market evaluation)
Evaluation generally comparative
System A vs. B or A vs A
Cost-benefit analysis possible
Most common evaluation retrieval effectiveness

6
IR Evaluation

Functional analysis
Test system each functionality (includes error
analysis)
Performance analysis
Response Time Space Required (balance/
tradeoffs)
short response time ? smaller space used ? better
system
Performance evaluation
Performance of indexing structures, OS
interactions, delays
Retrieval performance evaluation
How precise is the answer set
On a given retrieval strategy S ? similarity
between retrieved docs expert docs ? goodness
of S

7
IR Evaluation

Effectiveness
the ability of IR system to retrieve relevant
documents and suppress non-relevant documents
related to relevancy of retrieved items
Relevancy
typically not binary
Subjective Depends upon a specific users
judgment
Situational Relates to users current needs
Cognitive Depends on human perception and
behavior
Dynamic Changes over time

8
Relevancy

Relevant (not relevant) according to User
Relevant (not relevant) according to System
Four main situations
User Relevant System Not Relevant
User Not Relevant System Relevant
User Not Relevant System Not Relevant
User Relevant System Relevant

9
Relevancy Aspects

Logical relevancy
Bosch (trade mark) vs. Den Bosch
Usability
Date and origin of the document
Format of the document
Other users

10
Test collection

Real collections
never know full set of relevant documents
Compare retrieval performance with a Test
collection
set of documents
set of queries
set of relevance judgments (which docs relevant
to each query)

11
Test Collections

To compare the performance of two techniques
each technique used to evaluate test queries
results (set or ranked list) compared using some
performance measure
most common measures - precision and recall
Usually - use multiple measures to get different
views of performance
Usually - test with multiple collections -
performance is collection dependent

12
Sample Test Collection
13
Test collection creation

Manual method
Every document judged against every query by
experts
Pooling method
Queries run against several IR systems first
Results pooled, top proportion chosen for judging
Only top documents are judged

14
Text REtrieval Conference (TREC)

Established in 1992 to evaluate large-scale IR
Retrieving documents from a gigabyte collection
Run by NISTs Information Access Division
Initially sponsored by DARPA as part of Tipster
program
Now supported by many, including DARPA, ARDA, and
NIST
Most well known IR evaluation setting
Proceedings available at http//trec.nist.gov

15
Text REtrieval Conference (TREC)

Consists of IR research tracks
Ad-hoc retrieval, routing, cross-language,
scanned documents, speech recognition, query,
video, filtering, Spanish, question answering,
novelty, Chinese, high precision, interactive,
Web, database merging, NLP,
Each track works on roughly the same model
NIST carries out evaluation
How well your site did
How others tackled the problem
Successful approaches generally adopted in next
cycle

16
Lecture 2 Overview

Types of evaluation
Relevance and test collections
Effectiveness measures
Recall and Precision
Significance tests
Query languages

17
Precision Recall
Purpose of all IRS is to retrieve relevant
information
18
Query Match

Match retrieved document satisfying (relevant
to) the information need
character strings in descriptor and query
keywords match
Miss not retrieved document satisfying
(relevant to) the information need
character strings in descriptor and query
keywords do not match (semantically similar)
False match retrieved document which satisfies
the query but is not relevant to the information
need
character strings in descriptor and query
keywords match but are semantically different

19
Retrieval Evaluation Setting

Q - query
R set of relevant documents
R - number of relevant documents
S(Q) ? A answer set
A - number of answer set documents
Ra relevant documents in answer set
Ra - number of docs in R ? A

Relevant Documents in Answer Set
Ra
Relevant Docs R
Answer Set A
20
Precision

Fraction of the retrieved documents (A), which
are relevant
high precision
when there are relatively few False Matches
can be determined exactly

Precision
(System User Yes)
Precision
(User No System Yes) (System User Yes)
Relevant documents retrieved
Precision
All documents retrieved
21
Recall

Fraction of the relevant documents (R), which are
retrieved
high recall
when there are relatively few Misses
cannot be determined exactly - requires
knowledge of all relevant documents in a
collection

Recall
(System User Yes)
Recall
(User Yes System No) (System User Yes)
Relevant documents retrieved
Recall
All relevant documents
22
Determining Recall is Difficult

Total number of relevant items is sometimes not
available
Sample across the database and perform relevance
judgment on these items
Apply different retrieval algorithms to the same
database for the same query. The aggregate of
relevant items is taken as the total relevant set

23
Trade-off between Recall Precision
We aim to obtain the highest for both

IR trying to increase the number of relevant
docs will also retrieve increasing numbers of
non-relevant
efforts to increase one measure tend to decrease
the other

24
Computing Recall/Precision Points

For a given query
produce the ranked list of retrievals
Adjust a threshold on this ranked list
produces different sets of retrieved documents
and therefore different recall/precision measures
Mark each document in the ranked list that is
relevant
Compute a recall/precision pair for each position
in the ranked list that contains a relevant
document

25
Computing Example
Let total of relevant docs 6 Check each new
recall point
R1/60.167 P1/11
R2/60.333 P2/21
R3/60.5 P3/40.75
R4/60.667 P4/60.667
Missing one relevant document. Never reach 100
recall
R5/60.833 p5/130.38
26
Example

http//www.googlewhack.com/
find that elusive query (two words - no quote
marks) with a single, solitary result!
http//www.webology.ir/2005/v2n2/a12.html
comparison of precision and recall in Search
Engines

27
Low Recall Solutions

Words exist in several forms
e.g. limit, limits, limited, limitation
Stemming to increase recall
Suffix removal allows word variants to match
e.g. word roots often precede modifiers
Boolean systems often allow manual truncation
Stemming does automatic truncation

28
Low Recall Solutions

Synonymy
Many words with similar meanings
Synonym(w1, w2) ? ?m w1Meansm ? w2Meansm
Recall increased by
Thesaurus-based query expansion
Latent semantic indexing
Polysemy
One word has dissimilar meanings
PolySem(w) ? ?m1?m2 wMeansm1 ? wMeansm2
Recall increased by word sense disambiguation
Indexing word meanings rather than words
Context provides clues to word meaning

29
Query Languages (QL)

Which queries can be formulated
Dependent on the underlying IR model
Use
content (semantics)
content structure (text syntax)
to find relevant documents
Query enhancement techniques
e.g. synonyms, thesauri, stemming, etc.
Query
Formulation of the users info need
Words or combination of words operations

30
Keyword-based Querying

Keywords
contained in documents
Retrieval Unit
retrieved document
contains the answer to the query
Intuitive
Easy to express
Allow for fast ranking
Basic queries (single multiple words)

31
Single-word queries

Text documents ? search for the keywords
Set of docs ranked according to the degree of
similarity to the query
Ranking
word occurrences inside the text
term frequency - counts the number of times a
word appears inside a document

32
Context queries

Complement single-word queries with search for
context word, which are near to other words
Phrase context query
Sequence of single-word queries
Proximity context query
More relaxed version of phrase query
Sequence of single-word queries with a max
allowed distance between them
Distance in characters or words

33
Examples Context Queries

Phrase
information retrieval
information about retrieval
information with respect to the retrieval
Distance
1
4
Ranking similar to single-word queries

34
Boolean Queries

Oldest form of keyword query
words operators
atoms (basic queries) Boolean operators
A or B, A and B, A not B
Query syntax tree

AND
OR
white
paper
chocolate
35
Boolean Query Mechanics

Basic Query
Find X ? return all documents containing term X
X Single words or phrases
Simple text or string matching
Complex Query
boolean connectors and, or, not

36
Boolean IR

Boolean operators approximate natural language
e.g. find documents about a colour printers that
are not made by Hewlett-Packard
AND can denote relationships between concepts
e.g. colour AND printer
OR can denote alternate terminology
e.g. colour AND (printer OR laser-printer)
NOT can exclude alternate meanings
e.g. colour AND (printer OR laser-printer) NOT
(Hewlett-Packard OR HP)

37
Google Search

Google basic search
http//www.google.com/help/basics.html

Google advanced search
http//www.google.com/help/refinesearch.html

38
Natural Language Queries

Enumeration of words context queries
All docs matching a portion of the query are
retrieved
Higher ranking to all docs matching more parts of
query
Negation - user determines words to be eliminated
? lower ranking
Threshold for too low ranked docs
Boolean queries a simplified version of NL
queries
Vector of term weights (doc query)

39
Natural Language Queries
40
(No Transcript)
41
(No Transcript)
42
Pattern Matching

More specific query formulation
Based on concept of pattern
a set of syntactic features that occur in a text
segment
segments that fulfils the pattern specifications
pattern match
Retrieve pieces of text that have some property
Useful for linguistics, text statistics, data
extraction
Pattern types
Words, prefixes, suffixes, substrings, ranges,
errors, regular expressions, extended patterns

43
Examples Pattern Matching

Words string sequence of chars
Prefixes program? programmer
Suffixes er ? computer, monster, poster
Substrings tal ? coastal, talk, matallic
any flow ? will match many flowers
Ranges a pair of strings which matches any word
lying between them in lexicographical order eg.
range between words held and hold will retrieve
strings such as hoax, hissing, helm, help, etc.
(lexicographical order)

44
Examples Pattern Matching

Allowing errors
word together with an error threshold
retrieves all text words similar to a given word
errors are caused by typing, spelling, etc.
most accepted model is the Levenshtein distance
or edit distance

45
Examples Pattern Matching

Regular expression general pattern build up by
simple strings operators (?, ?, ?)
pro (blemtein) (se) (012)
will match words like
problem02
proteins
Extended patterns
subset of the regular expressions
conditional expressions (part of the pattern may
not appear always
wild characters matching any sequence in the text

46
Example

distance between
COLOR and COLOUR is 1
SURVEY and SURGERY is 2
in the query, must be specified the maximum
number of allowed errors for a word to match the
pattern

47
Structural Queries

Based on structure of the text
Structure in text usually very restrictive
Languages to represent structured documents
(HTML)
3 structures
Fixed (form-like)
Hypertext
Hierarchical
Current query languages integrate both contents
and structural queries

48
Fixed Structure

Docs have fixed set of fields
Some fields are not present in all docs
No nesting or overlap between fields is allowed
Each model refers to a concrete structure of a
collection

49
Hypertext

Max freedom with respect to structuring power
Directed graph where the nodes hold some text and
the links represent connection between nodes or
positions outside of nodes
User manually traverses the hypertext nodes
following links to search
http//xanadu.com/zigzag/

50
Hierarchical Structure

Intermediate model
between fixed and hypertext
Recursive decomposition of text
typical for many text collections
Simplification from hypertext to a hierarchy
allows for faster algorithms to solve queries
The more powerful the mode the less efficiency
implemented
Example
retrieve a figure on a page with structure
Title car
Introduction blue

in
with
figure
with
section
introduction
title
51
Query Languages Trends

Query languages retrieving info from text DBs

52
(No Transcript)
53
Querying Languages Compared

Boolean
More user control
Sharper cut
Typically start with 1 or 2 terms and either
refine search or,
broaden search
Nearness
Simpler
More diffuse view
Typically start with many terms
Works best in collection of similar documents
(topic-based collection)

54
Problem for IR Query Languages

Vocabulary mismatch
Specific genres and fields are associated with
certain words and grammars. This requires
knowledge
Humans have
Computers do not have
Information need is often described using
different words than are found in relevant
documents
Vocabulary is not meaning
Computers match words (character strings) not
meanings
Retrieval performance (relevance) is judged
according to meaning

55
(No Transcript)
56
(No Transcript)
57
http//www.kanoodle.com/results.html?queryinforma
tionretrieval23.x023.y0
58
http//www.overture.com/d/search/?typehomemktus
langen_USKeywordsinformationretrieval
59

WiseGuide automatically generates categories
semantically related to the words in your query.
For very general queries
For words with multiple meanings

http//www.wisenut.com/search/query.dll?qinformat
ionretrieval
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

2ID10: Information Retrieval Lecture 2: IR Evaluation - PowerPoint PPT Presentation

2ID10: Information Retrieval Lecture 2: IR Evaluation

set of documents. set of queries. set of relevance judgments (which docs relevant to each query) ... search to find all of the relevant documents in the corpus ... – PowerPoint PPT presentation