Title: 2ID10: Information Retrieval Lecture 2: IR Evaluation
12ID10 Information RetrievalLecture 2 IR
Evaluation Queries
2Lecture 1 Summary
Compare the information need with the information
generate a ranking which reflects relevance
Information Need
Ranked list of documents
IR System
feedback
3Lecture 1 Summary
- IR Classic Models
- Document Representation
- Query representation
- Indexing
- Weighting Similarity
- TF-IDF
4Lecture 2 Overview
- Types of evaluation
- Relevance and test collections
- Effectiveness measures
- Recall and Precision
- Significance tests
- Query languages
5Types of IR Evaluation
- Assistance in formulating queries
- Speed of retrieval
- Resources required
- Presentation of documents
- Ability to find relevant documents
- Appealing to users (market evaluation)
- Evaluation generally comparative
- System A vs. B or A vs A
- Cost-benefit analysis possible
- Most common evaluation retrieval effectiveness
6IR Evaluation
- Functional analysis
- Test system each functionality (includes error
analysis) - Performance analysis
- Response Time Space Required (balance/
tradeoffs) - short response time ? smaller space used ? better
system - Performance evaluation
- Performance of indexing structures, OS
interactions, delays - Retrieval performance evaluation
- How precise is the answer set
- On a given retrieval strategy S ? similarity
between retrieved docs expert docs ? goodness
of S
7IR Evaluation
- Effectiveness
- the ability of IR system to retrieve relevant
documents and suppress non-relevant documents - related to relevancy of retrieved items
- Relevancy
- typically not binary
- Subjective Depends upon a specific users
judgment - Situational Relates to users current needs
- Cognitive Depends on human perception and
behavior - Dynamic Changes over time
8Relevancy
- Relevant (not relevant) according to User
- Relevant (not relevant) according to System
- Four main situations
- User Relevant System Not Relevant
- User Not Relevant System Relevant
- User Not Relevant System Not Relevant
- User Relevant System Relevant
9Relevancy Aspects
- Logical relevancy
- Bosch (trade mark) vs. Den Bosch
- Usability
- Date and origin of the document
- Format of the document
- Other users
10Test collection
- Real collections
- never know full set of relevant documents
- Compare retrieval performance with a Test
collection - set of documents
- set of queries
- set of relevance judgments (which docs relevant
to each query)
11Test Collections
- To compare the performance of two techniques
- each technique used to evaluate test queries
- results (set or ranked list) compared using some
performance measure - most common measures - precision and recall
- Usually - use multiple measures to get different
views of performance - Usually - test with multiple collections -
performance is collection dependent
12Sample Test Collection
13Test collection creation
- Manual method
- Every document judged against every query by
experts - Pooling method
- Queries run against several IR systems first
- Results pooled, top proportion chosen for judging
- Only top documents are judged
14Text REtrieval Conference (TREC)
- Established in 1992 to evaluate large-scale IR
- Retrieving documents from a gigabyte collection
- Run by NISTs Information Access Division
- Initially sponsored by DARPA as part of Tipster
program - Now supported by many, including DARPA, ARDA, and
NIST - Most well known IR evaluation setting
- Proceedings available at http//trec.nist.gov
15Text REtrieval Conference (TREC)
- Consists of IR research tracks
- Ad-hoc retrieval, routing, cross-language,
scanned documents, speech recognition, query,
video, filtering, Spanish, question answering,
novelty, Chinese, high precision, interactive,
Web, database merging, NLP, - Each track works on roughly the same model
- NIST carries out evaluation
- How well your site did
- How others tackled the problem
- Successful approaches generally adopted in next
cycle
16Lecture 2 Overview
- Types of evaluation
- Relevance and test collections
- Effectiveness measures
- Recall and Precision
- Significance tests
- Query languages
17Precision Recall
Purpose of all IRS is to retrieve relevant
information
18Query Match
- Match retrieved document satisfying (relevant
to) the information need - character strings in descriptor and query
keywords match - Miss not retrieved document satisfying
(relevant to) the information need - character strings in descriptor and query
keywords do not match (semantically similar) - False match retrieved document which satisfies
the query but is not relevant to the information
need - character strings in descriptor and query
keywords match but are semantically different
19Retrieval Evaluation Setting
- Q - query
- R set of relevant documents
- R - number of relevant documents
- S(Q) ? A answer set
- A - number of answer set documents
- Ra relevant documents in answer set
- Ra - number of docs in R ? A
Relevant Documents in Answer Set
Ra
Relevant Docs R
Answer Set A
20Precision
- Fraction of the retrieved documents (A), which
are relevant - high precision
- when there are relatively few False Matches
- can be determined exactly
Precision
(System User Yes)
Precision
(User No System Yes) (System User Yes)
Relevant documents retrieved
Precision
All documents retrieved
21Recall
- Fraction of the relevant documents (R), which are
retrieved - high recall
- when there are relatively few Misses
- cannot be determined exactly - requires
knowledge of all relevant documents in a
collection
Recall
(System User Yes)
Recall
(User Yes System No) (System User Yes)
Relevant documents retrieved
Recall
All relevant documents
22Determining Recall is Difficult
- Total number of relevant items is sometimes not
available - Sample across the database and perform relevance
judgment on these items - Apply different retrieval algorithms to the same
database for the same query. The aggregate of
relevant items is taken as the total relevant set
23Trade-off between Recall Precision
We aim to obtain the highest for both
- IR trying to increase the number of relevant
docs will also retrieve increasing numbers of
non-relevant - efforts to increase one measure tend to decrease
the other
24Computing Recall/Precision Points
- For a given query
- produce the ranked list of retrievals
- Adjust a threshold on this ranked list
- produces different sets of retrieved documents
- and therefore different recall/precision measures
- Mark each document in the ranked list that is
relevant - Compute a recall/precision pair for each position
- in the ranked list that contains a relevant
document
25Computing Example
Let total of relevant docs 6 Check each new
recall point
R1/60.167 P1/11
R2/60.333 P2/21
R3/60.5 P3/40.75
R4/60.667 P4/60.667
Missing one relevant document. Never reach 100
recall
R5/60.833 p5/130.38
26Example
- http//www.googlewhack.com/
- find that elusive query (two words - no quote
marks) with a single, solitary result! - http//www.webology.ir/2005/v2n2/a12.html
- comparison of precision and recall in Search
Engines
27Low Recall Solutions
- Words exist in several forms
- e.g. limit, limits, limited, limitation
- Stemming to increase recall
- Suffix removal allows word variants to match
- e.g. word roots often precede modifiers
- Boolean systems often allow manual truncation
- Stemming does automatic truncation
28Low Recall Solutions
- Synonymy
- Many words with similar meanings
- Synonym(w1, w2) ? ?m w1Meansm ? w2Meansm
- Recall increased by
- Thesaurus-based query expansion
- Latent semantic indexing
- Polysemy
- One word has dissimilar meanings
- PolySem(w) ? ?m1?m2 wMeansm1 ? wMeansm2
- Recall increased by word sense disambiguation
- Indexing word meanings rather than words
- Context provides clues to word meaning
29Query Languages (QL)
- Which queries can be formulated
- Dependent on the underlying IR model
- Use
- content (semantics)
- content structure (text syntax)
- to find relevant documents
- Query enhancement techniques
- e.g. synonyms, thesauri, stemming, etc.
- Query
- Formulation of the users info need
- Words or combination of words operations
30Keyword-based Querying
- Keywords
- contained in documents
- Retrieval Unit
- retrieved document
- contains the answer to the query
- Intuitive
- Easy to express
- Allow for fast ranking
- Basic queries (single multiple words)
31Single-word queries
- Text documents ? search for the keywords
- Set of docs ranked according to the degree of
similarity to the query - Ranking
- word occurrences inside the text
- term frequency - counts the number of times a
word appears inside a document
32Context queries
- Complement single-word queries with search for
context word, which are near to other words - Phrase context query
- Sequence of single-word queries
- Proximity context query
- More relaxed version of phrase query
- Sequence of single-word queries with a max
allowed distance between them - Distance in characters or words
33Examples Context Queries
- Phrase
- information retrieval
- information about retrieval
- information with respect to the retrieval
- Distance
- 1
- 4
- Ranking similar to single-word queries
34Boolean Queries
- Oldest form of keyword query
- words operators
- atoms (basic queries) Boolean operators
- A or B, A and B, A not B
- Query syntax tree
AND
OR
white
paper
chocolate
35Boolean Query Mechanics
- Basic Query
- Find X ? return all documents containing term X
- X Single words or phrases
- Simple text or string matching
- Complex Query
- boolean connectors and, or, not
36Boolean IR
- Boolean operators approximate natural language
- e.g. find documents about a colour printers that
are not made by Hewlett-Packard - AND can denote relationships between concepts
- e.g. colour AND printer
- OR can denote alternate terminology
- e.g. colour AND (printer OR laser-printer)
- NOT can exclude alternate meanings
- e.g. colour AND (printer OR laser-printer) NOT
(Hewlett-Packard OR HP)
37Google Search
- Google basic search
- http//www.google.com/help/basics.html
- Google advanced search
- http//www.google.com/help/refinesearch.html
38Natural Language Queries
- Enumeration of words context queries
- All docs matching a portion of the query are
retrieved - Higher ranking to all docs matching more parts of
query - Negation - user determines words to be eliminated
? lower ranking - Threshold for too low ranked docs
- Boolean queries a simplified version of NL
queries - Vector of term weights (doc query)
39Natural Language Queries
40(No Transcript)
41(No Transcript)
42Pattern Matching
- More specific query formulation
- Based on concept of pattern
- a set of syntactic features that occur in a text
segment - segments that fulfils the pattern specifications
pattern match - Retrieve pieces of text that have some property
- Useful for linguistics, text statistics, data
extraction - Pattern types
- Words, prefixes, suffixes, substrings, ranges,
errors, regular expressions, extended patterns
43Examples Pattern Matching
- Words string sequence of chars
- Prefixes program? programmer
- Suffixes er ? computer, monster, poster
- Substrings tal ? coastal, talk, matallic
- any flow ? will match many flowers
- Ranges a pair of strings which matches any word
lying between them in lexicographical order eg.
range between words held and hold will retrieve
strings such as hoax, hissing, helm, help, etc.
(lexicographical order)
44Examples Pattern Matching
- Allowing errors
- word together with an error threshold
- retrieves all text words similar to a given word
- errors are caused by typing, spelling, etc.
- most accepted model is the Levenshtein distance
or edit distance
45Examples Pattern Matching
- Regular expression general pattern build up by
simple strings operators (?, ?, ?) - pro (blemtein) (se) (012)
- will match words like
- problem02
- proteins
- Extended patterns
- subset of the regular expressions
- conditional expressions (part of the pattern may
not appear always - wild characters matching any sequence in the text
46Example
- distance between
- COLOR and COLOUR is 1
- SURVEY and SURGERY is 2
- in the query, must be specified the maximum
number of allowed errors for a word to match the
pattern
47Structural Queries
- Based on structure of the text
- Structure in text usually very restrictive
- Languages to represent structured documents
(HTML) - 3 structures
- Fixed (form-like)
- Hypertext
- Hierarchical
- Current query languages integrate both contents
and structural queries
48Fixed Structure
- Docs have fixed set of fields
- Some fields are not present in all docs
- No nesting or overlap between fields is allowed
- Each model refers to a concrete structure of a
collection
49Hypertext
- Max freedom with respect to structuring power
- Directed graph where the nodes hold some text and
the links represent connection between nodes or
positions outside of nodes - User manually traverses the hypertext nodes
following links to search - http//xanadu.com/zigzag/
50Hierarchical Structure
- Intermediate model
- between fixed and hypertext
- Recursive decomposition of text
- typical for many text collections
- Simplification from hypertext to a hierarchy
- allows for faster algorithms to solve queries
- The more powerful the mode the less efficiency
implemented - Example
- retrieve a figure on a page with structure
- Title car
- Introduction blue
in
with
figure
with
section
introduction
title
51Query Languages Trends
- Query languages retrieving info from text DBs
52(No Transcript)
53Querying Languages Compared
- Boolean
- More user control
- Sharper cut
- Typically start with 1 or 2 terms and either
- refine search or,
- broaden search
- Nearness
- Simpler
- More diffuse view
- Typically start with many terms
- Works best in collection of similar documents
(topic-based collection)
54Problem for IR Query Languages
- Vocabulary mismatch
- Specific genres and fields are associated with
certain words and grammars. This requires
knowledge - Humans have
- Computers do not have
- Information need is often described using
different words than are found in relevant
documents - Vocabulary is not meaning
- Computers match words (character strings) not
meanings - Retrieval performance (relevance) is judged
according to meaning
55(No Transcript)
56(No Transcript)
57http//www.kanoodle.com/results.html?queryinforma
tionretrieval23.x023.y0
58http//www.overture.com/d/search/?typehomemktus
langen_USKeywordsinformationretrieval
59- WiseGuide automatically generates categories
semantically related to the words in your query. - For very general queries
- For words with multiple meanings
http//www.wisenut.com/search/query.dll?qinformat
ionretrieval
60(No Transcript)
61(No Transcript)
62(No Transcript)