2ID10: Information Retrieval Lecture 2: IR Evaluation - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

2ID10: Information Retrieval Lecture 2: IR Evaluation

Description:

set of documents. set of queries. set of relevance judgments (which docs relevant to each query) ... search to find all of the relevant documents in the corpus ... – PowerPoint PPT presentation

Number of Views:271
Avg rating:3.0/5.0
Slides: 63
Provided by: loraa
Category:

less

Transcript and Presenter's Notes

Title: 2ID10: Information Retrieval Lecture 2: IR Evaluation


1
2ID10 Information RetrievalLecture 2 IR
Evaluation Queries
  • Lora Aroyo
  • 4 April 2006

2
Lecture 1 Summary
Compare the information need with the information
generate a ranking which reflects relevance
Information Need
Ranked list of documents
IR System
feedback
3
Lecture 1 Summary
  • IR Classic Models
  • Document Representation
  • Query representation
  • Indexing
  • Weighting Similarity
  • TF-IDF

4
Lecture 2 Overview
  • Types of evaluation
  • Relevance and test collections
  • Effectiveness measures
  • Recall and Precision
  • Significance tests
  • Query languages

5
Types of IR Evaluation
  • Assistance in formulating queries
  • Speed of retrieval
  • Resources required
  • Presentation of documents
  • Ability to find relevant documents
  • Appealing to users (market evaluation)
  • Evaluation generally comparative
  • System A vs. B or A vs A
  • Cost-benefit analysis possible
  • Most common evaluation retrieval effectiveness

6
IR Evaluation
  • Functional analysis
  • Test system each functionality (includes error
    analysis)
  • Performance analysis
  • Response Time Space Required (balance/
    tradeoffs)
  • short response time ? smaller space used ? better
    system
  • Performance evaluation
  • Performance of indexing structures, OS
    interactions, delays
  • Retrieval performance evaluation
  • How precise is the answer set
  • On a given retrieval strategy S ? similarity
    between retrieved docs expert docs ? goodness
    of S

7
IR Evaluation
  • Effectiveness
  • the ability of IR system to retrieve relevant
    documents and suppress non-relevant documents
  • related to relevancy of retrieved items
  • Relevancy
  • typically not binary
  • Subjective Depends upon a specific users
    judgment
  • Situational Relates to users current needs
  • Cognitive Depends on human perception and
    behavior
  • Dynamic Changes over time

8
Relevancy
  • Relevant (not relevant) according to User
  • Relevant (not relevant) according to System
  • Four main situations
  • User Relevant System Not Relevant
  • User Not Relevant System Relevant
  • User Not Relevant System Not Relevant
  • User Relevant System Relevant

9
Relevancy Aspects
  • Logical relevancy
  • Bosch (trade mark) vs. Den Bosch
  • Usability
  • Date and origin of the document
  • Format of the document
  • Other users

10
Test collection
  • Real collections
  • never know full set of relevant documents
  • Compare retrieval performance with a Test
    collection
  • set of documents
  • set of queries
  • set of relevance judgments (which docs relevant
    to each query)

11
Test Collections
  • To compare the performance of two techniques
  • each technique used to evaluate test queries
  • results (set or ranked list) compared using some
    performance measure
  • most common measures - precision and recall
  • Usually - use multiple measures to get different
    views of performance
  • Usually - test with multiple collections -
    performance is collection dependent

12
Sample Test Collection
13
Test collection creation
  • Manual method
  • Every document judged against every query by
    experts
  • Pooling method
  • Queries run against several IR systems first
  • Results pooled, top proportion chosen for judging
  • Only top documents are judged

14
Text REtrieval Conference (TREC)
  • Established in 1992 to evaluate large-scale IR
  • Retrieving documents from a gigabyte collection
  • Run by NISTs Information Access Division
  • Initially sponsored by DARPA as part of Tipster
    program
  • Now supported by many, including DARPA, ARDA, and
    NIST
  • Most well known IR evaluation setting
  • Proceedings available at http//trec.nist.gov

15
Text REtrieval Conference (TREC)
  • Consists of IR research tracks
  • Ad-hoc retrieval, routing, cross-language,
    scanned documents, speech recognition, query,
    video, filtering, Spanish, question answering,
    novelty, Chinese, high precision, interactive,
    Web, database merging, NLP,
  • Each track works on roughly the same model
  • NIST carries out evaluation
  • How well your site did
  • How others tackled the problem
  • Successful approaches generally adopted in next
    cycle

16
Lecture 2 Overview
  • Types of evaluation
  • Relevance and test collections
  • Effectiveness measures
  • Recall and Precision
  • Significance tests
  • Query languages

17
Precision Recall
Purpose of all IRS is to retrieve relevant
information
18
Query Match
  • Match retrieved document satisfying (relevant
    to) the information need
  • character strings in descriptor and query
    keywords match
  • Miss not retrieved document satisfying
    (relevant to) the information need
  • character strings in descriptor and query
    keywords do not match (semantically similar)
  • False match retrieved document which satisfies
    the query but is not relevant to the information
    need
  • character strings in descriptor and query
    keywords match but are semantically different

19
Retrieval Evaluation Setting
  • Q - query
  • R set of relevant documents
  • R - number of relevant documents
  • S(Q) ? A answer set
  • A - number of answer set documents
  • Ra relevant documents in answer set
  • Ra - number of docs in R ? A

Relevant Documents in Answer Set
Ra
Relevant Docs R
Answer Set A
20
Precision
  • Fraction of the retrieved documents (A), which
    are relevant
  • high precision
  • when there are relatively few False Matches
  • can be determined exactly

Precision
(System User Yes)
Precision
(User No System Yes) (System User Yes)
Relevant documents retrieved
Precision
All documents retrieved
21
Recall
  • Fraction of the relevant documents (R), which are
    retrieved
  • high recall
  • when there are relatively few Misses
  • cannot be determined exactly - requires
    knowledge of all relevant documents in a
    collection

Recall
(System User Yes)
Recall
(User Yes System No) (System User Yes)
Relevant documents retrieved
Recall
All relevant documents
22
Determining Recall is Difficult
  • Total number of relevant items is sometimes not
    available
  • Sample across the database and perform relevance
    judgment on these items
  • Apply different retrieval algorithms to the same
    database for the same query. The aggregate of
    relevant items is taken as the total relevant set

23
Trade-off between Recall Precision
We aim to obtain the highest for both
  • IR trying to increase the number of relevant
    docs will also retrieve increasing numbers of
    non-relevant
  • efforts to increase one measure tend to decrease
    the other

24
Computing Recall/Precision Points
  • For a given query
  • produce the ranked list of retrievals
  • Adjust a threshold on this ranked list
  • produces different sets of retrieved documents
  • and therefore different recall/precision measures
  • Mark each document in the ranked list that is
    relevant
  • Compute a recall/precision pair for each position
  • in the ranked list that contains a relevant
    document

25
Computing Example
Let total of relevant docs 6 Check each new
recall point
R1/60.167 P1/11
R2/60.333 P2/21
R3/60.5 P3/40.75
R4/60.667 P4/60.667
Missing one relevant document. Never reach 100
recall
R5/60.833 p5/130.38
26
Example
  • http//www.googlewhack.com/
  • find that elusive query (two words - no quote
    marks) with a single, solitary result!
  • http//www.webology.ir/2005/v2n2/a12.html
  • comparison of precision and recall in Search
    Engines

27
Low Recall Solutions
  • Words exist in several forms
  • e.g. limit, limits, limited, limitation
  • Stemming to increase recall
  • Suffix removal allows word variants to match
  • e.g. word roots often precede modifiers
  • Boolean systems often allow manual truncation
  • Stemming does automatic truncation

28
Low Recall Solutions
  • Synonymy
  • Many words with similar meanings
  • Synonym(w1, w2) ? ?m w1Meansm ? w2Meansm
  • Recall increased by
  • Thesaurus-based query expansion
  • Latent semantic indexing
  • Polysemy
  • One word has dissimilar meanings
  • PolySem(w) ? ?m1?m2 wMeansm1 ? wMeansm2
  • Recall increased by word sense disambiguation
  • Indexing word meanings rather than words
  • Context provides clues to word meaning

29
Query Languages (QL)
  • Which queries can be formulated
  • Dependent on the underlying IR model
  • Use
  • content (semantics)
  • content structure (text syntax)
  • to find relevant documents
  • Query enhancement techniques
  • e.g. synonyms, thesauri, stemming, etc.
  • Query
  • Formulation of the users info need
  • Words or combination of words operations

30
Keyword-based Querying
  • Keywords
  • contained in documents
  • Retrieval Unit
  • retrieved document
  • contains the answer to the query
  • Intuitive
  • Easy to express
  • Allow for fast ranking
  • Basic queries (single multiple words)

31
Single-word queries
  • Text documents ? search for the keywords
  • Set of docs ranked according to the degree of
    similarity to the query
  • Ranking
  • word occurrences inside the text
  • term frequency - counts the number of times a
    word appears inside a document

32
Context queries
  • Complement single-word queries with search for
    context word, which are near to other words
  • Phrase context query
  • Sequence of single-word queries
  • Proximity context query
  • More relaxed version of phrase query
  • Sequence of single-word queries with a max
    allowed distance between them
  • Distance in characters or words

33
Examples Context Queries
  • Phrase
  • information retrieval
  • information about retrieval
  • information with respect to the retrieval
  • Distance
  • 1
  • 4
  • Ranking similar to single-word queries

34
Boolean Queries
  • Oldest form of keyword query
  • words operators
  • atoms (basic queries) Boolean operators
  • A or B, A and B, A not B
  • Query syntax tree

AND
OR
white
paper
chocolate
35
Boolean Query Mechanics
  • Basic Query
  • Find X ? return all documents containing term X
  • X Single words or phrases
  • Simple text or string matching
  • Complex Query
  • boolean connectors and, or, not

36
Boolean IR
  • Boolean operators approximate natural language
  • e.g. find documents about a colour printers that
    are not made by Hewlett-Packard
  • AND can denote relationships between concepts
  • e.g. colour AND printer
  • OR can denote alternate terminology
  • e.g. colour AND (printer OR laser-printer)
  • NOT can exclude alternate meanings
  • e.g. colour AND (printer OR laser-printer) NOT
    (Hewlett-Packard OR HP)

37
Google Search
  • Google basic search
  • http//www.google.com/help/basics.html
  • Google advanced search
  • http//www.google.com/help/refinesearch.html

38
Natural Language Queries
  • Enumeration of words context queries
  • All docs matching a portion of the query are
    retrieved
  • Higher ranking to all docs matching more parts of
    query
  • Negation - user determines words to be eliminated
    ? lower ranking
  • Threshold for too low ranked docs
  • Boolean queries a simplified version of NL
    queries
  • Vector of term weights (doc query)

39
Natural Language Queries
40
(No Transcript)
41
(No Transcript)
42
Pattern Matching
  • More specific query formulation
  • Based on concept of pattern
  • a set of syntactic features that occur in a text
    segment
  • segments that fulfils the pattern specifications
    pattern match
  • Retrieve pieces of text that have some property
  • Useful for linguistics, text statistics, data
    extraction
  • Pattern types
  • Words, prefixes, suffixes, substrings, ranges,
    errors, regular expressions, extended patterns

43
Examples Pattern Matching
  • Words string sequence of chars
  • Prefixes program? programmer
  • Suffixes er ? computer, monster, poster
  • Substrings tal ? coastal, talk, matallic
  • any flow ? will match many flowers
  • Ranges a pair of strings which matches any word
    lying between them in lexicographical order eg.
    range between words held and hold will retrieve
    strings such as hoax, hissing, helm, help, etc.
    (lexicographical order)

44
Examples Pattern Matching
  • Allowing errors
  • word together with an error threshold
  • retrieves all text words similar to a given word
  • errors are caused by typing, spelling, etc.
  • most accepted model is the Levenshtein distance
    or edit distance

45
Examples Pattern Matching
  • Regular expression general pattern build up by
    simple strings operators (?, ?, ?)
  • pro (blemtein) (se) (012)
  • will match words like
  • problem02
  • proteins
  • Extended patterns
  • subset of the regular expressions
  • conditional expressions (part of the pattern may
    not appear always
  • wild characters matching any sequence in the text

46
Example
  • distance between
  • COLOR and COLOUR is 1
  • SURVEY and SURGERY is 2
  • in the query, must be specified the maximum
    number of allowed errors for a word to match the
    pattern

47
Structural Queries
  • Based on structure of the text
  • Structure in text usually very restrictive
  • Languages to represent structured documents
    (HTML)
  • 3 structures
  • Fixed (form-like)
  • Hypertext
  • Hierarchical
  • Current query languages integrate both contents
    and structural queries

48
Fixed Structure
  • Docs have fixed set of fields
  • Some fields are not present in all docs
  • No nesting or overlap between fields is allowed
  • Each model refers to a concrete structure of a
    collection

49
Hypertext
  • Max freedom with respect to structuring power
  • Directed graph where the nodes hold some text and
    the links represent connection between nodes or
    positions outside of nodes
  • User manually traverses the hypertext nodes
    following links to search
  • http//xanadu.com/zigzag/

50
Hierarchical Structure
  • Intermediate model
  • between fixed and hypertext
  • Recursive decomposition of text
  • typical for many text collections
  • Simplification from hypertext to a hierarchy
  • allows for faster algorithms to solve queries
  • The more powerful the mode the less efficiency
    implemented
  • Example
  • retrieve a figure on a page with structure
  • Title car
  • Introduction blue

in
with
figure
with
section
introduction
title
51
Query Languages Trends
  • Query languages retrieving info from text DBs

52
(No Transcript)
53
Querying Languages Compared
  • Boolean
  • More user control
  • Sharper cut
  • Typically start with 1 or 2 terms and either
  • refine search or,
  • broaden search
  • Nearness
  • Simpler
  • More diffuse view
  • Typically start with many terms
  • Works best in collection of similar documents
    (topic-based collection)

54
Problem for IR Query Languages
  • Vocabulary mismatch
  • Specific genres and fields are associated with
    certain words and grammars. This requires
    knowledge
  • Humans have
  • Computers do not have
  • Information need is often described using
    different words than are found in relevant
    documents
  • Vocabulary is not meaning
  • Computers match words (character strings) not
    meanings
  • Retrieval performance (relevance) is judged
    according to meaning

55
(No Transcript)
56
(No Transcript)
57
http//www.kanoodle.com/results.html?queryinforma
tionretrieval23.x023.y0
58
http//www.overture.com/d/search/?typehomemktus
langen_USKeywordsinformationretrieval
59
  • WiseGuide automatically generates categories
    semantically related to the words in your query.
  • For very general queries
  • For words with multiple meanings

http//www.wisenut.com/search/query.dll?qinformat
ionretrieval
60
(No Transcript)
61
(No Transcript)
62
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com