Relevance, Precision - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Relevance, Precision

Description:

Free text. Boolean. Relevance. Precision and recall. 3. information. translation. analysis ... Online Review 11:4 (1987): 248. 12. Exercise. Compare the ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 28
Provided by: shar253
Category:

less

Transcript and Presenter's Notes

Title: Relevance, Precision


1
Relevance, Precision Recall
2
Outline
  • Free Text vs. Controlled vocabulary
  • Free text
  • Boolean
  • Relevance
  • Precision and recall

3
construct for relevance
information
analysis
translation
create relevant connection
database
standards
translation
searching
analysis
queries
4
Free text or natural language searching
  • Why free-text searching?
  • there is no thesaurus
  • the term is a new
  • there are not many hits, either in the database
    or using controlled vocabulary
  • free text searching retrieves a larger set of
    documents

5
Free text searching problems
  • reverse concepts - LIBRAR? and SCHOOL? -
    "library school" and "school library"
  • homographs - CRACK - cocaine or seismic fault?
  • excessive truncation - BOOK? acronyms - ADD,
    SAD, AIDS - "add", "sad", and "aids"

6
Boolean Searching
  • May be free text or controlled vocabulary
  • Based on set theory
  • Addresses three problems of free text searching
  • variant word forms
  • synonyms
  • Homographs
  • Acronyms

7
AND
  • pigs AND space

space
pigs
8
OR
  • hogs OR pigs

Pigs
Hogs
9
AND
  • (hogs OR pigs) AND space

pigs
space
hogs
10
NOT
  • (pigs AND space) NOT barns

space
pigs
barns
11
FREE TEXT vs CONTROLLED VOCABULARY DEBATE
ADVANTAGES OF FREE TEXT Low cost Simplified
searching Full information content
searchable Every word has equal retrieval
value No human indexing errors No delay in
incorporating terms
ADVANTAGES OF CONTROLLED VOCABULARY Solves many
semantic problems Permits generic relationships
to be identified Maps areas of knowledge
DISADVANTAGES OF FREE TEXT Greater burden on
searcher Information implicitly but not overtly
included in text may be missed Absence of
specific to generic linage Vocabulary of
discipline must be known
DISADVANTAGES OF CONTROLLED VOCABULARY High
cost Possible inadequacies of coverage Human
error Possible out of date vocabulary Difficulty
of systematically incorporating all relevant
relationships between terms
Dubois, C.P.R. Free text vs. controlled
vocabulary a reassessment. Online Review 114
(1987) 248.
12
Exercise
  • Compare the following bibliographic records
  • Think about controlled vocabulary vs. freetext

13
Relevance
  • Oxford closely connected or appropriate to the
    matter in hand
  • Webster
  • relation to the matter at hand
  • the ability (as of an information retrieval
    system) to retrieve material that satisfies the
    needs of the user

14
information
analysis
translation
create relevant connection
database
standards
translation
analysis
queries
15
Relevance Dimensions
  • Systematic The item retrieved was in a
    form/format that meets the information need
  • Topical the item retrieved was on the topic
  • Pertinence the item retrieved is informative
  • Utility the item retrieved is useful for
    resolving my information need
  • Motivational the item retrieved may cause me to
    take other action (s)

Saracevic 1996
16
RELEVANCE (summarized from Schamber, Eisenberg
and Nilan (IPM 1990)
Relevance to a subject topicality system-oriented
Easier to measure, more easily Operationalized,
observable Rationalist tradition Shannon-Weaver
model of communication Objective reality
User relevance User-oriented More closely related
to reality Subjective Related to usefulness,
utility Pertinence, satisfaction Dynamic,
situational approach Based in users perceptions
From Saracevic reading System or algorithmic
relevance relation between a query and
information objects (text) in the file of a
system as retrieved, or as failed to be
retrieved, by a given procedure of
algorithm. Topical or subject relevance relation
between the subject or topic expressed in a
query, and topic or subject covered by retrieved
texts, Aboutness is the criterion
Cognitive relevance or pertinence relation
between the state of knowledge and cognitive
information need of a user, and texts
retrieved. Situational relevance or utility
relation between the situation, task, or problem
at hand, and texts. Motivational or affective
relevance relation between the intents, goals,
and motivations of a user, and texts retrieved.
17
Relevance levels
  • Binary
  • (relevant vs. non-relevant)
  • Multiple levels
  • Not relevant
  • Marginally relevant
  • Fairly relevant
  • Highly relevant

18
Relevance
  • How do you decide if a document is relevant to
    your needs or interests?

19
Relevance criteria identified by health
information users during Web searches
  • In the document summary
  • Title
  • Description
  • Date
  • URL
  • Format
  • In the document itself
  • Title
  • Section heading
  • Paragraph text
  • Emphasized text
  • Image
  • Hyperlink
  • Navigation
  • List item
  • Citation or reference

20
What other factors affecting relevance?
  • Novelty (Xu and Chen, 2006)
  • Understandability (Xu and Chen, 2006)
  • Reliability (Xu and Chen, 2006)
  • Document presentation order (Huang Wang, 2004)
  • Number of documents judged (Huang Wang, 2004)

21
Evaluation of Retrieval
  • How well does an information retrieval system
    perform?
  • There are two main measures for the evaluation of
    a system
  • Recall
  • Precision
  • Based on the notion of relevance of records
    retrieved.

22
Recall and Precision calculation
  • Precision ratio of useful items to total
    retrieved
  • P of relevant records retrieved
  • of records retrieved
  • Recall extent to which all useful items are
    found, from the total in the database
  • of relevant records retrieved
  • of relevant records in the database
  • E.g. 100 relevant records in the database
  • 80 records retrieved that are relevant
  • 200 records retrieved in total
  • Recall 80/100 80
  • Precision 80/200 40 lots of junk

23
Recall - Precision Relationship
24
Boolean Precision/Recall
  • AND increases precision
  • OR increases recall

25
Free text/controlled vocabulary Precision/Recall
  • free text and controlled vocabulary terms each
    contribute to precision and each to recall, but
    they do so in different ways

26
Factors affecting recall
  • Controlled vocabularies
  • Presence of broad concept terms
  • Linkage of semantically
  • Related terms
  • Free text
  • Length of record
  • (number of access
  • points)
  • Redundancy

Factors affecting precision
Free text specificity diversity of the way
concepts are represented
27
  • When do you prefer a high precision search?
  • When do you prefer a high recall search?
Write a Comment
User Comments (0)
About PowerShow.com