Title: Relevance, Precision
1Relevance, Precision Recall
2Outline
- Free Text vs. Controlled vocabulary
- Free text
- Boolean
- Relevance
- Precision and recall
3construct for relevance
information
analysis
translation
create relevant connection
database
standards
translation
searching
analysis
queries
4Free text or natural language searching
- Why free-text searching?
- there is no thesaurus
- the term is a new
- there are not many hits, either in the database
or using controlled vocabulary - free text searching retrieves a larger set of
documents
5Free text searching problems
- reverse concepts - LIBRAR? and SCHOOL? -
"library school" and "school library" - homographs - CRACK - cocaine or seismic fault?
- excessive truncation - BOOK? acronyms - ADD,
SAD, AIDS - "add", "sad", and "aids"
6Boolean Searching
- May be free text or controlled vocabulary
- Based on set theory
- Addresses three problems of free text searching
- variant word forms
- synonyms
- Homographs
- Acronyms
7AND
space
pigs
8OR
Pigs
Hogs
9AND
pigs
space
hogs
10NOT
- (pigs AND space) NOT barns
space
pigs
barns
11FREE TEXT vs CONTROLLED VOCABULARY DEBATE
ADVANTAGES OF FREE TEXT Low cost Simplified
searching Full information content
searchable Every word has equal retrieval
value No human indexing errors No delay in
incorporating terms
ADVANTAGES OF CONTROLLED VOCABULARY Solves many
semantic problems Permits generic relationships
to be identified Maps areas of knowledge
DISADVANTAGES OF FREE TEXT Greater burden on
searcher Information implicitly but not overtly
included in text may be missed Absence of
specific to generic linage Vocabulary of
discipline must be known
DISADVANTAGES OF CONTROLLED VOCABULARY High
cost Possible inadequacies of coverage Human
error Possible out of date vocabulary Difficulty
of systematically incorporating all relevant
relationships between terms
Dubois, C.P.R. Free text vs. controlled
vocabulary a reassessment. Online Review 114
(1987) 248.
12Exercise
- Compare the following bibliographic records
- Think about controlled vocabulary vs. freetext
13Relevance
- Oxford closely connected or appropriate to the
matter in hand - Webster
- relation to the matter at hand
- the ability (as of an information retrieval
system) to retrieve material that satisfies the
needs of the user
14information
analysis
translation
create relevant connection
database
standards
translation
analysis
queries
15Relevance Dimensions
- Systematic The item retrieved was in a
form/format that meets the information need - Topical the item retrieved was on the topic
- Pertinence the item retrieved is informative
- Utility the item retrieved is useful for
resolving my information need - Motivational the item retrieved may cause me to
take other action (s)
Saracevic 1996
16RELEVANCE (summarized from Schamber, Eisenberg
and Nilan (IPM 1990)
Relevance to a subject topicality system-oriented
Easier to measure, more easily Operationalized,
observable Rationalist tradition Shannon-Weaver
model of communication Objective reality
User relevance User-oriented More closely related
to reality Subjective Related to usefulness,
utility Pertinence, satisfaction Dynamic,
situational approach Based in users perceptions
From Saracevic reading System or algorithmic
relevance relation between a query and
information objects (text) in the file of a
system as retrieved, or as failed to be
retrieved, by a given procedure of
algorithm. Topical or subject relevance relation
between the subject or topic expressed in a
query, and topic or subject covered by retrieved
texts, Aboutness is the criterion
Cognitive relevance or pertinence relation
between the state of knowledge and cognitive
information need of a user, and texts
retrieved. Situational relevance or utility
relation between the situation, task, or problem
at hand, and texts. Motivational or affective
relevance relation between the intents, goals,
and motivations of a user, and texts retrieved.
17Relevance levels
- Binary
- (relevant vs. non-relevant)
- Multiple levels
- Not relevant
- Marginally relevant
- Fairly relevant
- Highly relevant
18Relevance
- How do you decide if a document is relevant to
your needs or interests?
19Relevance criteria identified by health
information users during Web searches
- In the document summary
- Title
- Description
- Date
- URL
- Format
- In the document itself
- Title
- Section heading
- Paragraph text
- Emphasized text
- Image
- Hyperlink
- Navigation
- List item
- Citation or reference
20What other factors affecting relevance?
- Novelty (Xu and Chen, 2006)
- Understandability (Xu and Chen, 2006)
- Reliability (Xu and Chen, 2006)
- Document presentation order (Huang Wang, 2004)
- Number of documents judged (Huang Wang, 2004)
21Evaluation of Retrieval
- How well does an information retrieval system
perform? - There are two main measures for the evaluation of
a system - Recall
- Precision
- Based on the notion of relevance of records
retrieved.
22Recall and Precision calculation
- Precision ratio of useful items to total
retrieved - P of relevant records retrieved
- of records retrieved
- Recall extent to which all useful items are
found, from the total in the database - of relevant records retrieved
- of relevant records in the database
- E.g. 100 relevant records in the database
- 80 records retrieved that are relevant
- 200 records retrieved in total
- Recall 80/100 80
- Precision 80/200 40 lots of junk
23Recall - Precision Relationship
24Boolean Precision/Recall
- AND increases precision
- OR increases recall
25Free text/controlled vocabulary Precision/Recall
- free text and controlled vocabulary terms each
contribute to precision and each to recall, but
they do so in different ways
26Factors affecting recall
- Controlled vocabularies
- Presence of broad concept terms
- Linkage of semantically
- Related terms
- Free text
- Length of record
- (number of access
- points)
- Redundancy
Factors affecting precision
Free text specificity diversity of the way
concepts are represented
27- When do you prefer a high precision search?
- When do you prefer a high recall search?