Prof. Ray Larson

About This Presentation

Title:

Prof. Ray Larson

Description:

Lecture 8: Probabilistic IR and Relevance Feedback SIMS 202: Information Organization and Retrieval Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS – PowerPoint PPT presentation

Number of Views:96

Avg rating:3.0/5.0

Slides: 67

Provided by: ValuedGate1241

Learn more at: https://courses.ischool.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: Prof. Ray Larson

1
Lecture 8 Probabilistic IR and Relevance
Feedback
SIMS 202 Information Organization and Retrieval

Prof. Ray Larson Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 1030 am - 1200 pm
Fall 2004
http//www.sims.berkeley.edu/academics/courses/is2
02/f04/

2
Lecture Overview

Review
Vector Representation
Term Weights
Vector Matching
Clustering
Probabilistic Models of IR
Relevance Feedback

Credit for some of the slides in this lecture
goes to Marti Hearst
3
Lecture Overview

Review
Vector Representation
Term Weights
Vector Matching
Clustering
Probabilistic Models of IR
Relevance Feedback

Credit for some of the slides in this lecture
goes to Marti Hearst
4
Document Vectors
5
Vector Space Documents and Queries
Q is a query also represented as a vector
Boolean term combinations
6
Documents in Vector Space
t3
D1
D9
D11
D5
D3
D10
D2
D4
t1
D7
D6
D8
t2
7
Binary Weights

Only the presence (1) or absence (0) of a term is
included in the vector

8
Raw Term Weights

The frequency of occurrence for the term in each
document is included in the vector

9
tfidf weights
10
Inverse Document Frequency

IDF provides high values for rare words and low
values for common words

For a collection of 10000 documents (N 10000)
11
tfidf Normalization

Normalize the term weights (so longer vectors are
not unfairly given more weight)
Normalize usually means force all values to fall
within a certain range, usually between 0 and 1,
inclusive

12
Vector Space Similarity

Now, the similarity of two documents is
This is also called the cosine, or normalized
inner product
The normalization was done when weighting the
terms
Note that the wik weights can be stored in the
vectors/ inverted files for the documents

13
Vector Space Matching
Di(di1,wdi1di2, wdi2dit, wdit) Q
(qi1,wqi1qi2, wqi2qit, wqit)
Term B
1.0
Q (0.4,0.8) D1(0.8,0.3) D2(0.2,0.7)
Q
D2
0.8
0.6
0.4
D1
0.2
0.8
0.6
0.4
0.2
0
1.0
Term A
14
Vector Space Visualization
15
Document/Document Matrix
16
Text Clustering

Clustering is
The art of finding groups in data.
-- Kaufmann and Rousseau

Term 1
Term 2
17
(No Transcript)
18
Problems with Vector Space

There is no real theoretical basis for the
assumption of a term space
it is more for visualization that having any real
basis
most similarity measures work about the same
regardless of model
Terms are not really orthogonal dimensions
Terms are not independent of all other terms
Retrieval efficiency vs. indexing and update
efficiency for stored pre-calculated weights

19
Lecture Overview

Review
Vector Representation
Term Weights
Vector Matching
Clustering
Probabilistic Models of IR
Relevance Feedback

Credit for some of the slides in this lecture
goes to Marti Hearst
20
Probabilistic Models

Rigorous formal model attempts to predict the
probability that a given document will be
relevant to a given query
Ranks retrieved documents according to this
probability of relevance (Probability Ranking
Principle)
Relies on accurate estimates of probabilities

21
Probability Ranking Principle

If a reference retrieval systems response to
each request is a ranking of the documents in the
collections in the order of decreasing
probability of usefulness to the user who
submitted the request, where the probabilities
are estimated as accurately as possible on the
basis of whatever data has been made available to
the system for this purpose, then the overall
effectiveness of the system to its users will be
the best that is obtainable on the basis of that
data.

Stephen E. Robertson, J. Documentation 1977
22
Model 1 Maron and Kuhns

Concerned with estimating probabilities of
relevance at the point of indexing
If a patron came with a request using term ti,
what is the probability that she/he would be
satisfied with document Dj ?

23
Model 1

A patron submits a query (call it Q) consisting
of some specification of her/his information
need. Different patrons submitting the same
stated query may differ as to whether or not they
judge a specific document to be relevant. The
function of the retrieval system is to compute
for each individual document the probability that
it will be judged relevant by a patron who has
submitted query Q.

Robertson, Maron Cooper, 1982
24
Model 1 Bayes

A is the class of events of using the library
Di is the class of events of Document i being
judged relevant
Ij is the class of queries consisting of the
single term Ij
P(DiA,Ij) probability that if a query is
submitted to the system then a relevant document
is retrieved

25
Model 2

Documents have many different properties some
documents have all the properties that the patron
asked for, and other documents have only some or
none of the properties. If the inquiring patron
were to examine all of the documents in the
collection she/he might find that some having all
the sought after properties were relevant, but
others (with the same properties) were not
relevant. And conversely, he/she might find that
some of the documents having none (or only a few)
of the sought after properties were relevant,
others not. The function of a document retrieval
system is to compute the probability that a
document is relevant, given that it has one (or a
set) of specified properties.

Robertson, Maron Cooper, 1982
26
Model 2 Robertson Sparck Jones
Given a term t and a query q
Document Relevance
-
r n-r n -
R-r N-n-Rr N-n
R N-R N
Document Indexing
27
Robertson-Sparck Jones Weights

Retrospective formulation

28
Robertson-Sparck Jones Weights

Predictive formulation

29
Probabilistic Models Some Unifying Notation

D All present and future documents
Q All present and future queries
(Di,Qj) A document query pair
x class of similar documents,
y class of similar queries,
Relevance (R) is a relation

30
Probabilistic Models

Model 1 -- Probabilistic Indexing, P(Ry,Di)
Model 2 -- Probabilistic Querying, P(RQj,x)
Model 3 -- Merged Model, P(R Qj, Di)
Model 0 -- P(Ry,x)
Probabilities are estimated based on prior usage
or relevance estimation

31
Probabilistic Models
Q
D
y
Qj
x
Di
32
Logistic Regression

Another approach to estimating probability of
relevance
Based on work by William Cooper, Fred Gey and
Daniel Dabney
Builds a regression model for relevance
prediction based on a set of training data
Uses less restrictive independence assumptions
than Model 2
Linked Dependence

33
So Whats Regression?

A method for fitting a curve (not necessarily a
straight line) through a set of points using some
goodness-of-fit criterion
The most common type of regression is linear
regression

34
Whats Regression?

Least Squares Fitting is a mathematical procedure
for finding the best fitting curve to a given set
of points by minimizing the sum of the squares of
the offsets ("the residuals") of the points from
the curve
The sum of the squares of the offsets is used
instead of the offset absolute values because
this allows the residuals to be treated as a
continuous differentiable quantity

35
Logistic Regression
36
Probabilistic Models Logistic Regression

Estimates for relevance based on log-linear model
with various statistical measures of document
content as independent variables

Log odds of relevance is a linear function of
attributes
Term contributions summed
Probability of Relevance is inverse of log odds
37
Logistic Regression Attributes
Average Absolute Query Frequency Query
Length Average Absolute Document
Frequency Document Length Average Inverse
Document Frequency Inverse Document
Frequency Number of Terms in common between
query and document -- logged
38
Logistic Regression

Probability of relevance is based on Logistic
regression from a sample set of documents to
determine values of the coefficients
At retrieval the probability estimate is obtained
by
For the 6 X attribute measures shown previously

39
Probabilistic Models
Advantages
Disadvantages

Strong theoretical basis
In principle should supply the best predictions
of relevance given available information
Can be implemented similarly to Vector

Relevance information is required -- or is
guestimated
Important indicators of relevance may not be term
-- though terms only are usually used
Optimally requires on-going collection of
relevance information

40
Vector and Probabilistic Models

Support natural language queries
Treat documents and queries the same
Support relevance feedback searching
Support ranked retrieval
Differ primarily in theoretical basis and in how
the ranking is calculated
Vector assumes relevance
Probabilistic relies on relevance judgments or
estimates

41
Current Use of Probabilistic Models

Virtually all the major systems in TREC now use
the Okapi BM25 formula which incorporates the
Robertson-Sparck Jones weights

42
Okapi BM25

Where
Q is a query containing terms T
K is k1((1-b) b.dl/avdl)
k1, b and k3 are parameters , usually 1.2, 0.75
and 7-1000
tf is the frequency of the term in a specific
document
qtf is the frequency of the term in a topic from
which Q was derived
dl and avdl are the document length and the
average document length measured in some
convenient unit
w(1) is the Robertson-Sparck Jones weight

43
Language Models

A recent addition to the probabilistic models is
language modeling that estimates the
probability that a query could have been produced
by a given document.
This is a slight variation on the other
probabilistic models that has led to some modest
improvements in performance

44
Logistic Regression and Cheshire II

The Cheshire II system (see readings) uses
Logistic Regression equations estimated from TREC
full-text data
Used for a number of production level systems
here and in the U.K.

45
Lecture Overview

Review
Vector Representation
Term Weights
Vector Matching
Clustering
Probabilistic Models of IR
Relevance Feedback

Credit for some of the slides in this lecture
goes to Marti Hearst
46
Querying in IR System
47
Relevance Feedback in an IR System
48
Query Modification

Problem How to reformulate the query?
Thesaurus expansion
Suggest terms similar to query terms
Relevance feedback
Suggest terms (and documents) similar to
retrieved documents that have been judged to be
relevant

49
Relevance Feedback

Main Idea
Modify existing query based on relevance
judgements
Extract terms from relevant documents and add
them to the query
And/or re-weight the terms already in the query
Two main approaches
Automatic (pseudo-relevance feedback)
Users select relevant documents
Users/system select terms from an
automatically-generated list

50
Relevance Feedback

Usually do both
Expand query with new terms
Re-weight terms in query
There are many variations
Usually positive weights for terms from relevant
docs
Sometimes negative weights for terms from
non-relevant docs
Remove terms ONLY in non-relevant documents

51
Rocchio Method
52
Rocchio/Vector Illustration
Q0 retrieval of information (0.7,0.3) D1
information science (0.2,0.8) D2
retrieval systems (0.9,0.1) Q
½Q0 ½ D1 (0.45,0.55) Q ½Q0 ½ D2
(0.80,0.20)
53
Example Rocchio Calculation
Relevant docs
Non-rel doc
Original Query
Constants
Rocchio Calculation
Resulting feedback query
54
Rocchio Method

Rocchio automatically
Re-weights terms
Adds in new terms (from relevant docs)
Have to be careful when using negative terms
Rocchio is not a machine learning algorithm
Most methods perform similarly
Results heavily dependent on test collection
Machine learning methods are proving to work
better than standard IR approaches like Rocchio

55
Probabilistic Relevance Feedback
Given a query term t
Document Relevance
-
r n-r n -
R-r N-n-Rr N-n
R N-R N
Document Indexing
Where N is the number of documents seen
56
Robertson-Sparck Jones Weights

Retrospective formulation

57
Using Relevance Feedback

Known to improve results
In TREC-like conditions (no user involved)
What about with a user in the loop?
How might you measure this?

58
Relevance Feedback Summary

Iterative query modification can improve
precision and recall for a standing query
In at least one study, users were able to make
good choices by seeing which terms were suggested
for R.F. and selecting among them (Koeneman
Belkin)

59
Alternative Notions of Relevance Feedback

Find people whose taste is similar to yours
Will you like what they like?
Follow a users actions in the background
Can this be used to predict what the user will
want to see next?
Track what lots of people are doing
Does this implicitly indicate what they think is
good and not good?

60
Alternative Notions of Relevance Feedback

Several different criteria to consider
Implicit vs. Explicit judgements
Individual vs. Group judgements
Standing vs. Dynamic topics
Similarity of the items being judged vs.
similarity of the judges themselves

61
Collaborative Filtering (Social Filtering)

If Pam liked the paper, Ill like the paper
If you liked Star Wars, youll like Independence
Day
Rating based on ratings of similar people
Ignores the text, so works on text, sound,
pictures, etc.
But Initial users can bias ratings of future
users

62
Ringo Collaborative Filtering

Users rate musical artists from like to dislike
1 detest 7 cant live without 4 ambivalent
There is a normal distribution around 4
However, what matters are the extremes
Nearest Neighbors Strategy Find similar users
and predicted (weighted) average of user ratings
Pearson r algorithm weight by degree of
correlation between user U and user J
1 means very similar, 0 means no correlation, -1
dissimilar
Works better to compare against the ambivalent
rating (4), rather than the individuals average
score

63
Social Filtering

Ignores the content, only looks at who judges
things similarly
Works well on data relating to taste
something that people are good at predicting
about each other too
Does it work for topic?
GroupLens results suggest otherwise (preliminary)
Perhaps for quality assessments
What about for assessing if a document is about a
topic?

64
Summary

Relevance feedback is an effective means for
user-directed query modification
Modification can be done with either direct or
indirect user input
Modification can be done based on an individuals
or a groups past input

65
David Hong on Cheshire

Cheshire II provided the paradigm of a fully
standards-based IR system (SGML and Z39.50
Protocol). While there are both benefits and
drawback to implementing standards-based
technologies, what can other IR systems gain from
being standards-compliant and how could this
model make other IR systems more flexible?
Cheshire II's interface allows users to specify
conventional Boolean matching and probabilistic
search. How would you infer this level of
granularity in the form of a natural language
query?
What would be some of the potential benefits of
doing feedback searching with multiple records in
an large Internet search engine?
What are the potential barriers in implementing
this feature?

66
Next Time

Information Retrieval Evaluation more on
collaborative filtering
Readings for next time
An Evaluation of Retrieval Effectiveness (Blair
Maron)
Rave Reviews Acquiring Relevance Assessments
from Multiple Users (Belew)
A Case for Interaction A Study of Interactive
Information Retrieval Behavior and Effectiveness
(Koeneman Belkin)
Work Tasks and Socio-Cognitive Relevence A
Specific Example (Hjorland Chritensen)
Social Information Filtering Algorithms for
Automating "Word of Mouth" (Shardanand Maes)