Relevance Ranking and Relevance Feedback - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Relevance Ranking and Relevance Feedback

Description:

Users are generally looking for the best document with a particular piece of information ... Terms with similar association clusters are more likely to be synonyms ... – PowerPoint PPT presentation

Number of Views:156

Avg rating:3.0/5.0

Slides: 35

Provided by: hpl5

Category:

more less

Transcript and Presenter's Notes

Title: Relevance Ranking and Relevance Feedback

1
Relevance RankingandRelevance Feedback

Carl Staelin

2
Motivation - Feast or famine

Queries return either too few or too many results
Users are generally looking for the best document
with a particular piece of information
Users dont want to look through hundreds of
documents to locate the information
? Rank documents according to expected relevance!

3
Model

Can we get user feedback?
Document score is influenced by similarity to
previously user-rated documents
Can we utilize external information?
E.g., how many other documents reference this
document?

4
Queries

Most queries are short
One to three words
Many queries are ambiguous
Saturn
Saturn the planet?
Saturn the car?

5
Sample Internal Features

Term frequency
Location of term appearance in document
Capitalization
Font
Relative font size
Bold, italic,
Appearance in lttitlegt, ltmetagt, lth?gt tags
Co-location with other (relevant) words

6
Sample External Features

Document citations
How often is it cited?
How important are the documents that cited it?
Relevance of text surrounding hyperlink
Relevance of documents citing this document
Location within website (e.g. height in the
directory structure or click distance from /)
Popularity of pages from similar queries
Search engine links often connect through search
site so they can track click-throughs
Your idea here

7
Problem

Given all these features, how do we rank the
search results?
Often use similarity between query and document
May use other factors to weight ranking score,
such as citation ranking
May use an iterative search which ranks documents
according to similarity/dissimilarity to query
and previously marked relevant/irrelevant
documents

8
Relevance Feedback

Often an information retrieval system does not
return useful information on the first try!
If at first you dont succeed, try, try, again
Find out from the user which results were most
relevant, and try to find more documents like
them and less like the others
Assumption relevant documents are somehow
similar to each other and different from
irrelevant documents
Question how?

9
Relevance Feedback Methods

Two general approaches
Create new queries with user feedback
Create new queries automatically
Re-compute document weights with new information
Expand or modify the query to more accurately
reflect the users desires

10
Vector Space Re-Weighting

Given a query Q with its query vector q
Initial, user-annotated results D
Dr relevant, retrieved documents
Dn irrelevant, retrieved documents
di are the document weight vectors
Update weights on query vector q
Re-compute similarity score to new q

11
Vector Space Re-Weighting

Basic idea
increase weight for terms appearing in relevant
documents
Decrease weight for terms appearing in irrelevant
documents
There are a few standard equations

12
Vector Space Re-Weighting

Rochio
q' ?q (?/Dr)?di ?Dr di - (?/Dn)?di ?Dn di
Ide regular
q' ?q ??di ?Dr di - ??di ?Dn di
Ide Dec_hi
q' ?q ??di ?Dr di - ?maxdi ?Dn (di )

13
Vector Space Re-Weighting

The initial query vector q0 will have non-zero
weights only for terms appearing in the query
The query vector update process can add weight to
terms that dont appear in the original query
Automatic expansion of the original query terms
Some terms can end up having negative weight!
E.g., if you want to find information on the
planet Saturn, car could have a negative weight

14
Probabilistic Re-Weighting

After initial search, get feedback from user on
document relevance
Use relevance information to recalculate term
weights
Re-compute similarity and try again

15
Probabilistic Re-Weighting

Remember from last time
Simij ??kwikwjk (ln(P(tkR)/(1-P(tkR)))
ln((1-P(tkR))/P(tkR)))
P(tkR) 0.5
P(tkR) ni /N ni docs containing tk
gt
Simij ?k wik wjk ln((N - ni)/ni)

16
Probabilistic Re-Weighting

Given document relevance feedback
Dr set of relevant retrieved docs
Drk subset of Dr containing tk
gt
P(tkR) Drk / Dr
P(tkR) (ni - Drk) / (N - Dr)

17
Probabilistic Re-Weighting

Substituting the new probabilities gives
Simij ?k wik wjk ln((Drk / (Dr-Drk)) ?
((ni - Drk) / (N - Dr- (ni - Drk))))
However small values of Drk, Dr can cause
problems, so usually a fudge factor is added

18
Probabilistic Re-Weighting

Effectively updates query term weights
No automatic query expansion
Terms not in the initial query are never
considered
No memory of previous weights

19
Query Expansion Via Local Clustering

Create new queries automatically
Cluster initial search results
Use clusters to create new queries
Compute Sk(n), which is the set of keywords
similar to tk that should be added to the query
D set of retrieved documents
V vocabulary of D

20
Association Clustering

Find terms that frequently co-occur within
documents
skl ckl ?i fik fil
Or, normalized association matrix s
skl ckl / (ckk cll ckl)
Association cluster Sk(n)
Sk(n) n largest skl s.t. l ? k

21
Metric Clustering

Measure distance between keyword appearances in a
document
r(tk, tl) words between tk and tl
ckl ?k?l (1 / r(tk, tl))
Normalized skl ckl / (tk ? tl)
tk of words stemmed to tk
Sk(n) is the same as before

22
Scalar Clustering

Terms with similar association clusters are more
likely to be synonyms
sk is the row vector from association clustering
skl
skl (sk ? sl) / (sk ? sl)
Sk(n) is the same as before

23
Query Expansion Via Local Context Analysis

Combine information from initial search results
and global corpus
Break retrieved documents into fixed-length
passages
Treat each passage as a document, and rank order
them using the initial query
Compute the weight of each term in top ranked
passages using a TFIDF-like similarity with query
Take the top m terms and add them to the original
query with weight
w 1 0.9(rank / m)

24
Local Context Analysis