Modelbased Feedback in the Language Modeling Approach to Information Retrieval presentation

About This Presentation

Transcript and Presenter's Notes

Title: Modelbased Feedback in the Language Modeling Approach to Information Retrieval

1
Model-based Feedback in the Language Modeling
Approach to Information Retrieval

Chengxiang Zhai and John Lafferty
School of Computer Science
Carnegie Mellon University

2
Outline

The Language Modeling Approach to IR
Feedback Expansion-based vs. Model-based
Two Model-based feedback algorithms
Evaluation
Conclusions Future Work

3
Text Retrieval (TR)

Given a query, find relevant documents in a
document collection (? Ranking documents)
Many applications (Web pages, News, Email, )
Many models developed (vector space,
probabilistic)
The language modeling approach is a new model
that is promising

4
Retrieval as Language Model Estimation

Document ranking based on query likelihood (Ponte
Croft 98, Miller et al. 99, Berger Lafferty
99, Hiemstra 2000, etc.)

Retrieval problem ? Estimation of p(wid)
Many advantages good statistical foundation,
reuse existing LM methods ...
But, feedback is awkward

5
Feedback in Text Retrieval

Learning from examples
In effect, new, related terms are extracted to
enhance the original query
Generally leads to performance increase (both
average precision and recall)

6
Relevance Feedback
7
Pseudo/Blind/Automatic Feedback
Results d1 3.5 d2 2.4 dk 0.5 ...
Retrieval Engine
Query
Updated query
Document collection
Judgments d1 d2 d3 dk - ...
Feedback
8
Feedback in the Language Modeling Approach

Mostly expansion-based adding new terms to
query
(Ponte 1998, Miller et al. 1999, Ng 1999)
Query term reweighting, no expansion (Hiemstra
2001)
Implicit feedback (Berger Lafferty 99)
Conceptual inconsistency in expansion-based
approaches
Original query as text
Expanded query as text terms

9
Question

How to exploit language modeling to perform
natural and effective feedback?

10
A KL-Divergence Unigram Retrieval Model

A special case of the general risk minimization
retrieval framework (Lafferty Zhai 2001)
Retrieval formula
Retrieval ? Estimation of ?Q and ?D
Special case empirical distribution of q
recovers query-likelihood

query entropy (ignored for ranking)
11
Expansion-based vs. Model-based
Doc model
Scoring
Document D
Results
Query Q
Query likelihood
Feedback Docs
Doc model
Document D
Scoring
Results
KL-divergence
Query model
Query Q
Feedback Docs
12
Feedback as Model Interpolation
MLsmooth
Document D
Results
Query Q
ML
Feedback Docs Fd1, d2 , , dn
?0
?1
Generative model Divergence minimization
No feedback
Full feedback
13
?F Estimation Method I Generative Mixture Model
14
?F Estimation Method II Empirical Divergence
Minimization
15
Example of Feedback Query Model
Trec topic 412 airport security
Mixture model approach Web database Top 10 docs
?0.9
?0.7
16
Model-based feedback vs. Simple LM
17
Sensitivity of Precision to ?
18
Sensitivity of Precision to ? (Mixture Model
Divergence Min., ?0.5)
Over discrimination can be harmful
19
The Lemur Toolkit

Language Modeling and Information Retrieval
Toolkit
Under development at CMU and UMass
All experiments reported here were run using
Lemur
http//www.cs.cmu.edu/lemur
Contact us if you are interested in using it

20
Conclusions

Model-based feedback is natural and effective
Performance is sensitive to both ? and ?
Mixture model more sensitive to ?, but less to ?
(??0.5)
Divergence min more sensitive to ?, but less to
? (??0.3)
The sensitivity suggests more robust models are
needed. E.g., use query to focus the model
Markov chain query model (Lafferty Zhai, 2001)
Relevance language model (Lavrenko Croft, 2001)

21
Future Work

Evaluating methods for relevance feedback
Examples in pseudo feedback can be quite noisy
Relevance feedback better reflects learning
ability
More robust feedback models, e.g.,
Query-focused feedback (e.g., Query translation
model)
Passage-based feedback (e.g., Hidden Markov model)

22
?F Estimation Method I Generative Mixture Model
23
Effect of Feedback on 3 Collections
Disk45-CR, topics 401-450 (Trec8 ad hoc)
Web, topics 401-450 (Trec8 small web, 2GB)
AP88-89, topics 101-150

Document language model fixed Dirichlet prior
(?1,000)
Baseline original ML query model (close to
optimal)
Feedback Mixture model Divergence minimization
(Best run shown)
top 10 docs for feedback
model truncated at p(w)0.001

24
Approaches to Text Retrieval

TR ? Ranking documents w.r.t. a query
Many approaches/models developed
Vector-space models ranking by query-document
similarity (Salton et al. 75)
Probabilistic models ranking by probability of
relevance given query and document (Robertson
Sparck Jones 76, Ponte Croft 98)
Big challenge good performance without
heuristic/ad hoc tuning of parameters
The language modeling approach is promising

25
Model-based feedback vs. Simple LM and Rocchio
26
Sensitivity of Precision to ? (Mixture Model
Divergence Min., ?0.5)
Over discrimination can be harmful
27
Feedback in Text Retrieval

General idea Learning from examples
Where do examples (of good/relevant documents)
come from?
User provides relevance judgments ?
relevance feedback
Assume top N (e.g. 10) documents to be relevant ?
pseudo/blind feedback
In effect, new, related terms are extracted to
enhance the original query
Generally leads to performance increase (both
average precision and recall)

Write a Comment

User Comments (0)

About PowerShow.com

Modelbased Feedback in the Language Modeling Approach to Information Retrieval PowerPoint PPT Presentation