Modelbased Feedback in the Language Modeling Approach to Information Retrieval PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Modelbased Feedback in the Language Modeling Approach to Information Retrieval


1
Model-based Feedback in the Language Modeling
Approach to Information Retrieval
  • Chengxiang Zhai and John Lafferty
  • School of Computer Science
  • Carnegie Mellon University

2
Outline
  • The Language Modeling Approach to IR
  • Feedback Expansion-based vs. Model-based
  • Two Model-based feedback algorithms
  • Evaluation
  • Conclusions Future Work

3
Text Retrieval (TR)
  • Given a query, find relevant documents in a
    document collection (? Ranking documents)
  • Many applications (Web pages, News, Email, )
  • Many models developed (vector space,
    probabilistic)
  • The language modeling approach is a new model
    that is promising

4
Retrieval as Language Model Estimation
  • Document ranking based on query likelihood (Ponte
    Croft 98, Miller et al. 99, Berger Lafferty
    99, Hiemstra 2000, etc.)
  • Retrieval problem ? Estimation of p(wid)
  • Many advantages good statistical foundation,
    reuse existing LM methods ...
  • But, feedback is awkward

5
Feedback in Text Retrieval
  • Learning from examples
  • In effect, new, related terms are extracted to
    enhance the original query
  • Generally leads to performance increase (both
    average precision and recall)

6
Relevance Feedback
7
Pseudo/Blind/Automatic Feedback
Results d1 3.5 d2 2.4 dk 0.5 ...
Retrieval Engine
Query
Updated query
Document collection
Judgments d1 d2 d3 dk - ...
Feedback
8
Feedback in the Language Modeling Approach
  • Mostly expansion-based adding new terms to
    query
  • (Ponte 1998, Miller et al. 1999, Ng 1999)
  • Query term reweighting, no expansion (Hiemstra
    2001)
  • Implicit feedback (Berger Lafferty 99)
  • Conceptual inconsistency in expansion-based
    approaches
  • Original query as text
  • Expanded query as text terms

9
Question
  • How to exploit language modeling to perform
    natural and effective feedback?

10
A KL-Divergence Unigram Retrieval Model
  • A special case of the general risk minimization
    retrieval framework (Lafferty Zhai 2001)
  • Retrieval formula
  • Retrieval ? Estimation of ?Q and ?D
  • Special case empirical distribution of q
    recovers query-likelihood

query entropy (ignored for ranking)
11
Expansion-based vs. Model-based
Doc model
Scoring
Document D
Results
Query Q
Query likelihood
Feedback Docs
Doc model
Document D
Scoring
Results
KL-divergence
Query model
Query Q
Feedback Docs
12
Feedback as Model Interpolation
MLsmooth
Document D
Results
Query Q
ML
Feedback Docs Fd1, d2 , , dn
?0
?1
Generative model Divergence minimization
No feedback
Full feedback
13
?F Estimation Method I Generative Mixture Model
14
?F Estimation Method II Empirical Divergence
Minimization
15
Example of Feedback Query Model
Trec topic 412 airport security
Mixture model approach Web database Top 10 docs
?0.9
?0.7
16
Model-based feedback vs. Simple LM
17
Sensitivity of Precision to ?
18
Sensitivity of Precision to ? (Mixture Model
Divergence Min., ?0.5)
Over discrimination can be harmful
19
The Lemur Toolkit
  • Language Modeling and Information Retrieval
    Toolkit
  • Under development at CMU and UMass
  • All experiments reported here were run using
    Lemur
  • http//www.cs.cmu.edu/lemur
  • Contact us if you are interested in using it

20
Conclusions
  • Model-based feedback is natural and effective
  • Performance is sensitive to both ? and ?
  • Mixture model more sensitive to ?, but less to ?
    (??0.5)
  • Divergence min more sensitive to ?, but less to
    ? (??0.3)
  • The sensitivity suggests more robust models are
    needed. E.g., use query to focus the model
  • Markov chain query model (Lafferty Zhai, 2001)
  • Relevance language model (Lavrenko Croft, 2001)

21
Future Work
  • Evaluating methods for relevance feedback
  • Examples in pseudo feedback can be quite noisy
  • Relevance feedback better reflects learning
    ability
  • More robust feedback models, e.g.,
  • Query-focused feedback (e.g., Query translation
    model)
  • Passage-based feedback (e.g., Hidden Markov model)

22
?F Estimation Method I Generative Mixture Model
23
Effect of Feedback on 3 Collections
Disk45-CR, topics 401-450 (Trec8 ad hoc)
Web, topics 401-450 (Trec8 small web, 2GB)
AP88-89, topics 101-150
  • Document language model fixed Dirichlet prior
    (?1,000)
  • Baseline original ML query model (close to
    optimal)
  • Feedback Mixture model Divergence minimization
    (Best run shown)
  • top 10 docs for feedback
  • model truncated at p(w)0.001

24
Approaches to Text Retrieval
  • TR ? Ranking documents w.r.t. a query
  • Many approaches/models developed
  • Vector-space models ranking by query-document
    similarity (Salton et al. 75)
  • Probabilistic models ranking by probability of
    relevance given query and document (Robertson
    Sparck Jones 76, Ponte Croft 98)
  • Big challenge good performance without
    heuristic/ad hoc tuning of parameters
  • The language modeling approach is promising

25
Model-based feedback vs. Simple LM and Rocchio
26
Sensitivity of Precision to ? (Mixture Model
Divergence Min., ?0.5)
Over discrimination can be harmful
27
Feedback in Text Retrieval
  • General idea Learning from examples
  • Where do examples (of good/relevant documents)
    come from?
  • User provides relevance judgments ?
    relevance feedback
  • Assume top N (e.g. 10) documents to be relevant ?
    pseudo/blind feedback
  • In effect, new, related terms are extracted to
    enhance the original query
  • Generally leads to performance increase (both
    average precision and recall)
Write a Comment
User Comments (0)
About PowerShow.com