A Review of Information Filtering Part I: Adaptive Filtering - PowerPoint PPT Presentation

About This Presentation
Title:

A Review of Information Filtering Part I: Adaptive Filtering

Description:

Logistic Regression in Okapi(cont. ... Well-motivated method for the Okapi system. Based on principled approach. Cons. Limited adaptation ... – PowerPoint PPT presentation

Number of Views:105
Avg rating:3.0/5.0
Slides: 50
Provided by: Ale2
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: A Review of Information Filtering Part I: Adaptive Filtering


1
A Review of Information FilteringPart I
Adaptive Filtering
Chengxiang Zhai Language Technologies
Institiute School of Computer Science Carnegie
Mellon University
2
Outline
  • The Problem of Adaptive Information Filtering
    (AIF)
  • The TREC Work on AIF
  • Evaluation Setup
  • Main Approaches
  • Sample Results
  • The Importance of Learning
  • Summary Research Directions

3
Adaptive Information Filtering (AIF)
  • Dynamic information stream
  • (Relatively) stable user interest
  • System blocks non-relevant information
    according to users interest
  • User provides feedback on the received items
  • System learns from users feedback
  • Performance measured by the utility of the
    filtering decisions

4
A Typical AIF Application News Filtering
  • Given a news stream and users
  • Each user expresses interest by a text query
  • For each news article, system makes a yes/no
    filtering decision for each user interest
  • User provides feedback on the received news
  • System learns from feedback
  • Utility 3Good - 2 Bad

5
AIF vs. Retrieval, Categorization, Topic tracking
etc.
  • AIF is like retrieval over a dynamic stream of
    information items, but ranking is impossible
  • AIF is like online binary categorization without
    initial training data and with limited feedback
  • AIF is like tracking user interest over a news
    stream

6
Evaluation of AIF
  • Primary measure linear utility (-gtprob. cut)
  • E.g., used in
    TREC7 8
    used in TREC9
  • Problems with the linear utility
  • Unbounded
  • Not comparable across topics/profiles
  • Average utility may be dominated by one topic

7
Other Measures
  • Nonlinear utility (e.g., early relevant doc is
    worth more)
  • Normalized utility
  • More meaningful for averaging
  • But can be inversely correlated with
    precision/recall!
  • Other measures that reflect a trade-off between
    precision and recall

8
A Typical AIF System
User profile text
Initialization
Accepted Docs
Binary Classifier
...
User
User Interest Profile
Doc Source
utility func
9
Three Basic Problems in AIF
  • Making filtering decision (Binary classifier)
  • Doc text, profile text ? yes/no
  • Initialization
  • Initialize the filter based on only the profile
    text or very few examples
  • Learning from
  • Limited relevance judgments (only on yes docs)
  • Accumulated documents
  • All trying to maximize the utility

10
The TREC Work on AIF
  • The Filtering Track of TREC
  • Major Approaches to AIF
  • Sample Results

11
The Filtering Track (TREC7, 8, 9)(Hull 99, Hull
Robertson 00, Robertson Hull 01)
  • Encourage development and evaluation of
    techniques for text filtering
  • Tasks
  • Adaptive filtering (start with little/none
    training, online filtering with limited feedback)
  • Batch filtering (start with many training
    examples, online filtering with limited feedback)
  • Routing (start with many training examples,
    ranking test documents)

12
AIF Evaluation Setup
  • TREC7 LF1, LF3 utility functions
  • AP88--90 50 topics
  • No training initially
  • TREC8 LF1, LF2 utility functions
  • Financial Times 92-94 50 topics
  • No training initially
  • TREC9 T9U, Precision_at_50, etc
  • OHSUMED 63 original topics 4903 MeSH topics
  • 2? initial (positive) training examples available

13
Major Approaches to AIF
  • Extended retrieval systems
  • Reuse retrieval techniques to score documents
  • Use a score threshold for filtering decision
  • Learn to improve scoring with traditional
    feedback
  • New approaches to threshold setting and learning
  • Modified categorization systems
  • Adapt to binary, unbalanced categorization
  • New approaches to initialization
  • Train with censored training examples

14
A General Vector-Space AIF Approach
no
doc vector
Utility Evaluation
Scoring
Thresholding
yes
profile vector
threshold
Vector Learning
Threshold Learning
Feedback Information
15
Extended Retrieval Systems
  • City Univ./MicroSoft (Okapi) Prob. IR
  • Univ. of Massachusetts (Inquery) Infer. Net.
  • Queens College, CUNY (Pirc) Prob. IR
  • Clairvoyance Corp. (Clarit) Vector Space
  • Univ. of Nijmegen (KUN) Vector Space
  • Univ. of Twente (TNO) Language Model
  • And many others ...

16
Threshold Setting in Extended Retrieval Systems
  • Utility-independent approaches (generally not
    working well, not covered in this talk)
  • Indirect (linear) utility optimization
  • Logistic regression (score-gtprob. of relevance)
  • Direct utility optimization
  • Empirical utility optimization
  • Expected utility optimization given score
    distributions
  • All try to learn the optimal threshold

17
Difficulties in Threshold Learning
  • Censored data
  • Little/none labeled data
  • Scoring bias due to vector learning

36.5 R 33.4 N 32.1 R 29.9 ? 27.3
? ...
?30.0
18
Logistic Regression
  • General idea convert score of D to p(RD)
  • Fit the model using feedback data
  • Linear utility is optimized with a fixed prob.
    cutoff
  • But,
  • Possibly incorrect parametric assumptions
  • No positive examples initially
  • Censored data and limited positive feedback

19
Logistic Regression in Okapi(Robertson Walker
2000)
  • Motivation Recover probability of relevance from
    the original prob. IR model
  • Need to estimate ?, ?, and ast1 (avg. score of
    top 1 docs)
  • All topics share the same ?, which is initially
    set and never updated

20
Logistic Regression in Okapi(cont.)
  • Initially, all topics share the same ?, ?, and
    ast1 is estimated with a linear regression
    ast1 a1 a2 maxscore
  • After one week, ast1 is estimated based on the
    documents available from the week.
  • Threshold learning
  • ? is fixed all the time
  • ? is updated with gradient descent
  • heuristic ladder is used to allow exploration

21
Logistic Regression in Okapi(cont.)
  • Pros
  • Well-motivated method for the Okapi system
  • Based on principled approach
  • Cons
  • Limited adaptation
  • Exploration is ad hoc (over-explore initially)
  • Some nonlinear utility may not correspond to a
    fixed probability cutoff

22
Direct Utility Optimization
  • Given
  • A utility function U(CR ,CR- ,CN ,CN-)
  • Training data Dltsi, R,N,?gt
  • Formulate utility as a function of the threshold
    and training data UF(?,D)
  • Choose the threshold by optimizing F(?,D), i.e.,

23
Empirical Utility Optimization
  • Basic idea
  • Compute the utility on the training data for each
    candidate threshold (score of a training doc)
  • Choose the threshold that gives the maximum
    utility
  • Difficulty Biased training sample!
  • We can only get an upper bound for the true
    optimal threshold.
  • Solutions
  • Heuristic adjustment(lowering) of threshold
  • Lead to beta-gamma threshold learning

24
The Beta-Gamma Threshold Learning Method in
CLARIT(zhai et al. 00)
  • Basic idea
  • Extend the empirical utility optimization method
    by putting a lower bound on the threshold.
  • ? is to correct score bias
  • ? is to control exploration
  • ?, ? are relatively stable and can be tuned based
    on independent data
  • Can optimize any utility function (with
    appropriate zero utility )

25
Illustration of Beta-Gamma Threshold Learning
26
Beta-Gamma Threshold Learning (cont.)
  • Pros
  • Explicitly addresses exploration-exploitation
    tradeoff (Safe exploration)
  • Arbitrary utility (with appropriate lower bound)
  • Empirically effective and robust
  • Cons
  • Purely heuristic
  • Zero utility lower bound often too conservative

27
Score Distribution Approaches( Aramptzis
Hameren 01 Zhang Callan 01)
  • Assume generative model of scores p(sR), p(sN)
  • Estimate the model with training data
  • Find the threshold by optimizing the expected
    utility under the estimated model
  • Specific methods differ in the way of defining
    and estimating the scoring distributions

28
A General Formulation of Score Distribution
Approaches
  • Given p(R), p(sR), and p(sN), EU for sample
    size n, is a function of ? and n, I.e., EUF(n,
    ?)
  • The optimal threshold for sample size n is

29
Solution for Linear Utility Continuous p(sR)
p(sN)
  • Linear utility
  • The optimal threshold is the solution to the
    following equation (independent of n)

30
Gaussian-Exponential Distributions
  • P(sR) N(?,?2) p(s-s0N) E(?)

(From Zhang Callan 2001)
31
Optimal Threshold for Gaussian-Exp.
Distributions
32
Parameter Estimation in KUN (Aramptzis Hameren
01)
  • ?, ?2 estimated using ML on rel. docs
  • ? estimated using top 50 non-rel. docs
  • Some recent improvement
  • Compute p(s) based on p(wi)
  • Initial distribution q as the only rel doc.
  • Soft probabilistic threshold, sampling with p(Rs)

33
Maximum Conditional Likelihood (Zhang Callan 01)
  • Explicitly modeling of censored data
  • Data ltsi, ri,?igt ri? R,N,
  • Maximizing
  • Conjugate Gradient Descent
  • Prior is introduced for smoothing (making it
    Bayesian?)
  • Minimum delivery ratio used to ensure
    exploration

34
Score Distribution Approaches (cont.)
  • Pros
  • Principled approach
  • Arbitrary utility
  • Empirically effective
  • Cons
  • May be sensitive to the scoring function
  • Exploration not addressed

35
Modified Categorization Methods
  • Mostly applied to batch filtering, or routing and
    sometimes combined with Rocchio
  • K-Nearest Neighbor (CMU)
  • Naïve Bayes (Seoul)
  • Neural Network (ICDC, DSO, IRIT)
  • Decision Tree (NTT)
  • Only K-Nearest Neighbor was applied to AIF (CMU)
  • With special thresholding strategies

36
The State of the Art Performance
  • For high-precision utilities, system can hardly
    beat the zero-return baseline! (I.e., negative
    utility)
  • Direct/indirect utility optimization methods
    generally performed much better than
    utility-independent tuning of threshold
  • Hard to compare different threshold learning
    methods, due to too many other factors (e.g.,
    scoring, etc)

37
  • TREC7
  • No initial example
  • No system beats the zero-return baseline for F1
    (prgt0.4)
  • Several systems beat the zero-return baseline for
    F3 (prgt0.2)

(from Hull 99)
38
  • TREC7
  • Learning effect is clear in some systems
  • But, stream is not long enough for systems to
    benefit from learning

(from Hull 99)
39
  • TREC8
  • Again, learning effect is clear
  • But, systems still couldnt beat the zero-return
    baseline!

(from Hull Robertson 00)
40
  • TREC9
  • 2 initial examples
  • Amplifying learning effect
  • T9U (prob gt0.33)
  • Systems clearly beat the zero-return baseline!

(from Robertson Hull 01)
41
The Importance of Learning in AIF(Results from
Zhai et al. 00)
  • Learning and initial inaccuracies Learning
    compensates for initial inaccuracies
  • Exploitation vs. exploration Exploration
    (lowering threshold) pays off in the long run

score
ideal adaptive
ideal fixed
actual adaptive
actual fixed
time
42
Learning Effect 1 Correction of Inappropriate
Initial Threshold Setting
bad initial threshold without updating
bad initial threshold with updating
43
Learning Effect 2 Early Exploration Pays Off
44
Learning Effect 3 Regular Exploration Pays Off
Later
45
Tradeoff between Exploration and Exploitation
under-explore
over-explore
46
Summary
  • AIF is a very interesting and challenging online
    learning problem
  • As a learning task, it has extremely sparse
    training data
  • Initially no training data
  • Later, limited and censored training examples
  • Practically, learning must also be efficient

47
Summary(cont.)
  • Evaluation of AIF is challenging
  • Good performance (utility) is achieved by
  • Direct/indirect utility optimization
  • Learning the optimal score threshold from
    feedback
  • Appropriate tradeoff between exploration and
    exploitation
  • Several different threshold methods can all be
    effective

48
Research Directions
  • Threshold learning
  • Non-parametric score density estimation?
  • Controlled comparison of threshold methods
  • Integrated AIF model
  • Bayesian decision theory EM?
  • Exploration-exploitation tradeoff
  • Reinforcement learning?
  • User model evaluation measures
  • Users care about more factors than the linear
    utility
  • Users interest may drift over time
  • Redundancy reduction novelty detection

49
References
  • General papers on TREC filtering evaluation
  • D. Hull, The TREC-7 Filtering Track Description
    and Analysis , TREC-7 Proceedings.
  • D. Hull and S. Robertson, The TREC-8 Filtering
    Track Final Report, TREC-8 Proceedings.
  • S. Robertson and D. Hull, The TREC-9 Filtering
    Track Final Report, TREC-9 Proceedings.
  • Papers on specific adaptive filtering methods
  • Stephen Robertson and Stephen Walker (2000),
    Threshold Setting in Adaptive Filtering . Journal
    of Documentation, 56312-331, 2000
  • Chengxiang Zhai, Peter Jansen, and David A.
    Evans, Exploration of a heuristic approach to
    threshold learning in adaptive filtering, 2000
    ACM SIGIR Conference on Research and Development
    in Information Retrieval (SIGIR'00), 2000. Poster
    presentation.
  • Avi Arampatzis and Andre van Hameren The
    Score-Distributional Threshold Optimization for
    Adaptive Binary Classification Tasks ,
    SIGIR'2001.
  • Yi Zhang and Jamie Callan, 2001, Maximum
    Likelihood Estimation for Filtering Threshold,
    SIGIR 2001.

50
The End Thank you!
Write a Comment
User Comments (0)
About PowerShow.com