A Review of Information Filtering Part I: Adaptive Filtering

About This Presentation

Title:

A Review of Information Filtering Part I: Adaptive Filtering

Description:

Logistic Regression in Okapi(cont. ... Well-motivated method for the Okapi system. Based on principled approach. Cons. Limited adaptation ... – PowerPoint PPT presentation

Number of Views:105

Avg rating:3.0/5.0

Slides: 50

Provided by: Ale2

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Review of Information Filtering Part I: Adaptive Filtering

1
A Review of Information FilteringPart I
Adaptive Filtering
Chengxiang Zhai Language Technologies
Institiute School of Computer Science Carnegie
Mellon University
2
Outline

The Problem of Adaptive Information Filtering
(AIF)
The TREC Work on AIF
Evaluation Setup
Main Approaches
Sample Results
The Importance of Learning
Summary Research Directions

3
Adaptive Information Filtering (AIF)

Dynamic information stream
(Relatively) stable user interest
System blocks non-relevant information
according to users interest
User provides feedback on the received items
System learns from users feedback
Performance measured by the utility of the
filtering decisions

4
A Typical AIF Application News Filtering

Given a news stream and users
Each user expresses interest by a text query
For each news article, system makes a yes/no
filtering decision for each user interest
User provides feedback on the received news
System learns from feedback
Utility 3Good - 2 Bad

5
AIF vs. Retrieval, Categorization, Topic tracking
etc.

AIF is like retrieval over a dynamic stream of
information items, but ranking is impossible
AIF is like online binary categorization without
initial training data and with limited feedback
AIF is like tracking user interest over a news
stream

6
Evaluation of AIF

Primary measure linear utility (-gtprob. cut)
E.g., used in
TREC7 8
used in TREC9
Problems with the linear utility
Unbounded
Not comparable across topics/profiles
Average utility may be dominated by one topic

7
Other Measures

Nonlinear utility (e.g., early relevant doc is
worth more)
Normalized utility
More meaningful for averaging
But can be inversely correlated with
precision/recall!
Other measures that reflect a trade-off between
precision and recall

8
A Typical AIF System
User profile text
Initialization
Accepted Docs
Binary Classifier
...
User
User Interest Profile
Doc Source
utility func
9
Three Basic Problems in AIF

Making filtering decision (Binary classifier)
Doc text, profile text ? yes/no
Initialization
Initialize the filter based on only the profile
text or very few examples
Learning from
Limited relevance judgments (only on yes docs)
Accumulated documents
All trying to maximize the utility

10
The TREC Work on AIF

The Filtering Track of TREC
Major Approaches to AIF
Sample Results

11
The Filtering Track (TREC7, 8, 9)(Hull 99, Hull
Robertson 00, Robertson Hull 01)

Encourage development and evaluation of
techniques for text filtering
Tasks
Adaptive filtering (start with little/none
training, online filtering with limited feedback)
Batch filtering (start with many training
examples, online filtering with limited feedback)
Routing (start with many training examples,
ranking test documents)

12
AIF Evaluation Setup

TREC7 LF1, LF3 utility functions
AP88--90 50 topics
No training initially
TREC8 LF1, LF2 utility functions
Financial Times 92-94 50 topics
No training initially
TREC9 T9U, Precision_at_50, etc
OHSUMED 63 original topics 4903 MeSH topics
2? initial (positive) training examples available

13
Major Approaches to AIF

Extended retrieval systems
Reuse retrieval techniques to score documents
Use a score threshold for filtering decision
Learn to improve scoring with traditional
feedback
New approaches to threshold setting and learning
Modified categorization systems
Adapt to binary, unbalanced categorization
New approaches to initialization
Train with censored training examples

14
A General Vector-Space AIF Approach
no
doc vector
Utility Evaluation
Scoring
Thresholding
yes
profile vector
threshold
Vector Learning
Threshold Learning
Feedback Information
15
Extended Retrieval Systems

City Univ./MicroSoft (Okapi) Prob. IR
Univ. of Massachusetts (Inquery) Infer. Net.
Queens College, CUNY (Pirc) Prob. IR
Clairvoyance Corp. (Clarit) Vector Space
Univ. of Nijmegen (KUN) Vector Space
Univ. of Twente (TNO) Language Model
And many others ...

16
Threshold Setting in Extended Retrieval Systems

Utility-independent approaches (generally not
working well, not covered in this talk)
Indirect (linear) utility optimization
Logistic regression (score-gtprob. of relevance)
Direct utility optimization
Empirical utility optimization
Expected utility optimization given score
distributions
All try to learn the optimal threshold

17
Difficulties in Threshold Learning

Censored data
Little/none labeled data
Scoring bias due to vector learning

36.5 R 33.4 N 32.1 R 29.9 ? 27.3
? ...
?30.0
18
Logistic Regression

General idea convert score of D to p(RD)
Fit the model using feedback data
Linear utility is optimized with a fixed prob.
cutoff
But,
Possibly incorrect parametric assumptions
No positive examples initially
Censored data and limited positive feedback

19
Logistic Regression in Okapi(Robertson Walker
2000)

Motivation Recover probability of relevance from
the original prob. IR model
Need to estimate ?, ?, and ast1 (avg. score of
top 1 docs)
All topics share the same ?, which is initially
set and never updated

20
Logistic Regression in Okapi(cont.)

Initially, all topics share the same ?, ?, and
ast1 is estimated with a linear regression
ast1 a1 a2 maxscore
After one week, ast1 is estimated based on the
documents available from the week.
Threshold learning
? is fixed all the time
? is updated with gradient descent
heuristic ladder is used to allow exploration

21
Logistic Regression in Okapi(cont.)

Pros
Well-motivated method for the Okapi system
Based on principled approach
Cons
Limited adaptation
Exploration is ad hoc (over-explore initially)
Some nonlinear utility may not correspond to a
fixed probability cutoff

22
Direct Utility Optimization

Given
A utility function U(CR ,CR- ,CN ,CN-)
Training data Dltsi, R,N,?gt
Formulate utility as a function of the threshold
and training data UF(?,D)
Choose the threshold by optimizing F(?,D), i.e.,

23
Empirical Utility Optimization

Basic idea
Compute the utility on the training data for each
candidate threshold (score of a training doc)
Choose the threshold that gives the maximum
utility
Difficulty Biased training sample!
We can only get an upper bound for the true
optimal threshold.
Solutions
Heuristic adjustment(lowering) of threshold
Lead to beta-gamma threshold learning

24
The Beta-Gamma Threshold Learning Method in
CLARIT(zhai et al. 00)

Basic idea
Extend the empirical utility optimization method
by putting a lower bound on the threshold.
? is to correct score bias
? is to control exploration
?, ? are relatively stable and can be tuned based
on independent data
Can optimize any utility function (with
appropriate zero utility )

25
Illustration of Beta-Gamma Threshold Learning
26
Beta-Gamma Threshold Learning (cont.)

Pros
Explicitly addresses exploration-exploitation
tradeoff (Safe exploration)
Arbitrary utility (with appropriate lower bound)
Empirically effective and robust
Cons
Purely heuristic
Zero utility lower bound often too conservative

27
Score Distribution Approaches( Aramptzis
Hameren 01 Zhang Callan 01)

Assume generative model of scores p(sR), p(sN)
Estimate the model with training data
Find the threshold by optimizing the expected
utility under the estimated model
Specific methods differ in the way of defining
and estimating the scoring distributions

28
A General Formulation of Score Distribution
Approaches

Given p(R), p(sR), and p(sN), EU for sample
size n, is a function of ? and n, I.e., EUF(n,
?)
The optimal threshold for sample size n is

29
Solution for Linear Utility Continuous p(sR)
p(sN)

Linear utility
The optimal threshold is the solution to the
following equation (independent of n)

30
Gaussian-Exponential Distributions

P(sR) N(?,?2) p(s-s0N) E(?)

(From Zhang Callan 2001)
31
Optimal Threshold for Gaussian-Exp.
Distributions
32
Parameter Estimation in KUN (Aramptzis Hameren
01)

?, ?2 estimated using ML on rel. docs
? estimated using top 50 non-rel. docs
Some recent improvement
Compute p(s) based on p(wi)
Initial distribution q as the only rel doc.
Soft probabilistic threshold, sampling with p(Rs)

33
Maximum Conditional Likelihood (Zhang Callan 01)

Explicitly modeling of censored data
Data ltsi, ri,?igt ri? R,N,
Maximizing
Conjugate Gradient Descent
Prior is introduced for smoothing (making it
Bayesian?)
Minimum delivery ratio used to ensure
exploration

34
Score Distribution Approaches (cont.)

Pros
Principled approach
Arbitrary utility
Empirically effective
Cons
May be sensitive to the scoring function
Exploration not addressed

35
Modified Categorization Methods

Mostly applied to batch filtering, or routing and
sometimes combined with Rocchio
K-Nearest Neighbor (CMU)
Naïve Bayes (Seoul)
Neural Network (ICDC, DSO, IRIT)
Decision Tree (NTT)
Only K-Nearest Neighbor was applied to AIF (CMU)
With special thresholding strategies

36
The State of the Art Performance

For high-precision utilities, system can hardly
beat the zero-return baseline! (I.e., negative
utility)
Direct/indirect utility optimization methods
generally performed much better than
utility-independent tuning of threshold
Hard to compare different threshold learning
methods, due to too many other factors (e.g.,
scoring, etc)

TREC7
No initial example
No system beats the zero-return baseline for F1
(prgt0.4)
Several systems beat the zero-return baseline for
F3 (prgt0.2)

(from Hull 99)
38

TREC7
Learning effect is clear in some systems
But, stream is not long enough for systems to
benefit from learning

(from Hull 99)
39

TREC8
Again, learning effect is clear
But, systems still couldnt beat the zero-return
baseline!

(from Hull Robertson 00)
40

TREC9
2 initial examples
Amplifying learning effect
T9U (prob gt0.33)
Systems clearly beat the zero-return baseline!

(from Robertson Hull 01)
41
The Importance of Learning in AIF(Results from
Zhai et al. 00)

Learning and initial inaccuracies Learning
compensates for initial inaccuracies
Exploitation vs. exploration Exploration
(lowering threshold) pays off in the long run

score
ideal adaptive
ideal fixed
actual adaptive
actual fixed
time
42
Learning Effect 1 Correction of Inappropriate
Initial Threshold Setting
bad initial threshold without updating
bad initial threshold with updating
43
Learning Effect 2 Early Exploration Pays Off
44
Learning Effect 3 Regular Exploration Pays Off
Later
45
Tradeoff between Exploration and Exploitation
under-explore
over-explore
46
Summary

AIF is a very interesting and challenging online
learning problem
As a learning task, it has extremely sparse
training data
Initially no training data
Later, limited and censored training examples
Practically, learning must also be efficient

47
Summary(cont.)

Evaluation of AIF is challenging
Good performance (utility) is achieved by
Direct/indirect utility optimization
Learning the optimal score threshold from
feedback
Appropriate tradeoff between exploration and
exploitation
Several different threshold methods can all be
effective

48
Research Directions

Threshold learning
Non-parametric score density estimation?
Controlled comparison of threshold methods
Integrated AIF model
Bayesian decision theory EM?
Exploration-exploitation tradeoff
Reinforcement learning?
User model evaluation measures
Users care about more factors than the linear
utility
Users interest may drift over time
Redundancy reduction novelty detection

49
References

General papers on TREC filtering evaluation
D. Hull, The TREC-7 Filtering Track Description
and Analysis , TREC-7 Proceedings.
D. Hull and S. Robertson, The TREC-8 Filtering
Track Final Report, TREC-8 Proceedings.
S. Robertson and D. Hull, The TREC-9 Filtering
Track Final Report, TREC-9 Proceedings.
Papers on specific adaptive filtering methods
Stephen Robertson and Stephen Walker (2000),
Threshold Setting in Adaptive Filtering . Journal
of Documentation, 56312-331, 2000
Chengxiang Zhai, Peter Jansen, and David A.
Evans, Exploration of a heuristic approach to
threshold learning in adaptive filtering, 2000
ACM SIGIR Conference on Research and Development
in Information Retrieval (SIGIR'00), 2000. Poster
presentation.
Avi Arampatzis and Andre van Hameren The
Score-Distributional Threshold Optimization for
Adaptive Binary Classification Tasks ,
SIGIR'2001.
Yi Zhang and Jamie Callan, 2001, Maximum
Likelihood Estimation for Filtering Threshold,
SIGIR 2001.

50
The End Thank you!

Write a Comment

User Comments (0)

About PowerShow.com

A Review of Information Filtering Part I: Adaptive Filtering - PowerPoint PPT Presentation

A Review of Information Filtering Part I: Adaptive Filtering

Logistic Regression in Okapi(cont. ... Well-motivated method for the Okapi system. Based on principled approach. Cons. Limited adaptation ... – PowerPoint PPT presentation