An Interactive Learning Approach to Optimizing Information Retrieval Systems - PowerPoint PPT Presentation

About This Presentation

Title:

An Interactive Learning Approach to Optimizing Information Retrieval Systems

Description:

Title: Interactively Optimizing Information Systems as a Dueling Bandits Problem Author: yyue Last modified by: yyue Created Date: 8/16/2006 12:00:00 AM – PowerPoint PPT presentation

Number of Views:99

Avg rating:3.0/5.0

Slides: 54

Provided by: yyu3

Category:

more less

Transcript and Presenter's Notes

Title: An Interactive Learning Approach to Optimizing Information Retrieval Systems

1
An Interactive Learning Approach to Optimizing
Information Retrieval Systems

CMU ML Lunch
September 27th, 2010
Yisong Yue
Carnegie Mellon University

2
Information Systems
3
Interactive Learning Setting

Find the best ranking function (out of 1000)
Show results, evaluate using click logs
Clicks biased (users only click on what they see)
Explore / exploit problem

4
Interactive Learning Setting

Find the best ranking function (out of 1000)
Show results, evaluate using click logs
Clicks biased (users only click on what they see)
Explore / exploit problem
Technical issues
How to interpret clicks?
What is the utility function?
What results to show users?

5
Interactive Learning Setting

Find the best ranking function (out of 1000)
Show results, evaluate using click logs
Clicks biased (users only click on what they see)
Explore / exploit problem
Technical issues
How to interpret clicks?
What is the utility function?
What results to show users?

6
Team-Game Interleaving
(uthorsten, qsvm)
f1(u,q) ? r1
f2(u,q) ? r2
1. Kernel Machines http//svm.first.gmd.de/ 2.
SVM-Light Support Vector Machine
http//ais.gmd.de/thorsten/svm
light/ 3. Support Vector Machine and Kernel ...
References http//svm.research.bell-labs.com/SVMr
efs.html 4. Lucent Technologies SVM demo applet
http//svm.research.bell-labs.com/SVT/SVMsvt.htm
l 5. Royal Holloway Support Vector Machine
http//svm.dcs.rhbnc.ac.uk
NEXTPICK
1. Kernel Machines http//svm.first.gmd.de/ 2.
Support Vector Machine http//jbolivar.freeserver
s.com/ 3. An Introduction to Support Vector
Machines http//www.support-vector.net/ 4. Archiv
es of SUPPORT-VECTOR-MACHINES ... http//www.jisc
mail.ac.uk/lists/SUPPORT... 5. SVM-Light Support
Vector Machine http//ais.gmd.de/thorsten/svm
light/
Interleaving(r1,r2)
1. Kernel Machines T2 http//svm.first.gmd.de/
2. Support Vector Machine T1 http//jbolivar.free
servers.com/ 3. SVM-Light Support Vector Machine
T2 http//ais.gmd.de/thorsten/svm light/ 4. An
Introduction to Support Vector Machines T1 http/
/www.support-vector.net/ 5. Support Vector
Machine and Kernel ... References T2 http//svm.r
esearch.bell-labs.com/SVMrefs.html 6. Archives of
SUPPORT-VECTOR-MACHINES ... T1 http//www.jiscmai
l.ac.uk/lists/SUPPORT... 7. Lucent Technologies
SVM demo applet T2 http//svm.research.bell-labs
.com/SVT/SVMsvt.html
Invariant For all k, in expectation same number
of team members in top k from each team.

Interpretation (r1 Â r2) ? clicks(T1) gt
clicks(T2)

Radlinski, Kurup, Joachims CIKM 2008
7
Setting

Find the best ranking function (out of 1000)
Evaluate using click logs
Clicks biased (users only click on what they see)
Explore / exploit problem
Technical issues
How to interpret clicks?
What is the utility function?
What results to show users?

8
Interleave A vs B

Left wins Right wins
A vs B 1 0
A vs C 0 0
B vs C 0 0
9
Interleave A vs C

Left wins Right wins
A vs B 1 0
A vs C 0 1
B vs C 0 0
10
Interleave B vs C

Left wins Right wins
A vs B 1 0
A vs C 0 1
B vs C 1 0
11
Interleave A vs B

Left wins Right wins
A vs B 1 1
A vs C 0 1
B vs C 1 0
12
Dueling Bandits Problem

Given K bandits b1, , bK
Each iteration compare (duel) two bandits
E.g., interleaving two retrieval functions

Yue Joachims, ICML 2009 Yue, Broder,
Kleinberg, Joachims, COLT 2009
13
Dueling Bandits Problem

Given K bandits b1, , bK
Each iteration compare (duel) two bandits
E.g., interleaving two retrieval functions
Cost function (regret)
(bt, bt) are the two bandits chosen
b is the overall best one
( users who prefer best bandit over chosen ones)

Yue Joachims, ICML 2009 Yue, Broder,
Kleinberg, Joachims, COLT 2009
14

Example 1
P(f gt f) 0.9
P(f gt f) 0.8
Incurred Regret 0.7
Example 2
P(f gt f) 0.7
P(f gt f) 0.6
Incurred Regret 0.3
Example 3
P(f gt f) 0.51
P(f gt f) 0.55
Incurred Regret 0.06

15
Assumptions

P(bi gt bj) ½ eij (distinguishability)
Strong Stochastic Transitivity
For three bandits bi gt bj gt bk
Monotonicity property
Stochastic Triangle Inequality
For three bandits bi gt bj gt bk
Diminishing returns property
Satisfied by many standard models
E.g., Logistic / Bradley-Terry

16
Explore then Exploit

First explore
Try to gather as much information as possible
Accumulates regret based on which bandits we
decide to compare
Then exploit
We have a (good) guess as to which bandit best
Repeatedly compare that bandit with itself
(i.e., interleave that ranking with itself)

17
Naïve Approach

In deterministic case, O(K) comparisons to find
max
Extend to noisy case
Repeatedly compare until confident one is better
Problem comparing two awful (but similar)
bandits
Example
P(A gt B) 0.85
P(A gt C) 0.85
P(B gt C) 0.51
Comparing B and C requires many comparisons!

18
Interleaved Filter

Choose candidate bandit at random

?
Yue, Broder, Kleinberg, Joachims, COLT 2009
19
Interleaved Filter

Choose candidate bandit at random
Make noisy comparisons (Bernoulli trial)
against all other bandits simultaneously
Maintain mean and confidence interval
for each pair of bandits being compared

?
Yue, Broder, Kleinberg, Joachims, COLT 2009
20
Interleaved Filter

Choose candidate bandit at random
Make noisy comparisons (Bernoulli trial)
against all other bandits simultaneously
Maintain mean and confidence interval
for each pair of bandits being compared
until another bandit is better
With confidence 1 d

?
Yue, Broder, Kleinberg, Joachims, COLT 2009
21
Interleaved Filter

Choose candidate bandit at random
Make noisy comparisons (Bernoulli trial)
against all other bandits simultaneously
Maintain mean and confidence interval
for each pair of bandits being compared
until another bandit is better
With confidence 1 d
Repeat process with new candidate
(Remove all empirically worse bandits)

?
Yue, Broder, Kleinberg, Joachims, COLT 2009
22
Interleaved Filter

Choose candidate bandit at random
Make noisy comparisons (Bernoulli trial)
against all other bandits simultaneously
Maintain mean and confidence interval
for each pair of bandits being compared
until another bandit is better
With confidence 1 d
Repeat process with new candidate
(Remove all empirically worse bandits)
Continue until 1 candidate left

?
Yue, Broder, Kleinberg, Joachims, COLT 2009
23
Intuition

Simulate comparing candidate with
all remaining bandits simultaneously

Yue, Broder, Kleinberg, Joachims, COLT 2009
24
Intuition

Simulate comparing candidate with
all remaining bandits simultaneously
Example
P(A gt B) 0.85
P(A gt C) 0.85
P(B gt C) 0.51
B is candidate
comparisons between B vs C bounded
by comparisons between B vs A!

Yue, Broder, Kleinberg, Joachims, COLT 2009
25
Regret Analysis

Can model sequence of candidate bandits
as a random walk.
Which will be the next candidate bandit?
O(Log K) rounds

1/3 1/3 1/3
Yue, Broder, Kleinberg, Joachims, COLT 2009
26
Regret Analysis

After each round, we remove a constant
fraction of the remaining bandits.
O(K) total matches

4 matches
2 matches
0 matches
Yue, Broder, Kleinberg, Joachims, COLT 2009
27
Regret Analysis

T time horizon
K bandits / retrieval functions
e best vs 2nd best
Average regret RT / T ? 0
Information-theoretically optimal
Also need to prove correctness (see paper)

Yue, Broder, Kleinberg, Joachims, COLT 2009
28
Summary

Provably efficient online algorithm
(In a regret sense)
Also results for continuous (convex) setting

29
Summary

Provably efficient online algorithm
(In a regret sense)
Also results for continuous (convex) setting
Requires comparison oracle
Reflects user preferences
Independence / Unbiased
Strong transitivity
Triangle Inequality

30
Directions to Explore

Relaxing assumptions
E.g., strong transitivity triangle inequality
Integrating context
Dealing with large K
Assume additional structure on for retrieval
functions?
Other cost models
PAC setting (fixed budget, find the best
possible)
Dynamic or changing user interests / environment

31
Improving Comparison Oracle
32
Improving Comparison Oracle

Dueling Bandits Problem
Interactive learning framework
Provably minimizes regret
Assumes idealized comparison oracle
Can we improve the comparison oracle?
Can we improve how we interpret results?

33
Determining Statistical Significance

Each q, interleave A(q) and B(q), log clicks
t-Test
For each q, score clicks on A(q)
E.g., 3/4 0.75
Sample mean score (e.g., 0.6)
Compute confidence (p value)
E.g., want p 0.05 (i.e., 95 confidence)
More data, more confident

34
Limitation

Example query session with 2 clicks
One click at rank 1 (from A)
Later click at rank 4 (from B)
Normally would count this query session as a tie

35
Limitation

Example query session with 2 clicks
One click at rank 1 (from A)
Later click at rank 4 (from B)
Normally would count this query session as a tie
But second click is probably more informative
so B should get more credit for this query

36
Linear Model

Feature vector f(q,c)
Weight of click is wTf(q,c)

Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
37
Example

wTf(q,c) differentiates last clicks and other
clicks

Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
38
Example

wTf(q,c) differentiates last clicks and other
clicks
Interleave A vs B
3 clicks per session
Last click 60 on result from A
Other 2 clicks random

Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
39
Example

wTf(q,c) differentiates last clicks and other
clicks
Interleave A vs B
3 clicks per session
Last click 60 on result from A
Other 2 clicks random
Conventional w (1,1) has significant variance
Only count last click w (1,0) minimizes
variance

Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
40
Scoring Query Sessions

Feature representation for query session

Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
41
Scoring Query Sessions

Feature representation for query session
Weighted score for query
Positive score favors A, negative favors B

Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
42
Supervised Learning

Will optimize for z-Test Inverse z-Test
Approximately equal t-Test for large samples
z-Score mean / standard deviation

(Assumes A gt B)
Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
43
Experiment Setup

Data collection
Pool of retrieval functions
Hash users into partitions
Run interleaving of different pairs in parallel
Collected on arXiv.org
2 pools of retrieval functions
Training Pool (6 pairs) know A gt B
New Pool (12 pairs)

44
Training Pool Cross Validation
45
(No Transcript)
46
Experimental Results

Inverse z-Test works well
Beats baseline on most of new interleaving pairs
Direction of tests all in agreement
In 6/12 pairs, for p0.1, reduces sample size by
10
In 4/12 pairs, achieves p0.05, but not baseline
400 to 650 queries per interleaving experiment
Weights hard to interpret (features correlated)
Largest weight 1 if single click rank gt 1

47
Interactive Learning

Dueling Bandits Problem
System learns on-the-fly.
Maximize total user utility over time
Exploration / exploitation tradeoff

48
Interactive Learning

Dueling Bandits Problem
System learns on-the-fly.
Maximize total user utility over time
Exploration / exploitation tradeoff
Interpreting Implicit Feedback
Supervised learning to learn better
interpretation
How do we close the loop?
Simple yet practical model
Efficient compatible with existing approaches

49
Thank You!
Slides, papers software available at
www.yisongyue.com
50
Extra Slides
51
Regret Analysis

Round all the time steps for a particular
candidate bandit
Halts when better bandit found
with 1- d confidence
Choose d 1/(TK2)
Match all the comparisons between two
bandits in a round
At most K matches in each round
Candidate plays one match against each
remaining bandit

Yue, Broder, Kleinberg, Joachims, COLT 2009
52
Regret Analysis

O(log K) total rounds
O(K) total matches
Each match incurs regret
Depends on d K-2T-1
Finds best bandit w.p. 1-1/T
Expected regret

Yue, Broder, Kleinberg, Joachims, COLT 2009
53
Removing Inferior Bandits

At conclusion of each round
Remove any empirically worse bandits
Intuition
High confidence that winner is better
than incumbent candidate
Empirically worse bandits cannot be much better
than
incumbent candidate
Can show via Hoeffding bound that winner is also
better than empirically worse bandits with high
confidence
Preserves 1-1/T confidence overall that well
find the best bandit