Title: An Interactive Learning Approach to Optimizing Information Retrieval Systems
1An Interactive Learning Approach to Optimizing
Information Retrieval Systems
- CMU ML Lunch
- September 27th, 2010
- Yisong Yue
- Carnegie Mellon University
2Information Systems
3Interactive Learning Setting
- Find the best ranking function (out of 1000)
- Show results, evaluate using click logs
- Clicks biased (users only click on what they see)
- Explore / exploit problem
4Interactive Learning Setting
- Find the best ranking function (out of 1000)
- Show results, evaluate using click logs
- Clicks biased (users only click on what they see)
- Explore / exploit problem
- Technical issues
- How to interpret clicks?
- What is the utility function?
- What results to show users?
5Interactive Learning Setting
- Find the best ranking function (out of 1000)
- Show results, evaluate using click logs
- Clicks biased (users only click on what they see)
- Explore / exploit problem
- Technical issues
- How to interpret clicks?
- What is the utility function?
- What results to show users?
6Team-Game Interleaving
(uthorsten, qsvm)
f1(u,q) ? r1
f2(u,q) ? r2
1. Kernel Machines http//svm.first.gmd.de/ 2.
SVM-Light Support Vector Machine
http//ais.gmd.de/thorsten/svm
light/ 3. Support Vector Machine and Kernel ...
References http//svm.research.bell-labs.com/SVMr
efs.html 4. Lucent Technologies SVM demo applet
http//svm.research.bell-labs.com/SVT/SVMsvt.htm
l 5. Royal Holloway Support Vector Machine
http//svm.dcs.rhbnc.ac.uk
NEXTPICK
1. Kernel Machines http//svm.first.gmd.de/ 2.
Support Vector Machine http//jbolivar.freeserver
s.com/ 3. An Introduction to Support Vector
Machines http//www.support-vector.net/ 4. Archiv
es of SUPPORT-VECTOR-MACHINES ... http//www.jisc
mail.ac.uk/lists/SUPPORT... 5. SVM-Light Support
Vector Machine http//ais.gmd.de/thorsten/svm
light/
Interleaving(r1,r2)
1. Kernel Machines T2 http//svm.first.gmd.de/
2. Support Vector Machine T1 http//jbolivar.free
servers.com/ 3. SVM-Light Support Vector Machine
T2 http//ais.gmd.de/thorsten/svm light/ 4. An
Introduction to Support Vector Machines T1 http/
/www.support-vector.net/ 5. Support Vector
Machine and Kernel ... References T2 http//svm.r
esearch.bell-labs.com/SVMrefs.html 6. Archives of
SUPPORT-VECTOR-MACHINES ... T1 http//www.jiscmai
l.ac.uk/lists/SUPPORT... 7. Lucent Technologies
SVM demo applet T2 http//svm.research.bell-labs
.com/SVT/SVMsvt.html
Invariant For all k, in expectation same number
of team members in top k from each team.
- Interpretation (r1 Â r2) ? clicks(T1) gt
clicks(T2)
Radlinski, Kurup, Joachims CIKM 2008
7Setting
- Find the best ranking function (out of 1000)
- Evaluate using click logs
- Clicks biased (users only click on what they see)
- Explore / exploit problem
- Technical issues
- How to interpret clicks?
- What is the utility function?
- What results to show users?
8Interleave A vs B
Left wins Right wins
A vs B 1 0
A vs C 0 0
B vs C 0 0
9Interleave A vs C
Left wins Right wins
A vs B 1 0
A vs C 0 1
B vs C 0 0
10Interleave B vs C
Left wins Right wins
A vs B 1 0
A vs C 0 1
B vs C 1 0
11Interleave A vs B
Left wins Right wins
A vs B 1 1
A vs C 0 1
B vs C 1 0
12Dueling Bandits Problem
- Given K bandits b1, , bK
- Each iteration compare (duel) two bandits
- E.g., interleaving two retrieval functions
Yue Joachims, ICML 2009 Yue, Broder,
Kleinberg, Joachims, COLT 2009
13Dueling Bandits Problem
- Given K bandits b1, , bK
- Each iteration compare (duel) two bandits
- E.g., interleaving two retrieval functions
- Cost function (regret)
- (bt, bt) are the two bandits chosen
- b is the overall best one
- ( users who prefer best bandit over chosen ones)
Yue Joachims, ICML 2009 Yue, Broder,
Kleinberg, Joachims, COLT 2009
14- Example 1
- P(f gt f) 0.9
- P(f gt f) 0.8
- Incurred Regret 0.7
- Example 2
- P(f gt f) 0.7
- P(f gt f) 0.6
- Incurred Regret 0.3
- Example 3
- P(f gt f) 0.51
- P(f gt f) 0.55
- Incurred Regret 0.06
15Assumptions
- P(bi gt bj) ½ eij (distinguishability)
- Strong Stochastic Transitivity
- For three bandits bi gt bj gt bk
- Monotonicity property
- Stochastic Triangle Inequality
- For three bandits bi gt bj gt bk
- Diminishing returns property
- Satisfied by many standard models
- E.g., Logistic / Bradley-Terry
16Explore then Exploit
- First explore
- Try to gather as much information as possible
- Accumulates regret based on which bandits we
decide to compare - Then exploit
- We have a (good) guess as to which bandit best
- Repeatedly compare that bandit with itself
- (i.e., interleave that ranking with itself)
17Naïve Approach
- In deterministic case, O(K) comparisons to find
max - Extend to noisy case
- Repeatedly compare until confident one is better
- Problem comparing two awful (but similar)
bandits - Example
- P(A gt B) 0.85
- P(A gt C) 0.85
- P(B gt C) 0.51
- Comparing B and C requires many comparisons!
18Interleaved Filter
- Choose candidate bandit at random
?
Yue, Broder, Kleinberg, Joachims, COLT 2009
19Interleaved Filter
- Choose candidate bandit at random
- Make noisy comparisons (Bernoulli trial)
- against all other bandits simultaneously
- Maintain mean and confidence interval
- for each pair of bandits being compared
?
Yue, Broder, Kleinberg, Joachims, COLT 2009
20Interleaved Filter
- Choose candidate bandit at random
- Make noisy comparisons (Bernoulli trial)
- against all other bandits simultaneously
- Maintain mean and confidence interval
- for each pair of bandits being compared
- until another bandit is better
- With confidence 1 d
?
Yue, Broder, Kleinberg, Joachims, COLT 2009
21Interleaved Filter
- Choose candidate bandit at random
- Make noisy comparisons (Bernoulli trial)
- against all other bandits simultaneously
- Maintain mean and confidence interval
- for each pair of bandits being compared
- until another bandit is better
- With confidence 1 d
- Repeat process with new candidate
- (Remove all empirically worse bandits)
?
Yue, Broder, Kleinberg, Joachims, COLT 2009
22Interleaved Filter
- Choose candidate bandit at random
- Make noisy comparisons (Bernoulli trial)
- against all other bandits simultaneously
- Maintain mean and confidence interval
- for each pair of bandits being compared
- until another bandit is better
- With confidence 1 d
- Repeat process with new candidate
- (Remove all empirically worse bandits)
- Continue until 1 candidate left
?
Yue, Broder, Kleinberg, Joachims, COLT 2009
23Intuition
- Simulate comparing candidate with
- all remaining bandits simultaneously
Yue, Broder, Kleinberg, Joachims, COLT 2009
24Intuition
- Simulate comparing candidate with
- all remaining bandits simultaneously
- Example
- P(A gt B) 0.85
- P(A gt C) 0.85
- P(B gt C) 0.51
- B is candidate
- comparisons between B vs C bounded
- by comparisons between B vs A!
Yue, Broder, Kleinberg, Joachims, COLT 2009
25Regret Analysis
- Can model sequence of candidate bandits
- as a random walk.
- Which will be the next candidate bandit?
-
- O(Log K) rounds
1/3 1/3 1/3
Yue, Broder, Kleinberg, Joachims, COLT 2009
26Regret Analysis
- After each round, we remove a constant
- fraction of the remaining bandits.
- O(K) total matches
4 matches
2 matches
0 matches
Yue, Broder, Kleinberg, Joachims, COLT 2009
27Regret Analysis
- T time horizon
- K bandits / retrieval functions
- e best vs 2nd best
- Average regret RT / T ? 0
- Information-theoretically optimal
- Also need to prove correctness (see paper)
Yue, Broder, Kleinberg, Joachims, COLT 2009
28Summary
- Provably efficient online algorithm
- (In a regret sense)
- Also results for continuous (convex) setting
29Summary
- Provably efficient online algorithm
- (In a regret sense)
- Also results for continuous (convex) setting
- Requires comparison oracle
- Reflects user preferences
- Independence / Unbiased
- Strong transitivity
- Triangle Inequality
30Directions to Explore
- Relaxing assumptions
- E.g., strong transitivity triangle inequality
- Integrating context
- Dealing with large K
- Assume additional structure on for retrieval
functions? - Other cost models
- PAC setting (fixed budget, find the best
possible) - Dynamic or changing user interests / environment
31Improving Comparison Oracle
32Improving Comparison Oracle
- Dueling Bandits Problem
- Interactive learning framework
- Provably minimizes regret
- Assumes idealized comparison oracle
- Can we improve the comparison oracle?
- Can we improve how we interpret results?
33Determining Statistical Significance
- Each q, interleave A(q) and B(q), log clicks
- t-Test
- For each q, score clicks on A(q)
- E.g., 3/4 0.75
- Sample mean score (e.g., 0.6)
- Compute confidence (p value)
- E.g., want p 0.05 (i.e., 95 confidence)
- More data, more confident
34Limitation
- Example query session with 2 clicks
- One click at rank 1 (from A)
- Later click at rank 4 (from B)
- Normally would count this query session as a tie
35Limitation
- Example query session with 2 clicks
- One click at rank 1 (from A)
- Later click at rank 4 (from B)
- Normally would count this query session as a tie
- But second click is probably more informative
- so B should get more credit for this query
36Linear Model
- Feature vector f(q,c)
- Weight of click is wTf(q,c)
Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
37Example
- wTf(q,c) differentiates last clicks and other
clicks
Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
38Example
- wTf(q,c) differentiates last clicks and other
clicks - Interleave A vs B
- 3 clicks per session
- Last click 60 on result from A
- Other 2 clicks random
Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
39Example
- wTf(q,c) differentiates last clicks and other
clicks - Interleave A vs B
- 3 clicks per session
- Last click 60 on result from A
- Other 2 clicks random
- Conventional w (1,1) has significant variance
- Only count last click w (1,0) minimizes
variance
Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
40Scoring Query Sessions
- Feature representation for query session
Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
41Scoring Query Sessions
- Feature representation for query session
- Weighted score for query
- Positive score favors A, negative favors B
Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
42Supervised Learning
- Will optimize for z-Test Inverse z-Test
- Approximately equal t-Test for large samples
- z-Score mean / standard deviation
(Assumes A gt B)
Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
43Experiment Setup
- Data collection
- Pool of retrieval functions
- Hash users into partitions
- Run interleaving of different pairs in parallel
- Collected on arXiv.org
- 2 pools of retrieval functions
- Training Pool (6 pairs) know A gt B
- New Pool (12 pairs)
44Training Pool Cross Validation
45(No Transcript)
46Experimental Results
- Inverse z-Test works well
- Beats baseline on most of new interleaving pairs
- Direction of tests all in agreement
- In 6/12 pairs, for p0.1, reduces sample size by
10 - In 4/12 pairs, achieves p0.05, but not baseline
- 400 to 650 queries per interleaving experiment
- Weights hard to interpret (features correlated)
- Largest weight 1 if single click rank gt 1
47Interactive Learning
- Dueling Bandits Problem
- System learns on-the-fly.
- Maximize total user utility over time
- Exploration / exploitation tradeoff
48Interactive Learning
- Dueling Bandits Problem
- System learns on-the-fly.
- Maximize total user utility over time
- Exploration / exploitation tradeoff
- Interpreting Implicit Feedback
- Supervised learning to learn better
interpretation - How do we close the loop?
- Simple yet practical model
- Efficient compatible with existing approaches
49Thank You!
Slides, papers software available at
www.yisongyue.com
50Extra Slides
51Regret Analysis
- Round all the time steps for a particular
- candidate bandit
- Halts when better bandit found
- with 1- d confidence
- Choose d 1/(TK2)
- Match all the comparisons between two
- bandits in a round
- At most K matches in each round
- Candidate plays one match against each
- remaining bandit
Yue, Broder, Kleinberg, Joachims, COLT 2009
52Regret Analysis
- O(log K) total rounds
- O(K) total matches
- Each match incurs regret
- Depends on d K-2T-1
- Finds best bandit w.p. 1-1/T
- Expected regret
Yue, Broder, Kleinberg, Joachims, COLT 2009
53Removing Inferior Bandits
- At conclusion of each round
- Remove any empirically worse bandits
- Intuition
- High confidence that winner is better
- than incumbent candidate
- Empirically worse bandits cannot be much better
than - incumbent candidate
- Can show via Hoeffding bound that winner is also
better than empirically worse bandits with high
confidence - Preserves 1-1/T confidence overall that well
find the best bandit