An Interactive Learning Approach to Optimizing Information Retrieval Systems - PowerPoint PPT Presentation

About This Presentation
Title:

An Interactive Learning Approach to Optimizing Information Retrieval Systems

Description:

Title: Interactively Optimizing Information Systems as a Dueling Bandits Problem Author: yyue Last modified by: yyue Created Date: 8/16/2006 12:00:00 AM – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 54
Provided by: yyu3
Category:

less

Transcript and Presenter's Notes

Title: An Interactive Learning Approach to Optimizing Information Retrieval Systems


1
An Interactive Learning Approach to Optimizing
Information Retrieval Systems
  • CMU ML Lunch
  • September 27th, 2010
  • Yisong Yue
  • Carnegie Mellon University

2
Information Systems
3
Interactive Learning Setting
  • Find the best ranking function (out of 1000)
  • Show results, evaluate using click logs
  • Clicks biased (users only click on what they see)
  • Explore / exploit problem

4
Interactive Learning Setting
  • Find the best ranking function (out of 1000)
  • Show results, evaluate using click logs
  • Clicks biased (users only click on what they see)
  • Explore / exploit problem
  • Technical issues
  • How to interpret clicks?
  • What is the utility function?
  • What results to show users?

5
Interactive Learning Setting
  • Find the best ranking function (out of 1000)
  • Show results, evaluate using click logs
  • Clicks biased (users only click on what they see)
  • Explore / exploit problem
  • Technical issues
  • How to interpret clicks?
  • What is the utility function?
  • What results to show users?

6
Team-Game Interleaving
(uthorsten, qsvm)
f1(u,q) ? r1
f2(u,q) ? r2
1. Kernel Machines http//svm.first.gmd.de/ 2.
SVM-Light Support Vector Machine
http//ais.gmd.de/thorsten/svm
light/ 3. Support Vector Machine and Kernel ...
References http//svm.research.bell-labs.com/SVMr
efs.html 4. Lucent Technologies SVM demo applet
http//svm.research.bell-labs.com/SVT/SVMsvt.htm
l 5. Royal Holloway Support Vector Machine
http//svm.dcs.rhbnc.ac.uk
NEXTPICK
1. Kernel Machines http//svm.first.gmd.de/ 2.
Support Vector Machine http//jbolivar.freeserver
s.com/ 3. An Introduction to Support Vector
Machines http//www.support-vector.net/ 4. Archiv
es of SUPPORT-VECTOR-MACHINES ... http//www.jisc
mail.ac.uk/lists/SUPPORT... 5. SVM-Light Support
Vector Machine http//ais.gmd.de/thorsten/svm
light/
Interleaving(r1,r2)
1. Kernel Machines T2 http//svm.first.gmd.de/
2. Support Vector Machine T1 http//jbolivar.free
servers.com/ 3. SVM-Light Support Vector Machine
T2 http//ais.gmd.de/thorsten/svm light/ 4. An
Introduction to Support Vector Machines T1 http/
/www.support-vector.net/ 5. Support Vector
Machine and Kernel ... References T2 http//svm.r
esearch.bell-labs.com/SVMrefs.html 6. Archives of
SUPPORT-VECTOR-MACHINES ... T1 http//www.jiscmai
l.ac.uk/lists/SUPPORT... 7. Lucent Technologies
SVM demo applet T2 http//svm.research.bell-labs
.com/SVT/SVMsvt.html
Invariant For all k, in expectation same number
of team members in top k from each team.
  • Interpretation (r1 Â r2) ? clicks(T1) gt
    clicks(T2)

Radlinski, Kurup, Joachims CIKM 2008
7
Setting
  • Find the best ranking function (out of 1000)
  • Evaluate using click logs
  • Clicks biased (users only click on what they see)
  • Explore / exploit problem
  • Technical issues
  • How to interpret clicks?
  • What is the utility function?
  • What results to show users?

8
Interleave A vs B

Left wins Right wins
A vs B 1 0
A vs C 0 0
B vs C 0 0
9
Interleave A vs C

Left wins Right wins
A vs B 1 0
A vs C 0 1
B vs C 0 0
10
Interleave B vs C

Left wins Right wins
A vs B 1 0
A vs C 0 1
B vs C 1 0
11
Interleave A vs B

Left wins Right wins
A vs B 1 1
A vs C 0 1
B vs C 1 0
12
Dueling Bandits Problem
  • Given K bandits b1, , bK
  • Each iteration compare (duel) two bandits
  • E.g., interleaving two retrieval functions

Yue Joachims, ICML 2009 Yue, Broder,
Kleinberg, Joachims, COLT 2009
13
Dueling Bandits Problem
  • Given K bandits b1, , bK
  • Each iteration compare (duel) two bandits
  • E.g., interleaving two retrieval functions
  • Cost function (regret)
  • (bt, bt) are the two bandits chosen
  • b is the overall best one
  • ( users who prefer best bandit over chosen ones)

Yue Joachims, ICML 2009 Yue, Broder,
Kleinberg, Joachims, COLT 2009
14
  • Example 1
  • P(f gt f) 0.9
  • P(f gt f) 0.8
  • Incurred Regret 0.7
  • Example 2
  • P(f gt f) 0.7
  • P(f gt f) 0.6
  • Incurred Regret 0.3
  • Example 3
  • P(f gt f) 0.51
  • P(f gt f) 0.55
  • Incurred Regret 0.06

15
Assumptions
  • P(bi gt bj) ½ eij (distinguishability)
  • Strong Stochastic Transitivity
  • For three bandits bi gt bj gt bk
  • Monotonicity property
  • Stochastic Triangle Inequality
  • For three bandits bi gt bj gt bk
  • Diminishing returns property
  • Satisfied by many standard models
  • E.g., Logistic / Bradley-Terry

16
Explore then Exploit
  • First explore
  • Try to gather as much information as possible
  • Accumulates regret based on which bandits we
    decide to compare
  • Then exploit
  • We have a (good) guess as to which bandit best
  • Repeatedly compare that bandit with itself
  • (i.e., interleave that ranking with itself)

17
Naïve Approach
  • In deterministic case, O(K) comparisons to find
    max
  • Extend to noisy case
  • Repeatedly compare until confident one is better
  • Problem comparing two awful (but similar)
    bandits
  • Example
  • P(A gt B) 0.85
  • P(A gt C) 0.85
  • P(B gt C) 0.51
  • Comparing B and C requires many comparisons!

18
Interleaved Filter
  • Choose candidate bandit at random

?
Yue, Broder, Kleinberg, Joachims, COLT 2009
19
Interleaved Filter
  • Choose candidate bandit at random
  • Make noisy comparisons (Bernoulli trial)
  • against all other bandits simultaneously
  • Maintain mean and confidence interval
  • for each pair of bandits being compared

?
Yue, Broder, Kleinberg, Joachims, COLT 2009
20
Interleaved Filter
  • Choose candidate bandit at random
  • Make noisy comparisons (Bernoulli trial)
  • against all other bandits simultaneously
  • Maintain mean and confidence interval
  • for each pair of bandits being compared
  • until another bandit is better
  • With confidence 1 d

?
Yue, Broder, Kleinberg, Joachims, COLT 2009
21
Interleaved Filter
  • Choose candidate bandit at random
  • Make noisy comparisons (Bernoulli trial)
  • against all other bandits simultaneously
  • Maintain mean and confidence interval
  • for each pair of bandits being compared
  • until another bandit is better
  • With confidence 1 d
  • Repeat process with new candidate
  • (Remove all empirically worse bandits)

?
Yue, Broder, Kleinberg, Joachims, COLT 2009
22
Interleaved Filter
  • Choose candidate bandit at random
  • Make noisy comparisons (Bernoulli trial)
  • against all other bandits simultaneously
  • Maintain mean and confidence interval
  • for each pair of bandits being compared
  • until another bandit is better
  • With confidence 1 d
  • Repeat process with new candidate
  • (Remove all empirically worse bandits)
  • Continue until 1 candidate left

?
Yue, Broder, Kleinberg, Joachims, COLT 2009
23
Intuition
  • Simulate comparing candidate with
  • all remaining bandits simultaneously

Yue, Broder, Kleinberg, Joachims, COLT 2009
24
Intuition
  • Simulate comparing candidate with
  • all remaining bandits simultaneously
  • Example
  • P(A gt B) 0.85
  • P(A gt C) 0.85
  • P(B gt C) 0.51
  • B is candidate
  • comparisons between B vs C bounded
  • by comparisons between B vs A!

Yue, Broder, Kleinberg, Joachims, COLT 2009
25
Regret Analysis
  • Can model sequence of candidate bandits
  • as a random walk.
  • Which will be the next candidate bandit?
  • O(Log K) rounds

1/3 1/3 1/3
Yue, Broder, Kleinberg, Joachims, COLT 2009
26
Regret Analysis
  • After each round, we remove a constant
  • fraction of the remaining bandits.
  • O(K) total matches

4 matches
2 matches
0 matches
Yue, Broder, Kleinberg, Joachims, COLT 2009
27
Regret Analysis
  • T time horizon
  • K bandits / retrieval functions
  • e best vs 2nd best
  • Average regret RT / T ? 0
  • Information-theoretically optimal
  • Also need to prove correctness (see paper)

Yue, Broder, Kleinberg, Joachims, COLT 2009
28
Summary
  • Provably efficient online algorithm
  • (In a regret sense)
  • Also results for continuous (convex) setting

29
Summary
  • Provably efficient online algorithm
  • (In a regret sense)
  • Also results for continuous (convex) setting
  • Requires comparison oracle
  • Reflects user preferences
  • Independence / Unbiased
  • Strong transitivity
  • Triangle Inequality

30
Directions to Explore
  • Relaxing assumptions
  • E.g., strong transitivity triangle inequality
  • Integrating context
  • Dealing with large K
  • Assume additional structure on for retrieval
    functions?
  • Other cost models
  • PAC setting (fixed budget, find the best
    possible)
  • Dynamic or changing user interests / environment

31
Improving Comparison Oracle
32
Improving Comparison Oracle
  • Dueling Bandits Problem
  • Interactive learning framework
  • Provably minimizes regret
  • Assumes idealized comparison oracle
  • Can we improve the comparison oracle?
  • Can we improve how we interpret results?

33
Determining Statistical Significance
  • Each q, interleave A(q) and B(q), log clicks
  • t-Test
  • For each q, score clicks on A(q)
  • E.g., 3/4 0.75
  • Sample mean score (e.g., 0.6)
  • Compute confidence (p value)
  • E.g., want p 0.05 (i.e., 95 confidence)
  • More data, more confident

34
Limitation
  • Example query session with 2 clicks
  • One click at rank 1 (from A)
  • Later click at rank 4 (from B)
  • Normally would count this query session as a tie

35
Limitation
  • Example query session with 2 clicks
  • One click at rank 1 (from A)
  • Later click at rank 4 (from B)
  • Normally would count this query session as a tie
  • But second click is probably more informative
  • so B should get more credit for this query

36
Linear Model
  • Feature vector f(q,c)
  • Weight of click is wTf(q,c)

Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
37
Example
  • wTf(q,c) differentiates last clicks and other
    clicks

Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
38
Example
  • wTf(q,c) differentiates last clicks and other
    clicks
  • Interleave A vs B
  • 3 clicks per session
  • Last click 60 on result from A
  • Other 2 clicks random

Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
39
Example
  • wTf(q,c) differentiates last clicks and other
    clicks
  • Interleave A vs B
  • 3 clicks per session
  • Last click 60 on result from A
  • Other 2 clicks random
  • Conventional w (1,1) has significant variance
  • Only count last click w (1,0) minimizes
    variance

Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
40
Scoring Query Sessions
  • Feature representation for query session

Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
41
Scoring Query Sessions
  • Feature representation for query session
  • Weighted score for query
  • Positive score favors A, negative favors B

Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
42
Supervised Learning
  • Will optimize for z-Test Inverse z-Test
  • Approximately equal t-Test for large samples
  • z-Score mean / standard deviation

(Assumes A gt B)
Yue, Gao, Chapelle, Zhang, Joachims, SIGIR 2010
43
Experiment Setup
  • Data collection
  • Pool of retrieval functions
  • Hash users into partitions
  • Run interleaving of different pairs in parallel
  • Collected on arXiv.org
  • 2 pools of retrieval functions
  • Training Pool (6 pairs) know A gt B
  • New Pool (12 pairs)

44
Training Pool Cross Validation
45
(No Transcript)
46
Experimental Results
  • Inverse z-Test works well
  • Beats baseline on most of new interleaving pairs
  • Direction of tests all in agreement
  • In 6/12 pairs, for p0.1, reduces sample size by
    10
  • In 4/12 pairs, achieves p0.05, but not baseline
  • 400 to 650 queries per interleaving experiment
  • Weights hard to interpret (features correlated)
  • Largest weight 1 if single click rank gt 1

47
Interactive Learning
  • Dueling Bandits Problem
  • System learns on-the-fly.
  • Maximize total user utility over time
  • Exploration / exploitation tradeoff

48
Interactive Learning
  • Dueling Bandits Problem
  • System learns on-the-fly.
  • Maximize total user utility over time
  • Exploration / exploitation tradeoff
  • Interpreting Implicit Feedback
  • Supervised learning to learn better
    interpretation
  • How do we close the loop?
  • Simple yet practical model
  • Efficient compatible with existing approaches

49
Thank You!
Slides, papers software available at
www.yisongyue.com
50
Extra Slides
51
Regret Analysis
  • Round all the time steps for a particular
  • candidate bandit
  • Halts when better bandit found
  • with 1- d confidence
  • Choose d 1/(TK2)
  • Match all the comparisons between two
  • bandits in a round
  • At most K matches in each round
  • Candidate plays one match against each
  • remaining bandit

Yue, Broder, Kleinberg, Joachims, COLT 2009
52
Regret Analysis
  • O(log K) total rounds
  • O(K) total matches
  • Each match incurs regret
  • Depends on d K-2T-1
  • Finds best bandit w.p. 1-1/T
  • Expected regret

Yue, Broder, Kleinberg, Joachims, COLT 2009
53
Removing Inferior Bandits
  • At conclusion of each round
  • Remove any empirically worse bandits
  • Intuition
  • High confidence that winner is better
  • than incumbent candidate
  • Empirically worse bandits cannot be much better
    than
  • incumbent candidate
  • Can show via Hoeffding bound that winner is also
    better than empirically worse bandits with high
    confidence
  • Preserves 1-1/T confidence overall that well
    find the best bandit
Write a Comment
User Comments (0)
About PowerShow.com