Bayesian Ranking using Expectation Propagation and Factor Graphs - PowerPoint PPT Presentation

About This Presentation
Title:

Bayesian Ranking using Expectation Propagation and Factor Graphs

Description:

EP: Tom Minka. TrueSkillTM: Ralf Herbrich & Thore Graepel _at_ MSR Cambridge (UK) ... Tom Minka's thesis in two lines. Approximate. By. Iterate. Pick a factor ... – PowerPoint PPT presentation

Number of Views:461
Avg rating:3.0/5.0
Slides: 35
Provided by: Dumi4
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Ranking using Expectation Propagation and Factor Graphs


1
Bayesian Ranking using Expectation Propagation
and Factor Graphs
  • Dumitru Erhan
  • LISA/DIRO _at_ Université de Montréal

2
Preface
  • Not my work (at all)
  • EP Tom Minka
  • TrueSkillTM Ralf Herbrich Thore Graepel _at_ MSR
    Cambridge (UK)
  • TrueChess Pierre Dangauthier _at_ INRIA
    Rhône-Alpes (France)
  • Slides, plots, and results taken with permission

3
Outline
  • Problem setting
  • Xbox Live
  • Factor Graphs
  • Exact inference in Factor Graphs
  • Approximate inference using EP
  • Loopy schedules and chess ratings
  • Results

4
The Ranking Problem
  • Vaguely speaking
  • Input ordered subsets of data
  • Output a ranking function
  • For example
  • Chess
  • Online games
  • Movie ratings
  • Internet search

5
Modelling Ranking
  • Ordinal regression
  • Order learning

f (x)
Rank 5
Rank 4
Rank 3
Rank 2
Rank 1
rank (x)
f (x)
f (a)
f (b)
f (c)
6
Xbox Live
7
Modelling the Bayesian Way I
  • Track belief distributions
  • Allow performance variations
  • Model game outcome

8
Modelling the Bayesian Way II
  • This leads to a probit-based likelihood
  • Posterior is not Gaussian!
  • Implications for inference, tracking, etc.
  • What if we could
  • obtain a nice visualization of the model and
  • stay in the Gaussian/exponential family, and
  • perform the approximations efficiently?
  • Factor Graphs Expectation Propagation!

9
Factor Graphs mini intro
  • A bi-partite graph that represents the
    factorization of a mathematical function
  • Nodes Factors Variables
  • Function product of all factors
  • Edges Dependencies of factors on variables

z
x
y
10
Factor Graphs continued
  • Used for modelling joint PDFs
  • Interested in marginals of the type
  • P(hidden observed)
  • Use the sum-product algorithm/belief propagation
    to compute them

11
Sum-Product Algorithm I
y
f3(x,y)
v
w
x
f1(v,w)
f2(w,x)
z
f4(x,z)
  • Observation Sum of products becomes product of
    sums of all messages from neighboring factors to
    variable!

12
Sum-Product Algorithm II
y
f3(x,y)
w
x
f2(w,x)
z
f4(x,z)
  • Observation Factors only need to sum out all
    their local variables!

13
Sum-Product Algorithm III
y
f3(x,y)
x
f2(w,x)
z
f4(x,z)
  • Observation Variables pass on the product of all
    incoming messages!

14
Belief Propagation
  • Concept of a message from node X to node Y X
    tells Y what state Y should be in
  • First propagate observed data
  • Then nodes exchange messages (start with leaves)
  • Messages priors conditional probabilities ?
    updates of beliefs
  • Belief(x) product of incoming messages
  • Basically, unnormalized marginals
  • Pass messages until convergence
  • If graph is tree guaranteed
  • If not

15
Approximate message passing
  • Problem The exact messages from factors to
    variables may not be closed under products
  • TrueSkillTM Gaussian x Step-fun Gaussian
  • Solution Approximate the marginal as well as
    possible in the sense of minimal KL divergence
  • Expectation Propagation Approximate the marginal
    by so-called moment-matching

16
Expectation Propagation
Message
Old marginal
New marginal
Exact


Approx


17
Tom Minkas thesis in two lines
  • Approximate
  • By
  • Iterate
  • Pick a factor
  • Remove its influence
  • Project and refine

18
Formal Problem Setting
  • Problem Setting
  • k teams of n1,,nk many players
  • The outcome is a ranking among the teams
    (including draws)
  • Questions
  • Skill si of each player such that the higher the
    skill the more likely the win
  • Global ranking among all players.
  • High quality of match among k teams.

19
TrueSkillTM Factor Graph
Player 1 wins over Player 2 3 draws with Player
4
s4
s1
s2
s3
Individual Skills
t1
t2
t3
Team Performances
Performances Differences
d1
d2
20
TrueSkillTM Model Details
  • Priors
  • Hidden variables
  • Performance
  • Team performance
  • Likelihood
  • Win
  • Draw
  • Skill evolution

21
More details and assumptions
  • Specifies an order on the real line
  • OK if we agree that 1-d is good enough
  • Draws transitivity not good
  • Assume and
  • A mini-FG is generated each time!
  • EP updates can be done efficiently
  • Moments of a truncated Gaussian
  • Information flows forward only
  • No updates in the light of future data

22
The Alternative ELO
  • Quite similar
  • Performances distributed around fixed skills
  • Win probability
  • Skill updates
  • Linear update
  • Differences
  • No uncertainty tracking
  • Linearized updates
  • No notion of teams, multiple players/teams, etc.
  • Not a generative model
  • TrueSkillTM is a generalization of ELO

23
Experimental setup
  • Types of experiments
  • Team ranking
  • Match quality
  • Win probability
  • Convergence properties
  • Ultimate goals
  • Provide reliable rankings
  • Better game experience

24
Data Halo 2 Multiplayer Beta
  • Publicly available
  • Real one is much larger
  • Number of Games 60022
  • Number of Players 5943
  • Parameters in all experiments
  • Performance variation factor 60
  • Draw Probability 5
  • Dynamics variation factor 2

25
Convergence properties
40
35
30
25
Level
20
15
Player 1 (TrueSkill)
10
Player 2 (TrueSkill)
Player 1 (ELO)
5
Player 2 (ELO)
0
0
100
200
300
400
25
26
Win probability
27
Other results
  • TrueSkillTM better at predicting tight matches
  • The additive team performance assumption does
    not hold in some cases (Capture-the-Flag)
  • There are some feedback loop issues

28
TrueSkillTM conclusions
  • Every Xbox 360 Live game uses TrueSkillTM
  • Service launched in November 2005.
  • Distinguishing properties
  • is a generalization of ELO
  • tracks a belief distribution
  • can deal with multiple teams/players/draws
  • First real-world implementation of EP
  • However
  • Draws are handled somewhat strangely (hack)
  • Information flows only forward in time

29
What if
  • we created a schedule that passes messages back
    in time?
  • Effectively, this means that future information
    is used for updating the current beliefs!
  • However, the FG is not a tree now
  • Loopy message passing schedule
  • Too much data in case of Xbox Live
  • Lets do chess instead!
  • Makes sense the game graph is not very
    connected in time
  • Hard to have a fair comparison between players

30
Chess Factor Graph
S1
S2
Performance noise
P1
P2
D P1 - P2
D gt eps
Morphy gt Paulsen Morphy Paulsen
Morphy gt Paulsen
Games in 1857
31
Chess dataset
  • Characteristics
  • 88 players
  • 15 664 games
  • 300 games per player
  • 60 000 hidden variables
  • 150 000 edges
  • Priors set to match ELO
  • Mean 2704
  • Stddev 100

32
Results
33
Chess results
  • Inflation over time?
  • Whos the best player of all time?
  • Kasparov?
  • Fischer?
  • Morphy?
  • Data set limitations
  • No individual game results, only tournaments
  • Runs up to 1991
  • 88 best players only

34
Final words
  • TrueSkillTM for Xbox Live mature tech
  • TrueChess quite experimental
  • Inference in loopy graphs is hard
  • Other applications
  • Ranking Go moves (ICML 06, Snowbird)
  • Social matchmaking (Future Best NIPS paper ?)
  • Oral presentation _at_ NIPS this year

35
Thats it
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com