Collaborative Ordinal Regression

About This Presentation
Title:

Collaborative Ordinal Regression

Description:

Collaborative Ordinal Regression. Shipeng Yu. Joint work with Kai Yu, Volker Tresp ... GP Ordinal Regression (GPOR) (Chu & Ghahramani, 2005) A probit ranking ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 34
Provided by: tshi9

less

Transcript and Presenter's Notes

Title: Collaborative Ordinal Regression


1
Collaborative Ordinal Regression
  • Shipeng Yu
  • Joint work with Kai Yu, Volker Tresp
  • and Hans-Peter Kriegel
  • University of Munich, Germany
  • Siemens Corporate Technology
  • shipeng.yu_at_gmail.com

2
Motivations
Features
Ratings
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. .
  • Superman
  • The Pianist
  • Star Wars
  • The Matrix
  • The Godfather
  • American Beauty


Genre
Actors
Very Dislike
Very Like
Directors
Descriptions



. . .
. . .
. . .
?
Ordinal Regression
3
Motivations (Cont.)
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
  • Superman
  • The Pianist
  • Star Wars
  • The Matrix
  • The Godfather
  • American Beauty

Features
Ratings



?


?



. . .
. . .
. . .
. . .
. . .
?
?
?
?
?

?
?

Collaborative Ordinal Regression
4
Outline
  • Motivations
  • Ranking Problem
  • Bayesian Framework for Ordinal Regression
  • Collaborative Ordinal Regression
  • Learning and Inference
  • Experiments
  • Conclusion and Extensions

5
Ranking Problem
  • Goal Assign ranks to objects
  • Different from classification/regression problem
  • Binary classification Has only 2 labels
  • Multi-class classification Ignores ordering
    property
  • Regression Only deals with real outputs

Ordinal Regression
Preference Learning
6
Ordinal Regression
  • Goal Assign ordered labels to objects
  • Applications
  • User preference prediction
  • Web ranking for search engines

7
One-task vs Multi-task
  • Common in real world problems
  • Collaborative filtering preference learning for
    multiple users
  • Web ranking ranking of web pages for different
    queries
  • Question How to learn related ranking tasks
    jointly?

Different ranking functions are correlated
Each function only ranked part of data
8
Outline
  • Motivations
  • Ranking Problem
  • Bayesian Framework for Ordinal Regression
  • Collaborative Ordinal Regression
  • Learning and Inference
  • Experiments
  • Conclusion and Extensions

9
Bayesian Ordinal Regression
  • Conditional model on ranking outputs
  • Ranking likelihood Conditional on the latent
    function
  • Prior Gaussian Process prior for latent function
  • Marginal ranking likelihood Integrate out latent
    function values
  • Ordinal regression likelihood

10
Bayesian Ordinal Regression (1)
  • Need to define ranking likelihood
  • Example Model (1)
  • GP Regression (GPR)
  • Assume a Gaussian form
  • Regression on the ranking label directly

11
Bayesian Ordinal Regression (2)
  • Need to define ranking likelihood
  • Example Model (2)
  • GP Ordinal Regression (GPOR) (Chu Ghahramani,
    2005)
  • A probit ranking likelihood
  • Assign labels based on the surrounding area

12
Outline
  • Motivations
  • Ranking Problem
  • Bayesian Framework for Ordinal Regression
  • Collaborative Ordinal Regression
  • Learning and Inference
  • Experiments
  • Conclusion and Extensions

13
Multi-task Setting?
  • Naïve approach 1 Learn a GP model for each task
  • No share of information between tasks
  • Naïve approach 2 Fit one parametric kernel
    jointly
  • The parametric kernel is too restrictive to fit
    all tasks
  • The collaborative effects
  • Common preferences
  • Functions share similar regression labels on
    some items
  • Similar variabilities
  • Functions tend to have same predictability on
    similar items

14
Collaborative Ordinal Regression
  • Hierarchical GP model for multi-task ordinal
    regression
  • mean function
  • model common preferences
  • covariance matrix
  • model similar variabilities
  • Both mean function and
  • (non-stationary) covariance
  • matrix are learned from data

GP Prior
Ordinal Regression Likelihood
15
COR The Model
  • Hierarchical Bayes model on functions
  • All the latent functions are sampled from the
    same GP prior
  • Allow different parameter settings for different
    tasks
  • We may only observe part of rank labels for each
    function

16
COR The Key Points
  • The GP prior connects all ordinal regression
    tasks
  • Model the first and second sufficient statistics
  • The lower level features are incorporated
    naturally
  • More general than pure collaborative filtering
  • We dont fix a parametric form for the kernel
  • Instead we assign the conjugate prior
  • We can make predictions for new input data and
    new tasks

17
Toy Problem (GPR Model)
New task prediction with base kernel (RBF)
Mean rank labels
Mean function
New task prediction with learned kernel
18
Outline
  • Motivations
  • Ranking Problem
  • Bayesian Framework for Ordinal Regression
  • Collaborative Ordinal Regression
  • Learning and Inference
  • Experiments
  • Conclusion and Extensions

19
Learning
  • Variational lower bound
  • EM Learning
  • E-step Approximate each posterior as a
    Gaussian
  • Estimate the mean vector and covariance matrix
    using EP
  • M-step Fix and maximize w.r.t.
    and

20
E-step
  • The true posterior distribution factorizes
  • EP procedures
  • Deletion Delete factor from the
    approximated Gaussian
  • Moments matching Match moments by adding true
    likelihood
  • Update Update the factor
  • Can be done analytically for the example models
  • For GPR model the EP step is exact

21
M-step
  • Update GP prior
  • Does not depend on the form of ranking likelihood
  • The conjugate prior corresponds to a smooth term
  • Update likelihood parameter
  • Do it separately for each task
  • Have the same update equation as the single-task
    case

22
Inference
  • Ordinal Regression
  • Non-stationary kernel on test data is unknown!
  • Solution work in the dual space (Yu et al. 2005)
  • Posterior
  • By constraint , posterior
  • For test data we have

23
Outline
  • Motivations
  • Ranking Problem
  • Bayesian Framework for Ordinal Regression
  • Collaborative Ordinal Regression
  • Learning and Inference
  • Experiments
  • Conclusion and Extensions

24
Experiments
  • Predict user ratings in movie data
  • MovieLens 591 movies, 943 users
  • 19 features from the Genre part of each movie
    (binary)
  • EachMovie 1,075 movies, 72,916 users
  • 23,753 features from online database (TF-IDF)
  • Experimental Settings
  • Pick up 100 users with the most ratings as
    tasks
  • Randomly choose 10, 20, 50 ratings for each user
    for training
  • Base kernel cosine similarity

25
Comparison Metrics
  • Ordinal Regression Evaluation
  • Mean absolute error (MAE)
  • Mean 0-1 error (MZOE)
  • Use Macro Micro average over multiple tasks
  • Ranking Evaluation
  • Normalized Discounted Cumulative Gain (NDCG)
  • NDCG_at_10 Only count the top 10 ranked items

26
Results - MovieLens
  • N Number of training items for each user
  • MMMF Maximum Margin Matrix Factorization (Srebro
    et al 2005)
  • State-of-the-art collaborative filtering model

27
Results - EachMovie
  • N Number of training items for each user
  • MMMF Maximum Margin Matrix Factorization (Srebro
    et al 2005)
  • State-of-the-art collaborative filtering model

28
New Ranking Functions
Test on the rest users for MovieLens
Use different kernels
The more users we use for training, the better
kernel we obtain!
29
Observations
  • Collaborative models are always better than
    individual models
  • We can learn a good non-stationary kernel from
    users
  • GPR CGPR are fast in training and robust in
    testing
  • Since there is no approximation
  • GPOR CGPOR are slow and sometimes overfit
  • Due to the numerical M-step
  • We can use other ranking likelihood
  • Then we may need to do numerical integration in
    EP step

30
Outline
  • Motivations
  • Ranking Problem
  • Bayesian Framework for Ordinal Regression
  • Collaborative Ordinal Regression
  • Learning and Inference
  • Experiments
  • Conclusion and Extensions

31
Conclusion
  • A Bayesian framework for multi-task ordinal
    regression
  • An efficient EM-EP learning algorithm
  • COR is better than individual OR algorithms
  • COR is better than pure collaborative filtering
  • Experiments show very encouraging results

32
Extensions
  • The framework is applicable to preference
    learning
  • Collaborative version of GP preference learning
    (Chu Ghahramani, 2005)
  • A probabilistic version of RankNet (Burges et al.
    2005)
  • GP mixture model for multi-task learning
  • Assign a Gaussian mixture model to each latent
    function
  • Prediction uses a linear combination of learned
    kernels
  • Connection to Dirichlet Processes

33
Thanks!
  • Questions?
Write a Comment
User Comments (0)