Title: Collaborative Ordinal Regression
1Collaborative Ordinal Regression
- Shipeng Yu
- Joint work with Kai Yu, Volker Tresp
- and Hans-Peter Kriegel
- University of Munich, Germany
- Siemens Corporate Technology
- shipeng.yu_at_gmail.com
2Motivations
Features
Ratings
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. .
- Superman
- The Pianist
- Star Wars
- The Matrix
- The Godfather
- American Beauty
Genre
Actors
Very Dislike
Very Like
Directors
Descriptions
. . .
. . .
. . .
?
Ordinal Regression
3Motivations (Cont.)
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
- Superman
- The Pianist
- Star Wars
- The Matrix
- The Godfather
- American Beauty
Features
Ratings
?
?
. . .
. . .
. . .
. . .
. . .
?
?
?
?
?
?
?
Collaborative Ordinal Regression
4Outline
- Motivations
- Ranking Problem
- Bayesian Framework for Ordinal Regression
- Collaborative Ordinal Regression
- Learning and Inference
- Experiments
- Conclusion and Extensions
5Ranking Problem
- Goal Assign ranks to objects
- Different from classification/regression problem
- Binary classification Has only 2 labels
- Multi-class classification Ignores ordering
property - Regression Only deals with real outputs
Ordinal Regression
Preference Learning
6Ordinal Regression
- Goal Assign ordered labels to objects
- Applications
- User preference prediction
- Web ranking for search engines
7One-task vs Multi-task
- Common in real world problems
- Collaborative filtering preference learning for
multiple users - Web ranking ranking of web pages for different
queries - Question How to learn related ranking tasks
jointly?
Different ranking functions are correlated
Each function only ranked part of data
8Outline
- Motivations
- Ranking Problem
- Bayesian Framework for Ordinal Regression
- Collaborative Ordinal Regression
- Learning and Inference
- Experiments
- Conclusion and Extensions
9Bayesian Ordinal Regression
- Conditional model on ranking outputs
- Ranking likelihood Conditional on the latent
function - Prior Gaussian Process prior for latent function
- Marginal ranking likelihood Integrate out latent
function values - Ordinal regression likelihood
10Bayesian Ordinal Regression (1)
- Need to define ranking likelihood
- Example Model (1)
- GP Regression (GPR)
- Assume a Gaussian form
- Regression on the ranking label directly
11Bayesian Ordinal Regression (2)
- Need to define ranking likelihood
- Example Model (2)
- GP Ordinal Regression (GPOR) (Chu Ghahramani,
2005) - A probit ranking likelihood
- Assign labels based on the surrounding area
12Outline
- Motivations
- Ranking Problem
- Bayesian Framework for Ordinal Regression
- Collaborative Ordinal Regression
- Learning and Inference
- Experiments
- Conclusion and Extensions
13Multi-task Setting?
- Naïve approach 1 Learn a GP model for each task
- No share of information between tasks
- Naïve approach 2 Fit one parametric kernel
jointly - The parametric kernel is too restrictive to fit
all tasks - The collaborative effects
- Common preferences
- Functions share similar regression labels on
some items - Similar variabilities
- Functions tend to have same predictability on
similar items
14Collaborative Ordinal Regression
- Hierarchical GP model for multi-task ordinal
regression - mean function
- model common preferences
- covariance matrix
- model similar variabilities
- Both mean function and
- (non-stationary) covariance
- matrix are learned from data
GP Prior
Ordinal Regression Likelihood
15COR The Model
- Hierarchical Bayes model on functions
- All the latent functions are sampled from the
same GP prior - Allow different parameter settings for different
tasks - We may only observe part of rank labels for each
function
16COR The Key Points
- The GP prior connects all ordinal regression
tasks - Model the first and second sufficient statistics
- The lower level features are incorporated
naturally - More general than pure collaborative filtering
- We dont fix a parametric form for the kernel
- Instead we assign the conjugate prior
- We can make predictions for new input data and
new tasks
17Toy Problem (GPR Model)
New task prediction with base kernel (RBF)
Mean rank labels
Mean function
New task prediction with learned kernel
18Outline
- Motivations
- Ranking Problem
- Bayesian Framework for Ordinal Regression
- Collaborative Ordinal Regression
- Learning and Inference
- Experiments
- Conclusion and Extensions
19Learning
- Variational lower bound
- EM Learning
- E-step Approximate each posterior as a
Gaussian - Estimate the mean vector and covariance matrix
using EP - M-step Fix and maximize w.r.t.
and
20E-step
- The true posterior distribution factorizes
- EP procedures
- Deletion Delete factor from the
approximated Gaussian - Moments matching Match moments by adding true
likelihood - Update Update the factor
- Can be done analytically for the example models
- For GPR model the EP step is exact
21M-step
- Update GP prior
- Does not depend on the form of ranking likelihood
- The conjugate prior corresponds to a smooth term
- Update likelihood parameter
- Do it separately for each task
- Have the same update equation as the single-task
case
22Inference
- Ordinal Regression
- Non-stationary kernel on test data is unknown!
- Solution work in the dual space (Yu et al. 2005)
- Posterior
- By constraint , posterior
- For test data we have
23Outline
- Motivations
- Ranking Problem
- Bayesian Framework for Ordinal Regression
- Collaborative Ordinal Regression
- Learning and Inference
- Experiments
- Conclusion and Extensions
24Experiments
- Predict user ratings in movie data
- MovieLens 591 movies, 943 users
- 19 features from the Genre part of each movie
(binary) - EachMovie 1,075 movies, 72,916 users
- 23,753 features from online database (TF-IDF)
- Experimental Settings
- Pick up 100 users with the most ratings as
tasks - Randomly choose 10, 20, 50 ratings for each user
for training - Base kernel cosine similarity
25Comparison Metrics
- Ordinal Regression Evaluation
- Mean absolute error (MAE)
- Mean 0-1 error (MZOE)
- Use Macro Micro average over multiple tasks
- Ranking Evaluation
- Normalized Discounted Cumulative Gain (NDCG)
- NDCG_at_10 Only count the top 10 ranked items
26Results - MovieLens
- N Number of training items for each user
- MMMF Maximum Margin Matrix Factorization (Srebro
et al 2005) - State-of-the-art collaborative filtering model
27Results - EachMovie
- N Number of training items for each user
- MMMF Maximum Margin Matrix Factorization (Srebro
et al 2005) - State-of-the-art collaborative filtering model
28New Ranking Functions
Test on the rest users for MovieLens
Use different kernels
The more users we use for training, the better
kernel we obtain!
29Observations
- Collaborative models are always better than
individual models - We can learn a good non-stationary kernel from
users - GPR CGPR are fast in training and robust in
testing - Since there is no approximation
- GPOR CGPOR are slow and sometimes overfit
- Due to the numerical M-step
- We can use other ranking likelihood
- Then we may need to do numerical integration in
EP step
30Outline
- Motivations
- Ranking Problem
- Bayesian Framework for Ordinal Regression
- Collaborative Ordinal Regression
- Learning and Inference
- Experiments
- Conclusion and Extensions
31Conclusion
- A Bayesian framework for multi-task ordinal
regression - An efficient EM-EP learning algorithm
- COR is better than individual OR algorithms
- COR is better than pure collaborative filtering
- Experiments show very encouraging results
32Extensions
- The framework is applicable to preference
learning - Collaborative version of GP preference learning
(Chu Ghahramani, 2005) - A probabilistic version of RankNet (Burges et al.
2005) - GP mixture model for multi-task learning
- Assign a Gaussian mixture model to each latent
function - Prediction uses a linear combination of learned
kernels - Connection to Dirichlet Processes
33Thanks!