Collaborative Ordinal Regression

About This Presentation

Title:

Collaborative Ordinal Regression

Description:

Collaborative Ordinal Regression. Shipeng Yu. Joint work with Kai Yu, Volker Tresp ... GP Ordinal Regression (GPOR) (Chu & Ghahramani, 2005) A probit ranking ... – PowerPoint PPT presentation

Number of Views:87

Avg rating:3.0/5.0

Slides: 34

Provided by: tshi9

more less

Transcript and Presenter's Notes

Title: Collaborative Ordinal Regression

1
Collaborative Ordinal Regression

Shipeng Yu
Joint work with Kai Yu, Volker Tresp
and Hans-Peter Kriegel
University of Munich, Germany
Siemens Corporate Technology
shipeng.yu_at_gmail.com

2
Motivations
Features
Ratings
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. .

Superman
The Pianist
Star Wars
The Matrix
The Godfather
American Beauty

Genre
Actors
Very Dislike
Very Like
Directors
Descriptions

. . .
. . .
. . .
?
Ordinal Regression
3
Motivations (Cont.)
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .

Superman
The Pianist
Star Wars
The Matrix
The Godfather
American Beauty

Features
Ratings

?

?

. . .
. . .
. . .
. . .
. . .
?
?
?
?
?

?
?

Collaborative Ordinal Regression
4
Outline

Motivations
Ranking Problem
Bayesian Framework for Ordinal Regression
Collaborative Ordinal Regression
Learning and Inference
Experiments
Conclusion and Extensions

5
Ranking Problem

Goal Assign ranks to objects
Different from classification/regression problem
Binary classification Has only 2 labels
Multi-class classification Ignores ordering
property
Regression Only deals with real outputs

Ordinal Regression
Preference Learning
6
Ordinal Regression

Goal Assign ordered labels to objects
Applications
User preference prediction
Web ranking for search engines

7
One-task vs Multi-task

Common in real world problems
Collaborative filtering preference learning for
multiple users
Web ranking ranking of web pages for different
queries
Question How to learn related ranking tasks
jointly?

Different ranking functions are correlated
Each function only ranked part of data
8
Outline

Motivations
Ranking Problem
Bayesian Framework for Ordinal Regression
Collaborative Ordinal Regression
Learning and Inference
Experiments
Conclusion and Extensions

9
Bayesian Ordinal Regression

Conditional model on ranking outputs
Ranking likelihood Conditional on the latent
function
Prior Gaussian Process prior for latent function
Marginal ranking likelihood Integrate out latent
function values
Ordinal regression likelihood

10
Bayesian Ordinal Regression (1)

Need to define ranking likelihood
Example Model (1)
GP Regression (GPR)
Assume a Gaussian form
Regression on the ranking label directly

11
Bayesian Ordinal Regression (2)

Need to define ranking likelihood
Example Model (2)
GP Ordinal Regression (GPOR) (Chu Ghahramani,
2005)
A probit ranking likelihood
Assign labels based on the surrounding area

12
Outline

Motivations
Ranking Problem
Bayesian Framework for Ordinal Regression
Collaborative Ordinal Regression
Learning and Inference
Experiments
Conclusion and Extensions

13
Multi-task Setting?

Naïve approach 1 Learn a GP model for each task
No share of information between tasks
Naïve approach 2 Fit one parametric kernel
jointly
The parametric kernel is too restrictive to fit
all tasks
The collaborative effects
Common preferences
Functions share similar regression labels on
some items
Similar variabilities
Functions tend to have same predictability on
similar items

14
Collaborative Ordinal Regression

Hierarchical GP model for multi-task ordinal
regression
mean function
model common preferences
covariance matrix
model similar variabilities
Both mean function and
(non-stationary) covariance
matrix are learned from data

GP Prior
Ordinal Regression Likelihood
15
COR The Model

Hierarchical Bayes model on functions
All the latent functions are sampled from the
same GP prior
Allow different parameter settings for different
tasks
We may only observe part of rank labels for each
function

16
COR The Key Points

The GP prior connects all ordinal regression
tasks
Model the first and second sufficient statistics
The lower level features are incorporated
naturally
More general than pure collaborative filtering
We dont fix a parametric form for the kernel
Instead we assign the conjugate prior
We can make predictions for new input data and
new tasks

17
Toy Problem (GPR Model)
New task prediction with base kernel (RBF)
Mean rank labels
Mean function
New task prediction with learned kernel
18
Outline

Motivations
Ranking Problem
Bayesian Framework for Ordinal Regression
Collaborative Ordinal Regression
Learning and Inference
Experiments
Conclusion and Extensions

19
Learning

Variational lower bound
EM Learning
E-step Approximate each posterior as a
Gaussian
Estimate the mean vector and covariance matrix
using EP
M-step Fix and maximize w.r.t.
and

20
E-step

The true posterior distribution factorizes
EP procedures
Deletion Delete factor from the
approximated Gaussian
Moments matching Match moments by adding true
likelihood
Update Update the factor
Can be done analytically for the example models
For GPR model the EP step is exact

21
M-step

Update GP prior
Does not depend on the form of ranking likelihood
The conjugate prior corresponds to a smooth term
Update likelihood parameter
Do it separately for each task
Have the same update equation as the single-task
case

22
Inference

Ordinal Regression
Non-stationary kernel on test data is unknown!
Solution work in the dual space (Yu et al. 2005)
Posterior
By constraint , posterior
For test data we have

23
Outline

Motivations
Ranking Problem
Bayesian Framework for Ordinal Regression
Collaborative Ordinal Regression
Learning and Inference
Experiments
Conclusion and Extensions

24
Experiments

Predict user ratings in movie data
MovieLens 591 movies, 943 users
19 features from the Genre part of each movie
(binary)
EachMovie 1,075 movies, 72,916 users
23,753 features from online database (TF-IDF)
Experimental Settings
Pick up 100 users with the most ratings as
tasks
Randomly choose 10, 20, 50 ratings for each user
for training
Base kernel cosine similarity

25
Comparison Metrics

Ordinal Regression Evaluation
Mean absolute error (MAE)
Mean 0-1 error (MZOE)
Use Macro Micro average over multiple tasks
Ranking Evaluation
Normalized Discounted Cumulative Gain (NDCG)
NDCG_at_10 Only count the top 10 ranked items

26
Results - MovieLens

N Number of training items for each user
MMMF Maximum Margin Matrix Factorization (Srebro
et al 2005)
State-of-the-art collaborative filtering model

27
Results - EachMovie

N Number of training items for each user
MMMF Maximum Margin Matrix Factorization (Srebro
et al 2005)
State-of-the-art collaborative filtering model

28
New Ranking Functions
Test on the rest users for MovieLens
Use different kernels
The more users we use for training, the better
kernel we obtain!
29
Observations