Fast Maximum Margin Matrix Factorization for Collaborative Prediction - PowerPoint PPT Presentation

About This Presentation
Title:

Fast Maximum Margin Matrix Factorization for Collaborative Prediction

Description:

Threshold-based ordinal regression. Scaling-up MMMF to large problems ... Ordinal Regression: we want to minimize the absolute difference between labels ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 22
Provided by: nathan106
Category:

less

Transcript and Presenter's Notes

Title: Fast Maximum Margin Matrix Factorization for Collaborative Prediction


1
Fast Maximum Margin Matrix Factorization
forCollaborative Prediction
2
Collaborative Prediction
  • Based on partially observed matrix
  • ) Predict unobserved entries

Will user i like movie j?
movies
users
3
Problems to Address
  • Underlying representation of preferences
  • Norm constrained matrix factorization (MMMF)
  • Discrete, ordered labels
  • Threshold-based ordinal regression
  • Scaling-up MMMF to large problems
  • Factorized objective, gradient descent
  • Ratings may not be missing at random

4
Linear Factor Model
User Preference Weights
Features
Feature Values
5
Ordinal Regression
Feature Vectors
Preference Weights
w1
6
Matrix Factorization
Feature Vectors
Preference Weights
q
Preference Scores
Ratings
7
Matrix Factorization
V
U
X
Y
q
8
Ordinal Regression
Feature Vectors
Preference Weights
w1
9
Max-Margin Ordinal Regression
Shashua Levin, NIPS 2002
10
Absolute Difference
  • Shashua Levins loss bounds the
    misclassification error
  • Ordinal Regression we want to minimize the
    absolute difference between labels

11
All-Thresholds Loss
Chu Keerthi, ICML 2005
Srebro et al., NIPS 2004
12
All-Thresholds Loss
  • Experiments comparing
  • Least squares regression
  • Multi-class classification
  • Shashua Levins Max-Margin OR
  • All-Thresholds OR
  • All-Thresholds Ordinal Regression
  • Lowest misclassification error
  • Lowest absolute difference error

Rennie Srebro, IJCAI Wkshp 2005
13
Learning Weights Features
3
5
2
3
4
2
3
3
5
2
2
3
4
2
3
5
2
2
5
2
1
4
5
2
3
5
2
1
4
3
3
5
2
3
1
4
2
3
5
2
3
1
3
2
4
2
2
3
5
2
2
3
4
q
2
3
2
1
2
2
3
5
2
1
2
3
3
5
2
2
3
4
2
3
5
5
2
1
3
3
5
2
3
1
4
2
3
5
2
3
4
Ratings
14
Low Rank Matrix Factorization
V
U

X rank k
¼
  • Sum-Squared Loss
  • Fully Observed Y
  • Classification Error Loss
  • Partially Observed Y

Use SVD to find Global Optimum
Non-convex No explicit soln.
15
Norm Constrained Factorization
Xtr minU,V
V
(UFro2 VFro2)/2
U
X
UFro2 ?i,j Uij2
Fazel et al., 2001
16
MMMF Objective
Original Objective
All-Thresholds
minX Xtr c loss(X,Y)
X
Srebro et al., NIPS 2004
17
Smooth Hinge
Gradient
Smooth Hinge
Hinge
18
Collaborative Prediction Results
URP Attitude Results Marlin, 2004
19
Local Minima?
Factorized Objective
minU,V (UFro2 VFro2)/2
c loss(UV,Y)
20
Local Minima?
Matrix Difference
X
Y
?
Data 100 x 100 MovieLens, 65 sparse
21
Summary
  • We scaled MMMF to large problems by optimizing
    the Factorized Objective
  • Empirical tests indicate that local minima issues
    are rare or absent
  • Results on large-scale data show substantial
    improvements over state-of-the-art

DAspremont Srebro large-scale SDP
optimization methods. Train on 1.5 million
binary labels in 20 hours.
Write a Comment
User Comments (0)
About PowerShow.com