Title: Fast Maximum Margin Matrix Factorization for Collaborative Prediction
1Fast Maximum Margin Matrix Factorization
forCollaborative Prediction
2Collaborative Prediction
- Based on partially observed matrix
- ) Predict unobserved entries
Will user i like movie j?
movies
users
3Problems to Address
- Underlying representation of preferences
- Norm constrained matrix factorization (MMMF)
- Discrete, ordered labels
- Threshold-based ordinal regression
- Scaling-up MMMF to large problems
- Factorized objective, gradient descent
- Ratings may not be missing at random
4Linear Factor Model
User Preference Weights
Features
Feature Values
5Ordinal Regression
Feature Vectors
Preference Weights
w1
6Matrix Factorization
Feature Vectors
Preference Weights
q
Preference Scores
Ratings
7Matrix Factorization
V
U
X
Y
q
8Ordinal Regression
Feature Vectors
Preference Weights
w1
9Max-Margin Ordinal Regression
Shashua Levin, NIPS 2002
10Absolute Difference
- Shashua Levins loss bounds the
misclassification error - Ordinal Regression we want to minimize the
absolute difference between labels
11All-Thresholds Loss
Chu Keerthi, ICML 2005
Srebro et al., NIPS 2004
12All-Thresholds Loss
- Experiments comparing
- Least squares regression
- Multi-class classification
- Shashua Levins Max-Margin OR
- All-Thresholds OR
- All-Thresholds Ordinal Regression
- Lowest misclassification error
- Lowest absolute difference error
Rennie Srebro, IJCAI Wkshp 2005
13Learning Weights Features
3
5
2
3
4
2
3
3
5
2
2
3
4
2
3
5
2
2
5
2
1
4
5
2
3
5
2
1
4
3
3
5
2
3
1
4
2
3
5
2
3
1
3
2
4
2
2
3
5
2
2
3
4
q
2
3
2
1
2
2
3
5
2
1
2
3
3
5
2
2
3
4
2
3
5
5
2
1
3
3
5
2
3
1
4
2
3
5
2
3
4
Ratings
14Low Rank Matrix Factorization
V
U
X rank k
¼
- Sum-Squared Loss
- Fully Observed Y
- Classification Error Loss
- Partially Observed Y
Use SVD to find Global Optimum
Non-convex No explicit soln.
15Norm Constrained Factorization
Xtr minU,V
V
(UFro2 VFro2)/2
U
X
UFro2 ?i,j Uij2
Fazel et al., 2001
16MMMF Objective
Original Objective
All-Thresholds
minX Xtr c loss(X,Y)
X
Srebro et al., NIPS 2004
17Smooth Hinge
Gradient
Smooth Hinge
Hinge
18Collaborative Prediction Results
URP Attitude Results Marlin, 2004
19Local Minima?
Factorized Objective
minU,V (UFro2 VFro2)/2
c loss(UV,Y)
20Local Minima?
Matrix Difference
X
Y
?
Data 100 x 100 MovieLens, 65 sparse
21Summary
- We scaled MMMF to large problems by optimizing
the Factorized Objective - Empirical tests indicate that local minima issues
are rare or absent - Results on large-scale data show substantial
improvements over state-of-the-art
DAspremont Srebro large-scale SDP
optimization methods. Train on 1.5 million
binary labels in 20 hours.