Title: Regression Based Latent Factor Models
1Regression Based Latent Factor Models
- Deepak Agarwal
- Bee-Chung Chen
- Yahoo! Research
- KDD 2009, Paris
- 6/29/2009
2OUTLINE
- Problem Definition
- Predicting dyadic response exploiting covariate
information - Factorization models Brief Overview
- Incorporating covariate information through
regressions - Cold start and warm-start through a single model
- Closer look at induced correlations
- Fitting algorithms Monte Carlo EM and Iterated
CM - Experiments
- Movie Lens
- Yahoo! Front Page
- Summary
3DYADIC DATA
- i user j movie yijrating (rest of the talk)
COVARIATES Xij(wi,xij,zj)
DYAD (i,j)
RESPONSE yij (Click rates, ratings)
4PROBLEM DEFINITION
- Models to predict ratings for new dyads
- Warm-start (user, movie) present in the training
data - Cold-start At least one of (user, movie) new
- Challenges
- Highly incomplete (user, movie) matrix
- Heavy tailed degree distributions for
users/movies - Large fraction of ratings from small fraction of
users/movies - Handling both warm-start and cold-start
effectively
5Possible approaches
- Large scale regression based on covariates
- Does not provide good estimates for heavy
users/movies - Large number of predictors to estimate
interactions - Collaborative filtering
- Neighborhood based
- Factorization (our approach in this paper)
- Good for warm-start cold-start dealt with
separately - Single model that handles cold-start and
warm-start - Heavy users/movies ? User/movie specific model
- Light users/movies ? fallback on regression
model - Smooth fallback mechanism for good performance
6Factorization Brief Overview
- Latent user factors (ai , ui(ui1,,uir))
- (N M)(r1) parameters
- Key technical issue
- Usual approach
- Latent movie factors (ßj , vj(v j1,.,v jr))
- will overfit for moderate values of r
- Regularization
- Gaussian ZeroMean prior
Interaction
7Existing Zero-Mean Factorization Model
Observation Equation
State Equation
Predict for new dyad
8Regression-based Factorization Model (RLFM)
- Main idea Flexible prior, predict factors
through regressions - Seamlessly handles cold-start and warm-start
- Modified state equation to incorporate covariates
9Advantages of RLFM
- Better regularization of factors
- Covariates shrink towards a better centroid
- Cold-start Fallback regression model
(FeatureOnly)
10Graphical representation of the model
11Advantages of RLFM illustrated on Yahoo! FP data
Only the first user factor plotted in the
comparisons
12Induced correlations among observations
Hierarchical random-effects model Marginal
distribution obtained by integrating out random
effects
13Closer look at induced marginal correlations
14Model Fitting
- Challenging, multi-modal posterior
- Monte-Carlo EM (MCEM)
- E-step Sample factors through Gibbs sampling
- M-step Estimate regressions through
off-the-shelf linear regression routines using
sampled factors as response - We used t-regression, others like LASSO could be
used - Iterated Conditional Mode (ICM)
- Replace E-step by CG conditional modes of
factors - M-step Estimate regressions using the modes as
response - Incorporating uncertainty in factor estimates in
MCEM helps
15Monte Carlo E-step
- Through a vanilla Gibbs sampler (conditionals
closed form) - Other conditionals also Gaussian and closed form
- Conditionals of users (movies) sampled
simultaneously - Small number of samples in early iterations,
large numbers in later iterations
16M-step (Why MCEM is better than ICM)
- Update G, optimize
- Update Auau I
Ignored by ICM, underestimates factor
variability Factors over-shrunk, posterior not
explored well
17Experiment 1 Better regularization
- MovieLens-100K, avg RMSE using pre-specified
splits - ZeroMean, RLFM and FeatureOnly (no cold-start
issues) - Covariates
- Users age, gender, zipcode (1st digit only)
- Movies genres
18Experiment 2 Better handling of Cold-start
- MovieLens-1M EachMovie
- Training-test split based on timestamp
- Same covariates as in Experiment 1.
19Experiment 3 Online updates help
- Covariates provide good initialization for new
user/movie factors but updating factor estimates
frequently (e.g. every hour) helps - Dyn-RLFM
- Estimate posterior mean and covariance at the end
of MCEM by running large number of Gibbs
iterations - For online updates, we do not change the
posterior covariance but only adapt the posterior
means through EWMA - This is done by running small number of Gibbs
iterations
20Experiment 3 Continued
21 New Application Today Module
on www.yahoo.com
- Today Module is the top-center part
- Four tabs Featured, Entertainment, Sports, and
Video - Featured displays content from all categories
- Today Module Routes traffic to other Y! pages,
increases user engagement
Defaults to the Featured Tab
22 Some More Background Featured
Tab in Detail
-
Four articles on F1,F2,F3,F4 -
F1 article as story by
default -
- Footer click ? corresponding article as story
- Click rates (CTR) Story clicks per display
(maximize this) - F1 ? max exposure, large fraction of story clicks
23Experiment 4 Predicting click-rate on articles
- Goal Predict click-rate on articles for a user
on F1 position - Article lifetimes short, dynamic updates
important - User covariates
- Age, Gender, Geo, Browse behavior
- Article covariates
- Content Category, keywords
- 2M ratings, 30K users, 4.5 K articles
24Results on Y! FP data
25Related Work
- Little work in a model based framework in the
past - PDLF, KDD 07 (does not predict factors using
covariates) - Recent work at WWW 09 published in parallel
- Matchbox Bayesian online recommendation
algorithm - Both models same (motivation different),
- Estimation methods different
- Matchbox based on variational Bayes, we
conjecture the performance would be similar to
the ICM method - Some papers at ICML this year are also related
- (not done with my reading yet)
26Summary
- Regularizing factors through covariates effective
- We presented a regression based factor model that
regularizes better and deals with both cold-start
and warm-start in a single framework in a
seamless way - Fitting method scalable Gibbs sampling for users
and movies can be done in parallel. Regressions
in M-step can be done with any off-the-shelf
scalable linear regression routine - Good results on benchmark data and a new Y! FP
data
27Ongoing Work
- Investigating various non-linear regressions in
M-step - Better MCMC sampling schemes for faster
convergence - Addressing model choice issues (through Bayes
factors) - Tensor factorization