Title: Yehuda Koren
 1Collaborative Filtering with Temporal Dynamics
  2Recommender systems
We Know What You OughtTo Be Watching This Summer 
 3Collaborative filtering
- Recommend items based on past transactions of 
 users
- Specific data characteristics are irrelevant 
- Domain-free 
- Can identify elusive aspects 
- Two popular approaches 
- Matrix factorization 
- Neighborhood 
4Movie rating data
Training data
Test data 
 5Achievable RMSEs on the Netflix data
Global average 1.1296 
Find better items
User average 1.0651 
Movie average 1.0533
Personalization
Cinematch 0.9514 baseline 
Algorithmics
Static neighborhood 0.9002
Static factorization 0.8911
Time effects
 Leader 0.8558 10.05 improvement 
Inherent noise ???? 
 6Something Happened in Early 2004
2004 
 7Are movies getting better with time? 
 8Multiple sources of temporal dynamics
- Item-side effects 
- Product perception and popularity are constantly 
 changing
- Seasonal patterns influence items popularity 
- User-side effects 
- Customers ever redefine their taste 
- Transient, short-term bias anchoring 
- Drifting rating scale 
- Change of rater within household
9Temporal dynamics - challenges
- Multiple sources Both items and users are 
 changing over time
- Multiple targets Each user/item forms a unique 
 time series ? Scarce data per target
- Inter-related targets Signal needs to be shared 
 among users  foundation of collaborative
 filtering ? cannot isolate multiple problems
- ? Common concept drift methodologies wont 
 hold.E.g., underweighting older instances is
 unappealing
10Basic matrix factorization model
users
items
users
items
A rank-3 SVD approximation 
 11Estimate unknown ratings as inner-products of 
factors
users
 ?
items
users
items
A rank-3 SVD approximation 
 12Estimate unknown ratings as inner-products of 
factors
users
 ?
items
users
items
A rank-3 SVD approximation 
 13Estimate unknown ratings as inner-products of 
factors
users
2.4
items
users
items
A rank-3 SVD approximation 
 14Matrix factorization model
- Properties 
- SVD isnt defined when entries are unknown ? use 
 specialized methods
- Can easily overfit, sensitive to regularization 
- Need to separate main effects
15Baseline predictors
- Mean rating 3.7 stars 
- The Sixth Sense is 0.5 stars above avg 
- Joe rates 0.2 stars below avg 
- ?Baseline predictionJoe will rate The Sixth 
 Sense 4 stars
- No user-item interaction
16Factor model correction
- Both The Sixth Sense and Joe are placed high on 
 the Supernatural Thrillers scale
- ?Adjusted estimateJoe will rate The Sixth Sense 
 4.5 stars
17Matrix factorization with biases
Baseline predictors µ  global average bu  
bias of u bi  bias of i
User-item interaction pu  user us factors qi 
 item is factors
?Minimization problem
regularization 
 18Addressing temporal dynamics
- Factor model conveniently allows separately 
 treating different aspects
- We observe changes in 
- Rating scale of individual users 
- Popularity of individual items 
- User preferences
Baseline predictors
User factors 
 19Parameterizing the model
- Use functional forms bu(t)f(u,t), bi(t)g(i,t), 
 pu(t)h(u,t)
- Need to find adequate f(), g(), h() 
- General guidelines 
- Items show slower temporal changes 
- Users exhibit frequent and sudden changes 
- Factors pu(t) are expensive to model 
- Gain flexibility by heavily parameterizing the 
 functions
20Achievable RMSEs on the Netflix data
Global average 1.1296 
Find better items
User average 1.0651 
Movie average 1.0533
Personalization
Cinematch 0.9514 baseline 
Algorithmics
Static neighborhood 0.9002
Static factorization 0.8911
Time effects
Dynamic factorization 0.8794
 Grand Prize 0.8563 10 improvement 
Inherent noise ???? 
 21Neighborhood-based CF
- Earliest and most common collaborative filtering 
 method
- Derive unknown ratings from those of similar 
 items (item-item variant)
22Neighborhood modeling
Use item-item weights - wij - to relate items
Need to estimate rating of user u for item i
Deviation from baseline estimate for item j
Baseline predictor
Weight from j to i
Set of items rated by u
constants
learned from the data through optimization 
 23Optimizing the model
Minimize the squared error function 
 24Making the model time-aware 
- A popular scheme  instance weightingdecay the 
 significance of outdated events within cost
 function
time decay
Dont do this! 
 25Why instance weighting isnt suitable?
- Not enough data per user  need to exploit all 
 signal, including old one
- The learnt parameters  wij  represent time 
 invariant item-item relations. Can be also
 deduced from older actions.
- Two items are related when users rated them 
 similarly within a short timeframe, even if this
 happened long ago
- How to do it right? 
26Time-aware neighborhood model
- Decay item-item relations based on time distance 
- User-specific decay rate controlled by ßu 
- All past user behavior is equally considered, 
 through cost function
27Temporal neighborhood model delivers same 
relative RMSE improvement (0.0117) as temporal 
factor model (!) 
Global average 1.1296 
Find better items
User average 1.0651 
Movie average 1.0533
Personalization
Cinematch 0.9514 baseline 
Algorithmics
Static neighborhood 0.9002
Static factorization 0.8911
Dynamic neighborhood 0.8885
Time effects
Dynamic factorization 0.8794
 Grand Prize 0.8563 10 improvement 
Inherent noise ????  
 28Lessons
- Modeling temporal effects is significant in 
 improving recommenders accuracy
- Allow multiple time drifting patterns across 
 users and items
- Integrate all users within a single model to 
 allow crucial cross-user collaboration
- Model user behavior along full history, do not 
 over-emphasize recent actions
- Separate long term values, while excluding 
 transient fluctuations from the model
- Sudden, single-day effects are significant 
- Modeling past temporal fluctuations helps in 
 predicting future behavior, even though we do not
 extrapolate future temporal dynamics
29Yehuda Koren Yahoo! Research yehuda_at_yahoo-inc.com