Yehuda Koren - PowerPoint PPT Presentation

1 / 29

About This Presentation

Title:

Yehuda Koren

Description:

To Be Watching This Summer. Collaborative filtering. Recommend items based on past transactions of users ... Product perception and popularity are constantly changing ... – PowerPoint PPT presentation

Number of Views:143

Avg rating:3.0/5.0

Slides: 30

Provided by: Yehuda4

Category:

more less

Transcript and Presenter's Notes

Title: Yehuda Koren

1
Collaborative Filtering with Temporal Dynamics

Yehuda Koren

2
Recommender systems
We Know What You OughtTo Be Watching This Summer
3
Collaborative filtering

Recommend items based on past transactions of
users
Specific data characteristics are irrelevant
Domain-free
Can identify elusive aspects
Two popular approaches
Matrix factorization
Neighborhood

4
Movie rating data
Training data
Test data
score date movie user
1 5/7/02 21 1
5 8/2/04 213 1
4 3/6/01 345 2
4 5/1/05 123 2
3 7/15/02 768 2
5 1/22/01 76 3
4 8/3/00 45 4
1 9/10/05 568 5
2 3/5/03 342 5
2 12/28/00 234 5
5 8/11/02 76 6
4 6/15/03 56 6
score date movie user
? 1/6/05 62 1
? 9/13/04 96 1
? 8/18/05 7 2
? 11/22/05 3 2
? 6/13/02 47 3
? 8/12/01 15 3
? 9/1/00 41 4
? 8/27/05 28 4
? 4/4/05 93 5
? 7/16/03 74 5
? 2/14/04 69 6
? 10/3/03 83 6
5
Achievable RMSEs on the Netflix data
Global average 1.1296
Find better items
User average 1.0651
Movie average 1.0533
Personalization
Cinematch 0.9514 baseline
Algorithmics
Static neighborhood 0.9002
Static factorization 0.8911
Time effects
Leader 0.8558 10.05 improvement
Inherent noise ????
6
Something Happened in Early 2004
2004
7
Are movies getting better with time?
8
Multiple sources of temporal dynamics

Item-side effects
Product perception and popularity are constantly
changing
Seasonal patterns influence items popularity
User-side effects
Customers ever redefine their taste
Transient, short-term bias anchoring
Drifting rating scale
Change of rater within household

9
Temporal dynamics - challenges

Multiple sources Both items and users are
changing over time
Multiple targets Each user/item forms a unique
time series ? Scarce data per target
Inter-related targets Signal needs to be shared
among users foundation of collaborative
filtering ? cannot isolate multiple problems
? Common concept drift methodologies wont
hold.E.g., underweighting older instances is
unappealing

10
Basic matrix factorization model
users
4 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1

items
users
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1

items
A rank-3 SVD approximation
11
Estimate unknown ratings as inner-products of
factors
users
4 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
?

items
users
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1

items
A rank-3 SVD approximation
12
Estimate unknown ratings as inner-products of
factors
users
4 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
?

items
users
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1

items
A rank-3 SVD approximation
13
Estimate unknown ratings as inner-products of
factors
users
4 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
2.4

items
users
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1

items
A rank-3 SVD approximation
14
Matrix factorization model
.2 -.4 .1
.5 .6 -.5
.5 .3 -.2
.3 2.1 1.1
-2 2.1 -.7
.3 .7 -1
4 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
-.9 2.4 1.4 .3 -.4 .8 -.5 -2 .5 .3 -.2 1.1
1.3 -.1 1.2 -.7 2.9 1.4 -1 .3 1.4 .5 .7 -.8
.1 -.6 .7 .8 .4 -.3 .9 2.4 1.7 .6 -.4 2.1

Properties
SVD isnt defined when entries are unknown ? use
specialized methods
Can easily overfit, sensitive to regularization
Need to separate main effects

15
Baseline predictors

Mean rating 3.7 stars
The Sixth Sense is 0.5 stars above avg
Joe rates 0.2 stars below avg
?Baseline predictionJoe will rate The Sixth
Sense 4 stars
No user-item interaction

16
Factor model correction

Both The Sixth Sense and Joe are placed high on
the Supernatural Thrillers scale
?Adjusted estimateJoe will rate The Sixth Sense
4.5 stars

17
Matrix factorization with biases
Baseline predictors µ global average bu
bias of u bi bias of i
User-item interaction pu user us factors qi
item is factors
?Minimization problem
regularization
18
Addressing temporal dynamics

Factor model conveniently allows separately
treating different aspects
We observe changes in
Rating scale of individual users
Popularity of individual items
User preferences

Baseline predictors
User factors
19
Parameterizing the model

Use functional forms bu(t)f(u,t), bi(t)g(i,t),
pu(t)h(u,t)
Need to find adequate f(), g(), h()
General guidelines
Items show slower temporal changes
Users exhibit frequent and sudden changes
Factors pu(t) are expensive to model
Gain flexibility by heavily parameterizing the
functions

20
Achievable RMSEs on the Netflix data
Global average 1.1296
Find better items
User average 1.0651
Movie average 1.0533
Personalization
Cinematch 0.9514 baseline
Algorithmics
Static neighborhood 0.9002
Static factorization 0.8911
Time effects
Dynamic factorization 0.8794
Grand Prize 0.8563 10 improvement
Inherent noise ????
21
Neighborhood-based CF

Earliest and most common collaborative filtering
method
Derive unknown ratings from those of similar
items (item-item variant)

22
Neighborhood modeling
Use item-item weights - wij - to relate items
Need to estimate rating of user u for item i
Deviation from baseline estimate for item j
Baseline predictor
Weight from j to i
Set of items rated by u
constants
learned from the data through optimization
23
Optimizing the model
Minimize the squared error function
24
Making the model time-aware

A popular scheme instance weightingdecay the
significance of outdated events within cost
function

time decay
Dont do this!
25
Why instance weighting isnt suitable?

Not enough data per user need to exploit all
signal, including old one
The learnt parameters wij represent time
invariant item-item relations. Can be also
deduced from older actions.
Two items are related when users rated them
similarly within a short timeframe, even if this
happened long ago
How to do it right?

26
Time-aware neighborhood model

Decay item-item relations based on time distance
User-specific decay rate controlled by ßu
All past user behavior is equally considered,
through cost function

27
Temporal neighborhood model delivers same
relative RMSE improvement (0.0117) as temporal
factor model (!)
Global average 1.1296
Find better items
User average 1.0651
Movie average 1.0533
Personalization
Cinematch 0.9514 baseline
Algorithmics
Static neighborhood 0.9002
Static factorization 0.8911
Dynamic neighborhood 0.8885
Time effects
Dynamic factorization 0.8794
Grand Prize 0.8563 10 improvement
Inherent noise ????
28
Lessons

Modeling temporal effects is significant in
improving recommenders accuracy
Allow multiple time drifting patterns across
users and items
Integrate all users within a single model to
allow crucial cross-user collaboration
Model user behavior along full history, do not
over-emphasize recent actions
Separate long term values, while excluding
transient fluctuations from the model
Sudden, single-day effects are significant
Modeling past temporal fluctuations helps in
predicting future behavior, even though we do not
extrapolate future temporal dynamics

29
4 5 5 3 1
3 1 2 4 4 5
5 3 4 3 2 1 4 2
2 4 5 4 2
5 2 2 4 3 4
4 2 3 3 1
3 4 3 2 1
4 5 4
2 4 3 4
3 2
5
2 4
Yehuda Koren Yahoo! Research yehuda_at_yahoo-inc.com

Write a Comment

User Comments (0)