Evaluating Similarity Measures: A Large-Scale Study in the orkut Social Network - PowerPoint PPT Presentation

About This Presentation

Title:

Evaluating Similarity Measures: A Large-Scale Study in the orkut Social Network

Description:

3173347.00 820384.00 23835.00 11/28/2004. 3196866.00 826068.00 23513.00 11/29/2004. 3220747.00 831702.00 23895.00 11/30/2004. 3243891.00 837286.00 23144.00 12/1/2004. – PowerPoint PPT presentation

Number of Views:117

Avg rating:3.0/5.0

Slides: 69

Provided by: Goo7302

Learn more at: https://www.cs.usfca.edu

Category:

more less

Transcript and Presenter's Notes

Title: Evaluating Similarity Measures: A Large-Scale Study in the orkut Social Network

1
Evaluating Similarity Measures A Large-Scale
Study in the orkutSocial Network

Ellen Spertus
spertus_at_google.com

2
Recommender systems

What are they?
Example Amazon

3
Controversial recommenders

What to do when your TiVo thinks youre gay,
Wall Street Journal, Nov. 26, 2002

http//tinyurl.com/2qyepg
4
Controversial recommenders

What to do when your TiVo thinks youre gay,
Wall Street Journal, Nov. 26, 2002

http//tinyurl.com/2qyepg
5
Controversial recommenders

What to do when your TiVo thinks youre gay,
Wall Street Journal, Nov. 26, 2002

http//tinyurl.com/2qyepg
6
Controversial recommenders

Wal-Mart DVD recommendations

http//tinyurl.com/2gp2hm
7
Controversial recommenders

Wal-Mart DVD recommendations

http//tinyurl.com/2gp2hm
8
Controversial recommenders

Wal-Mart DVD recommendations

http//tinyurl.com/2gp2hm
9
Googles mission

To organize the world's information and make it
universally accessible and useful.

10
communities
11
Community recommender

Goal Per-community ranked recommendations
How to determine?

12
Community recommender

Goal Per-community ranked recommendations
How to determine?
Implicit collaborativefiltering
Look for common membership between pairs of
communities

13
Terminology

Consider each community to be a set of members
B base community (e.g., Pizza)
R related community (e.g., Cheese)
Similarity measure
Based on overlap BnR

14
Example Pizza
15
Example Pizza
16
Terminology

Consider each community to be a set of members
B base community (e.g., Wine)
R related community (e.g., Linux)
Similarity measure
Based on overlap BnR
Also depends on B and R
Possibly asymmetric

17
Example of asymmetry
18
Similarity measures

L1 normalization
L2 normalization
Pointwise mutual information
Positive correlations
Positive and negative correlations
Salton tf-idf
Log-odds

19
L1 normalization

Vector notation
Set notation

20
L2 normalization

Vector notation
Set notation

21
Mutual information positive correlation

Formally,
Informally, how well membership in the base
community predicts membership in the related
community

22
Mutual information positive and negative
correlation
23
Salton tf-idf
24
LogOdds0

Formally,
Informally, how much likelier a member of B is to
belong to R than a non-member of B is.

25
LogOdds0

Formally,
Informally, how much likelier a member of B is to
belong to R than a non-member of B is.
This yielded the same rankings as L1.

26
LogOdds
27
Predictions?

Were there significant differences among the
measures?
Top-ranked recommendations
User preference
Which measure was best?
Was there a partial or total ordering of measures?

28
Recommendations for I love wine (2400)
29
Experiment

Precomputed top 12 recommendations for each base
community for each similarity measure
When a user views a community page
Hash the community and user ID to
Select an ordered pair of measures to
Interleave, filtering out duplicates
Track clicks of new users

30
Click interpretation
31
Click interpretation
32
Overall click rate (July 1-18)
Total recommendation pages generated 4,106,050
33
Overall click rate (July 1-18)
34
Overall click rate (July 1-18)
35
Analysis

For each pair of similarity measures Ma and Mb
and each click C, either
Ma recommended C more highly than Mb
Ma and Mb recommended C equally
Mb recommended C more highly than Ma

36
Results

Clicks leading to joins
L2 MI1 MI2 IDF L1 LogOdds
All clicks
L2 L1 MI1 MI2 IDF LogOdds

37
Positional effects

Original experiment
Ordered recommendations by rank
Second experiment
Generated recommendations using L2
Pseudo-randomly ordered recommendations, tracking
clicks by placement
Tracked 1.3 M clicks between September
22-October 21

38
Results single row (n28108)
Namorado Para o Bulldog
39
Results single row (n28,108)
p.12 (not significant)
40
Results two rows (n24,459)
41
Results two rows (n24,459)
p lt .001
42
Results 3 rows (n1,226,659)
43
Results 3 rows (n1,226,659)
p lt .001
44
Users reactions

Hundreds of requests per day to add
recommendations
Angry requests from community creators
General
Specific

45
Amusing recommendations
C
46
Amusing recommendations
C
Whats she trying to say? For every time a woman
has confused you
47
Amusing recommendations
Chocolate
48
Amusing recommendations
Chocolate
PMS
49
Allowing community owners to set recommendations
50
Allowing community owners to set recommendations
51
Manual recommendations