Title: It Takes Variety to Make a World
1It Takes Variety toMake a World
- Diversification in Recommendation Systems
- Cong Yu1, Laks Lakshmanan2, Sihem Amer-Yahia1
- 1Yahoo! Research NYC 2University of British
Columbia - March 25th 2009 _at_ EDBT
2Recommendation An Increasingly Important and
Ubiquitous Paradigm on the Web
- The Recommendation Paradigm
- Suggest content (in most cases, items) to users
based on her profile and past activities. - Why Recommendation?
- Search queries can be generic e.g., gt90 of
Yahoo! Travel queries are general descriptions
like family trip. - More so for Social Content Sites ...
3Recommendations on Social Content Sites
- Social Content Sites
- Sites where users make friends and share contents
- E.g., del.icio.us, Flickr, etc.
- Recommendation is an indispensible information
exploration paradigm on social content sites. - The rich activities and user connections provide
lots of opportunities for generating
recommendations.
4Challenges in Recommendation
- While relevance is important, other factors are
critical too - Diversity avoid returning items that are too
similar to each other. - Novelty avoid returning items that users are
likely to know already. - Serendipity aim to return less relevant items
that might give users a pleasant surprise. - Result Diversification
From the pool of relevant items, identify a list
of items that are dissimilar to each other and
maintain a high cumulative relevance, i.e.,
strike a good balance between relevance and
diversity.
5Existing Solutions
- Attribute-Based Diversification
- Follow Three Steps
- Obtain attributes of each relevant items
- Define a pair-wise item-to-item distance function
based on those attributes - Perform Diversification
- Optimizing an overall score as a weighted
combination of relevance and distance - Constraining either relevance or distance,
maximizing the other
6Problems with ExistingAttribute-Based
Diversification
- Lack of attributes for objects
- URLs in del.icio.us and photos in Flickr
- Overhead for retrieving attributes for certain
recommendation strategies - Difficulties in estimating the correct
parameters/thresholds for diversification
algorithms
Our Solutions Explanation-Based
Diversification Dynamic Diversification
Algorithms
7Main Contributions
- Formalized the Notion of Explanation-Based
Diversification - Designed and Implemented Algorithms for
- Scalable Similarity Computation
- Explanation Generation
- Diversification
- Experimentally Evaluated
- The characteristics of diversifications
algorithms - The practicality of explanation-based
diversification - The performance overhead of explanation-based
diversification
8Outline
- Motivation
- Problem Definition
- Algorithms
- Similarity Computation
- Recommendation Generation with Explanation
- Diversification
- Experimental Evaluation
- Conclusion
9Recommendation Strategies Overview
- Item-Based Strategies
- Estimate the rating of an unrated item (i) by the
user (u) based on its similarity to items already
rated and how u rated those items. - Collaborative Filtering Strategies
- Estimate the rating of i by u based on how us
similarity network (either explicit or implicit)
rated i.
10Explanation
- Basic Notion
- The set of objects because of which a particular
item is recommended to the user - Explanation for Item-Based Strategies
- Explanation for Collaborative Filtering Strategies
11Explanation-Based Diversity
- Pair-wise diversity distance between two
recommended items - Standard similarity measures like Jaccard
similarity and cosine similarity - E.g. (Distance based on Jaccard similarity)
- Diversity for the set of recommended items (S)
12Benefits of Explanation-Based Diversification
- Applicable to items without attributes or whose
attributes are difficult to analyze - Common on social content sites
- Explanations are by-products of many
recommendation processes - They can be maintained with little overhead
13Top-K Recommendation with Diversification Given
a user u, find a subset S from the set of
candidate items, such that S k and the
overall relevance of items in S and the diversity
of S are balanced.
14Outline
- Motivation
- Problem Definition
- Algorithms
- Similarity Computation (briefly)
- Recommendation Generation with Explanation
(briefly) - Diversification
- Experimental Evaluation
- Conclusion
15Diversification Balance between Relevance and
Diversity
- Relevance cumulative relevance of all items in
the result set - Diversity average distance of all item pairs in
the result set as described earlier - Ideal Scenario identify a top-k result set that
maximizes both relevance and diversity - Such top-k set is often impossible to find
16Naïve Solution Maximize Relevance
17Naïve Solution Maximize Diversity
18Smarter Solutions
- Eliminate items with scores below a threshold and
choose k items among the remaining with the
maximum diversity - Eliminate item pairs with distance below a
threshold (by removing the item with lower score)
and choose k items with highest scores
Algorithm Swap combines both!
19Algorithm Swap
- Sort candidate items according to their relevance
- Start by adding the K most relevant items to the
result set - Go through the rest of the candidate one by one,
swap an item into the result set if the item - Increases the set diversity above a certain
threshold - Does not drop the relevance by a certain
threshold - A simple top-2 example
c1
Results
Results
Results
c1
c1
c2
c1
c3
c3
c4
c3
c3
relevance c1 0.9 c2 0.6 c3 0.4 c4 0.3
c2
c4
initial set
diversity increase out-weighs the relevance drop
no change
20Challenge The appropriate thresholds are often
difficult to identify to produce the
right top-K recommendations
21Algorithm Iterative Greedy
- Dynamically identify the thresholds
- Establish two diversity bounds, Upper and Lower
Bounds. - At each iteration, scan the candidates
- items passing upper bound go to KeepList
- items not passing lower bound go to DiscardList.
- At the end of each iteration, bounds are adjusted
- Stop when exactly K items are generated
22Algorithm Iterative Greedy, contd
If DivList KeepList lt K
DivList
KeepList
pass
pass
UB
UB B
Candidates
B (UBLB)/2
Next Iteration
Candidates
LB B
LB
no-pass
no-pass
SimList
DiscardList
23Brief Overview of Similarity Computation
- Explicit network is not enough
- E.g., only 10 of users have at least one friend
in del.icio.us - Similarities between users can be generated based
on their activities - Costly with pair-wise comparison
- E.g., 1 million users gt 1 trillion comparisons
- Only a small fraction of those comparisons result
in similarity above a given threshold - Item-Based Similarity Computation
- Organize items based on the number of raters
- Start with items with the largest number of
raters - Compare two users only if they share enough rated
items - Details in the paper
24Brief Description of Recommendation Generation
with Explanation
- Post-Processing Approach
- Generate the recommendation result set
- Generate candidates
- Compute scores for candidates
- Sort the candidates
- For each item in the result set, fetch its
explanations - Integrated Approach
- Maintain the list of similar items or similar
users when the candidates are being generated - Create the explanations for each item while the
scores are being computed - Details in the paper
25Outline
- Motivation
- Problem Definition
- Algorithms
- Similarity Computation (briefly)
- Recommendation Generation with Explanation
(briefly) - Diversification
- Experimental Evaluation
- Conclusion
26Experimental Data
- Real world data sets
- del.icio.us online bookmark sharing site
- Y! Movies Yahoo!s online movie sharing site
27Result Comparison
28Explanations can Serve as a Good Basis for
Diversification
- Leveraging the Yahoo! Movies data set, we compare
diversified results obtained based on explanation
with those obtained based on attributes
29Overhead of Diversification is Small
30Outline
- Motivation
- Problem Definition
- Algorithms
- Similarity Computation (briefly)
- Recommendation Generation with Explanation
(briefly) - Diversification
- Experimental Evaluation
- Conclusion
31Conclusion
- Recommendation is becoming an indispensible
information exploration paradigm - Explanation-Based Diversification is a practical
alternative to attribute-based diversification - Algorithms Swap and Iterative Greedy strike a
good balance between relevance and diversity - Questions?