Top-N%20Recommendation%20Algorithm%20Based%20on%20Item-Graph - PowerPoint PPT Presentation

About This Presentation
Title:

Top-N%20Recommendation%20Algorithm%20Based%20on%20Item-Graph

Description:

... since in many domains (such as music, restaurants) it is hard to extract useful ... No pre-computing is needed, suffer serious scalability problem. ... – PowerPoint PPT presentation

Number of Views:173
Avg rating:3.0/5.0
Slides: 23
Provided by: All5162
Category:

less

Transcript and Presenter's Notes

Title: Top-N%20Recommendation%20Algorithm%20Based%20on%20Item-Graph


1
Top-N Recommendation Algorithm Based on Item-Graph
  • Allen, Zhenjiang LIN
  • CSE, CUHK
  • June 7, 2007

2
Outline
  • 1. Top-N Recommendation Problem
  • 2. Top-N Recommendation Algorithm
  • 3. Item-Graph Model and GCP-based Method
  • Item-Graph Model
  • Generalized Conditional Probability(GCP)-based
    Recommendation Algorithm
  • 4. Preliminary Experimental Results
  • 5. Conclusion and Future Work

3
1. Top-N Recommendation Problem
  • The Top-N Recommendation Problem
  • Given the preference information of users,
    recommend a set of N items to a certain user that
    he might be interested in, based on the items he
    has selected.
  • E-commerce system example Amazon. COM,
    customers vs products.

Item 1 Item 2 Item 3 Item m
User 1 1 0 1 0
User 2 1 1 0 0

User n 0 1 0 1
New User 1 ? 1 ? ?
User-Item matrix
4
Example the Amazon.com
5
1. Top-N Recommendation Problem
  • Challenges in E-commerce Systems
  • Huge amounts of data millions of users and/or
    items
  • Real-time return the results set
  • Limited new users preference information
  • Volatile users preference information.
  • Contributions
  • Propose the Item-Graph model.
  • simple incremental
  • to reflect the relationship among items
  • Develop the Generalized Conditional
    Probability-based top-N recommendation algorithm.
  • item-centric
  • based-on the Item-Graph model

6
2. Top-N Recommendation Algorithm
  • Two main paradigms
  • Content-based recommend items based on the
    content (textual information) of items.
  • Fab system Balabanovic97, Syskill Webert
    system Pazzani97.
  • Collaborative Filtering (CF) recommend items by
    collecting taste information from other users.
  • Collaborative between users (link information).
  • More popular than content-based recommendation,
    since in many domains (such as music,
    restaurants) it is hard to extract useful
    features from items.
  • Tapestry system Goldberg92, Video Recommender
    Hill95, Ringo Shardanand95, GroupLens
    Konstan97, Jester system Goldberg01, Amazon
    Linden03.

7
2. Top-N Recommendation Algorithm
  • CF algorithms classified by strategy of using
    data
  • Memory-based make recommendations based on the
    entire collection of references of the users.
  • No pre-computing is needed, suffer serious
    scalability problem.
  • E.g., Correlation-based Resnick94, Cosine-based
    Breese98.
  • Model-based use the collection of user
    preferences to learn a model, which is then used
    to make recommendations.
  • Building a model off-line, more scalable.
  • E.g., Cluster models Ungar98, Bayesian network
    model Breese98, Association Rule Mining
    approach Lin00.

8
2. Top-N Recommendation Algorithm
  • CF algorithms classified by strategy of using
    objects
  • User-centric look for similar (like-minded)
    users first and then make recommendation.
  • Similarity between users is relatively dynamic.
  • Pre-computing user neighborhood may lead to poor
    predictions.
  • Item-centric look for similar (or related) items
    first and then make recommendation.
  • Similarity between items is relatively static.
  • Enables pre-computing of item-item similarity.
  • Therefore, more scalable.
  • The aim of our work
  • Model-based Item-centric CF top-N recommendation
    algorithm.

9
2. Top-N Recommendation Algorithm
  • Notations
  • Item set I I1, I2, , Im.
  • User set U U1, U2, , Un.
  • User-Item matrix D (Dn,m).
  • Basket of the active user B ? I.
  • Similarity score of x and y sim(x,y).
  • Formal definition of top-N recommendation problem
  • Given a user-item matrix D and a set of items B
    that have been purchased by the active user,
    identify an ordered set of items X such that
    X N, and X nB 0.

10
2. Top-N Recommendation Algorithm
  • Two classical item-item similarity measures
  • Cosine-based (symmetric)
  • sim(Ii, Ij) cos(D,i,
    D,j) (1)
  • Conditional Probability(CP)-based (asymmetric)
  • sim(Ii, Ij) P(Ij Ii) Freq(Ii
    Ij) / Freq(Ii) (2)
  • Freq(X) the number of customers who have
    purchased the item set X.
  • The ranking score for item x
  • RS(x) ? b?B sim(b,x)
    (3)

11
3. Item-Graph Model GCP-based Method
  • Intuitions behind the Item-Graph
  • The similarity between two items is proportional
    to the times of co-purchase of them.
  • The similarity of item-pairs is transmissible.
  • E.g.,
  • Definition of the Item-Graph
  • Given a dataset D (Dn,m), the Item-Graph is
    defined by a weighted undirected graph G(V, E,
    W), where
  • V is the item set I.
  • An edge (x, y)?E if and only if items x and y
    have been co-purchased.
  • The weight of edge (x, y) is defined by the
    number of co-purchase of items x and y.

12
3. Item-Graph Model GCP-based Method
  • Updating the Item-Graph is easy
  • Adding new users preference information T into
    the graph needs O(T2) operations, including
    adding edges and/or increasing weight of edges.
  • E.g.,
  • Potentially direct application of the Item-Graph
  • Clustering the items.
  • Measuring item-item similarity.
  • Measuring importance of items.

13
3. Item-Graph Model GCP-based Method
  • Ideas in Generalized Conditional
    Probability-based method
  • According to the definition of top-N
    recommendation problem, for any x in I-B, we
    just need to compute the basket-based
    conditional probability P(xB) Freq(xB) /
    Freq(B). However,
  • Freq(xB) or Freq(B) may not exist, or
  • Freq(xB) or Freq(B) are too small to make much
    sense.
  • The CP-based method considers the sum of
    1-item-based conditional probabilities P(xy)
    instead, where x?I-B, y?B.
  • However, the multi-item-based conditional
    probabilities may also contribute to the
    recommendation.
  • E.g., suppose the ranking scores of x and y
    computed by the CP-based method are equal, and
    we also know P(xB)gtP(yB). Which one should be
    ranked higher, x or y?

14
3. Item-Graph Model GCP-based Method
  • The Generalized Conditional Probability
    (GCP)-based recommendation algorithm
  • The ranking score of item x is defined by the sum
    of all possible multi-item-based conditional
    probabilities, that is,
  • GCP(xB) ? S ? B P(xS) ? S ? B
    (Freq(xS) / Freq(S)). (4)
  • However, the number of subsets of B is 2B.
  • Use GCPd(xB) instead (set d2 in the following
    experiments)
  • GCPd(xB) ? S ? B, S d P(xS).
    (5)
  • Freq(xS) and Freq(S) can be extracted from the
    Item-Graph approximately.

15
3. Item-Graph Model GCP-based Method
  • Extracting Freq(A) from Item-Graph approximately
  • For an item set A, obtaining the exact Freq(A)
    may not be possible from the Item-Graph.
  • Extracting approximate Freq(A) from the
    Item-Graph instead.
  • Find out the complete sub-graph of A (denoted by
    CSG(A)) in the Item-Graph, running time O(A2).
  • Freq(A) minimal weight of edges in CSG(A).
  • E.g.,
  • for A a,b, Freq(A) 3.
  • for B a,b,c, Freq(B) 1.
  • P(cab) Freq(abc) / Freq(ab) 1 / 3.

16
4. Preliminary Experimental Results
  • Dataset
  • The MovieLens (http//www.grouplens.org/data)
  • A web-based movies recommender system
  • Contains multi-valued ratings that indicate how
    much each user liked a particular movie or not
  • Each user has rated at least 20 movies.
  • We treat the ratings as an indication that the
    users have seen the movies (nonzero) or not
    (zero).

Table 1 The characteristics of the MovieLens
dataset
of Users of Items Density1 Average Basket Size
943 1682 6.31 106.04
1Density the percentage of nonzero entries in
the user-item matrix.
17
4. Preliminary Experimental Results-1
  • Evaluation Design
  • Split the dataset into a training and test set by
  • randomly selecting one rated movie of each user
    to be part of the test set,
  • use the remaining rated movies for training.
  • Cosine(COS)-based, CP-based, GCP-based methods,
    10-runs average.
  • Evaluation Metrics
  • Hit-Rate (HR)
  • HR of hits /
    n (6)
  • Average Reciprocal Hit-Rate (ARHR)
  • ARHR (?i1,h1/pi) /
    n (7)
  • of hits the number of items in the test set
    that were also in the top-N lists.
  • h is the number of hits that occurred at
    positions p1, p2, , ph within the
  • top-N lists (i.e., 1 pi N).

18
4. Preliminary Experimental Results-1
  • Performance of Top-N Recommendation Algorithms
  • HR (left) x-axis top-N items, y-axis
    hit-rate of all users.
  • ARHR (right) x-axis top-N items, y-axis
    average reciprocal hit-rate of all users.
  • (For the GCP-based method, set d 2.)

19
4. Preliminary Experimental Results-2
  • Testing the Parameter d in GCP Method
  • Testing the effect of d ( d 1, 2, 3 ).
  • Evaluation Online Shopping Simulation
  • Randomly selecting part of the user records to be
    the training set
  • Use the remaining user records for training.
  • STEP 0 Constructing the item-graph based on the
    training set
  • STEP 1 for each user in the training set
  • randomly moving one item out of the users basket
    and make recommendation based on the remaining
    items in the basket
  • computing the order of this item in the
    recommendation list
  • updating the item-graph.
  • STEP 2 Computing HR and ARHR metrics.

20
4. Preliminary Experimental Results-2
  • Performance of Top-N Recommendation Algorithms
  • HR (left) x-axis top-N items, y-axis
    hit-rate of all users.
  • ARHR (right) x-axis top-N items, y-axis
    average reciprocal hit-rate of all users.

21
5. Conclusion and Future Work
  • Conclusion
  • Top-N Recommendation Problem and item-centric
    Algorithms
  • Cosine-based, conditional probability-based
  • Item-Graph model
  • Visualizing the relationship among items.
  • Easy to update.
  • Generalized Conditional Probability-based top-N
    recommendation algorithm
  • Item-centric based on the Item-Graph model
  • Future Work
  • Clustering items and measuring item-item
    similarities based on the Item-Graph model
  • Speeding up the GCP method.

22
References
  • Balabanovic97 M. Balabanovic and Y. Shoham.
    Fab Content-based, Collaborative Recommendation.
    Commun. ACM, 40(3)66-72, 1997.
  • Breese98 J. S. Breese, D. Heckerman, David
    and C. Kadie. Empirical Analysis of Predictive
    Algorithms for Collaborative Filtering. In
    Proceedings of the 14th Conference on Uncertainty
    in Artificial Intelligence (UAI-98), pages 43-52,
    San Francisco, 1998.
  • Deshpande04 M. Deshpande and G. Karypis.
    Item-based Top-N Recommendation Algorithms. ACM
    Trans. Inf. Syst., 22(1)143-177, 2004.
  • Lin00 W. Lin. Association Rule Mining for
    Collaborative Recommender Systems. Thesis
    submitted for the Degree of M.S. in Computer
    Science.
  • Linden03 G. Linden, B. Smith and J. York.
    Amazon.com Recommendations Item-to-Item
    Collaborative Filtering. IEEE Internet Computing,
    7(1)76-80, 2003.
  • Resnick94 P. Resnick, N. Iacovou, M. Suchak, P.
    Bergstorm and J. Riedl. GroupLens An Open
    Architecture for Collaborative Filtering of
    Netnews. Proc. Computer Supported Cooperative
    Work Conf., pages 175-186, 1994.
Write a Comment
User Comments (0)
About PowerShow.com