Sparsity, Scalability and Distribution in Recommender Systems - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Sparsity, Scalability and Distribution in Recommender Systems

Description:

Sparsity, Scalability and Distribution in Recommender Systems. Doctoral Thesis Proposal ... How can we design distributed RSs to make them widely available? ... – PowerPoint PPT presentation

Number of Views:332
Avg rating:3.0/5.0
Slides: 43
Provided by: josep256
Category:

less

Transcript and Presenter's Notes

Title: Sparsity, Scalability and Distribution in Recommender Systems


1
Sparsity, Scalability and Distribution in
Recommender Systems
  • Doctoral Thesis Proposal
  • Badrul M. Sarwar
  • Computer Science Engineering Dept.
  • University of Minnesota
  • Advisor Professor John Riedl

2
Talk Outline
  • Introduction to Recommender Systems
  • Research Challenges
  • Previous Work
  • Future Work and Completion Plan
  • Contributions and Conclusions

3
Information Overload
4
Computerized Solution techniques
  • Information Retrieval
  • Immediate information needs
  • Information Filtering
  • Content based filtering
  • Information filtering agents
  • Collaborative Filtering (CF)
  • Recommender systems (RS) - interface
  • Well use the term CF and RS interchangeably

5
Collaborative Filtering
  • Why another filtering technique?
  • Problems with content-based filtering
  • Limitations due to computer processing
  • Lack of aesthetic sense
  • Different techniques for different media
  • CF adds the missing piece into the picture
  • Human judgements

6
Collaborative Filtering Process





7
CF used successfully in e-commerce
8
Talk Outline
  • Introduction to Recommender Systems
  • Research Challenges
  • Previous Work
  • Future Work and Completion Plan
  • Contributions and conclusions

9
Research Challenges
  • RC1 How can we improve RS quality and
    performance by using dimensionality reduction
    techniques?
  • RC2 How can we design better interface for RS?
  • RC3 How can we design distributed RS to make
    them widely available?
  • RC4 How can utilize clustering algorithms to
    improve scalability in RS?

10
RC1 Motivation and Importance
  • RS Performance challenge
  • Meet two important goals
  • Quality
  • Best CF is 77 accurate
  • Scalability
  • Response time
  • Storage space

11
RC1 Motivation and Importance (contd.)
  • Stumbling blocks
  • High-dimensional data
  • Computational complexity
  • Noise and data over-fitting
  • Sparsity
  • Reduced number of predictions
  • Inferior quality

12
RC1 Specific Aims
  • Select a dimensionality reduction technique
  • Apply the technique
  • Evaluate quality
  • Study performance implications

13
Research Challenges
  • RC1 How can we improve RS quality and
    performance by using dimensionality reduction
    techniques?
  • RC2 How can we design better interface for RS?
  • RC3 How can we design distributed RS to make
    them widely available?
  • RC4 How can utilize clustering algorithms to
    improve scalability in RS?

14
RC 2 Motivation and Importance
  • Need for explanation interface
  • End-user point of view
  • Explanation of recommendations
  • Algorithmic explanation
  • Visual explanation
  • Visual explanation
  • Visualization amplifies cognition
  • Benefits
  • Increases usability and confidence

15
RC 2 Specific aims
  • Identify techniques
  • Use of dimension reduction results
  • Implementation
  • Evaluation
  • Usability study
  • Comparison with text-based system

16
Research Challenge 3
  • How can we improve RS quality and performance by
    using dimensionality reduction techniques?
  • How can we design better interface for RS?
  • How can we design distributed RSs to make them
    widely available?
  • How can utilize clustering algorithms to improve
    scalability in RS?

17
RC3 Motivation and Importance
  • Increasing needs for RS services
  • Availability challenge
  • Travelling users
  • Centralized RS problems
  • Problems of scale and robustness
  • Privacy concerns

18
RC3 Specific aims
  • Taxonomy of RS application space
  • Design framework
  • Key design issues
  • Implementation models
  • Evaluation criteria
  • Analysis of different models

19
Research Challenge 4
  • How can we improve RS quality and performance by
    using dimensionality reduction techniques?
  • How can we design better interface for RS?
  • How can we design distributed RS to make them
    widely available?
  • How can we utilize clustering algorithms to
    improve scalability in RSs?

20
RC4 Motivation and Importance
  • Scalability
  • Sparsity
  • Benefits of Clustering
  • Usenet (newsgroup)
  • Recent studies
  • Performance implications

21
RC4 Specific aims
  • Identify clustering algorithms
  • Soft cluster
  • Hard cluster
  • Partition the data set
  • Apply Galaxy algorithm
  • Evaluate results

22
Talk Outline
  • Introduction to Recommender Systems
  • Research Challenges
  • Previous Work
  • Future Work and Completion Plan
  • Contributions and conclusions

23
Research Approach
24
Dimension Reduction Experiments
  • Singular Value Decomposition
  • Matrix factorization
  • Dimension reduction
  • Prediction generation by re-constructing matrix
  • Result highlights
  • Quality of prediction improved
  • We expect to see improved performance

25
Applying dimension reduction in RS
  • We applied LSI/SVD based technique
  • SVD decomposes a matrix into three factors

The reconstructed matrix Rk Uk.Sk.Vk is the
closest rank-k matrix to the original matrix R.
26
SVD as prediction generator
27
Results SVD as prediction generator
28
Visual Interface Initial Prototype
  • Used SVD results
  • Plotted user and items in 2-D feature space
  • Prototype tested in Spotfire
  • Problems
  • Distance is non-Euclidean

29
Design of Visual Interface
  • Use of LSI/SVD for user-item visualization

30
Distributed RS Work done
  • Taxonomy of the application space
  • Based on ltNeighborhood and predictiongt
  • Identification of key design issues
  • Three implementation models proposed
  • Local profile model
  • Central profile model
  • Geographically distributed profile model

31
Talk Outline
  • Introduction to Recommender Systems
  • Research Challenges
  • Previous Work
  • Future Work and Completion Plan
  • Contributions and conclusions

32
Future WorkDimension Reduction
  • Study performance implications
  • SVD based prediction
  • Offline (model building)
  • Online
  • Offline part is time-consuming
  • Incremental SVD
  • Fold-in
  • Online is very promising

33
Future WorkDistributed RS
  • Evaluation
  • Possible approaches
  • Identify suitable evaluation criteria
  • Select applications from taxonomy
  • Analyze using each model (hypothetical)
  • Analyze each implementation in terms of the
    evaluation criteria

34
Future WorkVisual Interface
  • Implement Visual interface
  • Perform usability studies
  • Setup live user experiment
  • Identify usability questionnaires
  • Conduct the usability survey
  • Analyze results
  • Revise/redesign interface

35
Future WorkClustering in RS
  • Identify effective clustering algorithms
  • For soft and hard cluster (K-means and E-M)
  • Partition the dataset
  • Apply galaxy algorithm
  • Test for quality
  • Accuracy and coverage
  • Test for performance
  • Response time

36
Future WorkCompletion Plan
37
Contributions
  • Use of dimension reduction technique (SVD) to be
    a high-quality prediction generator
  • Submitted to ICDE 2000
  • Framework design for distributed RS.
  • Submitted to CIKM99
  • Visual interfaces
  • Clustering to improve scalability

38
That's all folks!
39
Distributed RS Local Profile Model
User
Profile data
40
Distributed RS Central Profile Model
CPS
RS
Remote RS
User
Profile storage
Remote RS
41
Geographically Distributed RS
GDPS 1
RS
User
Profile database
User
Remote RS
GDPS 3
User
GDPS 2
User
Remote RS
42
Problems of high dimensional data
A is highly correlated with B B is highly
correlated with C We cant say that C is also
highly correlated with A.
Write a Comment
User Comments (0)
About PowerShow.com