Intelligent Remote Sensing Using Wireless Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Intelligent Remote Sensing Using Wireless Networks

Description:

What is Netflix? subscription-based movie rental ... What is the Netflix Prize? attempt to increase ... tackling Netflix Prize requires lots of ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 24
Provided by: Emi5
Category:

less

Transcript and Presenter's Notes

Title: Intelligent Remote Sensing Using Wireless Networks


1
The Netflix ChallengeParallel Collaborative
Filtering
James Jolly Ben MurrellCS 387Parallel
Programming with MPIDr. Fikret Ercal
2
What is Netflix?
  • subscription-based movie rental
  • online frontend
  • over 100,000 movies to pick from
  • 8M subscribers
  • 2007 net income 67M

3
What is the Netflix Prize?
  • attempt to increase Cinematch accuracy
  • predict how users will rate unseen movies
  • 1M for 10 improvement

4
The contest dataset
  • contains 100,480,577 ratings
  • from 480,189 users
  • for 17,770 movies

5
Why is it hard?
  • user tastes difficult to model in general
  • movies tough to classify
  • large volume of data

6
Sounds like a job for collaborative filtering!
  • infer relationships between users
  • leverage them to make predictions

7
Why is it hard?
User Movie Rating Dijkstra Office
Space 5 Knuth Office Space 5 Turing Office
Space 5 Knuth Dr. Strangelove 4 Turing Dr.
Strangelove 2 Boole Titanic 5 Knuth Titanic
1 Turing Titanic 2
8
What makes users similar?
Office Space
Titanic
Dr. Strangelove
9
What makes users similar?The Pearson Correlation
Coefficient!
Office Space
Titanic
Dr. Strangelove
pc .813
10
Building a similarity matrix
Turing Knuth Boole Chomsky
Turing 1.000 0.813 0.750 0.125
Knuth 0.813 1.000 0.325 0.500
Boole 0.750 0.325 1.000 0.500
Chomsky 0.125 0.500 0.500 1.000
11
Predicting user ratings
Would Chomsky like Grammar Rock?
  • approach
  • use matrix to find users like Chomsky
  • drop ratings from those who havent seen it
  • take weighted average of remaining ratings

12
Predicting user ratings
Turing Knuth Boole Chomsky
Turing 1.000 0.813 0.750 0.125
Knuth 0.813 1.000 0.325 0.500
Boole 0.750 0.325 1.000 0.500
Chomsky 0.125 0.500 0.500 1.000
Suppose Turing, Knuth, and Boole rated it 5, 3,
and 1. Since .125 .5 .5 1.125, we
predict rChomsky ( (.125/1.125)5 (.5/1.125)3
(.5/1.125)1 )/3 rChomsky 1.519
13
So how is the data really organized?
user 1, rating 5user 13, rating 3user 42,
rating 2
movie file 1movie file 2movie file 3
user 13, rating 1user 42, rating 1user
1337, rating 2
user 13, rating 5user 311, rating 4user
666, rating 5
14
Training Data
  • 17,770 text files (one for each movie)
  • gt 2 GB

15
Parallelization
  • Two Step Process
  • Learning Step
  • Prediction Step
  • Concerns
  • Data Distribution
  • Task Distribution


16
Parallelizing the learning step
user 1 user 2 user 3 user 4 user 5 user 6 user 7 user 8
user 1 c1,1 c1,2 c1,3 c1,4 c1,5 c1,6 c1,7 c1,8
user 2 c2,1 c2,2 c2,3 c2,4 c2,5 c2,6 c2,7 c2,8
user 3 c3,1 c3,2 c3,3 c3,4 c3,5 c3,6 c3,7 c3,8
user 4 c4,1 c4,2 c4,3 c4,4 c4,5 c4,6 c4,7 c4,8
user 5 c5,1 c5,2 c5,3 c5,4 c5,5 c5,6 c5,7 c5,8
user 6 c6,1 c6,2 c6,3 c6,4 c6,5 c6,6 c6,7 c6,8
user 7 c7,1 c7,2 c7,3 c7,4 c7,5 c7,6 c7,7 c7,8
user 8 c8,1 c8,2 c8,3 c8,4 c8,5 c8,6 c8,7 c8,8
17
Parallelizing the learning step
user 1 user 2 user 3 user 4 user 5 user 6 user 7 user 8
user 1 c1,1 c1,2 c1,3 c1,4 c1,5 c1,6 c1,7 c1,8
user 2 c2,1 c2,2 c2,3 c2,4 c2,5 c2,6 c2,7 c2,8
user 3 c3,1 c3,2 c3,3 c3,4 c3,5 c3,6 c3,7 c3,8
user 4 c4,1 c4,2 c4,3 c4,4 c4,5 c4,6 c4,7 c4,8
user 5 c5,1 c5,2 c5,3 c5,4 c5,5 c5,6 c5,7 c5,8
user 6 c6,1 c6,2 c6,3 c6,4 c6,5 c6,6 c6,7 c6,8
user 7 c7,1 c7,2 c7,3 c7,4 c7,5 c7,6 c7,7 c7,8
user 8 c8,1 c8,2 c8,3 c8,4 c8,5 c8,6 c8,7 c8,8
P1
P2
P3
P4
18
Parallelizing the learning step
  • store data as usermovie rating
  • each proc has all rating data for n/p users
  • calculate each ci,j
  • calculation requires message passing(only 1/p
    of correlations can be calculated locally within
    a node)

19
Parallelizing the prediction step
  • Data distribution directly affects task
    distribution
  • Method 1 Store all user information on each
    processor and stripe movie information(less
    communication)

P0
predict(user, movie)
rating estimate
P1 P2 P3
All User Information All User Information All User Information
Movie1 Movie2 Movie3
Movie4 Movie5 Movie6
Movie7 Movie8 Movie9
Movie10 Movie11 Movie12
20
Parallelizing the prediction step
  • Data distribution directly affects task
    distribution
  • Method 2 Store all movie information on each
    processor and stripe user information (more
    communication)

P0
predict(user, movie)
gather partialestimates
P1 P2 P3
All Movie Ratings All Movie Ratings All Movie Ratings
User1 User2 User3
User4 User5 User6
User7 User8 User9
User10 User11 User12
21
Parallelizing the prediction step
  • Data distribution directly affects task
    distribution
  • Method 3 hybrid approach(lots of communication
    high number of nodes)

P1 P2 P3
Users 1-3 Users 1-3 Users 1-3
Movie1 Movie2 Movie3
Movie4 Movie5 Movie6
Movie7 Movie8 Movie9
Movie10 Movie11 Movie12
P7 P8 P9
Users 4-6 Users 4-6 Users 4-6
Movie13 Movie14 Movie15
Movie16 Movie17 Movie18
Movie19 Movie20 Movie21
Movie22 Movie23 Movie24
P0
predict(user, movie)
P4 P5 P6
Users 1-3 Users 1-3 Users 1-3
Movie13 Movie14 Movie15
Movie16 Movie17 Movie18
Movie19 Movie20 Movie21
Movie22 Movie23 Movie24

Users 4-6 Users 4-6 Users 4-6
Movie25 Movie26 Movie27
Movie28 Movie29 Movie30
Movie31 Movie32 Movie33
Movie34 Movie35 Movie36
22
Our Present Implementation
  • operates on a trimmed-down dataset
  • stripes movie information and stores
    similarity matrix in each processor
  • this wont scale well!
  • storing all movie information on each node would
    be optimal, but nic.mst.edu cant handle it

23
In summary
  • tackling Netflix Prize requires lots of data
    handling
  • we are working toward an implementation that
  • can operate on the entire training set
  • simple collaborative filtering should get us
    close
  • to the old Cinematch performance
Write a Comment
User Comments (0)
About PowerShow.com