Collaborative Filtering: Some Comments on the State of the Art - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Collaborative Filtering: Some Comments on the State of the Art

Description:

We have a problem synthesizing research in CF. CoFE: free, could increase research ... Recent OKAPI search model with greatly improved Mean Average Precision ... – PowerPoint PPT presentation

Number of Views:28

Avg rating:3.0/5.0

Slides: 43

Provided by: oregonstat9

Category:

more less

Transcript and Presenter's Notes

Title: Collaborative Filtering: Some Comments on the State of the Art

1
Collaborative Filtering Some Comments on the
State of the Art

Jon Herlocker
Assistant Professor
School of Electrical Engineering and Computer
Science
Oregon State University
Corvallis, OR

2
Yahoo! Employee-to-be?
AudreyHerlocker Age 1 8/18/2004
3
Take-Aways

We have a problem synthesizing research in CF
CoFE free, could increase research productivity,
and reduce barriers to standardization
More focus on the user experience needed
There is a great potential for CF in information
retrieval (i.e. not just product recommendation)

4
What is the State of the Art?

10 years of collaborative filtering (CF)
research
CF machine learning?
20 years of machine learning?
Still hasnt transitioned from a science to
engineering
Still no recommender system cookbook

5
What do we know?

Consider the academic literature on CF
Lots of disconnected discoveries
Hard to synthesize
Different data sets
Variance in algorithm implementation
Variance in experimental procedures
Analysis of systems, not features
Private knowledge not shared
High barrier to formal experimentation and
publication
No venue or reward for negative results
Commercial discoveries intellectual property
So the sum of all knowledge?
Doesnt add up

6
Productivity of CF Research Community

How to increase productivity of CF research?
Each effort should have greater effect on total
knowledge
Each effort should cost less
Increase the quantity of practical experience
with CF
Our contribution
CoFE Collaborative Filtering Engine

7
Shared Research Infrastructure

Concept
Free, open-source, infrastructure for rapid
development and analysis of algorithms
Also make it fast and stable enough for mid-scale
production
Facilitates
Lower cost methodical research
Sharing of new algorithms
Repositories
Comparability in analysis methods and algorithm
implementations
More practical usage of CF

8
CoFE

CoFE - Collaborative Filtering Engine
Open source framework for Java
Easy to create new algorithms
Includes testing infrastructure (next month)
Reference implementations of many popular CF
algorithms
Can support high-performance deployment
Production-ready (see Furl.net)

9
CoFE
Data Manager Object In-memory cache with high
performance datastructures
Algorithm Object
Relational DB (MySQL)
Algorithm Interface
Server instance
Analysis Framework
XML Experiment Metadata File and Delimited data
file
Experiment Configuration File (XML)
10
Checkpoint Take-Aways

We have a problem synthesizing research in CF
CoFE free, could increase research productivity
and reduce barriers to standardization
Coming up
More focus on the user experience needed
There is a great potential for CF in information
retrieval (i.e. not just product recommendation)
CoFE URL
http//eecs.oregonstate.edu/iis/CoFE

11
Does the Algorithm Really Matter?

Where do we get the most impact? (benefit/cost)
A. Improving the algorithm?
B. Changing user interface/user interaction?

12
Does the Algorithm Really Matter?

Where do we get the most impact? (benefit/cost)
A. Improving the algorithm?
B. Changing user interface/user interaction?
Answer
Unless you have already optimized your user
interface extensively, the answer is usually B.

13
Scenario from a Related Field

Document retrieval study by Turpin and Hersh
(SIGIR 2001)
Two groups of medical students
Compared human performance of
1970s search model (basic TF/IDF)
Recent OKAPI search model with greatly improved
Mean Average Precision
Identical user interfaces
Task locating medical information
Result no statistical difference!!!!

14
Turpin Hersh Findings

Humans quickly compensate for poor algorithm
performance
Possible conclusion provide user interfaces that
allow users to compensate
Many relevant results werent selected as
relevant
Possible conclusion focus on persuading as well
as recommending

15
Analyzing Algorithms for End-user Effects

Algorithms believed reasonable may actually be
terrible!
McLaughlin Herlocker, SIGIR 2004.
In this case, poor handling of low confidence
recommendations
In situations with small amounts of data
Changes in algorithm -gt big changes in
recommendations
Analyze exact recommendations seen by end-user
Instead of just items with existing ratings

16
Data from SIGIR 2004 Paper
17
Checkpoint Take-Aways

Previously
We have a problem synthesizing research in CF
CoFE free, could increase research productivity
and reduce barriers to standardization
More focus on the user experience needed
Coming up
There is a great potential for CF in information
retrieval (i.e. not just product recommendation)
CoFE URL
http//eecs.oregonstate.edu/iis/CoFE

18
Exploring Library Search Interfaces
With Janet Webster, Oregon State University
Libraries
19
Features of Web-based Library Search

Diverse content
Web pages, catalogs, journal indexes, electronic
journals, maps, various other digital special
collections
Searchable databases are important sources
Library responsibility
Guiding people to appropriate content
Understanding what the users real need is

20
SERF System for Electronic Recommendation
Filtering
21
(No Transcript)
22
(No Transcript)
23
The Human Element

Capture and leverage the experience of every user
Recommendations are based on human evaluation
Explicit votes
Inferred votes (implicit)
Recommend (question, document) pairs
Not just documents
Human can determine if questions have similarity
System gets smarter with each use
Not just each new document

24
Initial Results
25
Three months SERF usage 1194 search transactions
26
Three months SERF usage 1194 search transactions
Only Google Results (706 - 59.13)
Google results recommendations (488 - 40.87)
Average visited documents 2.196
Average visited documents 1.598
Clicked (172 24.4)
No clicks (534 - 75.6)
Clicked (197 40.4)
No click (291 59.6)
First click - recommendation (141 71.6)
First click - Google result (56 28.4)
Average ratings 14.727
Average ratings 20.715
27
Three months SERF usage 1194 search transactions
Only Google Results (706 - 59.13)
Google results recommendations (488 - 40.87)
Average visited documents 2.196
Average visited documents 1.598
Clicked (172 24.4)
No clicks (534 - 75.6)
Clicked (197 40.4)
No click (291 59.6)
First click - recommendation (141 71.6)
First click - Google result (56 28.4)
Average ratings 14.727
Average ratings 20.715
28
Three months SERF usage 1194 search transactions
Only Google Results (706 - 59.13)
Google results recommendations (488 - 40.87)
Average visited documents 2.196
Average visited documents 1.598
Clicked (172 24.4)
No clicks (534 - 75.6)
Clicked (197 40.4)
No click (291 59.6)
First click - recommendation (141 71.6)
First click - Google result (56 28.4)
Average rating 14.727 (49 Voted as Useful)
Average rating 20.715 (69 Voted as Useful)
Vote of yes 30, vote of no 0
29
Conclusion

No large leaps in language understanding expected
Understanding the meaning of language is very
hard
Collaborative filtering (CF) bypasses this
problem
Humans do the analysis
Technology is widely applicable

30
(No Transcript)
31
(No Transcript)
32
Try it!
33
Final Take-Aways

We have a problem synthesizing research in CF
CoFE free, could increase research productivity
and reduce barriers to standardization
More focus on the user experience needed
Great potential for CF in information retrieval
(i.e. not just product recommendation)

34
Links Contacts

Research Group Home Page
http//eecs.oregonstate.edu/iis
CoFE
http//eecs.oregonstate.edu/iis/CoFE
SERF
http//osulibrary.oregonstatate.edu/
Jon Herlocker
herlock_at_cs.orst.edu
1 (541) 737-8894

35
Simple CF
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
36
Ending Thoughts

Recommendation vs. persuasion

37
Stereotypical Integrator of RS Has

Large item catalog
With item attributes (e.g. keywords, metadata
such as author, subject, cross-references, )
Large user base
With user attributes (age, gender, city, country,
)
Evidence of customer preferences
Explicit ratings (powerful, but harder to elicit)
Observations of user activity (purchases, page
views, emails, prints, )

38
The RS Space
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
39
Traditional Personalization
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
40
Classic CF
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
In the end, most models will be hybrid
41
Classic CF
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
42
Advantages of Pure CF