Title: Collaborative Filtering: Some Comments on the State of the Art
1Collaborative Filtering Some Comments on the
State of the Art
- Jon Herlocker
- Assistant Professor
- School of Electrical Engineering and Computer
Science - Oregon State University
- Corvallis, OR
2Yahoo! Employee-to-be?
AudreyHerlocker Age 1 8/18/2004
3Take-Aways
- We have a problem synthesizing research in CF
- CoFE free, could increase research productivity,
and reduce barriers to standardization - More focus on the user experience needed
- There is a great potential for CF in information
retrieval (i.e. not just product recommendation)
4What is the State of the Art?
- 10 years of collaborative filtering (CF)
research - CF machine learning?
- 20 years of machine learning?
- Still hasnt transitioned from a science to
engineering - Still no recommender system cookbook
5What do we know?
- Consider the academic literature on CF
- Lots of disconnected discoveries
- Hard to synthesize
- Different data sets
- Variance in algorithm implementation
- Variance in experimental procedures
- Analysis of systems, not features
- Private knowledge not shared
- High barrier to formal experimentation and
publication - No venue or reward for negative results
- Commercial discoveries intellectual property
- So the sum of all knowledge?
- Doesnt add up
6Productivity of CF Research Community
- How to increase productivity of CF research?
- Each effort should have greater effect on total
knowledge - Each effort should cost less
- Increase the quantity of practical experience
with CF - Our contribution
- CoFE Collaborative Filtering Engine
7Shared Research Infrastructure
- Concept
- Free, open-source, infrastructure for rapid
development and analysis of algorithms - Also make it fast and stable enough for mid-scale
production - Facilitates
- Lower cost methodical research
- Sharing of new algorithms
- Repositories
- Comparability in analysis methods and algorithm
implementations - More practical usage of CF
8CoFE
- CoFE - Collaborative Filtering Engine
- Open source framework for Java
- Easy to create new algorithms
- Includes testing infrastructure (next month)
- Reference implementations of many popular CF
algorithms - Can support high-performance deployment
- Production-ready (see Furl.net)
9CoFE
Data Manager Object In-memory cache with high
performance datastructures
Algorithm Object
Relational DB (MySQL)
Algorithm Interface
Server instance
Analysis Framework
XML Experiment Metadata File and Delimited data
file
Experiment Configuration File (XML)
10Checkpoint Take-Aways
- We have a problem synthesizing research in CF
- CoFE free, could increase research productivity
and reduce barriers to standardization - Coming up
- More focus on the user experience needed
- There is a great potential for CF in information
retrieval (i.e. not just product recommendation) - CoFE URL
- http//eecs.oregonstate.edu/iis/CoFE
11Does the Algorithm Really Matter?
- Where do we get the most impact? (benefit/cost)
- A. Improving the algorithm?
- B. Changing user interface/user interaction?
12Does the Algorithm Really Matter?
- Where do we get the most impact? (benefit/cost)
- A. Improving the algorithm?
- B. Changing user interface/user interaction?
- Answer
- Unless you have already optimized your user
interface extensively, the answer is usually B.
13Scenario from a Related Field
- Document retrieval study by Turpin and Hersh
(SIGIR 2001) - Two groups of medical students
- Compared human performance of
- 1970s search model (basic TF/IDF)
- Recent OKAPI search model with greatly improved
Mean Average Precision - Identical user interfaces
- Task locating medical information
- Result no statistical difference!!!!
14Turpin Hersh Findings
- Humans quickly compensate for poor algorithm
performance - Possible conclusion provide user interfaces that
allow users to compensate - Many relevant results werent selected as
relevant - Possible conclusion focus on persuading as well
as recommending
15Analyzing Algorithms for End-user Effects
- Algorithms believed reasonable may actually be
terrible! - McLaughlin Herlocker, SIGIR 2004.
- In this case, poor handling of low confidence
recommendations - In situations with small amounts of data
- Changes in algorithm -gt big changes in
recommendations - Analyze exact recommendations seen by end-user
- Instead of just items with existing ratings
16Data from SIGIR 2004 Paper
17Checkpoint Take-Aways
- Previously
- We have a problem synthesizing research in CF
- CoFE free, could increase research productivity
and reduce barriers to standardization - More focus on the user experience needed
- Coming up
- There is a great potential for CF in information
retrieval (i.e. not just product recommendation) - CoFE URL
- http//eecs.oregonstate.edu/iis/CoFE
18Exploring Library Search Interfaces
With Janet Webster, Oregon State University
Libraries
19Features of Web-based Library Search
- Diverse content
- Web pages, catalogs, journal indexes, electronic
journals, maps, various other digital special
collections - Searchable databases are important sources
- Library responsibility
- Guiding people to appropriate content
- Understanding what the users real need is
20SERF System for Electronic Recommendation
Filtering
21(No Transcript)
22(No Transcript)
23The Human Element
- Capture and leverage the experience of every user
- Recommendations are based on human evaluation
- Explicit votes
- Inferred votes (implicit)
- Recommend (question, document) pairs
- Not just documents
- Human can determine if questions have similarity
- System gets smarter with each use
- Not just each new document
24Initial Results
25Three months SERF usage 1194 search transactions
26Three months SERF usage 1194 search transactions
Only Google Results (706 - 59.13)
Google results recommendations (488 - 40.87)
Average visited documents 2.196
Average visited documents 1.598
Clicked (172 24.4)
No clicks (534 - 75.6)
Clicked (197 40.4)
No click (291 59.6)
First click - recommendation (141 71.6)
First click - Google result (56 28.4)
Average ratings 14.727
Average ratings 20.715
27Three months SERF usage 1194 search transactions
Only Google Results (706 - 59.13)
Google results recommendations (488 - 40.87)
Average visited documents 2.196
Average visited documents 1.598
Clicked (172 24.4)
No clicks (534 - 75.6)
Clicked (197 40.4)
No click (291 59.6)
First click - recommendation (141 71.6)
First click - Google result (56 28.4)
Average ratings 14.727
Average ratings 20.715
28Three months SERF usage 1194 search transactions
Only Google Results (706 - 59.13)
Google results recommendations (488 - 40.87)
Average visited documents 2.196
Average visited documents 1.598
Clicked (172 24.4)
No clicks (534 - 75.6)
Clicked (197 40.4)
No click (291 59.6)
First click - recommendation (141 71.6)
First click - Google result (56 28.4)
Average rating 14.727 (49 Voted as Useful)
Average rating 20.715 (69 Voted as Useful)
Vote of yes 30, vote of no 0
29Conclusion
- No large leaps in language understanding expected
- Understanding the meaning of language is very
hard - Collaborative filtering (CF) bypasses this
problem - Humans do the analysis
- Technology is widely applicable
30(No Transcript)
31(No Transcript)
32Try it!
33Final Take-Aways
- We have a problem synthesizing research in CF
- CoFE free, could increase research productivity
and reduce barriers to standardization - More focus on the user experience needed
- Great potential for CF in information retrieval
(i.e. not just product recommendation)
34Links Contacts
- Research Group Home Page
- http//eecs.oregonstate.edu/iis
- CoFE
- http//eecs.oregonstate.edu/iis/CoFE
- SERF
- http//osulibrary.oregonstatate.edu/
- Jon Herlocker
- herlock_at_cs.orst.edu
- 1 (541) 737-8894
35Simple CF
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
36Ending Thoughts
- Recommendation vs. persuasion
37Stereotypical Integrator of RS Has
- Large item catalog
- With item attributes (e.g. keywords, metadata
such as author, subject, cross-references, ) - Large user base
- With user attributes (age, gender, city, country,
) - Evidence of customer preferences
- Explicit ratings (powerful, but harder to elicit)
- Observations of user activity (purchases, page
views, emails, prints, )
38The RS Space
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
39Traditional Personalization
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
40Classic CF
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
In the end, most models will be hybrid
41Classic CF
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
42Advantages of Pure CF
- No expensive and error-prone user attributes or
item attributes - Incorporates quality and taste
- Works on any rate-able item
- One data model gt many content domains
- Serendipity
- Users understand and connect with it!