Collaborative Filtering: Some Comments on the State of the Art - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Collaborative Filtering: Some Comments on the State of the Art

Description:

We have a problem synthesizing research in CF. CoFE: free, could increase research ... Recent OKAPI search model with greatly improved Mean Average Precision ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 43
Provided by: oregonstat9
Category:

less

Transcript and Presenter's Notes

Title: Collaborative Filtering: Some Comments on the State of the Art


1
Collaborative Filtering Some Comments on the
State of the Art
  • Jon Herlocker
  • Assistant Professor
  • School of Electrical Engineering and Computer
    Science
  • Oregon State University
  • Corvallis, OR

2
Yahoo! Employee-to-be?
AudreyHerlocker Age 1 8/18/2004
3
Take-Aways
  • We have a problem synthesizing research in CF
  • CoFE free, could increase research productivity,
    and reduce barriers to standardization
  • More focus on the user experience needed
  • There is a great potential for CF in information
    retrieval (i.e. not just product recommendation)

4
What is the State of the Art?
  • 10 years of collaborative filtering (CF)
    research
  • CF machine learning?
  • 20 years of machine learning?
  • Still hasnt transitioned from a science to
    engineering
  • Still no recommender system cookbook

5
What do we know?
  • Consider the academic literature on CF
  • Lots of disconnected discoveries
  • Hard to synthesize
  • Different data sets
  • Variance in algorithm implementation
  • Variance in experimental procedures
  • Analysis of systems, not features
  • Private knowledge not shared
  • High barrier to formal experimentation and
    publication
  • No venue or reward for negative results
  • Commercial discoveries intellectual property
  • So the sum of all knowledge?
  • Doesnt add up

6
Productivity of CF Research Community
  • How to increase productivity of CF research?
  • Each effort should have greater effect on total
    knowledge
  • Each effort should cost less
  • Increase the quantity of practical experience
    with CF
  • Our contribution
  • CoFE Collaborative Filtering Engine

7
Shared Research Infrastructure
  • Concept
  • Free, open-source, infrastructure for rapid
    development and analysis of algorithms
  • Also make it fast and stable enough for mid-scale
    production
  • Facilitates
  • Lower cost methodical research
  • Sharing of new algorithms
  • Repositories
  • Comparability in analysis methods and algorithm
    implementations
  • More practical usage of CF

8
CoFE
  • CoFE - Collaborative Filtering Engine
  • Open source framework for Java
  • Easy to create new algorithms
  • Includes testing infrastructure (next month)
  • Reference implementations of many popular CF
    algorithms
  • Can support high-performance deployment
  • Production-ready (see Furl.net)

9
CoFE
Data Manager Object In-memory cache with high
performance datastructures
Algorithm Object
Relational DB (MySQL)
Algorithm Interface
Server instance
Analysis Framework
XML Experiment Metadata File and Delimited data
file
Experiment Configuration File (XML)
10
Checkpoint Take-Aways
  • We have a problem synthesizing research in CF
  • CoFE free, could increase research productivity
    and reduce barriers to standardization
  • Coming up
  • More focus on the user experience needed
  • There is a great potential for CF in information
    retrieval (i.e. not just product recommendation)
  • CoFE URL
  • http//eecs.oregonstate.edu/iis/CoFE

11
Does the Algorithm Really Matter?
  • Where do we get the most impact? (benefit/cost)
  • A. Improving the algorithm?
  • B. Changing user interface/user interaction?

12
Does the Algorithm Really Matter?
  • Where do we get the most impact? (benefit/cost)
  • A. Improving the algorithm?
  • B. Changing user interface/user interaction?
  • Answer
  • Unless you have already optimized your user
    interface extensively, the answer is usually B.

13
Scenario from a Related Field
  • Document retrieval study by Turpin and Hersh
    (SIGIR 2001)
  • Two groups of medical students
  • Compared human performance of
  • 1970s search model (basic TF/IDF)
  • Recent OKAPI search model with greatly improved
    Mean Average Precision
  • Identical user interfaces
  • Task locating medical information
  • Result no statistical difference!!!!

14
Turpin Hersh Findings
  • Humans quickly compensate for poor algorithm
    performance
  • Possible conclusion provide user interfaces that
    allow users to compensate
  • Many relevant results werent selected as
    relevant
  • Possible conclusion focus on persuading as well
    as recommending

15
Analyzing Algorithms for End-user Effects
  • Algorithms believed reasonable may actually be
    terrible!
  • McLaughlin Herlocker, SIGIR 2004.
  • In this case, poor handling of low confidence
    recommendations
  • In situations with small amounts of data
  • Changes in algorithm -gt big changes in
    recommendations
  • Analyze exact recommendations seen by end-user
  • Instead of just items with existing ratings

16
Data from SIGIR 2004 Paper
17
Checkpoint Take-Aways
  • Previously
  • We have a problem synthesizing research in CF
  • CoFE free, could increase research productivity
    and reduce barriers to standardization
  • More focus on the user experience needed
  • Coming up
  • There is a great potential for CF in information
    retrieval (i.e. not just product recommendation)
  • CoFE URL
  • http//eecs.oregonstate.edu/iis/CoFE

18
Exploring Library Search Interfaces
With Janet Webster, Oregon State University
Libraries
19
Features of Web-based Library Search
  • Diverse content
  • Web pages, catalogs, journal indexes, electronic
    journals, maps, various other digital special
    collections
  • Searchable databases are important sources
  • Library responsibility
  • Guiding people to appropriate content
  • Understanding what the users real need is

20
SERF System for Electronic Recommendation
Filtering
21
(No Transcript)
22
(No Transcript)
23
The Human Element
  • Capture and leverage the experience of every user
  • Recommendations are based on human evaluation
  • Explicit votes
  • Inferred votes (implicit)
  • Recommend (question, document) pairs
  • Not just documents
  • Human can determine if questions have similarity
  • System gets smarter with each use
  • Not just each new document

24
Initial Results
25
Three months SERF usage 1194 search transactions
26
Three months SERF usage 1194 search transactions
Only Google Results (706 - 59.13)
Google results recommendations (488 - 40.87)
Average visited documents 2.196
Average visited documents 1.598
Clicked (172 24.4)
No clicks (534 - 75.6)
Clicked (197 40.4)
No click (291 59.6)
First click - recommendation (141 71.6)
First click - Google result (56 28.4)
Average ratings 14.727
Average ratings 20.715
27
Three months SERF usage 1194 search transactions
Only Google Results (706 - 59.13)
Google results recommendations (488 - 40.87)
Average visited documents 2.196
Average visited documents 1.598
Clicked (172 24.4)
No clicks (534 - 75.6)
Clicked (197 40.4)
No click (291 59.6)
First click - recommendation (141 71.6)
First click - Google result (56 28.4)
Average ratings 14.727
Average ratings 20.715
28
Three months SERF usage 1194 search transactions
Only Google Results (706 - 59.13)
Google results recommendations (488 - 40.87)
Average visited documents 2.196
Average visited documents 1.598
Clicked (172 24.4)
No clicks (534 - 75.6)
Clicked (197 40.4)
No click (291 59.6)
First click - recommendation (141 71.6)
First click - Google result (56 28.4)
Average rating 14.727 (49 Voted as Useful)
Average rating 20.715 (69 Voted as Useful)
Vote of yes 30, vote of no 0
29
Conclusion
  • No large leaps in language understanding expected
  • Understanding the meaning of language is very
    hard
  • Collaborative filtering (CF) bypasses this
    problem
  • Humans do the analysis
  • Technology is widely applicable

30
(No Transcript)
31
(No Transcript)
32
Try it!
33
Final Take-Aways
  • We have a problem synthesizing research in CF
  • CoFE free, could increase research productivity
    and reduce barriers to standardization
  • More focus on the user experience needed
  • Great potential for CF in information retrieval
    (i.e. not just product recommendation)

34
Links Contacts
  • Research Group Home Page
  • http//eecs.oregonstate.edu/iis
  • CoFE
  • http//eecs.oregonstate.edu/iis/CoFE
  • SERF
  • http//osulibrary.oregonstatate.edu/
  • Jon Herlocker
  • herlock_at_cs.orst.edu
  • 1 (541) 737-8894

35
Simple CF
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
36
Ending Thoughts
  • Recommendation vs. persuasion

37
Stereotypical Integrator of RS Has
  • Large item catalog
  • With item attributes (e.g. keywords, metadata
    such as author, subject, cross-references, )
  • Large user base
  • With user attributes (age, gender, city, country,
    )
  • Evidence of customer preferences
  • Explicit ratings (powerful, but harder to elicit)
  • Observations of user activity (purchases, page
    views, emails, prints, )

38
The RS Space
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
39
Traditional Personalization
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
40
Classic CF
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
In the end, most models will be hybrid
41
Classic CF
Users
Items
User-User Links
Item-ItemLinks
Observed preferences
42
Advantages of Pure CF
  • No expensive and error-prone user attributes or
    item attributes
  • Incorporates quality and taste
  • Works on any rate-able item
  • One data model gt many content domains
  • Serendipity
  • Users understand and connect with it!
Write a Comment
User Comments (0)
About PowerShow.com