Title: Implicit feedback: Good may be better than best
1Implicit feedback Good may be better than best
2Limitations of the web
- Dead links
- Lack of support for author royalties
- Poor indexing and navigation support
- Better system?
- Enforce link consistency
- Allow authors to collect royalties
- Support for better navigation and indexing
3Web
- Xanadu (1960)
- Improved design, fixes all of these limitations
- Essentially unused
- The web
- Widely used
- Disadvantages of the improved design
- Extra effort imposed on users
- Added complexity in the system
- Extended development time
- e.g., if link consistency is enforced, no longer
can anyone make information available simply by
putting a file in a specific directory - The web has become very popular in part due to
its limitations - Good may be better than best
4Web vs. Xanadu
- Ted Nelson
- Much credit hypertext, inspiration for the
web, Lotus notes, HyperCard - More to Xanadu not covered here (transclusion,
bidirectional links, version management) - According to Nelson
- On both the desktop and world-wide scale,
culturally and commercially, we are poorer for
these bad tools the web - The World Wide Web is precisely what we were
trying to prevent
5CiteSeer
- CiteSeer
- Metadata not required for submission
- Specific citation formats not required
- More optimal system?
- Require manual submission which specifies title,
author, etc. (CORR) - Require citations to be submitted in a specific
form (Cameron) - CiteSeer is likely to contain more errors
- Error rate on articles not processed is 100
- Value of explicit feedback not obtained is 0
- Much lower overhead and complexity for users
6Implicit vs. explicit feedback
- Explicit feedback
- Overhead for the user
- Implicit feedback
- No overhead for the user
- Implicit feedback may be better than explicit
feedback because you may not be able to get
sufficient explicit feedback - Other issues - accuracy of feedback
7Good may be better than best
- Not a binary choice
- Often many possible systems
- Also
- Worse is better
- Best is the worst enemy of good
- MIT approach vs. New Jersey approach for
design (Gabriel) - The increased overhead, complexity and/or cost
(for the system and/or the users), and extended
development times of more optimal systems may
make them far less successful than alternatives
8Convenience of access
- 119,924 conference articles (bibliographical data
from DBLP)
9Explicit metadata usage
- Only 34 of sites use description or keywords
tags on their homepage - Analyzed 2,500 random servers
- 0.3 of sites contained Dublin Core tags
- Attention is the scarce resource. Herb Simon
(1967) - Difficult to obtain explicit feedback
10Implicit vs. explicit feedback
- Limitations of implicit feedback
- Hard to determine the meaning of a click. If the
best link is not displayed, users will still
click on something - Click duration may be misleading
- People leave machines unattended
- Opening multiple windows quickly, then reading
them all slowly - Multitasking
- Limitations of explicit feedback
- Spam
- Inconsistent ratings
11CiteSeer
12CiteSeer
- Scientific literature digital library
- Over 600,000 documents indexed
- Earths largest free full-text index of
scientific literature - (Los Alamos arXiv about 200,000 papers)
- Over 20,000 hosts accessing the site daily
- Accesses from over 150 countries per month
- Over 10 requests per second at peak times
13Improving implicit feedback
- Have to go to details page before getting link to
article - Have seen abstract before downloading
- Shown context of citations before downloading
14No download link
15Document information page
16Citation context
17CiteSeer explicit feedback
- Document ratings and comments
18CiteSeer explicit feedback
- Allow users to correct errors
- Authors may be motivated to correct errors
relating to their own work - How many explicit corrections? (About 600,000
papers) - How many explicit ratings? (percentage of
document accesses)
19Explicit feedback
- Over 300,000 explicit corrections/updates
- How many bogus updates?
- (We require a validated email address)
- Explicit ratings 0.17 of document accesses
20Explicit corrections
- Over 100 bogus correction attempts
21Comparison of feedback types
- How well do document access, document downloads,
and explicit ratings predict high-citation
papers? - Low citation papers (lt 5 citations)
- High citation papers (gt 5 citations)
- Ratio of downloads/accesses/ratings for high to
low-citation papers - Accesses ?
- Downloads ?
- Ratings ?
22Comparison of feedback types
- Low citation papers (lt 5 citations)
- High citation papers (gt 5 citations)
- Ratio of downloads/accesses/ratings for high to
low-citation papers - Accesses 2.5
- Downloads 3.1
- Ratings 0.96 (low 2.3 high 2.2)
23CiteSeer user profiling
- Profiling system not currently active (scale)
- Profile contains documents, citations, keywords,
etc. of interest - User notified of new related documents or
citations by email or via the web interface - Both implicit and explicit feedback
- Record the actions of a user for recommendations
- View
- Download
- Ignore
24(No Transcript)
25(No Transcript)
26CiteSeer user profiling
- Implicit feedback should be more successful in
CiteSeer due to citation context, query-sensitive
summaries, document details pages, and the
expense of document downloads - Users can better determine the relevance of
documents before they request details or download
articles - Analyze co-viewed/downloaded documents to
recommend documents related to a given document - Similar to one of Amazons book recommenders
27Profile creation
- (Pseudo)-documents added to users profile
whenever a user performs an action in the profile
editor or on a real document when browsing - Action interestingness a(.)
- Explicitly added to profile Very high positive
- Downloaded High positive
- Details viewed Moderate positive
- Recommendation ignored Low negative
- Removed from profile Set to zero
28Paper recommendations
- New papers recommended periodically via email or
the web interface - New paper d recommended if it has a sufficiently
high interestingness - Threshold initially set at a small positive value
29Profile adaption
- Adaption occurs via manual adjustment and machine
learning - User can explicitly modify a profile by adjusting
the weight of pseudo-documents - Browsing actions implicitly modify the weight of
corresponding pseudo-documents - User response to recommendation of a paper d is
used to update weights that contributed to the
recommendation - where is the learning rate
30Weight update rule properties
- Weights modified according to their contribution
to recommendations - Overall precision/recall threshold automatically
adapted. Ignoring recommendations raises the
threshold for recommending a paper. Explicitly
adding papers lowers the threshold - The influence of different relatedness measures
is adapted separately
31REFEREE
- Recommender framework where outside groups can
test recommendation systems live on CiteSeer - Implemented a version of Pennocks Personality
Diagnosis recommender for initial testing
32REFEREE
- Statistics on recommender performance available
quickly - For evaluation we focus on measuring impact on
user behavior - Implicit feedback more effective because users
see a lot of information about documents before
they can download them - Which recommenders best?
- Users who viewed x also viewed?
- Exact sentence overlap?
- Papers that cite this paper?
- Citation similarity?
33Recommendations followed
Recommendation type Recommendations followed
Sentence overlap 8.2
Cited by 5.1
CCIDF (bibliographic coupling) 3.1
PD-1 2.1
Users who viewed 2.0
PD-2 2.0
Co-citation 1.9
34NewsSeer
35NewsSeer
- Primarily a single page with implicit feedback
only - Also supports explicit feedback but this is
optional
36(No Transcript)
37(No Transcript)
38(No Transcript)
39NewsSeer statistics
- About 1 million pageviews
- About 10,000 users (gt 5 requests)
- 5,000 users (gt 10 requests)
- How many users rated an article?
- What percentage of requests were ratings on the
homepage? - What percentage of requests were for the source
ratings page?
40NewsSeer statistics
- 1,000 users rated an article from the 10,000 with
gt 5 requests - About 10
- About 20 of the top 2,500 users
- About 30 of the top 1,000 users
- 20 of 56 users that did gt1,000 requests
- 10 of 21 users that did gt2,000 requests
- Homepage 51 (auto-reloaded)
- View article 40
- Keyword query 4 (was not available initially)
- Ratings on homepage 5
- Source rating page views 0.2
41MusicSeer
42Music similarity
43Music similarity
- Music similarity survey
- Erdös game
44Music similarity
45Music similarity
46MusicSeer
- Survey
- 713 users, 10,997 judgments
- Game
- 680 users, 11,313 judgments
47Summary
- Implicit feedback may be better because there is
much lower overhead - Much greater participation may more than
compensate for the less accurate information
received - Can structure system to maximize implicit
feedback gained - Can obtain explicit feedback if enough incentive,
or easy enough