Implicit feedback: Good may be better than best - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Implicit feedback: Good may be better than best

Description:

Much credit: 'hypertext', inspiration for the web, Lotus notes, HyperCard ... If the best link is not displayed, users will still click on something ... – PowerPoint PPT presentation

Number of Views:158
Avg rating:3.0/5.0
Slides: 48
Provided by: ResearchM53
Category:

less

Transcript and Presenter's Notes

Title: Implicit feedback: Good may be better than best


1
Implicit feedback Good may be better than best
  • Steve Lawrence

2
Limitations of the web
  • Dead links
  • Lack of support for author royalties
  • Poor indexing and navigation support
  • Better system?
  • Enforce link consistency
  • Allow authors to collect royalties
  • Support for better navigation and indexing

3
Web
  • Xanadu (1960)
  • Improved design, fixes all of these limitations
  • Essentially unused
  • The web
  • Widely used
  • Disadvantages of the improved design
  • Extra effort imposed on users
  • Added complexity in the system
  • Extended development time
  • e.g., if link consistency is enforced, no longer
    can anyone make information available simply by
    putting a file in a specific directory
  • The web has become very popular in part due to
    its limitations
  • Good may be better than best

4
Web vs. Xanadu
  • Ted Nelson
  • Much credit hypertext, inspiration for the
    web, Lotus notes, HyperCard
  • More to Xanadu not covered here (transclusion,
    bidirectional links, version management)
  • According to Nelson
  • On both the desktop and world-wide scale,
    culturally and commercially, we are poorer for
    these bad tools the web
  • The World Wide Web is precisely what we were
    trying to prevent

5
CiteSeer
  • CiteSeer
  • Metadata not required for submission
  • Specific citation formats not required
  • More optimal system?
  • Require manual submission which specifies title,
    author, etc. (CORR)
  • Require citations to be submitted in a specific
    form (Cameron)
  • CiteSeer is likely to contain more errors
  • Error rate on articles not processed is 100
  • Value of explicit feedback not obtained is 0
  • Much lower overhead and complexity for users

6
Implicit vs. explicit feedback
  • Explicit feedback
  • Overhead for the user
  • Implicit feedback
  • No overhead for the user
  • Implicit feedback may be better than explicit
    feedback because you may not be able to get
    sufficient explicit feedback
  • Other issues - accuracy of feedback

7
Good may be better than best
  • Not a binary choice
  • Often many possible systems
  • Also
  • Worse is better
  • Best is the worst enemy of good
  • MIT approach vs. New Jersey approach for
    design (Gabriel)
  • The increased overhead, complexity and/or cost
    (for the system and/or the users), and extended
    development times of more optimal systems may
    make them far less successful than alternatives

8
Convenience of access
  • 119,924 conference articles (bibliographical data
    from DBLP)

9
Explicit metadata usage
  • Only 34 of sites use description or keywords
    tags on their homepage
  • Analyzed 2,500 random servers
  • 0.3 of sites contained Dublin Core tags
  • Attention is the scarce resource. Herb Simon
    (1967)
  • Difficult to obtain explicit feedback

10
Implicit vs. explicit feedback
  • Limitations of implicit feedback
  • Hard to determine the meaning of a click. If the
    best link is not displayed, users will still
    click on something
  • Click duration may be misleading
  • People leave machines unattended
  • Opening multiple windows quickly, then reading
    them all slowly
  • Multitasking
  • Limitations of explicit feedback
  • Spam
  • Inconsistent ratings

11
CiteSeer
12
CiteSeer
  • Scientific literature digital library
  • Over 600,000 documents indexed
  • Earths largest free full-text index of
    scientific literature
  • (Los Alamos arXiv about 200,000 papers)
  • Over 20,000 hosts accessing the site daily
  • Accesses from over 150 countries per month
  • Over 10 requests per second at peak times

13
Improving implicit feedback
  • Have to go to details page before getting link to
    article
  • Have seen abstract before downloading
  • Shown context of citations before downloading

14
No download link
15
Document information page
16
Citation context
17
CiteSeer explicit feedback
  • Document ratings and comments

18
CiteSeer explicit feedback
  • Allow users to correct errors
  • Authors may be motivated to correct errors
    relating to their own work
  • How many explicit corrections? (About 600,000
    papers)
  • How many explicit ratings? (percentage of
    document accesses)

19
Explicit feedback
  • Over 300,000 explicit corrections/updates
  • How many bogus updates?
  • (We require a validated email address)
  • Explicit ratings 0.17 of document accesses

20
Explicit corrections
  • Over 100 bogus correction attempts

21
Comparison of feedback types
  • How well do document access, document downloads,
    and explicit ratings predict high-citation
    papers?
  • Low citation papers (lt 5 citations)
  • High citation papers (gt 5 citations)
  • Ratio of downloads/accesses/ratings for high to
    low-citation papers
  • Accesses ?
  • Downloads ?
  • Ratings ?

22
Comparison of feedback types
  • Low citation papers (lt 5 citations)
  • High citation papers (gt 5 citations)
  • Ratio of downloads/accesses/ratings for high to
    low-citation papers
  • Accesses 2.5
  • Downloads 3.1
  • Ratings 0.96 (low 2.3 high 2.2)

23
CiteSeer user profiling
  • Profiling system not currently active (scale)
  • Profile contains documents, citations, keywords,
    etc. of interest
  • User notified of new related documents or
    citations by email or via the web interface
  • Both implicit and explicit feedback
  • Record the actions of a user for recommendations
  • View
  • Download
  • Ignore

24
(No Transcript)
25
(No Transcript)
26
CiteSeer user profiling
  • Implicit feedback should be more successful in
    CiteSeer due to citation context, query-sensitive
    summaries, document details pages, and the
    expense of document downloads
  • Users can better determine the relevance of
    documents before they request details or download
    articles
  • Analyze co-viewed/downloaded documents to
    recommend documents related to a given document
  • Similar to one of Amazons book recommenders

27
Profile creation
  • (Pseudo)-documents added to users profile
    whenever a user performs an action in the profile
    editor or on a real document when browsing
  • Action interestingness a(.)
  • Explicitly added to profile Very high positive
  • Downloaded High positive
  • Details viewed Moderate positive
  • Recommendation ignored Low negative
  • Removed from profile Set to zero

28
Paper recommendations
  • New papers recommended periodically via email or
    the web interface
  • New paper d recommended if it has a sufficiently
    high interestingness
  • Threshold initially set at a small positive value

29
Profile adaption
  • Adaption occurs via manual adjustment and machine
    learning
  • User can explicitly modify a profile by adjusting
    the weight of pseudo-documents
  • Browsing actions implicitly modify the weight of
    corresponding pseudo-documents
  • User response to recommendation of a paper d is
    used to update weights that contributed to the
    recommendation
  • where is the learning rate

30
Weight update rule properties
  • Weights modified according to their contribution
    to recommendations
  • Overall precision/recall threshold automatically
    adapted. Ignoring recommendations raises the
    threshold for recommending a paper. Explicitly
    adding papers lowers the threshold
  • The influence of different relatedness measures
    is adapted separately

31
REFEREE
  • Recommender framework where outside groups can
    test recommendation systems live on CiteSeer
  • Implemented a version of Pennocks Personality
    Diagnosis recommender for initial testing

32
REFEREE
  • Statistics on recommender performance available
    quickly
  • For evaluation we focus on measuring impact on
    user behavior
  • Implicit feedback more effective because users
    see a lot of information about documents before
    they can download them
  • Which recommenders best?
  • Users who viewed x also viewed?
  • Exact sentence overlap?
  • Papers that cite this paper?
  • Citation similarity?

33
Recommendations followed
Recommendation type Recommendations followed
Sentence overlap 8.2
Cited by 5.1
CCIDF (bibliographic coupling) 3.1
PD-1 2.1
Users who viewed 2.0
PD-2 2.0
Co-citation 1.9
34
NewsSeer
35
NewsSeer
  • Primarily a single page with implicit feedback
    only
  • Also supports explicit feedback but this is
    optional

36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
NewsSeer statistics
  • About 1 million pageviews
  • About 10,000 users (gt 5 requests)
  • 5,000 users (gt 10 requests)
  • How many users rated an article?
  • What percentage of requests were ratings on the
    homepage?
  • What percentage of requests were for the source
    ratings page?

40
NewsSeer statistics
  • 1,000 users rated an article from the 10,000 with
    gt 5 requests
  • About 10
  • About 20 of the top 2,500 users
  • About 30 of the top 1,000 users
  • 20 of 56 users that did gt1,000 requests
  • 10 of 21 users that did gt2,000 requests
  • Homepage 51 (auto-reloaded)
  • View article 40
  • Keyword query 4 (was not available initially)
  • Ratings on homepage 5
  • Source rating page views 0.2

41
MusicSeer
42
Music similarity
43
Music similarity
  • Music similarity survey
  • Erdös game

44
Music similarity
  • Erdös game

45
Music similarity
  • Simple survey

46
MusicSeer
  • Survey
  • 713 users, 10,997 judgments
  • Game
  • 680 users, 11,313 judgments

47
Summary
  • Implicit feedback may be better because there is
    much lower overhead
  • Much greater participation may more than
    compensate for the less accurate information
    received
  • Can structure system to maximize implicit
    feedback gained
  • Can obtain explicit feedback if enough incentive,
    or easy enough
Write a Comment
User Comments (0)
About PowerShow.com