Title: Accurately Interpreting Clickthrough Data as Implicit Feedback
1Accurately Interpreting Clickthrough Data as
Implicit Feedback
- Thorsten Joachims, Laura Granka, Bing Pan, Helene
Hembrooke, Geri Gay - Cornell University
- SIGIR 2005
- Presented by Rosta Farzan
- PAWS Group Meeting
2Problem
Adapting retrieval systems requires large amount
of data
Explicit Data
Implicit Data
Expensive
Noisy and unreliable
3Goal
- Evaluate which types of implicit feedback can
reliably be extracted from observed users behavior
4Outline
- Introduction
- User Study
- Analysis
- Discussion
5Introduction
- Designing a study to evaluate the reliability of
implicit feedback - How users interact with the list of ranked
results from Google search - Two types of analysis
- Analysis of users behavior
- Using eye-tracking logging
- Do users scan from top to bottom?
- How many abstracts do they read before clicking?
- How does users behavior change if the result are
manipulated artificially? - Analysis of Implicit Feedback
- Comparing implicit feedback with explicit
feedback collected manually
6User Study
- Task
- Five navigational
- Find related web pages
- Five informational
- Find specific information
- Users read each question in turn and answered
orally when they found the answer - Participants
- Phase I
- 34 undergraduate, different major
- Used data from 29 because of eye-tracking issues
- Phase II
- 22 participants, 16 were used
- Conditions
- Phase I
- Normal - Googles search result with no
manipulation - Phase II
- Normal - Googles search result with no
manipulation - Swapped -Top two results were switched in order
- Reversed - 10 search results in reversed order
Navigation Find the homepage of Michael Jordan, the statistician. Find the page displaying route map for Greyhound buses.
Informational Where is the tallest mountain in New York located? Which actor starred as the main character in the original Time Machine movie?
7User Study
- Data Collection
- Implicit data
- HTTP-proxy server logs all click-stream data
- Eye-tracking
- fixations
- Explicit data
- Five judges for each two questions plus 10
results pages from two other questions - Order the randomized results by how relevant they
are - Relative decision making
- Inter-judges agreement
- Phase I (ordering top 10) 89.5
- Phase II (ordering all results) 82.5
8Analysis of User Behavior
- Which links do users view and click?
- Do users scan links from top to bottom?
- Which links do users evaluate before clicking?
9Which Links do Users View and Click?
User click substantially more often on the first
than second link
Scrolling
10Do Users Scan Links from Top to Bottom?
On average users tend to read from top to bottom
There is a big gap before viewing the
third-ranked
Users first scan the viewable results quite
thoroughly before scrolling
11Which Links do Users Evaluate before Clicking?
They view substantially more abstracts above than
below the click
12Analysis of Implicit Feedback
- How relevance of the document to the query
influence clicking decision? - What Clicks tell us about the relevance of a
document?
13Does Relevance Influence User Decision?
- Using reversed condition
- Lower quality of retrieval
- Users react to the relevance of the presented
links - Users view lower ranked links more frequently
- Scan significantly more abstracts
- Users clicked less on first rank
- Users clicked more often on low ranked
14Are Clicks Absolute Relevance Judgments?
- Trust bias
- Ranked first receives
- many more clicks
- Quality bias
- Comparing clicking behavior in normal condition
vs. reversed condition. - On lower quality, users click on abstracts that
are on average less relevant
15Are Clicks Relative Relevance Judgments?
- Consider not-clicked links as well as clicks as
feedback signals - Example l1 l2 l3 l4 l5 l6 l7
- Strategy 1 Click gt Skip Above
- Rel(l3) gt rel(l2), rel(l5) gt rel(l2), rel(l5) gt
rel(l4) - Phase I data supports this strategy but phase II
doesnt - Strategy 2 Last Click gt Skip Above
- Earlier clicks might be less informed than later
clicks - Rel(l5) gt rel(l2), rel(l5) gt rel(l4)
- Still not supported by phase II data
16Strategies
- Strategy 3 Click gt Earlier Click
- Click later in time are on more relevant
abstracts - Assuming order of clicks as 3, 1, 5
- Rel(l1)gtrel(l3), rel(l5)gtrel(l3), rel(l5)gtrel(l1)
- Not supported by data
- Strategy 4 Last Click gt Skip Previous
- Constraint only between a clicked link and a
not-clicked link immediately above - Result is similar to strategy 1
- Strategy 5 Click gt No-Click Next
- Constraint between a clicked link and an
immediately following link