Title: Learning%20User%20Interaction%20Models%20for%20Predicting%20Web%20Search%20Result%20Preferences
1Learning User Interaction Models for Predicting
Web Search Result Preferences
- Eugene Agichtein
- Eric Brill
- Susan Dumais
- Robert Ragno
Microsoft Research
2User Interactions
- Goal Harness rich user interactions with search
results to improve quality of search - Millions of users submit queries daily and
interact with the search results - Clicks, query refinement, dwell time
- User interactions with search engines are
plentiful, but require careful interpretation - We will predict user preferences for results
3Related Work
- Linking implicit interactions and explicit
judgments - Fox et al. TOIS 2005
- Predict explicit satisfaction rating
- Joachims SIGIR 2005
- Predict preference (gaze studies, interpretation
strategies) - More broad overview of analyzing implicit
interactions Kelly Teevan SIGIR Forum 2003
4Outline
- Distributional model of user interactions
- User Behavior Relevance Noise
- Rich set of user interaction features
- Learning framework to predict user preferences
- Large-scale evaluation
5Interpreting User Interactions
- Clickthrough and subsequent browsing behavior of
individual users influenced by many factors - Relevance of a result to a query
- Visual appearance and layout
- Result presentation order
- Context, history, etc.
- General idea
- Aggregate interactions across all users and
queries - Compute expected behavior for any query/page
- Recover relevance signal for a given query
6Case Study Clickthrough
- Clickthrough frequency for all queries in
sample
Clickthrough (query q, document d, result
position p) expected (p)
relevance (q , d)
7Clickthrough for Queries with Known Position of
Top Relevant Result
Relative clickthrough for queries top relevant
result known to be at position 1
8Clickthrough for Queries with Known Position of
Top Relevant Result
Higher clickthrough at top non-relevant than at
top relevant document
Relative clickthrough for queries with known
relevant results in position 1 and 3 respectively
9Deviation from Expected
- Relevance component deviation from expected
Relevance(q , d) observed - expected (p)
10Beyond Clickthrough Rich User Interaction Space
- Observed and Distributional features
- Observed features aggregated values over all
user interactions for each query and result pair - Distributional features deviations from the
expected behavior for the query - Represent user interactions as vectors in
Behavior Space - Presentation what a user sees before click
- Clickthrough frequency and timing of clicks
- Browsing what users do after the click
11Some User Interaction Features
Presentation Presentation
ResultPosition Position of the URL in Current ranking
QueryTitleOverlap Fraction of query terms in result Title
Clickthrough Clickthrough
DeliberationTime Seconds between query and first click
ClickFrequency Fraction of all clicks landing on page
ClickDeviation Deviation from expected click frequency
Browsing Browsing
DwellTime Result page dwell time
DwellTimeDeviation Deviation from expected dwell time for query
12Outline
- Distributional model of user interactions
- Rich set of user interaction features
- Models for predicting user preferences
- Experimental results
13Predicting Result Preferences
- Task predict pairwise preferences
- A user will prefer Result A gt Result B
- Models for preference prediction
- Current search engine ranking
- Clickthrough
- Full user behavior model
14Clickthrough Model
- SAN Skip Above and Skip Next
- Adapted from Joachims et al. SIGIR05
- Motivated by gaze tracking
- Example
- Click on results 2, 4
- Skip Above 4 gt (1, 3), 2gt1
- Skip Next 4 gt 5, 2gt3
1
2
3
4
5
6
7
8
15Distributional Model
- CD distributional model, extends SAN
- Clickthrough considered iff frequency gt e than
expected - Click on result 2 likely by chance
- 4gt(1,2,3,5), but not 2gt(1,3)
1
2
3
4
5
6
7
8
16User Behavior Model
- Full set of interaction features
- Presentation, clickthrough, browsing
- Train the model with explicit judgments
- Input behavior feature vectors for each
query-page pair in rated results - Use RankNet (Burges et al., ICML 2005) to
discover model weights - Output a neural net that can assign a
relevance score to a behavior feature vector
17RankNet for User Behavior
- RankNet general, scalable, robust Neural Net
training algorithms and implementation - Optimized for ranking predicting an ordering of
items, not scores for each - Trains on pairs (where first point is to be
ranked higher or equal to second) - Extremely efficient
- Uses cross entropy cost (probabilistic model)
- Uses gradient descent to set weights
- Restarts to escape local minima
18Outline
- Distributional model of user interactions
- Rich set of user interaction features
- Models for predicting user preferences
- Experimental evaluation
19Evaluation Metrics
- Task predict user preferences
- Pairwise agreement
- For comparison with previous work
- Useful for ranking and other applications
- Precision for a query
- Fraction of pairs predicted that agree with
preferences derived from human ratings - Recall for a query
- Fraction of human-rated preferences predicted
correctly - Average Precision and Recall across all queries
20Datasets
- Explicit judgments
- 3,500 queries, top 10 results, relevance ratings
converted to pairwise preferences for each query - User behavior data
- Opt-in client-side instrumentation
- Anonymized UserID, time, visited page
- Detect queries submitted to MSN Search engine
- Subsequent visited pages
- 120,000 instances of these 3,500 queries
submitted at least 2 times over 21 days
21Methods Compared
- Preferences inferred by
- Current search engine ranking Baseline
- Result i gt Result j iff i gt j
- Clickthrough model SAN
- Clickthrough distributional model CD
- Full user behavior model UserBehavior
22Results Predicting User Preferences
- Baseline lt SAN lt CD ltlt UserBehavior
- Rich user behavior features result in dramatic
improvement
23Contribution of Feature Types
- Presentation features not helpful
- Browsing features higher precision, lower
recall - Clickthrough features gt CD due to learning
24Amount of Interaction Data
- Prediction accuracy for varying amount of user
interactions per query - Slight increase in Recall, substantial increase
in Precision
25Learning Curve
- Minimum precision of 0.7
- Recall increases substantially with more days of
user interactions
26Experiments Summary
- Clickthrough distributional model more accurate
than previously published work - Rich user behavior features dramatic accuracy
improvement - Accuracy increases for frequent queries and
longer observation period
27Some Applications
- Web search ranking (next talk)
- Can use preference predictions to re-rank results
- Can integrate features into ranking algorithms
- Identifying and answering navigational queries
- Can tune model to focus on top 1 result
- Supports classification or ranking methods
- Details in Agichtein Zheng, KDD 2006
- Automatic evaluation augment explicit relevance
judgments
28Conclusions
- General framework for training rich user
interaction models - Robust techniques for inferring user relevance
preferences - High-accuracy preference prediction in a large
scale evaluation
29Thank you
Text Mining, Search, and Navigation group
http//research.microsoft.com/tmsn/ Adaptive
Systems and Interaction group http//research.mic
rosoft.com/adapt/
Microsoft Research
30Presentation Features
- Query terms in Title, Summary, URL
- Position of result
- Length of URL
- Depth of URL
31Clickthrough Features
- Fraction of clicks on URL
- Deviation from expected given result position
- Time to click
- Time to first click in session
- Deviation from average time for query
32Browsing Features
- Time on URL
- Cumulative time on URL (CuriousBrowser)
- Deviation from average time on URL
- Averaged over the user
- Averaged over all results for the query
- Number of subsequent non-result URLs
33An Intelligent Baseline