Query Chains: Learning to Rank from Implicit Feedback - PowerPoint PPT Presentation

About This Presentation
Title:

Query Chains: Learning to Rank from Implicit Feedback

Description:

Presented By: Steven Carr. The Problem. The results returned from web searches can be ... Search engines don't learn from your document selections or ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 16
Provided by: stev1171
Category:

less

Transcript and Presenter's Notes

Title: Query Chains: Learning to Rank from Implicit Feedback


1
Query Chains Learning to Rank from Implicit
Feedback
  • Paper Authors Filip Radlinski
  • Thorsten Joachims
  • Presented By Steven Carr

2
The Problem
  • The results returned from web searches can be
    cluttered with results that the user considers to
    be irrelevant
  • Search engines dont learn from your document
    selections or from revisions to your query

3
Page Ranking
  • Non-learning Methods
  • Link-based (Google PageRank)
  • Learning Methods
  • Explicit user feedback
  • Ask the user how relevant they found the result
  • Very accurate data, but very time-consuming
  • Implicit user feedback
  • Determine the relevance by looking at search
    engine logs
  • Unlimited data at a low cost, but requires
    interpretation

4
The Solution
  • Automatically detect query chains
  • Use query chains to infer relevance of results in
    each query and between results from all queries
    in the chain
  • Use a ranking Support Vector Machine (SVM) to
    learn a retrieval function from the results.
  • Osmot search engine based on this model

5
Query Chains
  • People often reword their queries to get more
    useful results
  • Spelling mistake
  • Increased or decreased specificity
  • New but related query
  • Query chains are defined as a sequence of
    reformulated queries

6
Support Vector Machines
  • Learning method used for classification
  • Separates two classes of data points by
    generating a hyperplane that maximizes the vector
    distance between the two sets and the hyperplane
  • Uses the hyperplane to assign new data points to
    one of the two classes

7
Identifying Query Chains
  • Manually labeled query chains from the Cornell
    University library search engine for a period of
    five weeks
  • Used data to train SVMs with various parameters,
    giving an accuracy of 94.3 and a precision of
    96.5
  • Non-learning strategy of assuming all queries
    from the same IP in a 30 minute period belong to
    the same chain gave an accuracy and precision of
    91.6
  • The non-learning strategy was sufficiently
    accurate and less expensive so they used it
    instead

8
Inferring Relevance
  • Developed six strategies for generating feedback
    from query chains
  • Click gtq Skip Above A clicked on document is
    more relevant than any documents above it
  • Click First gtq No-Click Second Given the first
    two document results, if the first was clicked,
    it is more relevant
  • Strategies 3 and 4 are the same as the first two,
    but with respect to the previous query
  • Click gtq Skip Earlier Query A clicked on
    document is more relevant than any that were
    skipped in any earlier query
  • Click gtq Top Two Earlier Query If nothing was
    clicked in the last query, the clicked document
    is more relevant than the top two from an earlier
    query

9
Example
10
Learning Ranking Functions
11
Experiment
  • The Osmot search engine was created as a wrapper,
    implementing logging, analysis and ranking
  • Users presented with a combination of results
    from two different ranking functions
  • Evaluate which ranking was better based on which
    documents were clicked
  • Evaluation conducted over two months collecting
    around 2400 queries

12
Experiment Results
  • Users preferred results from the query chain
    ranking function 53 of the time
  • Model trained with query chains outperformed
    model trained without query chains with 99
    confidence

13
Conclusion
  • Developed an algorithm to determine the relevance
    of a document from log entries
  • Developed another algorithm to use preference
    judgments to learn an improved ranking function
  • Algorithm can learn to include documents that
    werent included in the original search results

14
Critique
  • The learning method uses only log files rather
    than constantly updating itself
  • Referred to other papers rather than explain
    concepts needed to understand the paper
  • Didnt offer a comparison between the
    effectiveness of their learning algorithm
    compared to other learning algorithms

15
Questions?
Write a Comment
User Comments (0)
About PowerShow.com