Predicting Users Site Preference in Web Search - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Predicting Users Site Preference in Web Search

Description:

– PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 2
Provided by: Bin107
Category:

less

Transcript and Presenter's Notes

Title: Predicting Users Site Preference in Web Search


1
Predicting Users Site Preference in Web
Search Bin Tan
Motivation
The k-NN Approach
More Details
  • We trust some websites more than others.
  • In web search, wed like to see results from
    these preferred websites.
  • The project is to learn a users site preferences
    from past web search interactions and to rerank
    the search results to reflect these preferences.
  • Examples
  • I go to wunderground.com for weather information,
    but in Google weather.com is top-ranked.
  • For me DAIS means dais.cs.uiuc.edu, and I dont
    care about sites with other meanings of DAIS.
  • When looking for papers, I would prefer results
    from portal.acm.org
  • Preferences are mostly topic-dependent!
  • Supervised learning problem Given a query q and
    a search result r, predict if rs site s is
    preferred by the user.
  • k-Nearest neighbor approach Find k past queries
    most similar to q and the preferred sites for
    these queries. Then determine if s is preferred
    using, e.g. weighted majority votes.
  • For each web search, we need a log record
    containing
  • Time of search
  • Keyword terms and their weights
  • Preferred sites and their confidence
  • Each query is characterized by a set of terms
    (query keywords frequent terms in the search
    results).
  • I only consider terms appearing in the clicked
    results.
  • I use TF-IDF weighting.
  • Distance between two queries is computed as the
    dot product of the term vectors
  • Its hard to determine whether a site is
    preferred by a user!
  • clicked result ltgt relevant result ltgt preferred
    site
  • I associate each clicked result with a value in
    0, 1 representing confidence of site
    preference.
  • Confidence value is determined heuristically
    time spent in a webpage, results clicked

Implementation
Probabilistic Approach
Efficiency Issues
  • Limit the number of search records, terms and
    sites Use FIFO or LRU to evict obsolete items
  • Build an inverted index from terms to search
    records
  • Also an inverted index from sites to search
    records
  • Given a query q with characterizing terms T and
    its top 50 results sites S, use the indices to
    find all past searches whose terms intersect with
    T and preferred sites intersect with S
  • Implemented on top of UCAIR Toolbar, a Google
    Toolbar-like browser plug-in that enables search
    result personalization.
  • Only pull up one result to the top whose site is
    most likely to be preferred by the user.
  • Maximize O(L1w,q)
  • (Still exploring)

After the user searched for dais and clicked on
the dais.cs.uiuc.edu result, we respond to
queries dais and database uiuc by putting
dais.cs.uiuc.edu at the top.
Original top results
Write a Comment
User Comments (0)
About PowerShow.com