Learn from Web Search Logs to Organize Search Results - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Learn from Web Search Logs to Organize Search Results

Description:

Jaguar (car, animal, software, and sports team) Related Work. Clustering search results ... Use search engine logs to learn the interesting aspects, and then ... – PowerPoint PPT presentation

Number of Views:107
Avg rating:3.0/5.0
Slides: 19
Provided by: nlgCsie
Category:
Tags: learn | logs | organize | results | search | web

less

Transcript and Presenter's Notes

Title: Learn from Web Search Logs to Organize Search Results


1
Learn from Web Search Logs to Organize Search
Results
  • SIGIR2007
  • Xuanhui Wang ChengXiang Zhai
  • UIUC

2
Outline
  • Motivation
  • Problem
  • Research Statement
  • Related Work
  • Solution
  • Experiment
  • Discussion
  • Conclusion

3
Motivation
  • Effective organization of search results is
    critical for improving the utility of any search
    engine.
  • Allow a user to navigate into relevant documents
    quickly.

4
Problem
  • Clustering search results
  • Problem
  • The clusters discovered do not correspond to the
    interesting aspects of a topic from the users
    perspectives.
  • e.g., users are often interested in finding
    either phone codes or zip codes when entering
    the query area codes. The results are clustered
    into local codes and international codes.
  • The cluster labels generated are not informative
    to allow a user to identify the right cluster.
  • e.g., for the ambiguous query jaguar, a cluster
    may be labeled as panthera onca. Although this
    is an accurate label for a cluster with the
    animal sense of jaguar, if a user is not
    familiar with the phrase, the label would not be
    helpful.

5
Research Statement
  • Purpose
  • Learning interesting aspects of a topic from
    Web search logs and organizing search results
    accordingly.
  • Car (car rental, car pricing)
  • Generating more meaningful cluster labels using
    past query words entered by users.
  • Jaguar (car, animal, software, and sports team)

6
Related Work
  • Clustering search results
  • Scatter/ Gather algorithm
  • Grouper
  • Suffix Tree Clustering (STC) algorithm
  • Search logs
  • Frequent Asked Questions
  • Suggesting query substitutes
  • Personalized search
  • Learning retrieval ranking function

7
Search Engine Logs
  • Sample
  • All the titles, snippets, and URLs of the clicked
    Web pages of that query are used to represent the
    session.
  • Aggregate all the sessions which contain exactly
    the same queries together.
  • Pseudo-document

8
Solution
  • Given a input query
  • Get its related information from search engine
    logs. All the information forms a working set.
  • Learn aspects from the information in the working
    set. Each aspect is labeled with a representative
    query.
  • Categorize and organize the search results of the
    input query according to the aspects learned
    above.

9
Finding Related Past Queries
  • History data set
  • N pseudo-documents
  • Use OKAPI to calculate the similarity between
    query q and Qi
  • Given a query q, we use Hqd1,,dn to represent
    the top ranked pseudo-documents from the history
    collection H.

10
Learning Aspects by Clustering
  • Star clustering algorithm
  • Given Hq, it constructs a pair-wise similarity
    graph on this collection based on the vector
    space model.
  • A similarity graph Gs can be constructed using a
    similarity threshold parameter s.
  • The center provides a label for the cluster.

11
Categorizing Search Results
  • A simple centroid-based method
  • For each discovered aspect Ci, it builds a
    centroid prototype pi by
  • For each search result sj, it computes the cosine
    similarity between sj and pi, and then it assign
    sj to the aspect with the highest score.

12
Data Collection
  • MSN search log data set
  • 8,144,000 queries, 3441,000 distinct queries, and
    4,649,000 distinct URLs in the raw data. (2/3,
    1/3)
  • Only keep those frequent, well-formatted, English
    queries, it gets 169,057 unique queries.
  • Obtain 172 and 177 test cases in the first and
    second test sets respectively.

13
Experiment
  • Experiment Design
  • Baseline default ranked list from a search
    engine
  • Cluster-based simply clustering search results
  • Log-based their approach
  • Evaluations
  • P_at_5
  • MRR

14
Experimental Results
15
Detailed Analysis
  • Diversity Analysis
  • Diversity analysis
  • Difficulty analysis

16
Parameter Setting
17
An Illustrative Example
  • Query area codes (phone codes, zip codes)

18
Conclusion
  • Use search engine logs to learn the interesting
    aspects, and then categorize the search results
    into the aspects learned.
  • Their method can improve the ranking baseline,
    and generate more meaningful aspect labels.
  • Future work
  • Personal search log to improve the organization
    of search results for each individual user.
Write a Comment
User Comments (0)
About PowerShow.com