Learn from Web Search Logs to Organize Search Results - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Learn from Web Search Logs to Organize Search Results

Description:

Number of Views:107

Avg rating:3.0/5.0

Slides: 19

Provided by: nlgCsie

Category:

more less

Transcript and Presenter's Notes

Title: Learn from Web Search Logs to Organize Search Results

1
Learn from Web Search Logs to Organize Search
Results

2
Outline

3
Motivation

Effective organization of search results is
critical for improving the utility of any search
engine.
Allow a user to navigate into relevant documents
quickly.

4
Problem

Clustering search results
Problem
The clusters discovered do not correspond to the
interesting aspects of a topic from the users
perspectives.
e.g., users are often interested in finding
either phone codes or zip codes when entering
the query area codes. The results are clustered
into local codes and international codes.
The cluster labels generated are not informative
to allow a user to identify the right cluster.
e.g., for the ambiguous query jaguar, a cluster
may be labeled as panthera onca. Although this
is an accurate label for a cluster with the
animal sense of jaguar, if a user is not
familiar with the phrase, the label would not be
helpful.

5
Research Statement

Purpose
Learning interesting aspects of a topic from
Web search logs and organizing search results
accordingly.
Car (car rental, car pricing)
Generating more meaningful cluster labels using
past query words entered by users.
Jaguar (car, animal, software, and sports team)

6
Related Work

7
Search Engine Logs

All the titles, snippets, and URLs of the clicked
Web pages of that query are used to represent the
session.
Aggregate all the sessions which contain exactly
the same queries together.
Pseudo-document

8
Solution

Given a input query
Get its related information from search engine
logs. All the information forms a working set.
Learn aspects from the information in the working
set. Each aspect is labeled with a representative
query.
Categorize and organize the search results of the
input query according to the aspects learned
above.

9
Finding Related Past Queries

History data set
N pseudo-documents
Use OKAPI to calculate the similarity between
query q and Qi
Given a query q, we use Hqd1,,dn to represent
the top ranked pseudo-documents from the history
collection H.

10
Learning Aspects by Clustering

Star clustering algorithm
Given Hq, it constructs a pair-wise similarity
graph on this collection based on the vector
space model.
A similarity graph Gs can be constructed using a
similarity threshold parameter s.
The center provides a label for the cluster.

11
Categorizing Search Results

A simple centroid-based method
For each discovered aspect Ci, it builds a
centroid prototype pi by
For each search result sj, it computes the cosine
similarity between sj and pi, and then it assign
sj to the aspect with the highest score.

12
Data Collection

MSN search log data set
8,144,000 queries, 3441,000 distinct queries, and
4,649,000 distinct URLs in the raw data. (2/3,
1/3)
Only keep those frequent, well-formatted, English
queries, it gets 169,057 unique queries.
Obtain 172 and 177 test cases in the first and
second test sets respectively.

13
Experiment

14
Experimental Results
15
Detailed Analysis

16
Parameter Setting
17
An Illustrative Example

18
Conclusion

Use search engine logs to learn the interesting
aspects, and then categorize the search results
into the aspects learned.
Their method can improve the ranking baseline,
and generate more meaningful aspect labels.
Future work
Personal search log to improve the organization
of search results for each individual user.

Write a Comment

User Comments (0)