Personalized Web Search using Clickthrough History

About This Presentation

Title:

Personalized Web Search using Clickthrough History

Description:

International Institute of Information Technology (IIIT) ... Jaguar (cat /car) Lemur (animal / lemur tool kit) ... I am into biology best guess for Jaguar? ... – PowerPoint PPT presentation

Number of Views:197

Avg rating:3.0/5.0

Slides: 122

Provided by: researchw6

Category:

more less

Transcript and Presenter's Notes

Title: Personalized Web Search using Clickthrough History

1
Personalized Web Search using Clickthrough
History

U. Rohini
200407019
rohini_at_research.iiit.ac.in
Language Technologies Research Center (LTRC)
International Institute of Information Technology
(IIIT)
Hyderabad, India

2
Outline of the talk

Introduction
Current Search Engines Problems
Motivation
Background
Problem Description
Solution Outline
Contributions
Review of Personalized Search
I Search A suite of approaches for Personalized
Web Search
Personalized Search using user Relevance
Feedback Statistical Language modeling based
approaches
Simple N-gram based methods
Noisy Channel based method
Personalized Search using user Relevance
Feedback Machine Learning based approach
Ranking SVM based method
Personalization without Relevance Feedback
Simple Statistical Language modeling based method
Experiments
Query Log Study
Simulated Feedback
Conclusions and Future Directions

3
Outline of the talk

Introduction
Current Search Engines Problems
Motivation
Background
Problem Description
Solution Outline
Contributions
Review of Personalized Search
I Search A suite of approaches for Personalized
Web Search
Statistical Language modeling based approaches
Simple N-gram based methods
Noisy Channel based method
Machine Learning based approach
Ranking SVM based method
Personalization without Relevance Feedback
Experiments
Query Log Study
Simulated Feedback
Conclusions and Future Directions

4
Introduction

Current Web Search engines
Provide users with documents relevant to their
information need
Issues
Information overload
To cater Hundreds of millions of users
Terabytes of data
Poor description of Information need
Short queries - Difficult to understand
Word ambiguities
Users only see top few results
Relevance
subjective depends on the user
One size Fits all ???

5
Motivation

Search is not a solved problem!
Poorly described information need
Java (Java island / Java programming language
)
Jaguar (cat /car)
Lemur (animal / lemur tool kit)
SBH (State bank of Hyderbad/Syracuse Behavioral
Health care)
Given prior information
I am into biology best guess for Jaguar?
past queries - information retrieval, language
modeling best guess for lemur?

6
Background

Prior Information user feedback

7
Problem Description

Personalized Search
Customize search results according to each
individual user
Personalized Search - Issues
What to use to Personalize?
How to Personalize?
When not to Personalize?
How to know Personalization helped?

8
Problem Statement

Problem
How to Personalize?
Our Direction
Use past Search history
Long term learning
Sub Problems
Broken down into 2 sub problems
How to model and represent past search contexts
How to use it to improve search results

9
Solution Outline

1. How to model and represent past search
contexts
Past search history from user over a period of
time query logs
User contexts triples user,query,relevant
documents
Apply appropriate method, learn from user
contexts, build model user profile
User Profile Learning
2. How to use it to improve search results
Get Initial Search results
Take top few documents, re-score using user
profile and sort again
Reranking

10
Contributions

I Search A suite of approaches for Personalized
Web Search
Proposed Personalized search approaches
Baseline
Basic Retrieval methods
Automatic Evaluation
Analysis of Query Log
Creating Simulated Feedback

11
Outline of the talk

Introduction
Current Search Engines Problems
Motivation
Background
Problem Description
Solution Outline
Contributions
Review of Personalized Search
I Search A suite of approaches for Personalized
Web Search
Statistical Language modeling based approaches
Simple N-gram based methods
Noisy Channel based method
Machine Learning based approach
Ranking SVM based method
Personalization without Relevance Feedback
Experiments
Query Log Study
Simulated Feedback
Conclusions and Future Directions

12
Review of Personalized Search

Personalized Search
Query logs Machine learning
Language modeling Community based
Others

13
Outline of the talk

Introduction
Current Search Engines Problems
Motivation
Background
Problem Description
Solution Outline
Contributions
Review of Personalized Search
I Search A suite of approaches for Personalized
Web Search
Statistical Language modeling based approaches
Simple N-gram based methods
Noisy Channel based method
Machine Learning based approach
Ranking SVM based method
Personalization without Relevance Feedback
Experiments
Query Log Study
Simulated Feedback
Conclusions and Future Directions

14
I Search A suite of approaches for Personalized
Search

Suite of Approaches
Statistical Language modeling based approaches
Simple N-gram based methods
Noisy Channel Model based method
Machine learning based approach
Ranking SVM based method
Personalization without relevance feedback
Simple N-gram based method

15
Outline of the talk

Introduction
Current Search Engines Problems
Motivation
Background
Problem Description
Solution Outline
Contributions
Review of Personalized Search
I Search A suite of approaches for Personalized
Web
Statistical Language modeling based approaches
Simple Language model based method
Noisy Channel based method
Machine Learning based approach
Ranking SVM based method
Personalization without Relevance Feedback
Experiments
Query Log Study
Simulated Feedback
Conclusions and Future Directions

16
Statistical Language Modeling based Approaches
Introduction

Statistical language modeling task of
estimating probability distribution that captures
statistical regularities of natural language
Applied to a number of problems Speech, Machine
Translation, IR, Summarization

17
Statistical Language Modeling based Approaches
Background
Lemur
Query Formulation Model
Query
Given a query, which is most likely to be the
Ideal Document?
User Information need Ideal Document
In spite of the progress, not much work to
capture, model and integrate user context !
18
Motivation for our approach
Ideal document
Encyclopedia gives a brief description of the
physical traits of this animal.

The Lemur toolkit for language modeling and
information retrieval is documented and made
available for download.
Information retrieval

User Past Search Contexts
Information retrieval (IR) is the science of
searching for information in documents,
searching for documents themselves, searching
for metadata which
19
Statistical Language Modeling based Approaches
Overview

From user contexts, capture statistical
properties of texts
Use the same to improve search results
Different Contexts
Unigram and Bigrams
Simple N-gram based approaches
Relationship between query and document words
Noisy Channel based approach

20
Outline of the talk

Introduction
Current Search Engines Problems
Motivation
Background
Problem Description
Solution Outline
Contributions
Review of Personalized Search
I Search A suite of approaches for Personalized
Web
Statistical Language modeling based approaches
Simple N-gram based methods
Noisy Channel based method
Machine Learning based approach
Ranking SVM based method
Personalization without Relevance Feedback
Experiments
Query Log Study
Simulated Feedback
Conclusions and Future Directions

21
N-gram based Approaches Motivation
Ideal document
Lemur - Encyclopedia gives a brief description of
the physical traits of this animal.

The Lemur toolkit for language modeling and
information retrieval is documented and made
available for download.
Information retrieval

Past Search Contexts
Unigrams Information Retrieval Documents
Bigrams Information retrieval Searching
documents Information documents
Information retrieval (IR) is the science of
searching for information in documents,
searching for documents themselves, searching
for metadata which
22
Sample user profile
23
Learning user profile

Given Past search history
Hu (q1, rf1), (q2, rf2), , (qn, rfn)
rfall contentation of all rf
For each unigram wi
User profile

24
Reranking

Recall, in general LM for IR
Our Approach

25
Outline of the talk

Introduction
Current Search Engines Problems
Motivation
Background
Problem Description
Solution Outline
Contributions
Review of Personalized Search
I Search A suite of approaches for Personalized
Web
Statistical Language modeling based approaches
Simple N-gram based methods
Noisy Channel based method
Machine Learning based approach
Ranking SVM based method
Personalization without Relevance Feedback
Experiments
Query Log Study
Simulated Feedback
Conclusions and Future Directions

26
Noisy Channel based Approach

Documents and Queries different information
spaces
Queries short, concise
Documents more descriptive
Most methods to retrieval or personalized web
search do not model this
We capture relationship between query and
document words

27
Noisy Channel based approach Motivation

Query Generation Process (Noisy Channel)
Ideal Document
Retrieval
Query Generation Process (Noisy Channel)
28
Similar to Statistical Machine Translation

Given an english sentence translate into french
Given a query, retrieve documents closer to ideal
document

Noisy channel 1
English Sentence
French Sentence
P(e/f)
Noisy Channel 2
Ideal Document
Query
P(q/w)
29
Learning user profile

User profile Translation Model
Triples (qw,dw,p(qw/dw))
Use Statistical Machine Translation methods
Learning user profile training a translation
model
In SMT Training a translation model
From Parallel texts
Using EM algorithm

30
Learning User profile

Extracting Parallel Texts
From Queries and corresponding snippets from
clicked documents
Training a Translation Model
GIZA - an open source tool kit widely used for
training translation models in Statistical
Machine Translation research.

U. Rohini, Vamshi Ambati, and Vasudeva Varma.
Statistical machine transla- tion models for
personalized search. Technical report,
International Institute of Information
Technology, 2007
31
Sample user profile
32
Reranking

Recall, in general LM for IR
Noisy Channel based approach

lemur
P(retrieval/lemur)
Lemur encyclopedia brief
Lemur toolkit information retireval
Lemur - Encyclopedia gives a brief description of
the physical traits of this animal.
The Lemur toolkit for language modeling and
information retrieval is documented and made
available for download.
D1
D4
33
Outline of the talk

Introduction
Current Search Engines Problems
Motivation
Background
Problem Description
Solution Outline
Contributions
Review of Personalized Search
I Search A suite of approaches for Personalized
Web
Statistical Language modeling based approaches
Simple N-gram based methods
Noisy Channel based method
Machine Learning based approach
Ranking SVM based method
Personalization without Relevance Feedback
Experiments
Query Log Study
Simulated Feedback
Conclusions and Future Directions

34
Machine Learning based ApproachesIntroduction

Most machine learning for IR - Binary
classification problem relevant and
non-relevant
Click through data
Click is not an absolute relevance but relative
relevance
i.e., assuming clicked relevant, un clicked -
irrelevant is wrong.
Clicks biased
Partial relative relevance - Clicked documents
are more relevant than the un clicked documents.

35
Background

Ranking SVM
A variation of SVM
Learns from Partial Relevance Data
Learning similar to classification SVM

36
Ranking SVMs based method

Use Ranking SVMs for learning user profile
Experimented
Different features
Unigram, bigram
Different Feature weights
Boolean, Term Frequency, Normalized Term Frequency

37
Learning user profile

User profile a weight vector
Learning Training an SVM Model
Steps
Extracting Features
Computing Feature Weights
Training SVM

1. Uppuluri R, Ambati V, Improving web search
results using collaborative filtering, In
proceedings of 3rd International Workshop on Web
Personalization (ITWP), held in conjunction with
AAAI 2006, 2006. 2. U. Rohini and Vasudeva
Varma. A novel approach for re-ranking of search
results using collaborative filtering. In
Proceeedings of International Conference on
Computing Theory and Applications (ICCTA07),
pages 491495, Kolkota, India, March 2007
38
Extracting Features

Features unigram, bigram
Given Past search history
Hu (q1, rf1), (q2, rf2), , (qn, rfn)
rfall contentation of all rf
Remove stop words from rfall
Extract all unigrams (or bigrams) from rfall

39
Computing Feature Weights

In each Relevant Document (di), compute weights
of features
Boolean Weighting
1 or 0
Term Frequency Weighting
tfw Number of times it occurs in di
Normalized Term Frequency Weighting
tfw/ di Q

40
Training SVM

Each relevant document represent as a string of
features and corresponding weights
We used SVMlight for training

41

Sample Training
Sample User Profile
42
Reranking

Sim(Q,D) W. ?(Q,D)
W weight vector/user profile
?(Q,D) vector of term and their weights
Measure of similarity between Q and D
Each term term in the query
Term weight product of weights in the query and
the document (boolean, term frequency,normalized
term frequency)

43
Outline of the talk

Introduction
Current Search Engines Problems
Motivation
Background
Problem Description
Solution Outline
Contributions
Review of Personalized Search
I Search A suite of approaches for Personalized
Web
Statistical Language modeling based approaches
Simple N-gram based methods
Noisy Channel based method
Machine Learning based approach
Ranking SVM based method
Personalization without Relevance Feedback
Experiments
Query Log Study
Simulated Feedback
Conclusions and Future Directions

44
Personalized Search without Relevance
FeedbackIntroduction

Can personalized be done without relevance
feedback about which documents are relevant
How much informative are the queries posed by
users
Is information contained in the queries enough to
personalize?

45
Approach

Past queries of the user available
Make effective use of past queries
Simple N-gram based approach

46
Learning user profile

Given Past search history
Hu q1 q2, qn
qconcat Concatenation of all queries
For each unigram wi
User profile

47
Sample user profile
48
Reranking

In general LM for IR
Our Approach

U. Rohini, Vamshi Ambati, and Vasudeva Varma.
Personalized search without relevance feedback.
Technical report, International Institute of
Information Technology, 2007
49
Outline of the talk

Introduction
Current Search Engines Problems
Motivation
Background
Problem Description
Solution Outline
Contributions
Review of Personalized Search
I Search A suite of approaches for Personalized
Web
Statistical Language modeling based approaches
Simple N-gram based methods
Noisy Channel based method
Machine Learning based approach
Ranking SVM based method
Personalization without Relevance Feedback
Experiments
Query Log Study
Simulated Feedback
Conclusions and Future Directions

50
Experiments Introduction, Problems

Aim To see how they perform by comparing it with
a baseline
Problems
No standard evaluation framework
Data
Lack of standardization
Comparison with previous work difficult
Difficult to repeat previously conducted
experiments
Difficult to share results and observations
Repeating effort to collect data over and over
Identified as a problem and need for
standardization (Allan et al. 2003)
Lack of standard personalized search baselines
In our work, used a variation of the Rocchio
Algorithm
Metrics

51
Experiments Data

Click through data from a popular search engine
Data collected from 250k million users over 3
months data in 2006.
Consists of (anonymous id, query,
timestamp,position of the click,domain name of
the click url)

52
Experiments Sample Data

53
Issues with the query log data

Web Search engines
Changing search engine indices
However, top 10 results mostly same
Implicit feedback Partial relevance feedback
90 of the users click only top 10 results.
95 only top 5 results
Only contained the domain name of the clicked URLs

54
Extracting Data Set

Conditions
A query should have at least 1 click
Exhibit long term behaviour (pose query over 3
months and exhibit similar interests)
Assumptions
Each anonymous id corresponds to one user
Use the domain name of the click url while
comparing
Final Data Set
How to split the data for training (learning user
profile) and testing ?
Temporally
Training data learning user profile, Testing
data Testing
First 2 months for training, third month for
testing
17 users
51.88 average queries in train set and 12.64
average queries in test set.

55
Baseline

Variation of Rocchio algorithm (Rocchio 1971)
Learning profile
User profile Vector of word and weights
For each query
For each clicked document
Collect corresonding snippet from search engine
Concatenate all such snippets for all queires
Compute frequency distribution of words
Reranking
Sim (Q,D) (tfq/Q tfrup/RUP). tfD/D

56
Metrics

MRR Mean Reciprocal Rank
Mrr(Q,D,u) ?q ? Q rr(q,RQ,D,u )
-----------------------
Q
rr(q,RQ,D,u ) position of the first relevant
document and 0 if no relevant result in the top
N(10).

57
Set up
Reranker 1. Rerank top M(10) resuts click
through data 2. First get the results from
google, Ignore ranks given by Google
(Similar to Tan, Shen Zhai 2006) 3. Rescore the
results using appropriately 4. Sort in
descending order and return
Top m urls
Results
Reranked Reslts
Query
Test Data Queryclicked urls
Clicked urls
Compare top n urls
MRR, P_at_n
58
Results Simple N-gram based Methods
59
Noisy Channel Based Method

Experiment 1
Comparison with baseline
Experiment 2
Different methods of extracting parallel texts
Experiment 3
Different training schemes
Different contexts for training
Different training models

60
Experiment 1
Comparison with baseline
61
Experiment 2

Extracting Parallel Texts Comparison of methods

62
Results
NS1 Query Snippets of relevant documents
NS3 Query Snippets of relevant documents
document Title Snippets
Synthetic query Snippets NS2 - Query
Snippets of relevant documents NS2 -
Query Snippets of relevant documents
Synthetic query Snippets
Synthetic query Snippets
query document title
63
Experiment 3

Different training schemes
Different contexts for training
Snippet Vs Document
Different training models
Different Training Models

64

Data and Set up

Data
Explicit Feedback data collected from 7 users
For each query, each user examined top 10
documents and identified top 10 documents
Collected the top 10 results for all queries.
Total documents 3469 documents
Set up
3469 documents - created lucene index.
For reranking, first retrieve the results using
lucene and then rerank them using the noisy
channel approach.
We perform 10 fold cross validation

65
Results
66
Results
I - Document Training and Document Testing II
- Document Training and Snippet Testing III -
Snippet Training and Document Testing IV -
Snippet Training and Snippet Testing
67
Results SVM
SVM1 - unigram, Binary SVM2 - unigram, Term
Frequency SVM3 - unigram, normalized term
frequency SVM4 - bigram, normalized term
frequency SVM4 unigrams bigrams, normalized
term frequency
68
Results Personalization without Relevance
Feedback
PRWF personalization without relevance feedback
using only the profile learnt from queries
alone PRWFSmoothing smoothing the
probabilities from the user profile using huge
query language model obtained from all the
queries from all the users in collection 01 of
the click through data
69
Experiments Summary

Language Modeling Best Results!
Interesting framework Personalized Search
Simple N-gram based approaches also worked well
Noisy Channel model worked best
Extracting Synthetic Queries helped
Different Training schemes
IBM Model1 Vs GIZA
Snippet Vs Document
Machine Learning competitive results
Different Features and weights
Without Relevance Feedback Very encouraging
results
Simple Approach worked well
Sparsity Query log was useful

70
Outline of the talk

Introduction
Current Search Engines Problems
Motivation
Background
Problem Description
Solution Outline
Contributions
Review of Personalized Search
I Search A suite of approaches for Personalized
Web Search
Statistical Language modeling based approaches
Simple N-gram based methods
Noisy Channel based method
Machine Learning based approach
Ranking SVM based method
Personalization without Relevance Feedback
Experiments
Query Log Study
Simulated Feedback
Conclusions and Future Directions

71
Query Log Study Introduction

Large interest in finding patterns and computing
statistics from query logs
Previous work
Patterns statistics of queries Common
queries, avg. no. of words, avg. no. of queries
per session etc.
Little work on analyzing click behaviour of users
Granka et. al - Eye tracking study

72
Query Log Study Our Analysis

Analyzing clicking behaviour of users
Study if any general pattern in clicking
behaviour
Aim to answer the following
Expt1 Do all users view results from top to
bottom?
Expt2 Do all users view same number of results?

73
Query Log Study Observations

Expt1 All users view results from top to bottom?
YES!! - For 90 of Queries
Why is this important ?
Expt2 How many top results does the user view?
gt Deepest click made by users
Statistical Analysis showed that deepest clicks
made by a sample of users follow a Zipfs
distribution or Power law
Many users view only top 5 (about 90/95), few
users view top 10, much fewer view top 20 and so
on
Why is this important?

74
Outline of the talk

Introduction
Current Search Engines Problems
Motivation
Background
Problem Description
Solution Outline
Contributions
Review of Personalized Search
I Search A suite of approaches for Personalized
Web Search
Statistical Language modeling based approaches
Simple N-gram based methods
Noisy Channel based method
Machine Learning based approach
Ranking SVM based method
Personalization without Relevance Feedback
Experiments
Query Log Study
Simulated Feedback
Conclusions and Future Directions

75
Simulated Feedback Introduction

Relevance Feedback Types, problems
Explicit
Difficult to collect
Implicit
Clickthrough data from search engines not
available
Repeatability of experiments Problem!
Web Dynamic data collections Feedback
collected becomes stale
Privacy

76
Simulated Feedback Motivation

Simulated Feedback Like from explicit and
implicit feedback
Potential area outcome useful for web search
and personalization
Easy to create
Customizable
Large amounts can be created
Repeatable
Testing specific domains

77
Simulated Feedback Creation
SIMULATOR
Web Search Behaviour Simulator Step1 Formulate
query Step2 Posing to a search engine Step3
Looking at results returned by search
engine Step4 Possibly clicking one or more
results
Simulator
User Creator
Parameters

Simulated Feedback
78
Outline

Introduction
Current Search Engines Problems
Motivation
Background
Problem
Solution Outline
Contributions
Review of Personalized Search
Thesis Outline
Statistical Language modeling based approaches
Simple Language model based approaches
Noisy Channel
Machine Learning based approach
Ranking SVM
Personalization without Relevance Feedback
Experiments
Conclusions and Future Directions

79
Conclusions

Statistical Language Modeling based approaches
Machine learning based approach
Personalized Search without relevance feedback
Performed evaluation using query log data
Query Log Analysis and Simulated Feedback

80
Future Directions

Recommending Documents
Extend to exploit Repetition in queries and
clickthroughs
Language Modeling based Approaches
Capture Richer context
N-gram based method trigrams etc
Noisy Channel based method bigram
Machine learning based Approaches
Can learn non-text patterns or behaviour
Personalized Summarization
Simulating user behaviour

Thank you

82
Simple N-gram based approaches

N-gram general term for words
1-gram unigram, 2-gram bigram
Capture statistical properties of text
Single words (Unigrams)
Two adjacent words (Bigrams)

83
Query Log Study Introduction

Query logs
Large interest in finding patterns and computing
statistics from query logs
Previous work
Patterns and statistics on queries
Common queries, avg. no. of words, avg. no. of
queries per session etc
Little work on analyzing click behaviour of users
Granka et. al - Eye tracking study

84
Query Log Study Our Analysis

Focus on Analyzing clicking behaviour of users
Study if any general pattern in clicking
behaviour
Aim to answer the following
All users view results from top to bottom (Expt
1)
All users view same number of results? (Expt 2)

85
Query log Data

Click through data from a popular search engine
Data collected from 250k million users over 3
months data in 2006.
Consists of (anonymous id, query,
timestamp,position of the click,domain name of
the click url)

86
Sample Data

87
Experiment 1

All users view results from top to bottom?
Position position of the search result in the
search engine
For each query
Arrange clicks based on time of click
If all the postions are in ascending order, user
views from top to bottom
The query is said to be an anomaly if not so!

88
(No Transcript)
89
Observations

For 90 of the queries, users always go from top
to bottom!!!
For the rest 10 queries
Uses clicks at least one bottom result before
clicking a top result
User not happy with search engine ranking
Not the behaviour of the user - 50 users
exhibit it
Certain Queries are hard ?

90
Experiment 2

How many top results does the user view?
Intuition
Typically users dont view all the results
Only top few How many?
Depends on the user?
Goal To see, how deep a user goes to see results

Patience how many results a user views
For each query, the deepest click. Maximum over
all queries
For each query, average click. Maximum over all
queries

92
For each query, the deepest click. Maximum over
all queries
93
For each query, average click. Maximum over all
queries
94
Observations

Statistical Analysis show they follow a Zipfs
distribution or Power law
Many users view only top 5 (about 90/95), few
users view top 10, much fewer view top 20 and so
on
Can characterize patience of a group of users
using Zipfs law or power law

95
Simulated Feedback

Relevance Feedback
Explicit
Difficult to collect
Implicit
Clickthrough data from search engines not
available
Repeatability of experiments Problem!
Web Dynamic data collections Feedback
collected becomes stale

96
Simulated Feedback

Simulated Feedback Drawing analog from explicit
and implicit feedback
Potential area outcome useful for web search
and personalization
Easy to create
Customizable
Large amounts can be created
Repeatable
Testing specific domains

Creating simulated Feedback
Creating Simulated user
Simulating user web search behaviour

U. Rohini, Vamshi Ambati, and Vasudeva Varma.
Creating simulated feedback. Technical report,
International Institute of Information
Technology, 2007.
98
Creating Simulated User

User Specific Parameters (Unique id etc)
Web search Specific parameters
Patience (From Query log analysis)
Threshold
Others can be Interests (User Profile/Model),
Browsing History etc.

We considered Patience and threshold in this work
99
Patience
Pick From Power law Distribution. Many users view
top 5, less few top 10, much fewer view top 20
and so on
100
Relevance Threshold

Depends on the query and user
For some query, very high relevance is needed
We compute it according to the query for each user

101
Simulating user web search behaviour

Formulate a Web Search Process
Step1 Create the query
Step2 Posing to a search engine
Step3 Looking at the results returned by the
search engine
Step4 Possibly clicking one or more results
Step 5 Reformulate if unsatisfied
Simulate the search process for the created user

We consider only Steps 1 to 4 in our approach
102
Simulating Step1Formulating the query

Can be very complex
We take a simple and practical approach
As of now, the queries are assumed to be given to
the system

103
Simulating Step2Searching the Search Engine

Given a search engine
Pose the query from Step1 to the search engine
Get the search results.

104
Simulating Step3Looking at the Search Results

Simulation of this step can be done in a number
of ways
Ex Random, top to bottom, bottom to up etc
We consider
Sequential from Top to bottom until patience is
zero
For each document performs clicks as in Step4
(motivated by Radlinski et al, Granka et al )

105
Simulating Step 4Clicking the results

Crucial Step of our simulation
User Clicks a result if
The snippet shown by the search engine appears to
be relevant to the user
The result below it is not more relevant than it
(motivated by Radlinski et al, Granka et al )

106
Simulated Feedback Creation
Search results
Search Engine
Simulator
Web Search Behaviour Simulator
User Creator
Parameters
Simulated Feedback
107
Evaluation Problems

Is Simulated Feedback relevant?
How different is it from a randomly created
feedback?
Evaluation -
No standard methods to evaluate
No Metrics to quantify success
How and what to compare ?

108
Experiments

Experiment 1
Comparison with Implicit Feedback from Query log
Data
Experiment 2
Comparison with Baselines
Experiment 3
Comparison with Explicit Feedback

109
Experimental Set up

Creating simulated user
Randomly assign unique id
Patience
Draw randomly from Power law Distribution 1-
25

110
Experimental set up

Simulating Web Search Process
Pick a user from query log, gather all queries
posed by him.
Simulate Web search process of each query in
succession
Step 1 Formulating a query
Pick each query in succession from the gathered
queries
Step 2 Searching the Search engine
Pose the query to a search engine and gather
results
Step 3 Looking at the results
Step 4 Clicking one or more

111
Sample Data Created
112
Experiment 1

Comparison with clickthroughs from query log
For each query Relevance Document Pool (RDP)
All clicked documents for the query from all the
users in the query log
Average Accuracy 60.04

113
Experiment 2

Random Navigation
Power law Navigation
Random click

114
Creating user
115
Creating Web Search Process
116
Results
117
Experiment 3

Comparison with explicit Feedback
4 Judges
Select small sub set of data created
25 users
1 query per user total 25 queries
We consider the query, and the simulated feedback
created for this query

118

Each judge given an evaluation form
Evaluation form
Details about the judge
A table containing query and corresponding
simulated click urls
For each simulated click judge feedback
Boolean feedback 1 or 0

119
Results

Judge Accuracy 66.02
Correlation between the judges 0.859

120
Discussion

6 increase in accuracy over comparison with
query log
Match problems
Search Engine index changes Relevance feedback
becomes stale!
Too low relevant documents in RDP
qualcom.com - Only one document in RDP.
Focussed query, only user posed it
Focussed query Vs General query
qualcomm.com - only one query , one user posed
lottery - 58 users , 24 unique click urls

121
Reranking

In general LM for IR
Noisy Channel based approach

lemur
Lemur encyclopedia brief
Lemur toolkit information retireval
The Lemur toolkit for language modeling and
information retrieval is documented and made
available for download.
Lemur - Encyclopedia gives a brief description of
the physical traits of this animal.

Write a Comment

User Comments (0)