Adaptive Information Retrieval presentation

About This Presentation

Transcript and Presenter's Notes

Title: Adaptive Information Retrieval

1
Adaptive Information Retrieval

Eren Manavoglu1
Advisor
Dr. C. Lee Giles1,2

1Department of Computer Science and Engineering
The Pennsylvania State University, University
Park, PA, 16802
2School of Information Sciences and Technology
The Pennsylvania State University, University
Park, PA, 16802

2
Information Retrieval over the Web Today

Main tools Search engines
User input query terms/questions
Result presentation ranked list of matching
documents
Ranking algorithm a combination of topical
relevancy, citations and document popularity
(probably)

3
Why Do We Need Adaptive Information Retrieval?

Different users use the same query to access
different information.
Jaguar animal or automobile?
Even the same user may have different information
needs at different points in time about the same
topic.
Dick Cheney Scooter scandal last month, Hunting
accident today.

4
Adaptive Information Retrieval Techniques

Personalization
Personalization on server side Relies on
explicit user input mainly (MyYahoo, Personalized
Google) and the parameters of the ranking
algorithm is personalized
Personalization on client side Uses both
implicit and explicit user feedback. The user
model is usually in the form of interest
profile(s)
Recommender Systems
Will be discussed in more detail later
Clustering search results
Will be discussed in more detail later

A survey can be found in Manavoglu, Giles,
Sping, Wang, The Power of One - Gaining Business
Value from Personalization Technologies, Chapter
9
5
What this Proposal is about

Learn the user behavior models to capture
individual differences
Group the documents at query time to capture the
current state of available information sources
Use the user model to highlight or recommend
groups of documents of interest to him/her

6
Work Done So Far

Recommender Systems
User Behavior Modeling
Clustering Search Results

7
Recommender Systems

Goal To automate the word-of-mouth
recommendations within a community.
Approaches
Content-based filtering Recommendations are done
solely based on the content of the items (e.g.
topic, author, etc.)
Collaborative filtering Similar users are
identified based on the items they view and then
the items viewed by like-minded users are
recommended to a given user Resnick 94
Hybrid Both content of the items and the
similarity of the users are used in making
recommendations Shani 2005

8
Recommender Systems -A Probablistic Approach

Represent the data as a collection of ordered
sequences of document requests
Assume that we are given a dataset consisting of
ordered sequences in some fixed alphabet. The
elements of the alphabet are documents.
For each document define its history H as a
so-far observed subsequence of the current
sequence
Our goal is to predict the next document Dnext
given the history H
Learn a probabilistic model P(Dnext H, Data)

9
Recommender Systems - Probabilistic Models

A mixture model based approach could be taken for
learning the probabilistic model P(Dnext H,
Data)
Nc is the number of components, P(Dnext
H,Data,k) is the component model
Component distribution, P(Dnext H,Data,k),
should make use of the sequence information and
the learning should be scalable
Mixture of multinomials, mixture of markov models

10
Proposed Recommendation Model - Maximum Entropy
(maxent)

Maximum entropy provides a framework to combine
information from different knowledge sources.
Each knowledge source imposes a set of
constraints.
Intersection of all constraints contains a set of
probability functions.
Maximum entropy principle chooses the one with
the highest entropy, i.e. most flat function.
Advantage of maxent approach
Combine 1st-order Markov model features and long
term dependencies among the data.

11
Representation for Long Term Dependence - Triggers

Long term dependence of Dnext on H is represented
with triggers.
The pair of actions (a,b) is a trigger iff
is substantially different from P(Dnextb).
Triggers are pairs with high mutual information
scores.

12
Maximum Entropy Model Definition

Maximum entropy objective function leads to the
following model

Fs(D,H) are the feature indicator functions. S is
the total number of bigramstriggers for document
D
s are maxent model parameters for component k
If s corresponds to bigram (a,b) then Fs(D,H)1
iff Da and Dprevb, and Fs(D,H)0 otherwise
Z?,k(H) is a normalization constant

13
Maximum Entropy Model -A Complication and its
Solution

Maxent computation is expensive if the number of
items is large. On a dataset of 500K items it
could take months
This problem can be solved if the items are
clustered before applying the maxent model
We employed a user navigation based clustering
algorithm using this simple idea documents that
are requested consecutively are related
Compute the document bigrams
Apply greedy clustering using the bigram counts

Details of the algorithm can be found in
Pavlov, Manavoglu, Giles, Pennock, Giles 2004
14
Experimental Evaluation

Competing Models
Maxent
1st Order Markov model
Mixture of Markov models (30-components)
Mixture of Multinomial (60-components, with the
similarity based CiteSeer predictors as its
feature set)
Correlation
Individual CiteSeer similarity-based predictors
CiteSeer merged-similarity-based predictor
We call ResearchIndex Merge the recommender that
is obtained by pulling all similarity-based
recommenders togetheressentially, this
corresponds to the currently available
recommending system in CiteSeer.

15
Current Recommendation Algorithms in CiteSeer

Similarity recommenders
Content-based text or citation similarity
Sentence Similarity, Text Similarity, Active
Bibliography, Cited By, Co-citations, Cites
Collaborative user similarity
Users Who Viewed
Based on source similarity
On Same Site

For detailed information on CiteSeer
recommenders please see Lawrence, Giles,
Bollacker, 1999.
16
Experimental Setup

Conducted on 6 months worth of CiteSeer data in
the form of lt time, user, request gt
Chronologically partitioned the data into 5
million training and 1.7 million test requests
Document requests were aggregated by userID and
sessionized. An inactivity time of 300 seconds
was used to break the requests into sessions
Robots were identified and eliminated based on
the number of requests submitted
Documents that appeared less than 3 times were
discarded (250,000 documents remained)

17
Evaluation Metrics

Hit ratio the ratio of correct predictions to
the total number of predictions made
Average height of prediction Predictions are a
list of ranked actions, where the ranking is done
by ordering the actions by their probability
values, thus average height of prediction is the
average height of the hit within this list
First N predictions in the list are considered
and the performance is evaluated based on the
success of these N predictions, for N1,,5,10
(called bins hereafter)

18
Results

Among the individual CiteSeer recommenders
Active Bibliography was the best predictor.
However the merged predictor performed better
than Active Bibliography hence it will represent
the CiteSeer recommender in the remainder of the
experiments.

19
Results
Although merged CiteSeer predictor does better
for longer recommendation lists, maxent is the
best for small number of guesses. Correlation
improves as the recommendation list gets
longer Also presented is the average height
results of the competing models. With respect to
average height maxent outperformed all the other
recommenders
20
Conclusions - Discussion

Maxent proved to be one of the best models
especially when the number of recommendations to
be made was limited
But clustering the documents is a crucial point
for its feasibility.
Most of the experiments done in the field of
recommender systems have been offline
experiments. However this could be misleading,
the effect of the recommendation on the user
should also be taken into account. We plan to
setup a medium for online experiments.

21
User Behavior Models
22
Motivation

In the previous section we focused on only the
items users viewed. But users do much more than
just viewing items. They browse, query, look at
recommendations, etc.
Can we improve user experience by looking at
these other actions as well?
Can we detect what the users are trying to do in
a given session?

23
User Behavior Modeling

What data do we have about users?
Web logs, in clickstream format
How can we make use of this data?
By applying web usage mining techniques

24
Previous Work on Web Usage Mining

Frequent item set identification by association
rule extraction methods1
Collaborative Filtering for recommender
systems2,3
Probabilistic graphical models (e.g. Bayesian
nets) to discover and represent dependencies
among different variables4
Sequential pattern analysis to discover patterns
in time-ordered sessions5

1Liu et al, Integrating Classification and
Association Rule Mining. Fourth International
Conference on Knowledge Discovery and Data
Mining, 1998 2Resnick et al, GroupLens. An open
architecture for Collaborative Filtering of
Netnews. ACM 1994 Conference on Computer
Supported Cooperative Work. 3Pavlov, Manavoglu,
Pennock, Giles, Collborative Filtering with
Maximum Entropy. IEEE Intelligent Systens
2004 4D. Heckerman, Bayesian Networks for Data
Mining. Data Mining and Knowledge Discovery
1997 5Cadez et al, Predictive Profiles for
Transaction Data Using Finite Mixture Models. TR,
UC Irvine 2001
25
Problem Definition

We propose the following sequential framework for
modeling user behavior.
Each user session is represented as a sequence,
labeled with a user ID.
Each individual item in the sequence represents a
user action.
For each action, history H(U) is defined as the
so-far observed ordered sequence of actions.
P(AnextH(U),Data) is the behavior model for user
U that predicts the next action Anext, given the
history H(U).

User U, time t
User U, time t1
User U, time t2
Action At Query
Action At1 Browse
Action At2 Help
H(U) empty
H(U) At
H(U) AtAt1
26
Problem Definition

Problem
To infer P(Anext H(U),Data) for each individual
given the training data.
Method Proposed
Mixture Models

27
Why Mixture Models?

Mixture models can handle heterogeneous data and
provide a clustering framework.
Mixture models are weighted combinations of
component models.
Each component represents a dominant pattern in
the data.
A global mixture model captures the general
patterns among users.
A mixture model can be personalized to capture
individual behavior patterns.

Buyer Model
Browser Model
Customer A (Frequently visits, but rarely buys)
0.2Buyer Model 0.8Browser Model
McLachlan Basford 1988, Cadez 2001
28
Mixture Models

Global Mixture Models
Parameters are same for all individuals. An Nc-
component mixture model

29
Personalized Mixture Models

Learn individual component probabilities, aU,ks
for each user. Resulting model is therefore
specific to each user.

For component distribution P(AnextH(U),Data,k)
Markov and maximum entropy distributions are used.

30
Parameter Estimation

Learn the parameters of the global model by
expectation-maximization (EM) algorithm.
Fix the component distributions.
Learn the individual component weights, aU,ks,
for known users by EM.
The aU,ks for users who do not appear in the
training data will be the global aks values.

31
Visualization

Each component of a mixture model can be viewed
as a cluster.
Each session is a weighted combination of these
clusters.
Probability of user session Su belonging to
cluster k is

Sample Cluster

Each session is assigned to the cluster that
maximizes P(kSU,Data).

32
Visualization - Graphs

If number of actions is large, viewing and
interpreting sample sessions becomes impossible.
Build a graph for each cluster.
Triggers and bigrams are computed and sorted for
each cluster
Pairs of actions gt threshold value are used to
build the graph
The sequence of actions defines the direction of
the edges

33
Experimental Setup

CiteSeer1 log files of approximately 2 months.
Log files are series of transactions in the form
of lttime, action, user ID, related infogt.
lt1014151912.455 event authorhomepage 3491213
197248 - - ip 42779 agent Mozilla/2.0
(compatible MSIE 3.02 Update a Windows NT)gt
Returning users are identified by cookies.
Actions are broken into sessions based on time
heuristics (300 seconds of inactivity indicates
the end of a session).
User1 Seesion1 37 31 34 31 32
User1 Session2 37 31 34 31 34 33 34
User2 Session1 31 34 33 20
User2 Session2 33 20
Robots are identified based on the number of
requests per session and are removed.

http//citeseer.ist.psu.edu
34
CiteSeer user actions and descriptions used
during experiments
35
Experiments

User behavior models are evaluated based on the
accuracy of their predictions.
User behavior clusters are visualized to
demonstrate the descriptive ability of the
models.
For maxent models the history was empirically set
to the last 5 actions.

36
Hit Ratio Results Returning Users
Hit ratio results on known users for 3 component
mixture model

Personalized models outperformed global models.
Personalization significantly improved for maxent
model.
Personalized Markov mixture is better for
returning users.

Hit ratio results on known users for 10 component
mixture model
37
Hit Ratio Results All Users
Hit ratio results on all users for 3 component
mixture model

Maxent performs better than Markov for all users.
Maxent chooses the most flat function and has
more parameters. Mixture of Markov models can
better fit the training data (for returning
users).

Hit ratio results on all users for 10 component
mixture model
38
User Behavior Clusters - Maxent Model Graphs

Cluster 4 Users Querying CiteSeer through
another search engine
Cluster 6 Users who use CiteSeer query interface
Cluster 9 Users who look at recommendations and
citations

39
Conclusions

Both mixture of maximum entropy and Markov models
are able to generate strong predictive models.
Markov models are better for predicting future
actions of returning users.
Maximum Entropy models perform better for
predicting the global model and first time users.
Predicting future actions could be useful in
recommending shortcuts and personalizing the user
interface.
CiteSeer users significantly differ in the way
they use the engine.
Mixture models can be used to discover these
different patterns.
Identifying the cluster a session belongs to can
be used to recommend certain functionalities of
CiteSeer (e.g. bibtex, recommendations, etc).

Manavoglu, Pavlov, Giles ICDM 2003
40
Navigation Graph
41
Query Behavior Analysis(Work in Progress)
42
Query Behavior Patterns - Motivation

Goal
To investigate how the users interact with the
query interface
To identify dominant patterns of querying
behavior, which could then be used in
recommendations
If the user is submitting the same query over and
over should we show the same recommendations?
If the user is modifying the original query
string persistently should we change the
recommendation criteria to give more priority to
diversity of the recommendations (because
modifying the query could be an indication of
unsatisfactory results/recommendations)?

43
Query Types

Query Types
New Query String none the terms in the query
(excluding stop words and logical operators) were
seen in the previous query
Modified Query String a partially new query,
some terms are eliminated and/or added
Same Query String same as the most previous
query string
Document Query Document database is queried
Citation Query Citation database is queried

44
Query Behavior Patterns - Methodology

Sequential Modeling
Sessionize log files
Consider the queries in a session as an ordered
sequence
Identify the query types within each session
Investigate the patterns for the ordered sequence
of query types
Algorithm
Mixture of 1st order Markov
Experimental Setup
Same as the previous section

45
Query Types - Example

Input
ltsessionID queryEvent userID queryStringgt
1 documentquery uid1 online routing
1 documentquery uid1 online routing lipmann
1 documentquery uid1 lipmann
1 documentquery uid1 Maarten Lipmann
1 documentquery uid1 oltsp
1 citationquery uid1 oltsp
Output
uid1 docNew docModified docModified docModified
docNew citeOld

46
Query Behavior - Results
47
Query Behavior Analysis - Observations

Users tend to submit the same query more than
once in the same session
Only a small portion of CiteSeer users are
submitting citation queries (15)
CiteSeer users are trying to modify their queries
instead of giving up
Can query modification be viewed as an indication
to change document recommendation or ranking
strategy?

48
Online Clustering of Search Results
49
Online Clustering of Search Results -Motivation

So far we looked at users behavior to improve
user experience and retrieval quality.
Is it also possible to organize the data
adaptively to improve the quality of the results?
Grouping topically related items together in the
result page may improve coverage and diversity of
the displayed results

50
Why Online?

An offline clustering/classification of search
results would not be query dependent.
Two documents could be related for the query
machine learning but they may not be as
relevant for a rather more specific one machine
learning for recommender systems.
The time of query changes the set of available
documents.
A news search on hurricane query today and
hurricane query a week ago would result in very
different set of documents

51
Online Clustering of Search Results - Test Case

To evaluate the performance of clustering on a
frequently updated index (with deletions as well
as addition of documents) we chose to work on
News articles.
Due to architecture of the underlying news search
engine (Yahoo! News) the implemented solution had
to satisfy the following conditions
Access to 100 documents per query
Work in a stateless/memoryless environment
Fetch original results, cluster them and return
the clustered results in less than 300
milliseconds

This work was done during an internship at Yahoo!
The paper is in review for confidentiality
requirements. Please ask about the details.
52
Online Clustering of Search Results - Algorithm

Clustering Algorithm
Hierarchical Agglomerative Clustering, Mixture of
Multinomials, Cluto, KNN, Kmeans were implemented
and compared. Hierarchical Aglomerative
Clustering (HAC) algorithm outperformed others in
news domain.
HAC algorithm works on the pairwise similarities
of news articles.
Pairwise similarity matrix is computed at the
time of query
The similarity of a cluster to another one is
computed as a factor of the pairwise document
similarities.
Similarity metric is an approximation of cosine
similarity measure.
Stopping criteria is a threshold on the
similarity of the clusters. The optimum threshold
value is discovered with an exhaustive search on
the hold-out dataset.

For a survey on Clustering algorithms see
Berkhin, 2002
53
Online Clustering of Search Results - Clustering
Algorithm Evaluation

HAC clustering algorithm was evaluated against
the aforementioned algorithms and Google News
Clusters by an editorial review.
300 queries were chosen in random from that days
user logs.
Reviewers were shown 4 documents per cluster.
Results of 2 different algorithms were displayed
side by side and reviewers were asked to chose
between the 2.
HAC was shown to outperform the rest of the
algorithms (the difference was statistically
significant)

54
Online Clustering of Search Results - Naming the
Clusters

Users may not realize how and why the documents
were grouped together. Having names for clusters
serving as short descriptions would help users
process and navigate through the result set
easier.
Common naming approaches
Choose the most representative article for each
cluster and use its title as the cluster label
(Google news search http//news.google.com)
Find the most representative phrases within the
documents belonging to each cluster (Clusty
http//news.clusty.com/)

For Keyphrase Extraction see Zha, SIGIR 2002
55
Online Clustering of Search Results - Naming the
Clusters

Observations
News articles are usually formulated to answer
more than one of the following questions where,
when, who, what and how.
Articles belonging to the same cluster may be
sharing answers to only some of these questions.
Goal
We want to partition the titles into substrings
corresponding to one (or more) of these
questions. And then choose among these substrings
the most common one.

56
Online Clustering of Search Results - Naming the
Clusters

Methodology
Use POS tags to find the boundaries between
compact substrings.
Break the sentence into smaller parts at these
points and generate a candidate set composed of
these shorter substrings and their continuous
concatenations.
Example
Title Wilma forces Caribbean tourists to flee
Substrings Wilma, Caribbean tourists, flee
Candidates Wilma, Wilma forces Caribbean
tourists, Caribbean tourists, Wilma forces
Caribbean tourists to flee, flee, Caribbean
tourists to flee
Score the candidates based on coverage, frequency
or length or a combination of these metrics.
Choose the candidate with the highest score to be
the cluster label.

57
Online Clustering of Search Results - Naming the
Clusters

Scoring Algorithms
Coverage Score A metric to measure how well the
candidate covers the information in all titles.
Frequency Score Is the frequency score for the
full string.
Length normalized frequency Score Candidates
with less number of words will have more
frequencies. To avoid this bias towards shorter
candidates we normalize the frequency scores
based on the number of words they include

58
Online Clustering of Search Results - Naming
Evaluation

160 queries were chosen at random from the daily
query logs. The results of these queries were
clustered and saved.
Candidate names for each cluster were generated
offline. Candidates included titles as well.
8 users were asked to either select from the list
of candidates the best label or type in their own
choice if they could not find an appropriate
candidate.
At least 3 judgments were collected per query.
55 of the names chosen by reviewers were titles.
Experimental results showed that length
normalized frequency score outperforms the others
(60 match with user labels).
In the case of ties, coverage scores were found
to be the best tie-breaker.

59
Work In Progress

Further query analysis
What are the users searching for Titles,
abstracts, references, publication dates, venues?
Are the users viewing/downloading on linked
documents, or not?
Investigating community based adaptation
strategies
Designing a framework for online evaluation of
recommender systems
Such a platform for CiteSeer was designed by
Cosley et al (called REFEREE) in its early days.
Our goal is to make it scale to and be available
in the next generation CiteSeer.

Recently published paper on discovering
E-communities Zhou, Manavoglu, Li, Giles, Zha in
WWW 2006
60
My Proposal

MyCiteSeer
A unified adaptive information retrieval engine
where document, action and query behaviors are
all factored in

61
How?

Action Model selects the modules and shortcuts to
be displayed
user U is interested in only users who viewed
document recommendations gt only this document
recommendation module will be shown on document
page
Query Model and Action Model selects the query
interface
user U searches for titles only and looks at more
than 20 results gt the default query field is
title and results are clustered
Document model selects the content with input
from Action and Query models.
Action and Query models act as implicit feedback

62
Evaluation?

A user study would give the most reliable and
comprehensive assessment of the system.
Individual components as well as the overall
system can be evaluated.
Some form of implicit feedback can also be used
for evaluation.
Clickthrough rate on results page, number of
actions taken to access a given document/page,
time spent on completing certain tasks.
If used together with some explicit user feedback
(through random questionnaires, ratings, or
feedback via email, blogs or forums) the
appropriate metrics could be identified and used.

Write a Comment

User Comments (0)

About PowerShow.com

Adaptive Information Retrieval PowerPoint PPT Presentation