Discovery of Aggregate Usage Profiles for Web Personalization - PowerPoint PPT Presentation

About This Presentation

Title:

Discovery of Aggregate Usage Profiles for Web Personalization

Description:

Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Jim Wiltshire School of Computer Science, Telecommunications, and Information Systems DePaul University – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 18

Provided by: aiStanfo1

Learn more at: http://ai.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Discovery of Aggregate Usage Profiles for Web Personalization

1
Discovery of Aggregate Usage Profiles for Web
Personalization

Bamshad Mobasher, Honghua Dai, Tao Luo, Miki
Nakagawa, Jim Wiltshire

School of Computer Science, Telecommunications,
and Information Systems DePaul University
2
Web Personalization

The Problem
dynamically serve customized content (pages,
products, etc.) to users based on their profiles,
preferences, or expected interests
Current Approaches
rule-based filtering
usually relies on static profile for users in
part obtained through explicit registration
collaborative filtering
usually requires explicit ratings from users on
similar types of objects
content-based filtering learn/store personal
profiles locally or on server-side
based on content similarity of user profile to
pages or product descriptions
Limitations of Current Technologies
user input may be subjective and prone to bias
explicit (and non-binary) user ratings may not be
available
profiles may be static and can become outdated
quickly
collaborative filtering problems with
scalability due to sparse data
content-based filtering may miss other semantic
relationships among objects

3
Usage-Based Web Personalization

Basic Idea
find aggregate user profiles by automatically
discovering user access patterns through Web
usage mining (offline process)
data sources for mining include server logs,
other click-stream data (e.g., product-oriented
user events), and site structure
match a users active session against the
discovered profiles to provide dynamic content
(online process)
Advantages / Goals
profiles are based on objective information (how
users actually use the site)
no explicit user ratings or interaction with
users (to enter a profile, etc.)
helps preserve user privacy, by making effective
use of anonymous data
usage data captures relationships missed by
content-based approaches
can help enhance the effectiveness of
collaborative or content-based filtering
techniques

4
Automatic Web PersonalizationOffline Process
Data Preparation
Usage Mining
Transaction Clustering Pageview Clustering
Usage Profiles
Data Cleaning Session Identification Pageview
Identification Transaction Identification Support
Filtering
Server Logs Other Click-Stream Data
Association-Rule Discovery
Domain Knowledge
5
Automatic Web PersonalizationOnline Process
Recommendation Engine
Input from the batch process
Recommendations
Active Session
6
Data Preparation Tasks

Preprocess and filter logs and other usage data
remove redundant references and create pageviews
domain knowledge to assign types to pageviews
handle references to scripts creating dynamic
pages
map logs against site topology
Identify user sessions and transactions
heuristics based on IP, referrer, agent fields,
and session time-outs used to identify unique
user sessions (may need to infer missing
references)
intra-session transactions can be obtained based
on a model of user behavior (involves classifying
references as content or navigational for
each user)
weights are assigned to each pageview based on
static pageview types as well as some measure of
user interest (e.g., duration of pageview)
Support filtering - remove very low/high support
pageviews

7
Aggregate Usage Profiles

Characteristics of Aggregate Profiles
the goal is to effectively capture common usage
patterns from potentially anonymous click-stream
data
profiles are represented as weighted collections
of pageviews
weights represent the significance of pageviews
within each profile
profiles are overlapping in order to capture
common interests among different groups/types of
users
multiple profiles may contribute to the
recommendation set for a given user
Example Profiles from the ACR (Assoc. for
Consumer Research) Site

1.00 Call for Papers 0.67 ACR News Special
Topics 0.67 CFP Journal of Psychology and
Marketing I 0.67 CFP Journal of Psychology and
Marketing II 0.67 CFP Journal of Consumer
Psychology II 0.67 CFP Journal of Consumer
Psychology I
1.00 CFP Winter 2000 SCP Conference 1.00 Call
for Papers 0.36 CFP ACR 1999 Asia-Pacific
Conference 0.30 ACR 1999 Annual
Conference 0.25 ACR News Updates 0.24 Conference
Update
8
Methodologies for the Discovery of Aggregate
Profiles

Discovery of Profiles Based on Transaction
Clusters
cluster user transactions - features are
significant pageviews identified in the
preprocessing stage
derive usage profiles (set of pageview-weight
pairs) based on characteristics of each
transaction cluster
Cluster Pageviews
directly compute overlapping clusters of
pageviews based on co-occurrence patterns across
transactions
features are user transactions, so dimensionality
poses a problem for traditional clustering
algorithms
we use Association-Rule Hypergraph Partitioning
with an overlap factor

9
Profile Aggregation Based on Clustering
Transactions (PACT)

Input
set of relevant pageviews in preprocessed log
set of user transactions
each transaction is a pageview vector
Transaction Clusters
each cluster contains a set of transaction
vectors
for each cluster compute centroid as cluster
representative
Aggregate Usage Profiles
a set of pageview-weight pairs for transaction
cluster C, select each pageview pi such that
(in the cluster centroid) is greater than a
pre-specified threshold

10
Hypergraph-Based Clustering

Construct a hypergraph from sets of related items
Each hyperedge represents a frequent itemset
Weight of each hyperedge can be based on the
characteristics of frequent itemsets or
association rules

Recursively partition hypergraph so that each
partition contains only highly connected data
items
Given a hypergraph G(V,E) we find a k-way
partitioning such that the weight of the
hyperedges that are cut is minimized
The fitness of partitions measured in terms of
the ratio of weights of cut edges to the weights
of uncut edges within the partitions
The connectivity measures the percentage of edges
within the partition with which the vertex is
associated -- used for filtering partitions
Vertices from partial edges can be added back to
clusters based on a user-specified overlap factor

11
Profiles Based on Hypergraph Clusters of Pageviews

Input
input for clustering is the set of large itemsets
from association rule module
each itemset is a hyperedge (weights are a
function of the interest of the itemset)

Aggregate Profiles (Pageview Clusters)
hMETIS used as the underlying hypergraph
partitioning algorithm
clustering program directly outputs a set of
overlapping pageview clusters
the weight associated with pageview p in a
cluster C is based on the connectivity value of p
in hypergraph partition

12
Recommendations Based on Usage Profiles

Match current users activity against the
discovered usage profiles
a sliding window over the active session to
capture the current users short-term history
depth
usage profiles and the active session are treated
as vectors
matching score is computed based on the
similarity between vectors (e.g, normalized
cosine similarity)
Recommendations
each pageview is assigned a recommendation score
based on
matching score to aggregate profiles
information value of the pageview based on
domain knowledge (e.g., link distance of the
candidate recommendation to the active session)
recommendations are contributed by multiple
matching aggregate profiles

13
Experimental Set-up

The Data Sets
Log data from the Association for Consumer
Research Web site
18342 transactions, 62 pageview URLs (after
filtering)
Data set divided into training and evaluation
sets
Evaluation Methodology
Portion of each transaction (based on a specified
window size) in evaluation set was used to
generate a recommendation set (based on a given
recommendation threshold)
For each transaction, the overall coverage of the
recommendation set was divided by the number of
recommendations to produce an accuracy measure
The overall score was computed (for each
threshold) by taking the average scores over all
transactions in the evaluation set

14
Average Visit Percentage
AVP measures the likelihood that a user who
visits any page in a Given profile, also visits
other pages in that profile
15
Evaluation Measuring Recommendation Accuracy
Recommendation accuracy results, using a active
session window of size 3.
16
Evaluation Impact of Filtering
Comparison of PACT and Hypergraph (using window
size 2) for filtered and unfiltered data sets.
Filtering involved the removal of top-level
navigational pages from the data set, leaving
only deeper content-oriented pages.
17
Conclusions

Usage-Based Web Personalization
results suggest that effective personalization
can be achieved even with anonymous and
short-term click-stream data
possibly useful in the early stages of
personalization when more detailed profiles are
not available for individual users
could be used effectively in conjunction with
other methods based on content-based or
collaborative filtering
Which Method is Best?
PACT may be most appropriate when the goal is to
provide a more general personalization solution
involving a variety of objects across the whole
site
Hypergraph may be most appropriate when the goal
is to provide a highly focused set of
recommendations for specific portions of the site
In practice, usage-based methods need to be
combined with other techniques to provide an
integrated solution