Title: Integrating Web Usage and Content Mining for More Effective Personalization
1Integrating Web Usage and Content Mining for More
Effective Personalization
- Bamshad Mobasher, Hoghua Dai, Tao Luo, Yuqing Sun
and Jiang Zhu - Presented by Karthikeyan Sankarlingam and
Ranganathan Sankaralingam
2Motivation
- Web personalization has become an indispensable
part of e-commerce. - Collaborative and Content based filtering have
problems - Scalability, reliance on user ratings, inability
to capture really rich semantic relationships - The solution Personalization based on Web usage
mining.
3Web usage mining Framework
4Web usage mining Framework
5Data preparation Usage preprocessing
- Data cleaning, user identification, session
identification, pageview identification and path
completion - Transaction identification
- Support filtering to remove noise
- Shallow navigation patterns of non-active users
- Pageviews not useful for personalization
6Data preparation Usage preprocessing output
- Transaction file with n pageviews
- P p1, p2 ,..pn
- Set of m user transactions
- T t1, t2 ,..tm, ti is a subset of P
- Each transaction t is represented as a n-dim
vector - ltw(p1,t), w(p2,t), .w(pn,t)gt
- Weights could be binary values to denote
presence/absence of page, or function of the
duration page was viewed to capture user
interest
7Content preprocessing
- Extract features from text and metadata XML or
HTML meta tags. (feature weights are provided in
meta data) - Total number of features extracted k
- Pageview of a page p represented as a k-dim
feature vector - ltfw(f1,p), fw(f2,p), .fw(fk,p)gt ,
- fw(fi,p) weight of ith feature in page p, 1 ? i
? k
8Web usage mining Framework
9Aggregate Usage Profiles
- A profile captures an aggregate view of the
behavior of subsets of users - Aggregate profiles should capture overlapping
interests - Distinguish between pageviews in terms of their
significance within a profile
10Building aggregate usage profiles
- Cluster the set of transactions T using a
standard clustering algorithm that uses
similarity measure k means. - Generate usage profiles from these clusters.
- Significance weight of a page weight(p)
- Usage profile prc is a vector of ltp, weight(p)gt
for all pages p in cluster c
11Building content profiles
- Discover overlapping interests in content in the
pages. - Clustering pageviews as k-dimensional feature
vectors does not work - We instead cluster the feature vectors as
n-dimensional pageviews - Obtain content profiles using significance
weights on the clustered feature vectors
(pageviews).
12Building content profiles
- Cluster feature vectors using a multivariate
k-means clustering method. - Significance weight of a pageview in a cluster G
- Content profiles CG is a vector ofltp, weight(p)gt
for all pages in cluster G
13Web usage mining Framework
14Integrating both profiles for personalization
- Match current users activity against aggregate
profiles and recommend. - User session is represented as a pageview vector
S with significance weight if the page was
visited, 0 otherwise. - We calculate the match(S,C) using cosine
similarity for all content and usage profiles. - Define Rec(S,p) sqrtweight(p,C) match(S,C)
15Returning results
- Get Rec(S,p) for all pageviews in all content and
usage profiles. - If Rec(S,p) gt threshold in either profile then
return page p. - Different methods can be used to combine usage
and content profiles.
16Experimental results
- Usage data from Assoc. for Consumer Research
(July 98-Jun 99) - 18430 user transactions with 62 unique pageviews.
- 16 transaction clusters generated
- 566 features extracted, 28 feature clusters
17Results Data set example
Overlapping Content profile example
18Results using Usage Profiles
19Results using Content profiles