Integrating Web Usage and Content Mining for More Effective Personalization - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Integrating Web Usage and Content Mining for More Effective Personalization

Description:

Extract features from text and metadata XML or HTML meta tags. ( feature weights are provided in meta data) Total number of features extracted = k ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 20
Provided by: Kart85
Category:

less

Transcript and Presenter's Notes

Title: Integrating Web Usage and Content Mining for More Effective Personalization


1
Integrating Web Usage and Content Mining for More
Effective Personalization
  • Bamshad Mobasher, Hoghua Dai, Tao Luo, Yuqing Sun
    and Jiang Zhu
  • Presented by Karthikeyan Sankarlingam and
    Ranganathan Sankaralingam

2
Motivation
  • Web personalization has become an indispensable
    part of e-commerce.
  • Collaborative and Content based filtering have
    problems
  • Scalability, reliance on user ratings, inability
    to capture really rich semantic relationships
  • The solution Personalization based on Web usage
    mining.

3
Web usage mining Framework
4
Web usage mining Framework
5
Data preparation Usage preprocessing
  • Data cleaning, user identification, session
    identification, pageview identification and path
    completion
  • Transaction identification
  • Support filtering to remove noise
  • Shallow navigation patterns of non-active users
  • Pageviews not useful for personalization

6
Data preparation Usage preprocessing output
  • Transaction file with n pageviews
  • P p1, p2 ,..pn
  • Set of m user transactions
  • T t1, t2 ,..tm, ti is a subset of P
  • Each transaction t is represented as a n-dim
    vector
  • ltw(p1,t), w(p2,t), .w(pn,t)gt
  • Weights could be binary values to denote
    presence/absence of page, or function of the
    duration page was viewed to capture user
    interest

7
Content preprocessing
  • Extract features from text and metadata XML or
    HTML meta tags. (feature weights are provided in
    meta data)
  • Total number of features extracted k
  • Pageview of a page p represented as a k-dim
    feature vector
  • ltfw(f1,p), fw(f2,p), .fw(fk,p)gt ,
  • fw(fi,p) weight of ith feature in page p, 1 ? i
    ? k

8
Web usage mining Framework
9
Aggregate Usage Profiles
  • A profile captures an aggregate view of the
    behavior of subsets of users
  • Aggregate profiles should capture overlapping
    interests
  • Distinguish between pageviews in terms of their
    significance within a profile

10
Building aggregate usage profiles
  • Cluster the set of transactions T using a
    standard clustering algorithm that uses
    similarity measure k means.
  • Generate usage profiles from these clusters.
  • Significance weight of a page weight(p)
  • Usage profile prc is a vector of ltp, weight(p)gt
    for all pages p in cluster c

11
Building content profiles
  • Discover overlapping interests in content in the
    pages.
  • Clustering pageviews as k-dimensional feature
    vectors does not work
  • We instead cluster the feature vectors as
    n-dimensional pageviews
  • Obtain content profiles using significance
    weights on the clustered feature vectors
    (pageviews).

12
Building content profiles
  • Cluster feature vectors using a multivariate
    k-means clustering method.
  • Significance weight of a pageview in a cluster G
  • Content profiles CG is a vector ofltp, weight(p)gt
    for all pages in cluster G

13
Web usage mining Framework
14
Integrating both profiles for personalization
  • Match current users activity against aggregate
    profiles and recommend.
  • User session is represented as a pageview vector
    S with significance weight if the page was
    visited, 0 otherwise.
  • We calculate the match(S,C) using cosine
    similarity for all content and usage profiles.
  • Define Rec(S,p) sqrtweight(p,C) match(S,C)

15
Returning results
  • Get Rec(S,p) for all pageviews in all content and
    usage profiles.
  • If Rec(S,p) gt threshold in either profile then
    return page p.
  • Different methods can be used to combine usage
    and content profiles.

16
Experimental results
  • Usage data from Assoc. for Consumer Research
    (July 98-Jun 99)
  • 18430 user transactions with 62 unique pageviews.
  • 16 transaction clusters generated
  • 566 features extracted, 28 feature clusters

17
Results Data set example
Overlapping Content profile example
18
Results using Usage Profiles
19
Results using Content profiles
Write a Comment
User Comments (0)
About PowerShow.com