Discovery of Aggregate Usage Profiles for Web Personalization - PowerPoint PPT Presentation

About This Presentation
Title:

Discovery of Aggregate Usage Profiles for Web Personalization

Description:

Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Jim Wiltshire School of Computer Science, Telecommunications, and Information Systems DePaul University – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 18
Provided by: aiStanfo1
Learn more at: http://ai.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Discovery of Aggregate Usage Profiles for Web Personalization


1
Discovery of Aggregate Usage Profiles for Web
Personalization
  • Bamshad Mobasher, Honghua Dai, Tao Luo, Miki
    Nakagawa, Jim Wiltshire

School of Computer Science, Telecommunications,
and Information Systems DePaul University
2
Web Personalization
  • The Problem
  • dynamically serve customized content (pages,
    products, etc.) to users based on their profiles,
    preferences, or expected interests
  • Current Approaches
  • rule-based filtering
  • usually relies on static profile for users in
    part obtained through explicit registration
  • collaborative filtering
  • usually requires explicit ratings from users on
    similar types of objects
  • content-based filtering learn/store personal
    profiles locally or on server-side
  • based on content similarity of user profile to
    pages or product descriptions
  • Limitations of Current Technologies
  • user input may be subjective and prone to bias
  • explicit (and non-binary) user ratings may not be
    available
  • profiles may be static and can become outdated
    quickly
  • collaborative filtering problems with
    scalability due to sparse data
  • content-based filtering may miss other semantic
    relationships among objects

3
Usage-Based Web Personalization
  • Basic Idea
  • find aggregate user profiles by automatically
    discovering user access patterns through Web
    usage mining (offline process)
  • data sources for mining include server logs,
    other click-stream data (e.g., product-oriented
    user events), and site structure
  • match a users active session against the
    discovered profiles to provide dynamic content
    (online process)
  • Advantages / Goals
  • profiles are based on objective information (how
    users actually use the site)
  • no explicit user ratings or interaction with
    users (to enter a profile, etc.)
  • helps preserve user privacy, by making effective
    use of anonymous data
  • usage data captures relationships missed by
    content-based approaches
  • can help enhance the effectiveness of
    collaborative or content-based filtering
    techniques

4
Automatic Web PersonalizationOffline Process
Data Preparation
Usage Mining
Transaction Clustering Pageview Clustering
Usage Profiles
Data Cleaning Session Identification Pageview
Identification Transaction Identification Support
Filtering
Server Logs Other Click-Stream Data
Association-Rule Discovery
Domain Knowledge
5
Automatic Web PersonalizationOnline Process
Recommendation Engine
Input from the batch process
Recommendations
Active Session
6
Data Preparation Tasks
  • Preprocess and filter logs and other usage data
  • remove redundant references and create pageviews
  • domain knowledge to assign types to pageviews
  • handle references to scripts creating dynamic
    pages
  • map logs against site topology
  • Identify user sessions and transactions
  • heuristics based on IP, referrer, agent fields,
    and session time-outs used to identify unique
    user sessions (may need to infer missing
    references)
  • intra-session transactions can be obtained based
    on a model of user behavior (involves classifying
    references as content or navigational for
    each user)
  • weights are assigned to each pageview based on
    static pageview types as well as some measure of
    user interest (e.g., duration of pageview)
  • Support filtering - remove very low/high support
    pageviews

7
Aggregate Usage Profiles
  • Characteristics of Aggregate Profiles
  • the goal is to effectively capture common usage
    patterns from potentially anonymous click-stream
    data
  • profiles are represented as weighted collections
    of pageviews
  • weights represent the significance of pageviews
    within each profile
  • profiles are overlapping in order to capture
    common interests among different groups/types of
    users
  • multiple profiles may contribute to the
    recommendation set for a given user
  • Example Profiles from the ACR (Assoc. for
    Consumer Research) Site

1.00 Call for Papers 0.67 ACR News Special
Topics 0.67 CFP Journal of Psychology and
Marketing I 0.67 CFP Journal of Psychology and
Marketing II 0.67 CFP Journal of Consumer
Psychology II 0.67 CFP Journal of Consumer
Psychology I
1.00 CFP Winter 2000 SCP Conference 1.00 Call
for Papers 0.36 CFP ACR 1999 Asia-Pacific
Conference 0.30 ACR 1999 Annual
Conference 0.25 ACR News Updates 0.24 Conference
Update
8
Methodologies for the Discovery of Aggregate
Profiles
  • Discovery of Profiles Based on Transaction
    Clusters
  • cluster user transactions - features are
    significant pageviews identified in the
    preprocessing stage
  • derive usage profiles (set of pageview-weight
    pairs) based on characteristics of each
    transaction cluster
  • Cluster Pageviews
  • directly compute overlapping clusters of
    pageviews based on co-occurrence patterns across
    transactions
  • features are user transactions, so dimensionality
    poses a problem for traditional clustering
    algorithms
  • we use Association-Rule Hypergraph Partitioning
    with an overlap factor

9
Profile Aggregation Based on Clustering
Transactions (PACT)
  • Input
  • set of relevant pageviews in preprocessed log
  • set of user transactions
  • each transaction is a pageview vector
  • Transaction Clusters
  • each cluster contains a set of transaction
    vectors
  • for each cluster compute centroid as cluster
    representative
  • Aggregate Usage Profiles
  • a set of pageview-weight pairs for transaction
    cluster C, select each pageview pi such that
    (in the cluster centroid) is greater than a
    pre-specified threshold

10
Hypergraph-Based Clustering
  • Construct a hypergraph from sets of related items
  • Each hyperedge represents a frequent itemset
  • Weight of each hyperedge can be based on the
    characteristics of frequent itemsets or
    association rules
  • Recursively partition hypergraph so that each
    partition contains only highly connected data
    items
  • Given a hypergraph G(V,E) we find a k-way
    partitioning such that the weight of the
    hyperedges that are cut is minimized
  • The fitness of partitions measured in terms of
    the ratio of weights of cut edges to the weights
    of uncut edges within the partitions
  • The connectivity measures the percentage of edges
    within the partition with which the vertex is
    associated -- used for filtering partitions
  • Vertices from partial edges can be added back to
    clusters based on a user-specified overlap factor

11
Profiles Based on Hypergraph Clusters of Pageviews
  • Input
  • input for clustering is the set of large itemsets
    from association rule module
  • each itemset is a hyperedge (weights are a
    function of the interest of the itemset)
  • Aggregate Profiles (Pageview Clusters)
  • hMETIS used as the underlying hypergraph
    partitioning algorithm
  • clustering program directly outputs a set of
    overlapping pageview clusters
  • the weight associated with pageview p in a
    cluster C is based on the connectivity value of p
    in hypergraph partition

12
Recommendations Based on Usage Profiles
  • Match current users activity against the
    discovered usage profiles
  • a sliding window over the active session to
    capture the current users short-term history
    depth
  • usage profiles and the active session are treated
    as vectors
  • matching score is computed based on the
    similarity between vectors (e.g, normalized
    cosine similarity)
  • Recommendations
  • each pageview is assigned a recommendation score
    based on
  • matching score to aggregate profiles
  • information value of the pageview based on
    domain knowledge (e.g., link distance of the
    candidate recommendation to the active session)
  • recommendations are contributed by multiple
    matching aggregate profiles

13
Experimental Set-up
  • The Data Sets
  • Log data from the Association for Consumer
    Research Web site
  • 18342 transactions, 62 pageview URLs (after
    filtering)
  • Data set divided into training and evaluation
    sets
  • Evaluation Methodology
  • Portion of each transaction (based on a specified
    window size) in evaluation set was used to
    generate a recommendation set (based on a given
    recommendation threshold)
  • For each transaction, the overall coverage of the
    recommendation set was divided by the number of
    recommendations to produce an accuracy measure
  • The overall score was computed (for each
    threshold) by taking the average scores over all
    transactions in the evaluation set

14
Average Visit Percentage
AVP measures the likelihood that a user who
visits any page in a Given profile, also visits
other pages in that profile
15
Evaluation Measuring Recommendation Accuracy
Recommendation accuracy results, using a active
session window of size 3.
16
Evaluation Impact of Filtering
Comparison of PACT and Hypergraph (using window
size 2) for filtered and unfiltered data sets.
Filtering involved the removal of top-level
navigational pages from the data set, leaving
only deeper content-oriented pages.
17
Conclusions
  • Usage-Based Web Personalization
  • results suggest that effective personalization
    can be achieved even with anonymous and
    short-term click-stream data
  • possibly useful in the early stages of
    personalization when more detailed profiles are
    not available for individual users
  • could be used effectively in conjunction with
    other methods based on content-based or
    collaborative filtering
  • Which Method is Best?
  • PACT may be most appropriate when the goal is to
    provide a more general personalization solution
    involving a variety of objects across the whole
    site
  • Hypergraph may be most appropriate when the goal
    is to provide a highly focused set of
    recommendations for specific portions of the site
  • In practice, usage-based methods need to be
    combined with other techniques to provide an
    integrated solution
Write a Comment
User Comments (0)
About PowerShow.com