Model-Based Clustering and Visualization of Navigation Patterns on a Web Site - PowerPoint PPT Presentation

About This Presentation
Title:

Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

Description:

Title: PowerPoint Presentation Last modified by: msaban Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 39
Provided by: eceUcsbEd
Learn more at: https://web.ece.ucsb.edu
Category:

less

Transcript and Presenter's Notes

Title: Model-Based Clustering and Visualization of Navigation Patterns on a Web Site


1
Model-Based Clustering and Visualization of
Navigation Patterns on a Web Site
  • I. Cadez, D. Heckerman, C. Meek, P. Smyth, and S.
    White

Presented by Motaz El Saban
2
Outline of the talk
  • Introduction and problem definition.
  • Model-based clustering.
  • Model learning.
  • Application to Msnbc.com IIS log data.
  • Data Visualization.
  • Scalability.
  • Why mixtures of first-order Markov models?
  • Conclusions.
  • Future work.

3
Introduction
  • New methodology for analyzing web navigation
    patterns. (a form of human behavior in digital
    environments)
  • Patterns sequence of URL categories traversed by
    users, stored in web-server logs for a duration
    of 24 hours on msnbc.com.
  • Functionality
  • Clustering users based on navigation patterns.
  • Visualization (WebCANVAS tool).

4
Web data analysis approach
  • Clustering
  • Partition users into clusters with users having
    similar dynamic behavior in the same cluster.
  • Visualization
  • Display the behavior of the users within that
    cluster.

5
Related Work
  • Most Previous work on Web Navigation patterns and
    visualization uses non-probabilistic methods
    YAN96 CHE98, mostly finding rules that govern
    navigation patterns.
  • Other work used probabilistic methods for
    predicting the behavior of users on Web pages,
    but not for clustering purposes using random walk
    models HUB 97, Markov models for pre-fetching
    pages PAD96, modeling next probable link use
    kth order Markov model BOR00.
  • These approaches use a single Markov model for
    all users as opposed to a clustering of users
    first.

6
Related Work
  • On the clustering side, FU00 applied BIRCH to
    cluster user web navigation patterns.
  • For Web navigation sequence-based clustering, and
    visualization no previously known work has been
    done using probabilistic clustering.
  • Rather, user history has been visualized using
    visual metaphors of maps, paths, and signposts
    WEX99.
  • MIN99 use planar graphs to visualize crowds of
    users at particular web pages.

7
What do we mean by pattern?
8
Challenges
  • Web navigation patterns are dynamic. No static
    techniques could capture its patterns, such as
    histograms ?Markov models.
  • Different users have heterogeneous dynamic
    behavior ? Mixture of models.
  • Large data size.
  • The proposed algorithm for learning the mixture
    of 1st order Markov models has runtime
    O(KNLKM2).
  • K clusters.
  • N sequences.
  • L average length of sequence.
  • M of Web page categories.
  • For typically small M, the algorithm scales
    linearly with N and K.
  • Hierarchical clustering methods scale as O(N2)

9
Model-Based Clustering
  • Assuming data is generated as follows
  • A user arrives at the web site and is assigned to
    one of K clusters with some probability, and
  • given that a user is in this cluster, his
    behavior is generated from some statistical model
    specific to that cluster.
  • let X be a multivariate random variable taking on
    values corresponding to the behavior of
    individual users.
  • Let C be a discrete-valued variable taking on
    values
  • c1 ...,cK, corresponding to the unknown cluster
    assignment for a user.

10
Model-Based Clustering
  • A mixture model for X with K components has the
    form

Where is the marginal probability of the kth
cluster, is the statistical model
describing the distribution for the variables for
users in the kth cluster, and denotes the
parameters of the model
11
Model-Based Clustering
  • In our case X (X1,,XL) is a sequence of
    variables describing the users path through the
    website.
  • Xi takes on some value xi from the M different
    page categories.
  • Each component in the model obeys the 1st order
    Markov model
  • where denotes the parameters of the
    probability distribution over the initial
    page-category request among users in cluster k,
  • and denotes the parameters of the probability
    distributions over transitions from one category
    to the next by a user in cluster k .
  • Both distributions are taken to be multinomial
    distribution.

12
Model-Based Clustering
  • EM algorithm is used to learn the model
    parameters.
  • Once learned, we can use the model to assign
    users to clusters by finding the class K that
    maximizes the membership probabilities
  • The user class assignment may be soft or hard.

13
Learning Mixture Models from Data
  • For a known number of K clusters.
  • Training data dtrain x1,,xN, with iid
    assumption.
  • MAP Estimate of

14
EM learning algorithm (briefly)
  • An iterative method to find local maxima for the
    MAP problem of .
  • Problem at hand involves two sub-problems
  • Compute user class assignment (membership
    probabilities).
  • Compute class parameters.
  • Chicken-egg problem!

15
EM learning algorithm (briefly)
  • EM approach
  • E-step given a current value of the parameters
    , assign a user with behavior X to cluster Ck
    using the membership probabilities.
  • M-step pretend that these assignments correspond
    to real data, and reassign to be the MAP
    estimate given this fictitious data.
  • Stop iteration when two consecutive iterations
    produce log likelihoods on the training data that
    differ by less than p (0.01 in the paper).

16
How to choose K?
  • Let the site administrator try several K values
    and choose the convenient one for visualization ?
    too time consuming. Rather,
  • Choose K by finding the model that accurately
    predicts Nt new test cases dtest XN1 ,...,XN
    Nt. That is, choose a model with K clusters
    that minimizes the out-of-sample predictive log
    score

17
Application to Msnbc.com
  • Each sequence in the data set corresponds to page
    views of a user during a twenty-four hour period.
  • Each event in the sequence corresponds to a user
    request for a page. The event denotes a page
    category rather than a URL.
  • Example categories are frontpage, news, tech,
  • The number of URLs per category ranges from 10 to
    5000.
  • Modeling only the order in which the pages are
    requested (no duration is modeled) .
  • Page requests served via a caching mechanism were
    not recorded in the server logs and, hence, not
    present in the data.

18
Application to Msnbc.com
  • The full data set consists of approximately one
    million sequences (users),with an average of 5.7
    events per sequence.
  • Model learning for various cluster sizes K is
    done with a training set size of 100,023.
  • Model evaluation was done using the out-of-sample
    predictive log score on a different sample of
    98,687 sequences drawn from the original data.

19
Observation on the model components
  • Some of the individual model components encode
    two or more clusters.
  • Example consider two clusters a cluster of
    users who initially request category a and then
    choose between categories b and c ,and a cluster
    of users who initially request category d and
    then choose between categories e and f .
  • These two clusters can be encoded in a single
    component of the mixture model, although the
    sequences for the separate clusters do not
    contain common elements.
  • The presence of multi-cluster components does not
    affect the out-of-sample predictive log score of
    a model.
  • However, it is problematic for visualization
    purposes.

20
Observation on the model components
  • Solutions
  • One method is to run the EM algorithm and then
    post-process the resulting model, separating any
    multi-cluster components found.
  • A second method is to allow only one state
    (category) to have a non-zero probability of
    being the initial state in each of the 1st-order
    Markov models.
  • Using the second method has the drawback that a
    cluster of users that have different initial
    states but similar paths after the initial state
    are divided into separate clusters.
  • Nonetheless,this potential problem was fairly
    insignificant for the Msnbc.com data.

21
Constrained models
  • Experimentally, constrained models have a
    predictive power almost equal to that of the
    unconstrained models.
  • However, introducing this constraint,more
    components are needed to represent the data than
    in the unconstrained case.
  • For this particular data,the constrained
    1st-order Markov models reach limit in predictive
    accuracy around K 100,
  • as compared to the unconstrained models,which
    reach their limit around K 60.

22
Out of sample results
23
Data VisualizationWebCANVAS tool
  • Display of twenty four hour period using 100
    clusters.
  • Each window corresponds to a cluster.
  • Each row of squares in a cluster corresponds to a
    user sequence.
  • WebCANVAS uses hard clustering, assigning each
    user to a single cluster.
  • Each square in a row encodes a page request in a
    particular category encoded by the color of the
    square.
  • Note that the use of color to encode URL category
    limits the utility of this tool to domains where
    the number of categories can be limited to fifty
    or so.

24
WebCANVAS Display
25
Discovering unexpected facts
  • Large groups of people enter msnbc.com on tech
    and local pages
  • Large group of people navigating from on-air to
    local
  • Little navigation between tech and business
    sections
  • and large number of hits to the weather pages.

26
WebCANVAS tool (model-direct sampling)
  • WebCANVAS display performed better subjectively
    than two other methods
  • Showing the 0th-order and 1st-order Markov models
    of a cluster.
  • traffic flow movie by Microsoft Site Server
    3.0.
  • Advantage of model-directed sampling over
    displaying the models themselves is that the
    former approach is not as sensitive to errors in
    modeling.
  • That is, by displaying sampled raw data,
    behaviors in the data not consistent with the
    model used can still be seen and appreciated.

27
Alternative Displaying models themselves
28
Scalability
  • Memory requirements of the algorithm are
  • O(NLKM2KM), which typically reduces to
  • O (NL) - i.e. the data size- for data sets where
    M is relatively small.
  • The runtime of the algorithm per iteration is
    linear in N and K.

29
Scalability in K
30
Scalability in N
31
Mixtures of 1st order Markov Models Too simple
model?
  • Sen and Hansen (2001), Deshpande and Karypis
    (2001) have shown that the 1st-order Markov model
    to be an inadequate model for empirically-observed
    page-request sequences.
  • It is not surprising, because for example
  • if a user visits a particular page,there tends to
    be a greater chance of he returning to that same
    page at a later time.
  • 1st order Markov model cannot capture this type
    of long-term memory.
  • However
  • Though the mixture model is 1st order Markov
    within a cluster, the overall unconditional model
    is NOT 1st order Markov.
  • Msnbc data is different from typical raw
    page-request sequences. Namely, URL categories
    result in a relatively small alphabet size as
    compared to working with uncategorized URLs.

32
Mixtures of 1st order Markov Models Too simple
model?
  • The combined effects of clustering and a small
    alphabet tend to produce low-entropy clusters in
    the sense that a few (two or three) categories
    often dominate the sequences within each cluster.
  • Thus, the tendency to return to a specific page
    that was visited earlier in a session can be well
    approximated by the simple mixture of 1st order
    Markov models.

33
Mixture of 1st order Markov Models vs 1st order
Markov Models
  • Mixture Model
  • Looking at the predictive distribution for the
    next symbol under the mixture model, i.e
  • Thus the probability of the next symbol is a
    weighted combination of the transition
    probabilities from
    each of the individual 1st order component
    models.

34
Mixture of 1st order Markov Models vs 1st order
Markov Models
  • The weights are determined by the partial
    membership probabilities of the
    prefix (history) subsequence
  • .
  • These weights are in turn a function of the
    history of the sequence (via Bayes rule), and
    typically depend strongly on the pattern of
    behavior before .
  • This prediction behavior of is opposed to
    the simple prediction distribution of the 1st
    order Markov model

35
Empirical proof of 1st order Markov Model
  • Diagnostic check empirically calculate the run
    lengths of page categories for several of the
    most likely clusters.
  • If the data are being generated by a 1st order
    Markov model, then the distribution of these run
    lengths will obey a geometric distribution.
  • Results are shown in each cluster for the three
    most frequently visited categories that had at
    least one run length of four or
    greater.(Categories that have run lengths of
    three or fewer provide relatively uninformative
    diagnostic plots.)

36
Empirical proof of 1st order Markov Model
  • Asterisks mark the empirically observed counts.
  • The center dotted line on each plot is the
    expected count as a function of run length under
    a geometric model using the empirically estimated
    self-transition probability of the Markov chain
    for the corresponding cluster.
  • The upper and lower dotted lines represent the
    plus and minus three-sigma sampling deviations
    for each count under the model.

37
Conclusions
  • Using a model-based clustering approach to
    cluster users based on web navigation patterns.
  • Develop a visualization tool that enables web
    administrators to better understand user
    behavior.
  • Using mixture of 1st order Markov models for
    clustering taking into account the order of page
    requests pages.
  • Experiments suggest that 1st order Markov model
    mixture components are appropriate for the
    msnbc.com data.
  • The algorithm learning time scales linearly with
    sample size. In contrast,agglomerative
    distance-based methods scale quadratically with
    sample size.

38
Future Work
  • Modeling the duration of each visit.
  • Avoiding the limitation of the proposed method to
    small M , modeling page visits at the URL level.
  • In one such extension,we can use Markov models to
    characterize both the transitions among
    categories and the transitions among pages within
    a given category.
  • Alternatively,we can use a hidden-Markov mixture
    model to learn categories and category
    transitions simultaneously.
Write a Comment
User Comments (0)
About PowerShow.com