Model-Based Clustering and Visualization of Navigation Patterns on a Web Site - PowerPoint PPT Presentation

About This Presentation

Title:

Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

Description:

Title: PowerPoint Presentation Last modified by: msaban Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:150

Avg rating:3.0/5.0

Slides: 39

Provided by: eceUcsbEd

Learn more at: https://web.ece.ucsb.edu

Category:

more less

Transcript and Presenter's Notes

Title: Model-Based Clustering and Visualization of Navigation Patterns on a Web Site

1
Model-Based Clustering and Visualization of
Navigation Patterns on a Web Site

I. Cadez, D. Heckerman, C. Meek, P. Smyth, and S.
White

Presented by Motaz El Saban
2
Outline of the talk

Introduction and problem definition.
Model-based clustering.
Model learning.
Application to Msnbc.com IIS log data.
Data Visualization.
Scalability.
Why mixtures of first-order Markov models?
Conclusions.
Future work.

3
Introduction

New methodology for analyzing web navigation
patterns. (a form of human behavior in digital
environments)
Patterns sequence of URL categories traversed by
users, stored in web-server logs for a duration
of 24 hours on msnbc.com.
Functionality
Clustering users based on navigation patterns.
Visualization (WebCANVAS tool).

4
Web data analysis approach

Clustering
Partition users into clusters with users having
similar dynamic behavior in the same cluster.
Visualization
Display the behavior of the users within that
cluster.

5
Related Work

Most Previous work on Web Navigation patterns and
visualization uses non-probabilistic methods
YAN96 CHE98, mostly finding rules that govern
navigation patterns.
Other work used probabilistic methods for
predicting the behavior of users on Web pages,
but not for clustering purposes using random walk
models HUB 97, Markov models for pre-fetching
pages PAD96, modeling next probable link use
kth order Markov model BOR00.
These approaches use a single Markov model for
all users as opposed to a clustering of users
first.

6
Related Work

On the clustering side, FU00 applied BIRCH to
cluster user web navigation patterns.
For Web navigation sequence-based clustering, and
visualization no previously known work has been
done using probabilistic clustering.
Rather, user history has been visualized using
visual metaphors of maps, paths, and signposts
WEX99.
MIN99 use planar graphs to visualize crowds of
users at particular web pages.

7
What do we mean by pattern?
8
Challenges

Web navigation patterns are dynamic. No static
techniques could capture its patterns, such as
histograms ?Markov models.
Different users have heterogeneous dynamic
behavior ? Mixture of models.
Large data size.
The proposed algorithm for learning the mixture
of 1st order Markov models has runtime
O(KNLKM2).
K clusters.
N sequences.
L average length of sequence.
M of Web page categories.
For typically small M, the algorithm scales
linearly with N and K.
Hierarchical clustering methods scale as O(N2)

9
Model-Based Clustering

Assuming data is generated as follows
A user arrives at the web site and is assigned to
one of K clusters with some probability, and
given that a user is in this cluster, his
behavior is generated from some statistical model
specific to that cluster.
let X be a multivariate random variable taking on
values corresponding to the behavior of
individual users.
Let C be a discrete-valued variable taking on
values
c1 ...,cK, corresponding to the unknown cluster
assignment for a user.

10
Model-Based Clustering

A mixture model for X with K components has the
form

Where is the marginal probability of the kth
cluster, is the statistical model
describing the distribution for the variables for
users in the kth cluster, and denotes the
parameters of the model
11
Model-Based Clustering

In our case X (X1,,XL) is a sequence of
variables describing the users path through the
website.
Xi takes on some value xi from the M different
page categories.
Each component in the model obeys the 1st order
Markov model

where denotes the parameters of the
probability distribution over the initial
page-category request among users in cluster k,
and denotes the parameters of the probability
distributions over transitions from one category
to the next by a user in cluster k .
Both distributions are taken to be multinomial
distribution.

12
Model-Based Clustering

EM algorithm is used to learn the model
parameters.
Once learned, we can use the model to assign
users to clusters by finding the class K that
maximizes the membership probabilities
The user class assignment may be soft or hard.

13
Learning Mixture Models from Data

For a known number of K clusters.
Training data dtrain x1,,xN, with iid
assumption.
MAP Estimate of

14
EM learning algorithm (briefly)

An iterative method to find local maxima for the
MAP problem of .
Problem at hand involves two sub-problems
Compute user class assignment (membership
probabilities).
Compute class parameters.
Chicken-egg problem!

15
EM learning algorithm (briefly)

EM approach
E-step given a current value of the parameters
, assign a user with behavior X to cluster Ck
using the membership probabilities.
M-step pretend that these assignments correspond
to real data, and reassign to be the MAP
estimate given this fictitious data.
Stop iteration when two consecutive iterations
produce log likelihoods on the training data that
differ by less than p (0.01 in the paper).

16
How to choose K?

Let the site administrator try several K values
and choose the convenient one for visualization ?
too time consuming. Rather,
Choose K by finding the model that accurately
predicts Nt new test cases dtest XN1 ,...,XN
Nt. That is, choose a model with K clusters
that minimizes the out-of-sample predictive log
score

17
Application to Msnbc.com

Each sequence in the data set corresponds to page
views of a user during a twenty-four hour period.
Each event in the sequence corresponds to a user
request for a page. The event denotes a page
category rather than a URL.
Example categories are frontpage, news, tech,
The number of URLs per category ranges from 10 to
5000.
Modeling only the order in which the pages are
requested (no duration is modeled) .
Page requests served via a caching mechanism were
not recorded in the server logs and, hence, not
present in the data.

18
Application to Msnbc.com

The full data set consists of approximately one
million sequences (users),with an average of 5.7
events per sequence.
Model learning for various cluster sizes K is
done with a training set size of 100,023.
Model evaluation was done using the out-of-sample
predictive log score on a different sample of
98,687 sequences drawn from the original data.

19
Observation on the model components

Some of the individual model components encode
two or more clusters.
Example consider two clusters a cluster of
users who initially request category a and then
choose between categories b and c ,and a cluster
of users who initially request category d and
then choose between categories e and f .
These two clusters can be encoded in a single
component of the mixture model, although the
sequences for the separate clusters do not
contain common elements.
The presence of multi-cluster components does not
affect the out-of-sample predictive log score of
a model.
However, it is problematic for visualization
purposes.

20
Observation on the model components

Solutions
One method is to run the EM algorithm and then
post-process the resulting model, separating any
multi-cluster components found.
A second method is to allow only one state
(category) to have a non-zero probability of
being the initial state in each of the 1st-order
Markov models.
Using the second method has the drawback that a
cluster of users that have different initial
states but similar paths after the initial state
are divided into separate clusters.
Nonetheless,this potential problem was fairly
insignificant for the Msnbc.com data.

21
Constrained models

Experimentally, constrained models have a
predictive power almost equal to that of the
unconstrained models.
However, introducing this constraint,more
components are needed to represent the data than
in the unconstrained case.
For this particular data,the constrained
1st-order Markov models reach limit in predictive
accuracy around K 100,
as compared to the unconstrained models,which
reach their limit around K 60.

22
Out of sample results
23
Data VisualizationWebCANVAS tool

Display of twenty four hour period using 100
clusters.
Each window corresponds to a cluster.
Each row of squares in a cluster corresponds to a
user sequence.
WebCANVAS uses hard clustering, assigning each
user to a single cluster.
Each square in a row encodes a page request in a
particular category encoded by the color of the
square.
Note that the use of color to encode URL category
limits the utility of this tool to domains where
the number of categories can be limited to fifty
or so.

24
WebCANVAS Display
25
Discovering unexpected facts

Large groups of people enter msnbc.com on tech
and local pages
Large group of people navigating from on-air to
local
Little navigation between tech and business
sections
and large number of hits to the weather pages.

26
WebCANVAS tool (model-direct sampling)

WebCANVAS display performed better subjectively
than two other methods
Showing the 0th-order and 1st-order Markov models
of a cluster.
traffic flow movie by Microsoft Site Server
3.0.
Advantage of model-directed sampling over
displaying the models themselves is that the
former approach is not as sensitive to errors in
modeling.
That is, by displaying sampled raw data,
behaviors in the data not consistent with the
model used can still be seen and appreciated.

27
Alternative Displaying models themselves
28
Scalability

Memory requirements of the algorithm are
O(NLKM2KM), which typically reduces to
O (NL) - i.e. the data size- for data sets where
M is relatively small.
The runtime of the algorithm per iteration is
linear in N and K.

29
Scalability in K
30
Scalability in N
31
Mixtures of 1st order Markov Models Too simple
model?

Sen and Hansen (2001), Deshpande and Karypis
(2001) have shown that the 1st-order Markov model
to be an inadequate model for empirically-observed
page-request sequences.
It is not surprising, because for example
if a user visits a particular page,there tends to
be a greater chance of he returning to that same
page at a later time.
1st order Markov model cannot capture this type
of long-term memory.
However
Though the mixture model is 1st order Markov
within a cluster, the overall unconditional model
is NOT 1st order Markov.
Msnbc data is different from typical raw
page-request sequences. Namely, URL categories
result in a relatively small alphabet size as
compared to working with uncategorized URLs.

32
Mixtures of 1st order Markov Models Too simple
model?

The combined effects of clustering and a small
alphabet tend to produce low-entropy clusters in
the sense that a few (two or three) categories
often dominate the sequences within each cluster.
Thus, the tendency to return to a specific page
that was visited earlier in a session can be well
approximated by the simple mixture of 1st order
Markov models.

33
Mixture of 1st order Markov Models vs 1st order
Markov Models

Mixture Model
Looking at the predictive distribution for the
next symbol under the mixture model, i.e
Thus the probability of the next symbol is a
weighted combination of the transition
probabilities from
each of the individual 1st order component
models.

34
Mixture of 1st order Markov Models vs 1st order
Markov Models

The weights are determined by the partial
membership probabilities of the
prefix (history) subsequence
.
These weights are in turn a function of the
history of the sequence (via Bayes rule), and
typically depend strongly on the pattern of
behavior before .
This prediction behavior of is opposed to
the simple prediction distribution of the 1st
order Markov model

35
Empirical proof of 1st order Markov Model

Diagnostic check empirically calculate the run
lengths of page categories for several of the
most likely clusters.
If the data are being generated by a 1st order
Markov model, then the distribution of these run
lengths will obey a geometric distribution.
Results are shown in each cluster for the three
most frequently visited categories that had at
least one run length of four or
greater.(Categories that have run lengths of
three or fewer provide relatively uninformative
diagnostic plots.)

36
Empirical proof of 1st order Markov Model

Asterisks mark the empirically observed counts.
The center dotted line on each plot is the
expected count as a function of run length under
a geometric model using the empirically estimated
self-transition probability of the Markov chain
for the corresponding cluster.
The upper and lower dotted lines represent the
plus and minus three-sigma sampling deviations
for each count under the model.

37
Conclusions

Using a model-based clustering approach to
cluster users based on web navigation patterns.
Develop a visualization tool that enables web
administrators to better understand user
behavior.
Using mixture of 1st order Markov models for
clustering taking into account the order of page
requests pages.
Experiments suggest that 1st order Markov model
mixture components are appropriate for the
msnbc.com data.
The algorithm learning time scales linearly with
sample size. In contrast,agglomerative
distance-based methods scale quadratically with
sample size.

38
Future Work

Modeling the duration of each visit.
Avoiding the limitation of the proposed method to
small M , modeling page visits at the URL level.
In one such extension,we can use Markov models to
characterize both the transitions among
categories and the transitions among pages within
a given category.
Alternatively,we can use a hidden-Markov mixture
model to learn categories and category
transitions simultaneously.