Research Directions in Adaptive Mixtures and ModelBased Clustering - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Research Directions in Adaptive Mixtures and ModelBased Clustering

Description:

This work has been conducted jointly with Jeff Solka, NSWCDD. ... Posse (JCGS) used initial partitions based on minimal spanning tree. K-means ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 43

Provided by: jeffs59

Category:

more less

Transcript and Presenter's Notes

Title: Research Directions in Adaptive Mixtures and ModelBased Clustering

1
Research Directions in Adaptive Mixtures and
Model-Based Clustering

Wendy L. Martinez
Office of Naval Research
April 1, 2005

2
Acknowledgements

This work has been conducted jointly with Jeff
Solka, NSWCDD.
This work has been supported by ONR ROPO program.
Jeff Solkas work has been supported by the ONR
ILIR program.

3
Disclaimer

Work in progress
Describe research ideas
Obtain feedback and suggestions

4
Outline

Model-based Clustering (MBC).
Mixture models and the EM algorithm
The agglomerative step
Adaptive Mixtures Density Estimation
Kernel density estimation
Their Synthesis
Initialization for MB agglomerative clustering
MB Adaptive Mixtures Density Estimation
Preliminary Results
Research Directions

5
Model-Based Clustering
Chosen Model
dendrogram
Agglomerative Model-Based Clustering
Initialization for EM 1. Initial number of
components 2. Initial values for parameters
Data
EM Algorithm
Final Result Estimated Model 1. Number of
components 2. Best Model M1-M4 3. Parameter
estimates
Highest BIC
BIC
Standard MBC performs hierarchical clustering
starting with the full dataset.
6
MODEL-BASED CLUSTERING

This technique takes a density function approach.
Uses finite mixture densities as models for
cluster analysis.
Each component density characterizes a cluster.

7
FINITE MIXTURESREVIEW

Model the density as a sum of C weighted
densities.
Expectation-maximization method used to estimate
parameters.
Must assume distribution for components - usually
normal distribution.
Each component characterizes a cluster.

8
EXPECTATION-MAXIMIZATION (EM) METHOD

Method for building or estimating the model.
Solution of likelihood functions requires
iterative procedure.
E Step - Expectation
Find probability that observations belong to the
k-th component density - the posteriors (t ik
s).
M Step - Maximization
Update all parameters based on posteriors (pk,
mk, Sk).

9
EXPECTATION-MAXIMIZATION (EM) METHOD

Issues
Can converge to a local optimum.
Can diverge.
Requires initial guess at the parameters of the
component densities.
Need an estimate of the number of components.
Requires an assumed distribution for the
component densities.

10
EXPECTATION-MAXIMIZATION (EM) METHOD

Model-based clustering addresses these issues
Form of densities constrains covariance
matrices
Initialization of EM via model-based
agglomerative clustering
Estimate of number of components via BIC
Adaptive mixtures
Covariance model is unconstrained version
Initialization of EM
Over-determined Estimate of number of components

11
AGGLOMERATIVE MBC

Regular agglomerative clustering
Each point is in a cluster.
Two closest clusters are merged at each step.
Closeness is determined by distance and linkage.
Model-Based agglomerative clustering
At each step, two clusters are merged such that
the likelihood for the given model is maximized.
We propose using Adaptive Mixtures to initialize
MB agglomerative clustering.

12
MODEL-BASED CLUSTERING

Best model is chosen using the Bayesian
Information Criterion (mM is parameters, LM is
loglikelihood)
The four models are (more models are possible)
Spherical/equal (M1)
Spherical/unequal (M2)
Ellipsoidal/equal (M3)
Ellipsoidal/unequal (unconstrained) (M4)

13
Kernel Density Estimation

Center a kernel at each data point
Evaluate weighted kernel-usually normal kernel
Add the value of the n curves
Computationally intensive, must store all of the
data, choice of kernel, smoothing parameter.

14
ADAPTIVE MIXTURES DENSITY ESTIMATION (AMDE)

Priebe and Marchette 1990s.
Priebe, JASA, 1994
Hybrid of Kernel Estimator and Mixture Model.
Number of Terms Driven by the Data.

15
AMDE ALGORITHM

1 - Given a New Observation.
2 - Update Existing Model Using the Recursive EM.
or
3 - Add a New Term to Explain This Data Point.

16
Recursive EM Update Equations All Have Hats
17
CREATE RULE - AMDE

Test the Mahalanobis distance from current data
point to each mixture term in the existing model.
Add in a new term when this distance exceeds a
certain create threshold
Location given by current data point.
Covariance given by weighted average of the
existing covariances.
Mixing coefficient set to 1/n.

18
Adaptive Mixtures

Creates over-determined model too many terms
Depends on order of the data
Uses sieve bound parameter to reset singular
covariance matrices
Covariance matrices are not constrained model
4.
Limited applicability in high-dimensional spaces
EM algorithm is used to refine estimate.

19
Visualizing the ProcessAdaptive Mixtures
20
Synthesis of AMDE and MBC

First, to use the AMDE as a way to initialize the
model-based agglomerative clustering
Second, to devise a model-based version of AMDE
Third, to combine these two ideas

21
MBC with an AMDE Start
Chosen Model
dendrogram
Agglomerative Model-Based Clustering
Initialization for EM 1. Initial number of
components 2. Initial values for parameters
Adaptive mixtures model
EM Algorithm
Data
Final Result Estimated Model 1. Number of
components 2. Best Model M1-M4 3. Parameter
estimates
Highest BIC
BIC
22
MBC With AMDE Smart Start

Form an adaptive mixtures model of the dataset.
Set create threshold in order to guarantee an
over determined model.
Partition the data based on the AMDE model using
tik.
Note some of the original AMDE mixture terms
die due to insufficient support.
Utilize this partition as a start to the usual
MBC procedure.
Instead of starting with as many terms as points
we start with approximately log(n) number of
points.

23
Other Possibilities

Other types of initialization
Posse (JCGS) used initial partitions based on
minimal spanning tree.
K-means
Benefits of AMDE initialization
Do not have to specify number of clusters as in
k-means.
Methods like k-means impose a certain structure.
In most cases, initial clusters are not
singletons.

24
Why Do This?

Computational tradeoff of the AMDE procedure vs.
the agglomerative procedure on the full dataset.
Advantages as the size of the dataset grows.
Non-singleton clusters possibly
Save on storage
AMDE is data order dependent.
Multiple mixture models/clustering can be
obtained by merely reordering the dataset.
Could get a distribution of models (number of
clusters/BICs)

25
4 Term Test Case
26
4 Term BIC Curves
27
Experiment Real Data

Model-based clustering was applied to Lansing
Woods maples.
Ran 20 trials with AMDE initialization.
Re-ordered data each time.
Maximum BIC model is 6 component non-uniform
spherical mixture.
This is model 2
Covariance matrices are diagonal.
Covariance matrices are not equal across terms.

28
The Raw Data
29
Original ConfigurationJASA, 2002
30
BICS for Best Trial
31
Number of Clusters 20 Trials
32
Configuration with AMDE
33
Now for Model-Based AMDE

The Adaptive Mixtures method uses the
unconstrained model.
It often tends to provide models that are
overly-complex too many terms.
Using a model-based version might provide
better density estimates.
A model-based version might provide better
starting partitions for the MB agglomerative
clustering for other models.
Extend applicability of AMDE to
higher-dimensional spaces fewer parameters

34
Recursive Update Equation Covariance

The different models correspond to constraints
on the covariance matrices.
Update equations can be found in Celeux and
Govaert, Pattern Recognition, 1995.
All depend on the scatter matrix for each term.
Propose updating based on the recursive update
for covariance.
Then multiply by n to get the scatter matrix.

35
Recursive Update Equation Covariance
Scatter matrix
36
Procedure MB-AMDE

Using same rule, either update current
configuration for new point
Weights, means, covariance.
Convert covariance to scatter matrix.
Update covariance according to model.
Or, create new term
Mean and weights created as in AMDE
Covariance is the common one for the model
family OR we could use the weighted average as
before.
And, allow terms to die if the covariances become
singular.
Assign term weight proportionally among the
remaining terms.

37
Idea I

Use MB AMDE as a stand-alone method.
Recall that EM is used to refine AMDE.
Not really focused on getting the number of
groups right.
Use MB AMDE as an initialization for MB
clustering.

38
Idea II

One of the issues with AMDE is that the mixtures
are overly complex too many terms.
Use model-based agglomerative clustering to prune
terms of the AMDE.
Then use the BIC to choose the model.
This is just MB clustering (maybe without the EM
step).

39
Idea III

Do more with the model-based agglomerative
clustering as stand-alone procedure
Cophenetic coefficient
Can be used to compare interpoint distances and
clustering
Can be used to compare two dendrograms
Inconsistency coefficient
The inconsistency coefficient characterizes links
by comparing its length with the average length
of other links at the same level of the
hierarchy.
Higher the value, the less similar the objects
connected by the link.
Do we have to convert link merge values to a
distance?

40
Idea III
41
Other Questions

Is there an equivalent BIC that does not
require the data?
Can the likelihood (classification or otherwise)
be recursively updated?

42
Conclusion

Discussed an initialization procedure for the
model-based agglomerative clustering.
Showed applications to synthetic and real data.
Possible advantages of AMDE initialization
Savings in storage??
Possibly find other solutions??
Formulation of Model-Based AMDE.

Write a Comment

User Comments (0)