Research Directions in Adaptive Mixtures and ModelBased Clustering - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Research Directions in Adaptive Mixtures and ModelBased Clustering

Description:

This work has been conducted jointly with Jeff Solka, NSWCDD. ... Posse (JCGS) used initial partitions based on minimal spanning tree. K-means ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 43
Provided by: jeffs59
Category:

less

Transcript and Presenter's Notes

Title: Research Directions in Adaptive Mixtures and ModelBased Clustering


1
Research Directions in Adaptive Mixtures and
Model-Based Clustering
  • Wendy L. Martinez
  • Office of Naval Research
  • April 1, 2005

2
Acknowledgements
  • This work has been conducted jointly with Jeff
    Solka, NSWCDD.
  • This work has been supported by ONR ROPO program.
  • Jeff Solkas work has been supported by the ONR
    ILIR program.

3
Disclaimer
  • Work in progress
  • Describe research ideas
  • Obtain feedback and suggestions

4
Outline
  • Model-based Clustering (MBC).
  • Mixture models and the EM algorithm
  • The agglomerative step
  • Adaptive Mixtures Density Estimation
  • Kernel density estimation
  • Their Synthesis
  • Initialization for MB agglomerative clustering
  • MB Adaptive Mixtures Density Estimation
  • Preliminary Results
  • Research Directions

5
Model-Based Clustering
Chosen Model
dendrogram
Agglomerative Model-Based Clustering
Initialization for EM 1. Initial number of
components 2. Initial values for parameters
Data
EM Algorithm
Final Result Estimated Model 1. Number of
components 2. Best Model M1-M4 3. Parameter
estimates
Highest BIC
BIC
Standard MBC performs hierarchical clustering
starting with the full dataset.
6
MODEL-BASED CLUSTERING
  • This technique takes a density function approach.
  • Uses finite mixture densities as models for
    cluster analysis.
  • Each component density characterizes a cluster.

7
FINITE MIXTURESREVIEW
  • Model the density as a sum of C weighted
    densities.
  • Expectation-maximization method used to estimate
    parameters.
  • Must assume distribution for components - usually
    normal distribution.
  • Each component characterizes a cluster.

8
EXPECTATION-MAXIMIZATION (EM) METHOD
  • Method for building or estimating the model.
  • Solution of likelihood functions requires
    iterative procedure.
  • E Step - Expectation
  • Find probability that observations belong to the
    k-th component density - the posteriors (t ik
    s).
  • M Step - Maximization
  • Update all parameters based on posteriors (pk,
    mk, Sk).

9
EXPECTATION-MAXIMIZATION (EM) METHOD
  • Issues
  • Can converge to a local optimum.
  • Can diverge.
  • Requires initial guess at the parameters of the
    component densities.
  • Need an estimate of the number of components.
  • Requires an assumed distribution for the
    component densities.

10
EXPECTATION-MAXIMIZATION (EM) METHOD
  • Model-based clustering addresses these issues
  • Form of densities constrains covariance
    matrices
  • Initialization of EM via model-based
    agglomerative clustering
  • Estimate of number of components via BIC
  • Adaptive mixtures
  • Covariance model is unconstrained version
  • Initialization of EM
  • Over-determined Estimate of number of components

11
AGGLOMERATIVE MBC
  • Regular agglomerative clustering
  • Each point is in a cluster.
  • Two closest clusters are merged at each step.
  • Closeness is determined by distance and linkage.
  • Model-Based agglomerative clustering
  • At each step, two clusters are merged such that
    the likelihood for the given model is maximized.
  • We propose using Adaptive Mixtures to initialize
    MB agglomerative clustering.

12
MODEL-BASED CLUSTERING
  • Best model is chosen using the Bayesian
    Information Criterion (mM is parameters, LM is
    loglikelihood)
  • The four models are (more models are possible)
  • Spherical/equal (M1)
  • Spherical/unequal (M2)
  • Ellipsoidal/equal (M3)
  • Ellipsoidal/unequal (unconstrained) (M4)

13
Kernel Density Estimation
  • Center a kernel at each data point
  • Evaluate weighted kernel-usually normal kernel
  • Add the value of the n curves
  • Computationally intensive, must store all of the
    data, choice of kernel, smoothing parameter.

14
ADAPTIVE MIXTURES DENSITY ESTIMATION (AMDE)
  • Priebe and Marchette 1990s.
  • Priebe, JASA, 1994
  • Hybrid of Kernel Estimator and Mixture Model.
  • Number of Terms Driven by the Data.

15
AMDE ALGORITHM
  • 1 - Given a New Observation.
  • 2 - Update Existing Model Using the Recursive EM.
  • or
  • 3 - Add a New Term to Explain This Data Point.

16
Recursive EM Update Equations All Have Hats
17
CREATE RULE - AMDE
  • Test the Mahalanobis distance from current data
    point to each mixture term in the existing model.
  • Add in a new term when this distance exceeds a
    certain create threshold
  • Location given by current data point.
  • Covariance given by weighted average of the
    existing covariances.
  • Mixing coefficient set to 1/n.

18
Adaptive Mixtures
  • Creates over-determined model too many terms
  • Depends on order of the data
  • Uses sieve bound parameter to reset singular
    covariance matrices
  • Covariance matrices are not constrained model
    4.
  • Limited applicability in high-dimensional spaces
  • EM algorithm is used to refine estimate.

19
Visualizing the ProcessAdaptive Mixtures
20
Synthesis of AMDE and MBC
  • First, to use the AMDE as a way to initialize the
    model-based agglomerative clustering
  • Second, to devise a model-based version of AMDE
  • Third, to combine these two ideas

21
MBC with an AMDE Start
Chosen Model
dendrogram
Agglomerative Model-Based Clustering
Initialization for EM 1. Initial number of
components 2. Initial values for parameters
Adaptive mixtures model
EM Algorithm
Data
Final Result Estimated Model 1. Number of
components 2. Best Model M1-M4 3. Parameter
estimates
Highest BIC
BIC
22
MBC With AMDE Smart Start
  • Form an adaptive mixtures model of the dataset.
  • Set create threshold in order to guarantee an
    over determined model.
  • Partition the data based on the AMDE model using
    tik.
  • Note some of the original AMDE mixture terms
    die due to insufficient support.
  • Utilize this partition as a start to the usual
    MBC procedure.
  • Instead of starting with as many terms as points
    we start with approximately log(n) number of
    points.

23
Other Possibilities
  • Other types of initialization
  • Posse (JCGS) used initial partitions based on
    minimal spanning tree.
  • K-means
  • Benefits of AMDE initialization
  • Do not have to specify number of clusters as in
    k-means.
  • Methods like k-means impose a certain structure.
  • In most cases, initial clusters are not
    singletons.

24
Why Do This?
  • Computational tradeoff of the AMDE procedure vs.
    the agglomerative procedure on the full dataset.
  • Advantages as the size of the dataset grows.
  • Non-singleton clusters possibly
  • Save on storage
  • AMDE is data order dependent.
  • Multiple mixture models/clustering can be
    obtained by merely reordering the dataset.
  • Could get a distribution of models (number of
    clusters/BICs)

25
4 Term Test Case
26
4 Term BIC Curves
27
Experiment Real Data
  • Model-based clustering was applied to Lansing
    Woods maples.
  • Ran 20 trials with AMDE initialization.
  • Re-ordered data each time.
  • Maximum BIC model is 6 component non-uniform
    spherical mixture.
  • This is model 2
  • Covariance matrices are diagonal.
  • Covariance matrices are not equal across terms.

28
The Raw Data
29
Original ConfigurationJASA, 2002
30
BICS for Best Trial
31
Number of Clusters 20 Trials
32
Configuration with AMDE
33
Now for Model-Based AMDE
  • The Adaptive Mixtures method uses the
    unconstrained model.
  • It often tends to provide models that are
    overly-complex too many terms.
  • Using a model-based version might provide
    better density estimates.
  • A model-based version might provide better
    starting partitions for the MB agglomerative
    clustering for other models.
  • Extend applicability of AMDE to
    higher-dimensional spaces fewer parameters

34
Recursive Update Equation Covariance
  • The different models correspond to constraints
    on the covariance matrices.
  • Update equations can be found in Celeux and
    Govaert, Pattern Recognition, 1995.
  • All depend on the scatter matrix for each term.
  • Propose updating based on the recursive update
    for covariance.
  • Then multiply by n to get the scatter matrix.

35
Recursive Update Equation Covariance
Scatter matrix
36
Procedure MB-AMDE
  • Using same rule, either update current
    configuration for new point
  • Weights, means, covariance.
  • Convert covariance to scatter matrix.
  • Update covariance according to model.
  • Or, create new term
  • Mean and weights created as in AMDE
  • Covariance is the common one for the model
    family OR we could use the weighted average as
    before.
  • And, allow terms to die if the covariances become
    singular.
  • Assign term weight proportionally among the
    remaining terms.

37
Idea I
  • Use MB AMDE as a stand-alone method.
  • Recall that EM is used to refine AMDE.
  • Not really focused on getting the number of
    groups right.
  • Use MB AMDE as an initialization for MB
    clustering.

38
Idea II
  • One of the issues with AMDE is that the mixtures
    are overly complex too many terms.
  • Use model-based agglomerative clustering to prune
    terms of the AMDE.
  • Then use the BIC to choose the model.
  • This is just MB clustering (maybe without the EM
    step).

39
Idea III
  • Do more with the model-based agglomerative
    clustering as stand-alone procedure
  • Cophenetic coefficient
  • Can be used to compare interpoint distances and
    clustering
  • Can be used to compare two dendrograms
  • Inconsistency coefficient
  • The inconsistency coefficient characterizes links
    by comparing its length with the average length
    of other links at the same level of the
    hierarchy.
  • Higher the value, the less similar the objects
    connected by the link.
  • Do we have to convert link merge values to a
    distance?

40
Idea III
41
Other Questions
  • Is there an equivalent BIC that does not
    require the data?
  • Can the likelihood (classification or otherwise)
    be recursively updated?

42
Conclusion
  • Discussed an initialization procedure for the
    model-based agglomerative clustering.
  • Showed applications to synthetic and real data.
  • Possible advantages of AMDE initialization
  • Savings in storage??
  • Possibly find other solutions??
  • Formulation of Model-Based AMDE.
Write a Comment
User Comments (0)
About PowerShow.com