X-means: Extending K-means with Efficient Estimation of the Number of Clusters - PowerPoint PPT Presentation

About This Presentation
Title:

X-means: Extending K-means with Efficient Estimation of the Number of Clusters

Description:

... Extending K-means with Efficient Estimation of the Number of Clusters. Dan Phelleg, Andrew Moore. Carnegie Mellon University. Published: ICML 2000. Presentation by: ... – PowerPoint PPT presentation

Number of Views:246
Avg rating:3.0/5.0
Slides: 9
Provided by: Pay27
Category:

less

Transcript and Presenter's Notes

Title: X-means: Extending K-means with Efficient Estimation of the Number of Clusters


1
X-means Extending K-means with Efficient
Estimation of the Number of Clusters
  • Dan Phelleg, Andrew Moore
  • Carnegie Mellon University
  • Published ICML 2000
  • Presentation by
  • Payam Refaeilzadeh

2
Problems with K-means
  • Need to know K
  • Searching for K is expensive
  • Even K-means with fixed-K scales poorly
  • Need to calculate the distance from each point to
    each centroid to find new cluster assignments

3
Remedies
  • Forward search for the appropriate value of k in
    a given range
  • Recursively split each cluster and use BIC score
    to decide if we should keep each split
  • Use kd-trees to accelerate individual rounds of
    K-means

4
Splitting
  • Use local BIC score to decide on keeping a split
  • Use global BIC score to decide which K to output
    at the end

5
BIC (Bayesian Information Criterion)
  • Adjusted Log-likelihood of the model.
  • The likelihood that the data is explained by
    the clusters according to the spherical-Gaussian
    assumption of k-means

6
Kd-trees
  • Points to be clustered are put into a binary
    hierarchical structure
  • Each node represents a subset of points and
    stores
  • The minimal hyper-rectangle enclosing all points
    in the subset
  • The vector-sum of all the points in the subset
  • The number of points in the subset

7
Using kd-trees
  • For each centroid store a counter containing the
    vector sum of all the points belonging to it and
    the number of points
  • Update the above by scanning the kd-tree only
    once
  • Start with the root node and all centroids
  • As you walk down the tree centroids start to get
    black-listed (when the points in that node could
    not possibly belong to a centroid)
  • When only one centroid remains, the counter for
    that centroid can be updated using the statistics
    stored in the node
  • At the end of the scan we have enough info to
    recalculate the centroid coordinates

8
Results
Write a Comment
User Comments (0)
About PowerShow.com