Clustering - PowerPoint PPT Presentation

About This Presentation
Title:

Clustering

Description:

The process of grouping a set of physical or abstract objects into classes of ... For simplicity, we still use 1-dimension objects. Objects: 1, 2, 5, 6,7 ... – PowerPoint PPT presentation

Number of Views:129
Avg rating:3.0/5.0
Slides: 12
Provided by: drhua
Category:

less

Transcript and Presenter's Notes

Title: Clustering


1
Clustering
  • Basic concepts with simple examples
  • Categories of clustering methods
  • Challenges

2
What is clustering?
  • The process of grouping a set of physical or
    abstract objects into classes of similar objects.
  • It is also called unsupervised learning.
  • It is a common and important task that finds many
    applications.

3
Major clustering methods
  • Partitioning methods
  • k-Means (and EM), k-Medoids
  • Hierarchical methods
  • agglomerative, divisive, BIRCH
  • Distance measures between clusters
  • minimum, maximum
  • mean
  • average

4
Clustering -- Example 1
  • For simplicity, 1-dimension objects and k2.
  • Objects 1, 2, 5, 6,7
  • K-means
  • Randomly select 5 and 6 as centroids
  • gt Two clusters 1,2,5 and 6,7 meanC18/3,
    meanC26.5
  • gt 1,2, 5,6,7 meanC11.5, meanC26
  • gt no change.
  • Aggregate dissimilarity 0.52 0.52 12
    12 2.5

5
Issues with k-means
  • A heuristic method
  • Outliers
  • How to prove it
  • Determining k
  • Crisp clustering
  • EM
  • Dont be confused with k-NN

6
Clustering -- Example 2
  • For simplicity, we still use 1-dimension
    objects.
  • Objects 1, 2, 5, 6,7
  • agglomerative clustering
  • find two closest objects and merge
  • gt 1,2, so we have now 1.5,5, 6,7
  • gt 1,2, 5,6, so 1.5, 5.5,7
  • gt 1,2, 5,6,7.

7
Issues with dendrograms
  • How to find proper clusters
  • An alternative divisive algorithms
  • Top down
  • How to divide
  • A heuristic - MST

8
Distance measures
  • Single link
  • Measured by the shortest edge between the two
    clusters
  • Complete link
  • Measured by the longest edge
  • Average link
  • Measured by the average edge length
  • An example

9
Other Methods
  • Density-based methods
  • DBSCAN
  • Grid-based methods
  • STING
  • Model-based methods
  • Conceptual clustering COBWEB
  • Category utility
  • Intraclass similarity
  • Interclass dissimilarity

10
  • Neural networks
  • Self-organizing feature maps (SOMs)
  • Subspace clustering
  • Clique

11
Challenges
  • Scalability
  • Dealing with different types of attributes
  • Clusters with arbitrary shapes
  • Automatically determining input parameters
  • Dealing with noise (outliers)
  • Order insensitivity
  • High dimensionality
  • Interpretability and usability
Write a Comment
User Comments (0)
About PowerShow.com