Clustering - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Clustering

Description:

Clustering Overview What is clustering? Clustering algorithms What is clustering? Clustering The act of grouping similar object into sets Clustering vs ... – PowerPoint PPT presentation

Number of Views:110
Avg rating:3.0/5.0
Slides: 16
Provided by: Iwona6
Category:

less

Transcript and Presenter's Notes

Title: Clustering


1
Clustering
  • ???

2
Overview
  • What is clustering?
  • Clustering algorithms

3
What is clustering?
  • Clustering
  • The act of grouping similar object into sets
  • Clustering vs. Classification
  • Classification assigns objects to predefined
    groups
  • Clustering infers groups based on clustered
    objects

4
Clustering algorithms
  • Hierarchical
  • Bottom-up (agglomerative clustering)
  • Top-down (divisive clustering)
  • Non-Hierarchical
  • K-means (can be fuzzy)
  • Single-pass (incremental)

5
Hierarchical Clustering
  • Bottom-up (agglomerative clustering)
  • Start with the individual object
  • Join cluster with maximum similarity
  • Top-down (divisive clustering)
  • Start with all the object
  • Divides them into groups
  • Split least coherent part in cluster

6
Agglomerative clustering
7
Clustering result dendrogram
8
Hierarchical clustering variants
  • Various ways of calculating cluster similarity

single-link (minimum)
complete-link (maximum)
Group-average (average)
9
Single Link
  • Similarity of two most similar members
  • Time complexity
  • O(n2)
  • Locally Coherent
  • Close objects are in the same cluster
  • Chaining effect

10
Complete Link
  • Similarity of two least similar members
  • Time complexity
  • O(n3)
  • Focused on global cluster quality
  • Avoids elongated cluster

11
Group average
  • Averages similarity between members
  • Time complexity
  • O(n2)
  • compromise between single-link and complete-link

12
K-means clustering
  • Defines clusters by the center of mass of their
    members
  • Initial center of cluster are randomly selected
  • Assign objects to cluster using distances between
    center and object
  • Re-compute the center of each cluster
  • Return step2 until stopping criteria is satisfied

13
K-means clustering (k3)
14
Single-pass
threshold
15
Properties of hierarchical and flat clustering
  • Preferable for detailed data analysis
  • Provides more information than flat
  • No single best algorithm (dependent on
    application)
  • Less efficient than flat ( N X N similarity
    matrix required)
  • Preferable if efficiency is consideration or data
    sets are very large
  • K-means is the conceptually simplest method
  • K-means assumes a simple Euclidean representation
    space and so cant be used for many data sets
Write a Comment
User Comments (0)
About PowerShow.com