Clustering Algorithms - PowerPoint PPT Presentation

About This Presentation
Title:

Clustering Algorithms

Description:

Find K clusters (or a classification that consists of K clusters) so that the ... Kohonen SOM Demo (from ai-junkie.com): mapping a 3D colorspace on a 2D Kohonen map ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 43
Provided by: johane3
Category:

less

Transcript and Presenter's Notes

Title: Clustering Algorithms


1
KI2 - 7
Clustering Algorithms
Johan Everts
Kunstmatige Intelligentie / RuG
2
What is Clustering?
Find K clusters (or a classification that
consists of K clusters) so that the objects of
one cluster are similar to each other whereas
objects of different clusters are dissimilar.
(Bacher 1996)
3
The Goals of Clustering
  • Determine the intrinsic grouping in a set of
    unlabeled data.
  • What constitutes a good clustering?
  • All clustering algorithms will produce clusters,
  • regardless of whether the data contains them
  • There is no golden standard, depends on goal
  • data reduction
  • natural clusters
  • useful clusters
  • outlier detection

4
Stages in clustering
5
Taxonomy of Clustering Approaches
6
Hierarchical Clustering
  • Agglomerative clustering treats each data point
    as a singleton cluster, and then successively
    merges clusters until all points have been merged
    into a single remaining cluster. Divisive
    clustering works the other way around.

7
Agglomerative Clustering
Single link
In single-link hierarchical clustering, we merge
in each step the two clusters whose two closest
members have the smallest distance.
8
Agglomerative Clustering
Complete link
In complete-link hierarchical clustering, we
merge in each step the two clusters whose merger
has the smallest diameter.
9
Example Single Link AC
  BA FI MI NA RM TO
BA 0 662 877 255 412 996
FI 662 0 295 468 268 400
MI 877 295 0 754 564 138
NA 255 468 754 0 219 869
RM 412 268 564 219 0 669
TO 996 400 138 869 669 0
10
Example Single Link AC
11
Example Single Link AC
  BA FI MI/TO NA RM
BA 0 662 877 255 412
FI 662 0 295 468 268
MI/TO 877 295 0 754 564
NA 255 468 754 0 219
RM 412 268 564 219 0
12
Example Single Link AC
13
Example Single Link AC
  BA FI MI/TO NA/RM
BA 0 662 877 255
FI 662 0 295 268
MI/TO 877 295 0 564
NA/RM 255 268 564 0
14
Example Single Link AC
15
Example Single Link AC
  BA/NA/RM FI MI/TO
BA/NA/RM 0 268 564
FI 268 0 295
MI/TO 564 295 0
16
Example Single Link AC
17
Example Single Link AC
  BA/FI/NA/RM MI/TO
BA/FI/NA/RM 0 295
MI/TO 295 0
18
Example Single Link AC
19
Example Single Link AC
20
Taxonomy of Clustering Approaches
21
Square error
22
K-Means
  • Step 0 Start with a random partition into K
    clusters
  • Step 1 Generate a new partition by assigning
    each pattern to its closest cluster center
  • Step 2 Compute new cluster centers as the
    centroids of the clusters.
  • Step 3 Steps 1 and 2 are repeated until there is
    no change in the membership (also cluster centers
    remain the same)

23
K-Means
24
K-Means How many Ks ?
25
K-Means How many Ks ?
26
Locating the knee
The knee of a curve is defined as the point of
maximum curvature.
27
Leader - Follower
  • Online
  • Specify threshold distance
  • Find the closest cluster center
  • Distance above threshold ? Create new cluster
  • Or else, add instance to cluster

28
Leader - Follower
  • Find the closest cluster center
  • Distance above threshold ? Create new cluster
  • Or else, add instance to cluster

29
Leader - Follower
  • Find the closest cluster center
  • Distance above threshold ? Create new cluster
  • Or else, add instance to cluster and update
    cluster center

Distance lt Threshold
30
Leader - Follower
  • Find the closest cluster center
  • Distance above threshold ? Create new cluster
  • Or else, add instance to cluster and update
    cluster center

31
Leader - Follower
  • Find the closest cluster center
  • Distance above threshold ? Create new cluster
  • Or else, add instance to cluster and update
    cluster center

Distance gt Threshold
32
Kohonen SOMs
The Self-Organizing Map (SOM) is an unsupervised
artificial neural network algorithm. It is a
compromise between biological modeling and
statistical data processing
33
Kohonen SOMs
  • Each weight is representative of a certain
    input.
  • Input patterns are shown to all neurons
    simultaneously.
  • Competitive learning the neuron with the
    largest response is chosen.

34
Kohonen SOMs
  • Initialize weights
  • Repeat until convergence
  • Select next input pattern
  • Find Best Matching Unit
  • Update weights of winner and neighbours
  • Decrease learning rate neighbourhood size

Learning rate neighbourhood size
35
Kohonen SOMs
Distance related learning
36
Kohonen SOMs
37
Some nice illustrations
38
Kohonen SOMs
  • Kohonen SOM Demo (from ai-junkie.com)
  • mapping a 3D colorspace on a 2D Kohonen map

39
Performance Analysis
  • K-Means
  • Depends a lot on a priori knowledge (K)
  • Very Stable
  • Leader Follower
  • Depends a lot on a priori knowledge (Threshold)
  • Faster but unstable

40
Performance Analysis
  • Self Organizing Map
  • Stability and Convergence Assured
  • Principle of self-ordering
  • Slow and many iterations needed for convergence
  • Computationally intensive

41
Conclusion
  • No Free Lunch theorema
  • Any elevated performance over one class, is
    exactly paid for in performance over another
    class
  • Ensemble clustering ?
  • Use SOM and Basic Leader Follower to identify
    clusters and then use k-mean clustering to
    refine.

42
Any Questions ?
  • ?
Write a Comment
User Comments (0)
About PowerShow.com