Title: Pattern Recognition: Statistical and Neural
1Nanjing University of Science Technology
Pattern RecognitionStatistical and Neural
Lonnie C. Ludeman Lecture 27 Nov 9, 2005
2Lecture 27 Topics
- K-Means Clustering Algorithm Details
- K-Means Step by Step Example
- ISODATA Algorithm -Overview
- 4. Agglomerative Hierarchical Clustering
Algorithm Description
3K-Means Clustering Algorithm
Basic Procedure
Randomly Select K cluster centers from Pattern
Space Distribute set of patterns to the cluster
center using minimum distance Compute new
Cluster centers for each cluster Continue this
process until the cluster centers do not change.
4Flow Diagram for K-Means Algorithm
5Step 1 Initialization Choose K initial
Cluster centers M1(1), M2(1), ...
, MK(1)
Method 1 First K samples
Method 2 K data samples selected randomly
Method 3 K random vectors
Set m 1 and Go To Step 2
6Step 2 Determine New Clusters Using
Cluster centers Distribute pattern vectors using
minimum distance.
Method 1 Use Euclidean distance Method 2 Use
other distance measures
Assign sample xj to class Ck if
Go to Step 3
7Step 3 Compute New Cluster Centers Using
the new Cluster assignment
Clk(m) m 1, 2, ... , K Compute new
cluster centers Mk(m1) m
1, 2, ... , K using
where Nk , k 1, 2, ... , K
is the number of pattern vectors in Clk(m)
Go to Step 4
8Step 4 Check for Convergence Using
Cluster centers from step 3 check for convergence
Convergence occurs if the means do not change
If Convergence occurs Clustering is complete and
the results given.
If No Convergence then Go to Step 5
9Step 5 Check for Maximum Number of Iterations
Define MAXIT as the maximum number of
iterations that is acceptable.
If m MAXIT Then display no convergence
and Stop.
If m lt MAXIT Then mm1 (increment m)
and Return to Step 2
10Example K-Means cluster algorithm
Given the following set of pattern vectors
11Plot of Data points in Given set of samples
12Do the following
13(a) Solution 2-class case
Initial Cluster centers
Plot of Data points in Given set of samples
14Initial Cluster Centers
Distances from all Samples to cluster centers
Cl2
Cl1
Cl2
Cl1
Cl2
Cl2
Cl2
With tie select randomly
First Cluster assignment
15Closest to x2
Closest to x1
Plot of Data points in Given set of samples
16First Cluster Assignment
Compute New Cluster centers
17New Cluster centers
Plot of Data points in Given set of samples
18Distances from all Samples to cluster centers
2
2
Cl2
Cl2
Cl1
Cl1
Cl2
Cl2
Cl1
Second Cluster assignment
19Old Cluster Center
M2(2)
New Clusters
M1(2)
Old Cluster Center
Plot of Data points in Given set of samples
20Compute New Cluster Centers
21ClusterCenters
M2(3)
New Clusters
M1(3)
Plot of Data points in Given set of samples
22Distances from all Samples to cluster centers
3
3
Cl1
Cl1
Cl1
Cl2
Cl2
Cl2
Cl2
Compute New Cluster centers
23(b) Solution 3-Class case
Select Initial Cluster Centers
First Cluster assignment using distances from
pattern vectors to initial cluster centers
24Compute New Cluster centers
Second Cluster assignment using distances from
pattern vectors to cluster centers
25At the next step we have convergence as the
cluster centers do not change thus the Final
Cluster Assignment becomes
26Final 3-Class Clusters
Cl3
Cl2
Final Cluster Centers
Cl1
Plot of Data points in Given set of samples
27Iterative Self Organizing Data Analysis Technique
A
ISODATA Algorithm
Performs Clustering of unclassified quantitative
data with an unknown number of clusters Similar
to K-Means but with ablity to merge and split
clusters thus giving flexibility in number of
clusters
28ISODATA Parameters that need to be specified
merged at each step
Requires more specified information than for the
K-Means Algorithm
29ISODATA Algorithm
Final Clustering
30Hierarchical Clustering
Approach 1 Agglomerative Combines
groups at each level Approach 2 Devisive
Combines groups at each level
Will present only Agglomerative Hierarchical
Clustering as it is most used.
31Agglomerative Hierarchical Clustering
Consider a set S of patterns to be clustered
S x1, x2, ... , xk, ... , xN
Define Level N by
S1(N) x1
Clusters at level N are the individual pattern
vectors
S2(N) x2
...
SN(N) xN
32Define Level N -1 to be N 1 Clusters formed by
merging two of the Level N clusters by the
following process.
Compute the distances between all the clusters at
level N and merge the two with the smallest
distance (resolve ties randomly) to give the
Level N-1 clusters as
S1(N-1)
Clusters at level N -1 result from this merging
S2(N-1)
...
SN-1(N-1)
33The process of merging two clusters at each step
is performed sequentially until Level 1 is
reached. Level one is a single cluster
containing all samples
S1(1) x1, x2, ... , xk, ... , xN
Thus Hierarchical clustering provides cluster
assignments for all numbers of clusters from N to
1.
34Definition
A Dendrogram is a tree like structure that
illustrates the mergings of clusters at each step
of the Hierarchical Approach.
A typical dendrogram appears on the next slide
35Typical Dendrogram
36Summary Lecture 27
- Presented the K-Means Clustering Algorithm
Details - Showed Example of Clustering using the K-Means
Algorithm (Step by Step) - Briefly discussed the ISODATA Algorithm
- 4. Introduced the Agglomerative Hierarchical
Clustering Algorithm
37End of Lecture 27