Title: K-means*: Clustering by Gradual Data Transformation
1K-means Clustering by Gradual Data
Transformation
- Mikko Malinen and Pasi Fränti
- Speech and Image Processing Unit
- School of Computing
- University of Eastern Finland
2 K-means clustering
- Gradual transformation of data
Data
3K-means clustering
- Iterate between two steps
- 1. Assignment step
- Assign the points to the nearest centroids
- 2. Update step
- Update the location of centroids
4K-means clustering
5Example of clustering (s2 dataset)
60 done
710 done
820 done
930 done
1040 done
1150 done
1260 done
1370 done
1480 done
1590 done
16100 done
17 18Time Complexity
Initialization
Data set transform
Empty clusters removal
K-means
Algorithm total
19Time Complexity
Fixed k-means
Initialization
Data set transform
Empty clusters removal
K-means
Algorithm total
20s1 d 2 n 5000 k 15
s2 d 2 n 5000 k 15
s3 d 2 n 5000 k 15
s4 d 2 n 5000 k 15
bridge d 16 n 4096 k 256
missa d 16 n 6480 k 256
house d 3 n34000 k256
thyroid d 5 n 215 k 2
iris d 4 n 150 k 2
wine d 13 n 178 k 3
Datasets
21Mean square error
Dataset k-means proposed GKM optimal
s1 1.85 1.01 0.89 0.89
s2 1.94 1.52 1.33 1.33
s3 1.97 1.71 1.69 1.69
s4 1.69 1.63 1.57 1.57
bridge 168.2 164.7 164.1 160.7
missa 5.33 5.15 5.34 5.12
house 9.88 9.48 5.94 5.86
thyroid 6.97 6.92 1.52 1.52
iris 3.70 3.70 2.02 2.02
wine 1.92 1.90 0.88 0.88
22Mean square error vs.number of steps
23Mean square error vs.number of steps
24Mean square error vs.number of steps
25Mean square error vs.number of steps
26Mean square error vs.number of steps
27Mean square error vs.number of steps
28Mean square error vs.number of steps
29Number of incorrect clusters
All correct
proposed 36 k-means 14
30Number of incorrect clusters
1 incorrect
proposed 64 k-means 38
31Number of incorrect clusters
2 incorrect
proposed 0 k-means 34
32Number of incorrect clusters
3 incorrect
proposed 0 k-means 10
33Summary
- We have presented a clustering method based on
gradual transformation of data and k-means.
Instead of fitting the model to data, we fit the
data to a model. - The proposed method gives better mean square
error than k-means.