Title: Objective Function Based Fuzzy Clustering
 1Objective Function Based Fuzzy Clustering
- Haimin Zhang 
 - Advisor Professor Dong-Guk Shin 
 - Department of Computer Science and Engineering 
 - University of Connecticut 
 - May 2005
 
  2Presentation Outline
- Introduction 
 - Fuzzy clustering 
 - Fuzzy C-Means and its problems 
 - Improved Algorithms 
 - Gustafson-Kessel algorithm 
 - Fuzzy C-Varieties algorithm 
 - Extended GK with volume prototypes 
 - Determine number of clusters---cluster merging 
 - Research Lines
 
  3Introduction to Fuzzy Clustering
- Motivation 
 - Why do we need clustering? 
 - Why do we need fuzzy clustering? 
 
  4Introduction to Fuzzy Clustering
- Input unlabeled dataset 
 -  n is the number of data points 
 -  , p is features dimension 
for each data point.  - Output 
 -  Membership Matrix U(cn), c is the number of 
clusters  -  Prototype Matrix V(cp), each column denotes one 
clusters prototype (cluster center).  - Goal 
 -  Maximize the similarity within clusters 
 -  Minimize the similarity between clusters 
 
  5One Example
Membership Matrix Cluster number C4 
 6Objective Function Based Fuzzy Clustering Methods
- An objective function measures the overall 
dissimilarity within clusters  - By minimizing the objective function we can 
obtain the optimal partition  - Fuzzy C-Means algorithm is the most popular 
objective function based fuzzy clustering method, 
it is also the common base for most of the newly 
developed objective function based fuzzy 
clustering methods.  
  7Fuzzy C-Means Clustering
- Objective Function 
 - Alternative Minimizing Procedure 
 - Termination Criterion 
 
  8Problems in FCM method
  9Problems in FCM method
- Clusters have non-spherical shapes.
 
Ellipsoids
Linear Varieties (lines)
N5000, C2
N2000, C2 
 10Problems in FCM method
- Clusters have different sizes and densities.
 
N2000, C2
N2000, C2 
 11Problems in FCM method
- Determine number of clusters 
 
  12Gustafson-Kessel Algorithm
- Objective function becomes 
 -  gives the shape of the cluster i, by adding 
it to the distance calculation, it reduces the 
distance along the long axis and magnifies the 
distance along the short axis, which changes the 
cluster shape into a sphere with same volume.  - Gustafson-Kessel algorithm performs well for 
clusters with ellipsoidal shapes, it has same 
problem as FCM when clusters have different sizes 
and densities.  
is the covariance matrix for cluster i 
 13Gustafson-Kessel Algorithms 
 14Fuzzy C-Varieties Algorithm
- Objective function becomes 
 -  s give the principle scatter directions of 
cluster i, by extracting the distances along the 
principle scatter directions from the Euclidian 
distances, it ignores the distance along the 
principle scatter directions then the center of 
the cluster is a linear variety(line in 
2-dimensional case).  - Fuzzy c-Varieties algorithm can efficiently 
detect the linear varieties substructure in the 
data. However it tends to grab points from other 
clusters along its principle directions, because 
it ignores the distance along those directions.  
  15Fuzzy C-Varieties Algorithm
Fuzzy c-Means
Fuzzy c_Varieties
N2000, C2 
 16Clustering with Volume prototypes
- Point prototype 
 - No 0/1 membership degree is assigned no matter 
how close one point is to the center of one 
cluster.  - Volume prototype 
 - When a data point is very close to a cluster 
center, it can be considered as fully belong to 
that cluster.  - Clustering with volume prototype can take 
clusters sizes into account, which improves the 
performance when clusters have different sizes.  - Clustering with volume prototype can discard the 
effect to one cluster from those data points 
which are far away from that cluster, which 
improves the performance when clusters have 
different densities. 
  17Extended GK with Volume prototypes
- Objective function becomes 
 - Membership degrees assignment rules 
 -  
 
  18Extended GK with Volume prototypes
Fuzzy c-Means
Fuzzy c_Varieties
N2000, C2 
 19 Number of Clusters ---Cluster Merging I
- Merging by closeness 
 - The closeness between two clusters is calculated 
by the ratio between their radius and their 
distance.  - If ,two clusters and are 
completely separated.  - If , two clusters and are merged 
to form a new cluster with and  
  20 Number of Clusters ---Cluster Merging I
- This method is compatible with the objective 
function. However, this method is only suitable 
for spherical shaped clusters.  - Idea of improvement 
 -  Map the distance and radius to each direction, 
calculate the similarity ratios in each direction 
and average(weighted average) them.  
  21 Number of Clusters ---Cluster Merging II
- Merging by similarity 
 - This method does not depends on clusters shapes 
and sizes.However the value may be effected by 
those data points that are far away from both of 
them, especially when the two clusters have lower 
density than others.  - Idea of improvement 
 -  Drop the data points whose membership degrees 
are less than a threshold for both of the two 
clusters in the calculation. 
  22Future Research Lines
- Extended other fuzzy clustering algorithms with 
volume prototypes (Gath-Geva algorithm,fuzzy 
c-shell, fuzzy c-rings etc.)  - Develop method to decide optimal fuziffier m or 
find new transform functions that can introduce 
fuzziness to the model  - Incorporate datas distribution into fuzzy 
clustering methods.  - Parallel algorithms and implementations. 
 - Apply fuzzy clustering to micro-array data 
analysis. 
  23References
- 1. Kaymak, U. Setnes, M. Fuzzy clustering with 
volume prototypes and adaptive cluster merging. 
Fuzzy Systems, IEEE Transactions on, Volume 
10, Issue 6, Dec. 2002  - 2. Xuejian, Xiong Kap Luk, Chan Kian Lee, Tan 
Similarity-driven cluster merging method for 
unsupervised fuzzy clustering . ACM International 
Conference Proceeding Series, Proceedings of the 
20th conference on Uncertainty in artificial 
intelligence 2004  -  3. J. C. Bezdek, Pattern Recognition With 
Fuzzy Objective Function. New York Plenum, 1981.  - 4. R. N. Dave,  Use of the adaptive fuzzy 
clustering algorithm to detect lines in digital 
images, Intell. Robots Comput. Vision VIII, vol. 
1192, pt. 2, pp. 600-611, Nov. 1989.  - 5. D. E. Gustafson and W.C. Kessel, Fuzzy 
clustering with a fuzzy covariance matrix, in 
Proc. IEEE Conf. Decision Contr., San Diego, CA, 
1979.  - 6. F. Klawonn, F. Höppner What is Fuzzy About 
Fuzzy Clustering? -- Understanding and Improving 
the Concept of the Fuzzifier. In M.R. Berthold, 
H.-J. Lenz, E. Bradley, R. Kruse, C. Borgelt 
(eds.) Advances in Intelligent Data Analysis V. 
Springer, Berlin (2003), 254-264.  
  24Questions and Comments
Thanks!