Title: Comparison of Spectral Clustering Methods
1Comparison of Spectral Clustering Methods
- Quals Talk
- by
- Deepak Verma
2Project goals
- Various Spectral Algorithms present
- Prove they are competitive and stable to noise
- Compare their performance to see which of them
works the best. - Proved near equivalence of two of the popular
algorithms
3Outline
- Introduction
- Algorithms
- Theoretical Results
- Experimental Setup
- Results and Discussion
- Conclusion
4Clustering
- Clustering Partitioning into dissimilar group of
similar objects. - Old problem with many variants
- Very hard problem as not clear what the
definition of a cluster should be.
5Spectral Clustering
- Use the top eigenvectors of a matrix derived
from similarities of the objects
6Intuition
- Map the objects into points in some spectral
domain using the similarity matrix. - Cluster these points in that domain using a
classic clustering algorithm.
K-dim spectral domain
original domain
7Problem Formulation
- Only similarities between objects is given. The
points may or may not exist really. - The similarities are positive and symmetric
- Number of clusters K is given.
8Graphical Interpretation
- Points i2I vertices of graph G
- Edges ij pairs with Sij gt 0
- Dii?i1n Sij degree of i
- vol A ?i 2 A Dii volume of Aµ I
- Clustering Partitioning
A
9Notation
10Good partitioning
- How to measure a good partitioning
- Minimize Normalized Cut (Ncut)
- Minimize Conductance
- Both NP hard. But how to calculate them ?
-
11Outline
- Introduction
- Algorithms
- Theoretical Results
- Experimental Setup
- Results and Discussion
- Conclusion
12Algorithms
- Spectral
- Shi Malik 97 (PAMI 2000) (SM)
- Kannan Vempala Vetta. (FOCS 2000) (KVV)
- Ng, Jordan and Weiss (NIPS 02) (NJW)
- Meila Shi (AISTATS 01) (Mcut)
- Classic
- Anchor (Moore, UAI2000)
- Linkage algorithms (single,ward)
13Spectral Algorithms
- Spectral Algorithms
- Multiway Recursive
- MCut NJW SM KVV
14MeilaShi Algorithm (Multiway)(mcut)
- Compute the Stochastic Matrix PD-1S
- Let Vn k be the eigenvectors corresponding to
the k largest eigenvalues P. - Consider the rows of V as (k-dim) points in the
spectral domain, g1 gn - Cluster g1 gn using the classic algorithms.
- Under certain conditions, the clustering
so found minimizes the generalized Ncut on the
original graph.
15NJW Algorithm (Multiway) (ang)
- Set the diagonal elements Sii to 0.
- Compute LD-1/2SD-1/2 (related to the laplacian.)
- Let Un k be the top k eigenvectors of L
- Form Y by normalizing rows of U to unit size
- Group rows of Y using K-means-Orthogonal
16Recursive Algorithms (SM,KVV)
- Partition graph into two clusters and then
recursively partition the clusters. - Map all the points on the second largest
eigenvector of P and partition based on that. - Difference
- SM based on Ncut. Guaranteed to minimize Ncut
under certain conditions. - KVV based on conductance.
17List of Algorithms Used
- Two linkage algorithms Single,Ward
- 6 recursive algorithms
- shi_r,kvv_add,kvv_mult ncut,cond
- 6 spectral algorithms
- ang,mcut mapping anchor,kmean,ward
- 6 doubly spectral algorithms
- ang_ang,mcut_mcut,ang_mcut kmean,ward
18Outline
- Introduction
- Algorithms
- Theoretical Results
- Experimental Setup
- Results and Discussion
- Conclusion
19Theoretical Results
- Stable method for ang,mcut
- Conditions for perfect S for ang and mcut
- Broader set of conditions for recursive variants.
20Modification to ang,mcut
- Lulu (definition of eigenvector)
- D-1/2SD-1/2ul u
- Pre mulitplying by D-1/2
- D-1SD-1/2ul D-1/2u
- Substituting vD-1/2u
- D-1Svlv () Pvlv)
- Pre multiplying by D
- SvlDv (Use this instead of P or L)
- We have v for Mcut and from that u for Ang.
21Gains of Above proof
- Numerically more stable than Mcut,NJW
- Is the basis of near equivalence of Mcut,NJW.
22Definitions
- A matrix S is block diagonal w.r.t a clustering
DC1,,CK if Sij0 whenever i,j belong to
different clusters.
23Definitions (contd)
- A stochastic matrix P is block stochastic w.r.t
a clustering DC1,,CK iff - Rkkåj2 CkPij has the same value for all points
i2 Ck' for all k,k'1,, K - R is non singular.
24Block stochastic
25Definitions (contd)
- A similarity matrix S is perfect for for a
spectral method w.r.t. to a clustering if all the
points in the same cluster Ci are mapped to
exactly the same point in the spectral domain gi
26Known Theoretical Results
- Block diagonal S () P) is perfect for all
spectral algorithms. - In case of NJW it gives orthogonal clusters in
the spectral domain. - Block stochastic P is perfect for Mcut.
27Our contribution.
- Block stochastic P is perfect for NJW and Mcut
- The clusters in the spectral domain are
orthogonal - The block stochastic is good for the recursive
algorithms as well. (and perfect for one variant
of SM)
28Outline
- Introduction
- Algorithms
- Theoretical Results
- Experimental Setup
- Results and Discussion
- Conclusion
29Datasets
- Two artificial data set
- A similarity matrix with a block stochastic P.
- Various 2D points
- Two real data sets
- Gene expression dataset.(Have results from model
based clustering) - A handwritten digit recognition database
- True clustering is always available.
30Evaluating clustering performance
- Measure it as deviation of the clustering C from
the true clustering. Ctrue (symmetric) - Two measures
- Clustering Error Classification kind of error
measure. - Variation of Information Information theoretic
measure. (We would not show results on these).
31Block stochastic
32Experiments Stability
- Added noise to block diagonal hard to see the
robustness of various algorithm to noise. Sij
Sij (U(0,1) h sqrt(Dii Djj))/n - Uniform (random) noise so as to preserve the
signal to noise ratio. - Sij Sij (U(0,1) h sqrt(Dii Djj))/n
- h varied from 10-0.1 to 100.7
- 10 runs for each algorithm and noise level
33Experiments Gene Expression
- For the gene expression data we compared the
performance of the spectral algorithms to model
based clustering. - (Yueng et. al. Bioinformics 2001)
- Comparison was done with the best clustering
produced by five different kinds of models.
34Outline
- Introduction
- Algorithms
- Theoretical Results
- Experimental Setup
- Results and Discussion
- Conclusion
35Cluster_ward
Best recursive
Top 3 Multiway
36Stability of spectral algorithms
- The multiway spectral algorithms are the most
stable of all algorithms. - The best of recursive spectral are not too far
behind and they catch up as the noise is
increased. - The linkage algorithms are very sensitive to
noise.
37Model
Recursive
38Recursive
Model
39Real Dataset Gene Expression.
- The performance of spectral clusters is
competitive with model based structures. - The performance is dependent on the kind of pre
processing of data.
40Conductance based
Ncut based
41Recursive spectral algorithms
- Algorithm based on the NCut measure are almost
always better than those based on Conductance. - The Conductance based algorithm are too sensitive
to noise.
42Outline
- Introduction
- Algorithms
- Theoretical Results
- Experimental Setup
- Results and Discussion
- Conclusion
43Conclusions
- Proved Equivalence between two existing
techniques and generalized the results - Demonstrated competitive performance of spectral
algorithms. - Empirically compared the performance of various
algorithms containing different components of
spectral algorithms
44Acknowledgements
- Marina Meila
- Ka Yee Yeung
- Thomas Richardson
- Jayant,Ashish
45Future Work
- Explore the SM algorithm with largest jump
measure - Automatically determine the number of clusters.
(gap, runt analysis). - Learn the similarity matrix (weights to
different dimensions)
46All Beware
- Here begins the world of Extra slides.
- Enter at your own risk .
47Cuts in a graph
- (edge) cut set of edges whose removal makes a
graph disconnected - weight of a cut
- cut( A, B ) ?i 2 A,j 2 B Sij
-
48The Normalized Cut (NCut)
-
- min NCut( A,A )
- Small cut between subsets of equal size
- NP-hard
- Examples
- Stochastic interpretation. P(A!A)P(A!A)
49Conductance
- Similar to Ncut (want to minimize it)
- Only the size of the smaller cluster is taken
into account. - Stochastic interpretation
- max(P(A! A) , P(A! A))
50SM Algorithm (Recursive)
- PD-1S
- Compute v2 the 2nd largest eigen vector.
- MIN-RATIO-CUT
- Sort elements of v2 (v2i) in increasing order.
- Compute NiNcut(1..i,i1..n)
- Partition I into the two clusters Ci0,C'i0 where
i0argmin Ni - Repeat recursively with largest ?2
51KVV algorithm (Recursive)
- PD-1S
- Compute v2 the 2nd largest eigen vector.
- MIN-RATIO-CUT
- Sort elements of v2 (v2i) in increasing order.
- Compute NiConductance(1..i,i1..n)
- Partition I into the two clusters Ci0,C'i0 where
i0argmin Ni - Repeat recursively with min conductance
52KVV (continued)
- Two variants possible depending on how the P of a
subset is calculated. - Pt, the matrix at step t is P on only the points
in this cluster. - Need to make it stochastic
- All the enteries scale up (KVVmult)
- Extra sum is added in the diagonal. (KVVadd)
53Anchor Algorithm (Classic)
- Anchor algorihtm
- Choose a point (first anchor) at random.
- Iteratively choose the next anchor to be the
point farthest from the existing anchors. - Assign the points to the closest anchor.
- The anchor now represent the clusters
54K Means (Classic)
- K Means
- Choose an initial set of center.
- Repeat
- Assign the point to the closest center to form a
cluster - Compute the new centers to be the mean of all
points in the cluster. - UNTIL convergence.
- We used multiple runs of random and orthogonal
initialization and took the minimum distortion.
55Linkage Algorithms (classic)
- Initialize all the points to be 1-points
clusters. - Keep on merging the closest clusters until you
get k clusters. - Two variations
- Single Linkage distance distance btw closest
points - Ward Linkage distance inner square distance
56Ward Linkage (Details)
- Ward linkage uses the incremental sum of squares
that is, the increase in the total within-group
sum of squares as a result of joining groups r
and s. It is given by - d(Cr,Cs)nrnsdrs2/(nrns)
57Names in Experiment
- shi_r Shi recursive (SM)
- kvv_add,kvv_mult
- mcut,ang
- anchor,ward,single,kmeans
58Clustering error
- Number of misclassifications.
- CEÃ¥iltgtj Confij
- But the clusters may be reordered (permuted).
- Need to minimize CE over all permutations
- Done efficiently using weighted max bipartite
matching (using LP).
59Variation of Information
- Probability P(k)nk/n
- Entropy H(C) - åk1K P(k)log P(k)
- P(k,k)Confkk/n
- Mutual information
- I(C,C)åk1K åk1KP(k,k)log( P(k,k)/P(k)P(k)
) - VI(C,C) H(C) H(C) 2I(C,C)
60Peformance Graphs
- Six graphs for each dataset multiway,recursive,b
estfive CE,VI - Error bars shown only for the artificial dataset
in which noise was added. (- multiway).
61NIST database
- Handwritten Digit Recognition
- 32 32 bitmaps of digits 0..9
- Dimension reduced to 8 8
- ) 64 length vector with values 0..16
- 100 digits each
- 0..9 digit1000
- 0,2,4,6,7 digitFive1000
62Gene Expression Data
- DNA microarray to study variation of many genes
together. - Yeast cell cycle data.
- 6000 gene expression over 17 time points
- Restricted to 384 genes whose expression peaked
at diff points corresponding to five phase cycle.
- Summary 5 clusters, 384 points of 17 dimensions.
- Two kind of normalization
- Logarithmic cellcycle
- Standardization cellcylcle-std (fits guassian
model better)
63- Linkage algorithm too bad
- Multiway gt recursive.
- In case of well separated digits the multiway
have near perfect performance. - In case of multiway spectral algorithms it is
better to underestimate than overestimate. - Digit dataset
64Summary of Results
- Spectral algorithm perform significantly better
than the linkage algorithm even when the clusters
are not well separated. - Spectral algo are more robust to noise.
- Ncut gt conductance
- Multiway better than recursive except when noise
is better - Ward,kmean slightly better than anchor
- More EV more noise. So better to underestimate K
65Experiments Real experiments