Comparison of Spectral Clustering Methods

About This Presentation

Title:

Comparison of Spectral Clustering Methods

Description:

... in that domain using a 'classic' clustering algorithm. original ... 'Classic' Anchor (Moore, UAI2000) Linkage algorithms (single,ward) May 16, ... 'classic' ... – PowerPoint PPT presentation

Number of Views:148

Avg rating:3.0/5.0

Slides: 66

Provided by: csWash

Category:

more less

Transcript and Presenter's Notes

Title: Comparison of Spectral Clustering Methods

1
Comparison of Spectral Clustering Methods

Quals Talk
by
Deepak Verma

2
Project goals

Various Spectral Algorithms present
Prove they are competitive and stable to noise
Compare their performance to see which of them
works the best.
Proved near equivalence of two of the popular
algorithms

3
Outline

Introduction
Algorithms
Theoretical Results
Experimental Setup
Results and Discussion
Conclusion

4
Clustering

Clustering Partitioning into dissimilar group of
similar objects.
Old problem with many variants
Very hard problem as not clear what the
definition of a cluster should be.

5
Spectral Clustering

Use the top eigenvectors of a matrix derived
from similarities of the objects

6
Intuition

Map the objects into points in some spectral
domain using the similarity matrix.
Cluster these points in that domain using a
classic clustering algorithm.

K-dim spectral domain
original domain
7
Problem Formulation

Only similarities between objects is given. The
points may or may not exist really.
The similarities are positive and symmetric
Number of clusters K is given.

8
Graphical Interpretation

Points i2I vertices of graph G
Edges ij pairs with Sij gt 0
Dii?i1n Sij degree of i
vol A ?i 2 A Dii volume of Aµ I
Clustering Partitioning

A
9
Notation
10
Good partitioning

How to measure a good partitioning
Minimize Normalized Cut (Ncut)
Minimize Conductance
Both NP hard. But how to calculate them ?

11
Outline

Introduction
Algorithms
Theoretical Results
Experimental Setup
Results and Discussion
Conclusion

12
Algorithms

Spectral
Shi Malik 97 (PAMI 2000) (SM)
Kannan Vempala Vetta. (FOCS 2000) (KVV)
Ng, Jordan and Weiss (NIPS 02) (NJW)
Meila Shi (AISTATS 01) (Mcut)
Classic
Anchor (Moore, UAI2000)
Linkage algorithms (single,ward)

13
Spectral Algorithms

Spectral Algorithms
Multiway Recursive
MCut NJW SM KVV

14
MeilaShi Algorithm (Multiway)(mcut)

Compute the Stochastic Matrix PD-1S
Let Vn k be the eigenvectors corresponding to
the k largest eigenvalues P.
Consider the rows of V as (k-dim) points in the
spectral domain, g1 gn
Cluster g1 gn using the classic algorithms.
Under certain conditions, the clustering
so found minimizes the generalized Ncut on the
original graph.

15
NJW Algorithm (Multiway) (ang)

Set the diagonal elements Sii to 0.
Compute LD-1/2SD-1/2 (related to the laplacian.)
Let Un k be the top k eigenvectors of L
Form Y by normalizing rows of U to unit size
Group rows of Y using K-means-Orthogonal

16
Recursive Algorithms (SM,KVV)

Partition graph into two clusters and then
recursively partition the clusters.
Map all the points on the second largest
eigenvector of P and partition based on that.
Difference
SM based on Ncut. Guaranteed to minimize Ncut
under certain conditions.
KVV based on conductance.

17
List of Algorithms Used

Two linkage algorithms Single,Ward
6 recursive algorithms
shi_r,kvv_add,kvv_mult ncut,cond
6 spectral algorithms
ang,mcut mapping anchor,kmean,ward
6 doubly spectral algorithms
ang_ang,mcut_mcut,ang_mcut kmean,ward

18
Outline

Introduction
Algorithms
Theoretical Results
Experimental Setup
Results and Discussion
Conclusion

19
Theoretical Results

Stable method for ang,mcut
Conditions for perfect S for ang and mcut
Broader set of conditions for recursive variants.

20
Modification to ang,mcut

Lulu (definition of eigenvector)
D-1/2SD-1/2ul u
Pre mulitplying by D-1/2
D-1SD-1/2ul D-1/2u
Substituting vD-1/2u
D-1Svlv () Pvlv)
Pre multiplying by D
SvlDv (Use this instead of P or L)
We have v for Mcut and from that u for Ang.

21
Gains of Above proof

Numerically more stable than Mcut,NJW
Is the basis of near equivalence of Mcut,NJW.

22
Definitions

A matrix S is block diagonal w.r.t a clustering
DC1,,CK if Sij0 whenever i,j belong to
different clusters.

23
Definitions (contd)

A stochastic matrix P is block stochastic w.r.t
a clustering DC1,,CK iff
Rkkåj2 CkPij has the same value for all points
i2 Ck' for all k,k'1,, K
R is non singular.

24
Block stochastic

100 elements 5 clusters

25
Definitions (contd)

A similarity matrix S is perfect for for a
spectral method w.r.t. to a clustering if all the
points in the same cluster Ci are mapped to
exactly the same point in the spectral domain gi

26
Known Theoretical Results

Block diagonal S () P) is perfect for all
spectral algorithms.
In case of NJW it gives orthogonal clusters in
the spectral domain.
Block stochastic P is perfect for Mcut.

27
Our contribution.

Block stochastic P is perfect for NJW and Mcut
The clusters in the spectral domain are
orthogonal
The block stochastic is good for the recursive
algorithms as well. (and perfect for one variant
of SM)

28
Outline

Introduction
Algorithms
Theoretical Results
Experimental Setup
Results and Discussion
Conclusion

29
Datasets

Two artificial data set
A similarity matrix with a block stochastic P.
Various 2D points
Two real data sets
Gene expression dataset.(Have results from model
based clustering)
A handwritten digit recognition database
True clustering is always available.

30
Evaluating clustering performance

Measure it as deviation of the clustering C from
the true clustering. Ctrue (symmetric)
Two measures
Clustering Error Classification kind of error
measure.
Variation of Information Information theoretic
measure. (We would not show results on these).

31
Block stochastic

100 elements 5 clusters

32
Experiments Stability

Added noise to block diagonal hard to see the
robustness of various algorithm to noise. Sij
Sij (U(0,1) h sqrt(Dii Djj))/n
Uniform (random) noise so as to preserve the
signal to noise ratio.
Sij Sij (U(0,1) h sqrt(Dii Djj))/n
h varied from 10-0.1 to 100.7
10 runs for each algorithm and noise level

33
Experiments Gene Expression

For the gene expression data we compared the
performance of the spectral algorithms to model
based clustering.
(Yueng et. al. Bioinformics 2001)
Comparison was done with the best clustering
produced by five different kinds of models.

34
Outline

Introduction
Algorithms
Theoretical Results
Experimental Setup
Results and Discussion
Conclusion

35
Cluster_ward
Best recursive
Top 3 Multiway
36
Stability of spectral algorithms

The multiway spectral algorithms are the most
stable of all algorithms.
The best of recursive spectral are not too far
behind and they catch up as the noise is
increased.
The linkage algorithms are very sensitive to
noise.

37
Model
Recursive
38
Recursive
Model
39
Real Dataset Gene Expression.

The performance of spectral clusters is
competitive with model based structures.
The performance is dependent on the kind of pre
processing of data.

40
Conductance based
Ncut based
41
Recursive spectral algorithms

Algorithm based on the NCut measure are almost
always better than those based on Conductance.
The Conductance based algorithm are too sensitive
to noise.

42
Outline

Introduction
Algorithms
Theoretical Results
Experimental Setup
Results and Discussion
Conclusion

43
Conclusions

Proved Equivalence between two existing
techniques and generalized the results
Demonstrated competitive performance of spectral
algorithms.
Empirically compared the performance of various
algorithms containing different components of
spectral algorithms

44
Acknowledgements

Marina Meila
Ka Yee Yeung
Thomas Richardson
Jayant,Ashish

45
Future Work

Explore the SM algorithm with largest jump
measure
Automatically determine the number of clusters.
(gap, runt analysis).
Learn the similarity matrix (weights to
different dimensions)

46
All Beware

Here begins the world of Extra slides.
Enter at your own risk .

47
Cuts in a graph

(edge) cut set of edges whose removal makes a
graph disconnected
weight of a cut
cut( A, B ) ?i 2 A,j 2 B Sij

48
The Normalized Cut (NCut)

min NCut( A,A )
Small cut between subsets of equal size
NP-hard
Examples
Stochastic interpretation. P(A!A)P(A!A)

49
Conductance

Similar to Ncut (want to minimize it)
Only the size of the smaller cluster is taken
into account.
Stochastic interpretation
max(P(A! A) , P(A! A))

50
SM Algorithm (Recursive)

PD-1S
Compute v2 the 2nd largest eigen vector.
MIN-RATIO-CUT
Sort elements of v2 (v2i) in increasing order.
Compute NiNcut(1..i,i1..n)
Partition I into the two clusters Ci0,C'i0 where
i0argmin Ni
Repeat recursively with largest ?2

51
KVV algorithm (Recursive)

PD-1S
Compute v2 the 2nd largest eigen vector.
MIN-RATIO-CUT
Sort elements of v2 (v2i) in increasing order.
Compute NiConductance(1..i,i1..n)
Partition I into the two clusters Ci0,C'i0 where
i0argmin Ni
Repeat recursively with min conductance

52
KVV (continued)

Two variants possible depending on how the P of a
subset is calculated.
Pt, the matrix at step t is P on only the points
in this cluster.
Need to make it stochastic
All the enteries scale up (KVVmult)
Extra sum is added in the diagonal. (KVVadd)

53
Anchor Algorithm (Classic)

Anchor algorihtm
Choose a point (first anchor) at random.
Iteratively choose the next anchor to be the
point farthest from the existing anchors.
Assign the points to the closest anchor.
The anchor now represent the clusters

54
K Means (Classic)

K Means
Choose an initial set of center.
Repeat
Assign the point to the closest center to form a
cluster
Compute the new centers to be the mean of all
points in the cluster.
UNTIL convergence.
We used multiple runs of random and orthogonal
initialization and took the minimum distortion.

55
Linkage Algorithms (classic)

Initialize all the points to be 1-points
clusters.
Keep on merging the closest clusters until you
get k clusters.
Two variations
Single Linkage distance distance btw closest
points
Ward Linkage distance inner square distance

56
Ward Linkage (Details)

Ward linkage uses the incremental sum of squares
that is, the increase in the total within-group
sum of squares as a result of joining groups r
and s. It is given by
d(Cr,Cs)nrnsdrs2/(nrns)

57
Names in Experiment

shi_r Shi recursive (SM)
kvv_add,kvv_mult
mcut,ang
anchor,ward,single,kmeans

58
Clustering error

Number of misclassifications.
CEåiltgtj Confij
But the clusters may be reordered (permuted).
Need to minimize CE over all permutations
Done efficiently using weighted max bipartite
matching (using LP).

59
Variation of Information

Probability P(k)nk/n
Entropy H(C) - åk1K P(k)log P(k)
P(k,k)Confkk/n
Mutual information
I(C,C)åk1K åk1KP(k,k)log( P(k,k)/P(k)P(k)
)
VI(C,C) H(C) H(C) 2I(C,C)

60
Peformance Graphs

Six graphs for each dataset multiway,recursive,b
estfive CE,VI
Error bars shown only for the artificial dataset
in which noise was added. (- multiway).

61
NIST database

Handwritten Digit Recognition
32 32 bitmaps of digits 0..9
Dimension reduced to 8 8
) 64 length vector with values 0..16
100 digits each
0..9 digit1000
0,2,4,6,7 digitFive1000

62
Gene Expression Data

DNA microarray to study variation of many genes
together.
Yeast cell cycle data.
6000 gene expression over 17 time points
Restricted to 384 genes whose expression peaked
at diff points corresponding to five phase cycle.
Summary 5 clusters, 384 points of 17 dimensions.
Two kind of normalization
Logarithmic cellcycle
Standardization cellcylcle-std (fits guassian
model better)

Linkage algorithm too bad
Multiway gt recursive.
In case of well separated digits the multiway
have near perfect performance.
In case of multiway spectral algorithms it is
better to underestimate than overestimate.
Digit dataset

64
Summary of Results

Spectral algorithm perform significantly better
than the linkage algorithm even when the clusters
are not well separated.
Spectral algo are more robust to noise.
Ncut gt conductance
Multiway better than recursive except when noise
is better
Ward,kmean slightly better than anchor
More EV more noise. So better to underestimate K

65
Experiments Real experiments

Write a Comment

User Comments (0)

About PowerShow.com

Comparison of Spectral Clustering Methods - PowerPoint PPT Presentation

Comparison of Spectral Clustering Methods

... in that domain using a 'classic' clustering algorithm. original ... 'Classic' Anchor (Moore, UAI2000) Linkage algorithms (single,ward) May 16, ... 'classic' ... – PowerPoint PPT presentation