Comparison of Spectral Clustering Methods - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Comparison of Spectral Clustering Methods

Description:

... in that domain using a 'classic' clustering algorithm. original ... 'Classic' Anchor (Moore, UAI2000) Linkage algorithms (single,ward) May 16, ... 'classic' ... – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 66
Provided by: csWash
Category:

less

Transcript and Presenter's Notes

Title: Comparison of Spectral Clustering Methods


1
Comparison of Spectral Clustering Methods
  • Quals Talk
  • by
  • Deepak Verma

2
Project goals
  • Various Spectral Algorithms present
  • Prove they are competitive and stable to noise
  • Compare their performance to see which of them
    works the best.
  • Proved near equivalence of two of the popular
    algorithms

3
Outline
  • Introduction
  • Algorithms
  • Theoretical Results
  • Experimental Setup
  • Results and Discussion
  • Conclusion

4
Clustering
  • Clustering Partitioning into dissimilar group of
    similar objects.
  • Old problem with many variants
  • Very hard problem as not clear what the
    definition of a cluster should be.

5
Spectral Clustering
  • Use the top eigenvectors of a matrix derived
    from similarities of the objects

6
Intuition
  • Map the objects into points in some spectral
    domain using the similarity matrix.
  • Cluster these points in that domain using a
    classic clustering algorithm.

K-dim spectral domain
original domain
7
Problem Formulation
  • Only similarities between objects is given. The
    points may or may not exist really.
  • The similarities are positive and symmetric
  • Number of clusters K is given.

8
Graphical Interpretation
  • Points i2I vertices of graph G
  • Edges ij pairs with Sij gt 0
  • Dii?i1n Sij degree of i
  • vol A ?i 2 A Dii volume of Aµ I
  • Clustering Partitioning

A
9
Notation
10
Good partitioning
  • How to measure a good partitioning
  • Minimize Normalized Cut (Ncut)
  • Minimize Conductance
  • Both NP hard. But how to calculate them ?

11
Outline
  • Introduction
  • Algorithms
  • Theoretical Results
  • Experimental Setup
  • Results and Discussion
  • Conclusion

12
Algorithms
  • Spectral
  • Shi Malik 97 (PAMI 2000) (SM)
  • Kannan Vempala Vetta. (FOCS 2000) (KVV)
  • Ng, Jordan and Weiss (NIPS 02) (NJW)
  • Meila Shi (AISTATS 01) (Mcut)
  • Classic
  • Anchor (Moore, UAI2000)
  • Linkage algorithms (single,ward)

13
Spectral Algorithms
  • Spectral Algorithms
  • Multiway Recursive
  • MCut NJW SM KVV

14
MeilaShi Algorithm (Multiway)(mcut)
  • Compute the Stochastic Matrix PD-1S
  • Let Vn k be the eigenvectors corresponding to
    the k largest eigenvalues P.
  • Consider the rows of V as (k-dim) points in the
    spectral domain, g1 gn
  • Cluster g1 gn using the classic algorithms.
  • Under certain conditions, the clustering
    so found minimizes the generalized Ncut on the
    original graph.

15
NJW Algorithm (Multiway) (ang)
  • Set the diagonal elements Sii to 0.
  • Compute LD-1/2SD-1/2 (related to the laplacian.)
  • Let Un k be the top k eigenvectors of L
  • Form Y by normalizing rows of U to unit size
  • Group rows of Y using K-means-Orthogonal

16
Recursive Algorithms (SM,KVV)
  • Partition graph into two clusters and then
    recursively partition the clusters.
  • Map all the points on the second largest
    eigenvector of P and partition based on that.
  • Difference
  • SM based on Ncut. Guaranteed to minimize Ncut
    under certain conditions.
  • KVV based on conductance.

17
List of Algorithms Used
  • Two linkage algorithms Single,Ward
  • 6 recursive algorithms
  • shi_r,kvv_add,kvv_mult ncut,cond
  • 6 spectral algorithms
  • ang,mcut mapping anchor,kmean,ward
  • 6 doubly spectral algorithms
  • ang_ang,mcut_mcut,ang_mcut kmean,ward

18
Outline
  • Introduction
  • Algorithms
  • Theoretical Results
  • Experimental Setup
  • Results and Discussion
  • Conclusion

19
Theoretical Results
  • Stable method for ang,mcut
  • Conditions for perfect S for ang and mcut
  • Broader set of conditions for recursive variants.

20
Modification to ang,mcut
  • Lulu (definition of eigenvector)
  • D-1/2SD-1/2ul u
  • Pre mulitplying by D-1/2
  • D-1SD-1/2ul D-1/2u
  • Substituting vD-1/2u
  • D-1Svlv () Pvlv)
  • Pre multiplying by D
  • SvlDv (Use this instead of P or L)
  • We have v for Mcut and from that u for Ang.

21
Gains of Above proof
  • Numerically more stable than Mcut,NJW
  • Is the basis of near equivalence of Mcut,NJW.

22
Definitions
  • A matrix S is block diagonal w.r.t a clustering
    DC1,,CK if Sij0 whenever i,j belong to
    different clusters.

23
Definitions (contd)
  • A stochastic matrix P is block stochastic w.r.t
    a clustering DC1,,CK iff
  • RkkÃ¥j2 CkPij has the same value for all points
    i2 Ck' for all k,k'1,, K
  • R is non singular.

24
Block stochastic
  • 100 elements 5 clusters

25
Definitions (contd)
  • A similarity matrix S is perfect for for a
    spectral method w.r.t. to a clustering if all the
    points in the same cluster Ci are mapped to
    exactly the same point in the spectral domain gi

26
Known Theoretical Results
  • Block diagonal S () P) is perfect for all
    spectral algorithms.
  • In case of NJW it gives orthogonal clusters in
    the spectral domain.
  • Block stochastic P is perfect for Mcut.

27
Our contribution.
  • Block stochastic P is perfect for NJW and Mcut
  • The clusters in the spectral domain are
    orthogonal
  • The block stochastic is good for the recursive
    algorithms as well. (and perfect for one variant
    of SM)

28
Outline
  • Introduction
  • Algorithms
  • Theoretical Results
  • Experimental Setup
  • Results and Discussion
  • Conclusion

29
Datasets
  • Two artificial data set
  • A similarity matrix with a block stochastic P.
  • Various 2D points
  • Two real data sets
  • Gene expression dataset.(Have results from model
    based clustering)
  • A handwritten digit recognition database
  • True clustering is always available.

30
Evaluating clustering performance
  • Measure it as deviation of the clustering C from
    the true clustering. Ctrue (symmetric)
  • Two measures
  • Clustering Error Classification kind of error
    measure.
  • Variation of Information Information theoretic
    measure. (We would not show results on these).

31
Block stochastic
  • 100 elements 5 clusters

32
Experiments Stability
  • Added noise to block diagonal hard to see the
    robustness of various algorithm to noise. Sij
    Sij (U(0,1) h sqrt(Dii Djj))/n
  • Uniform (random) noise so as to preserve the
    signal to noise ratio.
  • Sij Sij (U(0,1) h sqrt(Dii Djj))/n
  • h varied from 10-0.1 to 100.7
  • 10 runs for each algorithm and noise level

33
Experiments Gene Expression
  • For the gene expression data we compared the
    performance of the spectral algorithms to model
    based clustering.
  • (Yueng et. al. Bioinformics 2001)
  • Comparison was done with the best clustering
    produced by five different kinds of models.

34
Outline
  • Introduction
  • Algorithms
  • Theoretical Results
  • Experimental Setup
  • Results and Discussion
  • Conclusion

35
Cluster_ward
Best recursive
Top 3 Multiway
36
Stability of spectral algorithms
  • The multiway spectral algorithms are the most
    stable of all algorithms.
  • The best of recursive spectral are not too far
    behind and they catch up as the noise is
    increased.
  • The linkage algorithms are very sensitive to
    noise.

37
Model
Recursive
38
Recursive
Model
39
Real Dataset Gene Expression.
  • The performance of spectral clusters is
    competitive with model based structures.
  • The performance is dependent on the kind of pre
    processing of data.

40
Conductance based
Ncut based
41
Recursive spectral algorithms
  • Algorithm based on the NCut measure are almost
    always better than those based on Conductance.
  • The Conductance based algorithm are too sensitive
    to noise.

42
Outline
  • Introduction
  • Algorithms
  • Theoretical Results
  • Experimental Setup
  • Results and Discussion
  • Conclusion

43
Conclusions
  • Proved Equivalence between two existing
    techniques and generalized the results
  • Demonstrated competitive performance of spectral
    algorithms.
  • Empirically compared the performance of various
    algorithms containing different components of
    spectral algorithms

44
Acknowledgements
  • Marina Meila
  • Ka Yee Yeung
  • Thomas Richardson
  • Jayant,Ashish

45
Future Work
  • Explore the SM algorithm with largest jump
    measure
  • Automatically determine the number of clusters.
    (gap, runt analysis).
  • Learn the similarity matrix (weights to
    different dimensions)

46
All Beware
  • Here begins the world of Extra slides.
  • Enter at your own risk .

47
Cuts in a graph
  • (edge) cut set of edges whose removal makes a
    graph disconnected
  • weight of a cut
  • cut( A, B ) ?i 2 A,j 2 B Sij

48
The Normalized Cut (NCut)
  • min NCut( A,A )
  • Small cut between subsets of equal size
  • NP-hard
  • Examples
  • Stochastic interpretation. P(A!A)P(A!A)

49
Conductance
  • Similar to Ncut (want to minimize it)
  • Only the size of the smaller cluster is taken
    into account.
  • Stochastic interpretation
  • max(P(A! A) , P(A! A))

50
SM Algorithm (Recursive)
  • PD-1S
  • Compute v2 the 2nd largest eigen vector.
  • MIN-RATIO-CUT
  • Sort elements of v2 (v2i) in increasing order.
  • Compute NiNcut(1..i,i1..n)
  • Partition I into the two clusters Ci0,C'i0 where
    i0argmin Ni
  • Repeat recursively with largest ?2

51
KVV algorithm (Recursive)
  • PD-1S
  • Compute v2 the 2nd largest eigen vector.
  • MIN-RATIO-CUT
  • Sort elements of v2 (v2i) in increasing order.
  • Compute NiConductance(1..i,i1..n)
  • Partition I into the two clusters Ci0,C'i0 where
    i0argmin Ni
  • Repeat recursively with min conductance

52
KVV (continued)
  • Two variants possible depending on how the P of a
    subset is calculated.
  • Pt, the matrix at step t is P on only the points
    in this cluster.
  • Need to make it stochastic
  • All the enteries scale up (KVVmult)
  • Extra sum is added in the diagonal. (KVVadd)

53
Anchor Algorithm (Classic)
  • Anchor algorihtm
  • Choose a point (first anchor) at random.
  • Iteratively choose the next anchor to be the
    point farthest from the existing anchors.
  • Assign the points to the closest anchor.
  • The anchor now represent the clusters

54
K Means (Classic)
  • K Means
  • Choose an initial set of center.
  • Repeat
  • Assign the point to the closest center to form a
    cluster
  • Compute the new centers to be the mean of all
    points in the cluster.
  • UNTIL convergence.
  • We used multiple runs of random and orthogonal
    initialization and took the minimum distortion.

55
Linkage Algorithms (classic)
  • Initialize all the points to be 1-points
    clusters.
  • Keep on merging the closest clusters until you
    get k clusters.
  • Two variations
  • Single Linkage distance distance btw closest
    points
  • Ward Linkage distance inner square distance

56
Ward Linkage (Details)
  • Ward linkage uses the incremental sum of squares
    that is, the increase in the total within-group
    sum of squares as a result of joining groups r
    and s. It is given by
  • d(Cr,Cs)nrnsdrs2/(nrns)

57
Names in Experiment
  • shi_r Shi recursive (SM)
  • kvv_add,kvv_mult
  • mcut,ang
  • anchor,ward,single,kmeans

58
Clustering error
  • Number of misclassifications.
  • CEÃ¥iltgtj Confij
  • But the clusters may be reordered (permuted).
  • Need to minimize CE over all permutations
  • Done efficiently using weighted max bipartite
    matching (using LP).

59
Variation of Information
  • Probability P(k)nk/n
  • Entropy H(C) - Ã¥k1K P(k)log P(k)
  • P(k,k)Confkk/n
  • Mutual information
  • I(C,C)Ã¥k1K Ã¥k1KP(k,k)log( P(k,k)/P(k)P(k)
    )
  • VI(C,C) H(C) H(C) 2I(C,C)

60
Peformance Graphs
  • Six graphs for each dataset multiway,recursive,b
    estfive CE,VI
  • Error bars shown only for the artificial dataset
    in which noise was added. (- multiway).

61
NIST database
  • Handwritten Digit Recognition
  • 32 32 bitmaps of digits 0..9
  • Dimension reduced to 8 8
  • ) 64 length vector with values 0..16
  • 100 digits each
  • 0..9 digit1000
  • 0,2,4,6,7 digitFive1000

62
Gene Expression Data
  • DNA microarray to study variation of many genes
    together.
  • Yeast cell cycle data.
  • 6000 gene expression over 17 time points
  • Restricted to 384 genes whose expression peaked
    at diff points corresponding to five phase cycle.
  • Summary 5 clusters, 384 points of 17 dimensions.
  • Two kind of normalization
  • Logarithmic cellcycle
  • Standardization cellcylcle-std (fits guassian
    model better)

63
  • Linkage algorithm too bad
  • Multiway gt recursive.
  • In case of well separated digits the multiway
    have near perfect performance.
  • In case of multiway spectral algorithms it is
    better to underestimate than overestimate.
  • Digit dataset

64
Summary of Results
  • Spectral algorithm perform significantly better
    than the linkage algorithm even when the clusters
    are not well separated.
  • Spectral algo are more robust to noise.
  • Ncut gt conductance
  • Multiway better than recursive except when noise
    is better
  • Ward,kmean slightly better than anchor
  • More EV more noise. So better to underestimate K

65
Experiments Real experiments
Write a Comment
User Comments (0)
About PowerShow.com