Title: Overview of Clustering
1Overview of Clustering
2Outline
- K means for clustering
- Expectation Maximization algorithm for clustering
- Spectrum clustering (if time is permitted)
3Clustering
- Find out the underlying structure for given data
points
age
4Application (I) Search Result Clustering
5Application (II) Navigation
6Application (III) Google News
7Application (III) Visualization
Islands of music (Pampalk et al., KDD 03)
8Application (IV) Image Compression
http//www.ece.neu.edu/groups/rpl/kmeans/
9How to Find good Clustering?
- Minimize the sum of distance within clusters
10How to Efficiently Clustering Data?
11K-means for Clustering
- K-means
- Start with a random guess of cluster centers
- Determine the membership of each data points
- Adjust the cluster centers
12K-means for Clustering
- K-means
- Start with a random guess of cluster centers
- Determine the membership of each data points
- Adjust the cluster centers
13K-means for Clustering
- K-means
- Start with a random guess of cluster centers
- Determine the membership of each data points
- Adjust the cluster centers
14K-means
- Ask user how many clusters theyd like. (e.g.
k5)
15K-means
- Ask user how many clusters theyd like. (e.g.
k5) - Randomly guess k cluster Center locations
16K-means
- Ask user how many clusters theyd like. (e.g.
k5) - Randomly guess k cluster Center locations
- Each datapoint finds out which Center its
closest to. (Thus each Center owns a set of
datapoints)
17K-means
- Ask user how many clusters theyd like. (e.g.
k5) - Randomly guess k cluster Center locations
- Each datapoint finds out which Center its
closest to. - Each Center finds the centroid of the points it
owns
18K-means
- Ask user how many clusters theyd like. (e.g.
k5) - Randomly guess k cluster Center locations
- Each datapoint finds out which Center its
closest to. - Each Center finds the centroid of the points it
owns
Any Computational Problem?
19Improve K-means
- Group points by region
- KD tree
- SR tree
- Key difference
- Find the closest center for each rectangle
- Assign all the points within a rectangle to one
cluster
20Improved K-means
- Find the closest center for each rectangle
- Assign all the points within a rectangle to one
cluster
21Improved K-means
22Improved K-means
23Improved K-means
24Improved K-means
25Improved K-means
26Improved K-means
27Improved K-means
28Improved K-means
29Improved K-means
30A Gaussian Mixture Model for Clustering
- Assume that data are generated from a mixture of
Gaussian distributions - For each Gaussian distribution
- Center ?i
- Variance ?i (ignore)
- For each data point
- Determine membership
31Learning a Gaussian Mixture(with known
covariance)
32Learning a Gaussian Mixture(with known
covariance)
33Learning a Gaussian Mixture(with known
covariance)
E-Step
34Learning a Gaussian Mixture(with known
covariance)
M-Step
35Gaussian Mixture Example Start
36After First Iteration
37After 2nd Iteration
38After 3rd Iteration
39After 4th Iteration
40After 5th Iteration
41After 6th Iteration
42After 20th Iteration
43Mixture Model for Doc Clustering
44Mixture Model for Doc Clustering
45Mixture Model for Doc Clustering
46Mixture Model for Doc Clustering
Introduce hidden variable zij zij document di is
generated by the j-th language model ?j.
47Learning a Mixture Model
K number of language models
48Learning a Mixture Model
M-Step
49Examples of Mixture Models
50Other Mixture Models
- Probabilistic latent semantic index (PLSI)
- Latent Dirichlet Allocation (LDA)
51Problems (I)
- Both k-means and mixture models need to compute
centers of clusters and explicit distance
measurement - Given strange distance measurement, the center of
clusters can be hard to compute - E.g.,
52Problems (II)
- Both k-means and mixture models look for compact
clustering structures - In some cases, connected clustering structures
are more desirable
53Graph Partition
- MinCut bipartite graphs with minimal number of
cut edges
CutSize 2
542-way Spectral Graph Partitioning
- Weight matrix W
- wi,j the weight between two vertices i and j
- Membership vector q
55Solving the Optimization Problem
- Directly solving the above problem requires
combinatorial search ? exponential complexity - How to reduce the computation complexity?
56Relaxation Approach
- Key difficulty qi has to be either 1, 1
- Relax qi to be any real number
- Impose constraint
57Relaxation Approach
58Relaxation Approach
- Solution the second minimum eigenvector for D-W
59Graph Laplacian
- L is semi-positive definitive matrix
- For Any x, we have xTLx ? 0, why?
- Minimum eigenvalue ?1 0 (what is the
eigenvector?) -
- The second minimum eigenvalue ?2 gives the best
bipartite graph
60Recovering Partitions
- Due to the relaxation, q can be any number (not
just 1 and 1) - How to construct partition based on the
eigenvector?
61Spectral Clustering
- Minimum cut does not balance the size of
bipartite graphs
62Normalized Cut (Shi Malik, 1997)
- Minimize the similarity between clusters and
meanwhile maximize the similarity within clusters
63Normalized Cut
64Normalized Cut
- Relax q to real value under the constraint
65Image Segmentation
66Non-negative Matrix Factorization