Learning%20Spectral%20Clustering,%20With%20Application%20to%20Speech%20Separation

About This Presentation

Title:

Learning%20Spectral%20Clustering,%20With%20Application%20to%20Speech%20Separation

Description:

W is not necessarily positive semidefinite; ... R-way Normalized Cuts ... into R disjoint clusters by minimizing the normalized cut, cost function, C(A, W) ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 15

Provided by: Yut7

Learn more at: https://people.ee.duke.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning%20Spectral%20Clustering,%20With%20Application%20to%20Speech%20Separation

1
Learning Spectral Clustering, With Application to
Speech Separation

F. R. Bach and M. I. Jordan, JMLR 2006

2
Outline

Introduction
Normalized cuts -gt Cost functions for spectral
clustering
Learning similarity matrix
Approximate scheme
Examples
Conclusions

3
Introduction 1/2

Traditional spectral clustering techniques
Assume a metric/similarity structure, then use
clustering algorithms.
Manual feature selection and weight are
time-consuming.
Proposed method
A general framework for learning the similarity
matrix for spectral clustering from data.
Assume given data with known partitions and want
to build similarity matrices that will lead to
these partitions in spectral clustering.
Motivations
Hand-labelled databases are available image,
speech.
Robust to irrelevant features.

4
Introduction 2/2

Whats new?
Two cost functions J1(W, E), J2(W, E), W
similarity matrix, E a partition.
MinE J1 ? New clustering algorithms
MinW J1 ? learning the similarity matrix
W is not necessarily positive semidefinite
Design numerical approximation scheme for large
scale.

5
Spectral Clustering NCuts 1/4

R-way Normalized Cuts
Each data point is one node in a graph, the
weight on the edge connecting two nodes is the
similarity of those two.
A graph is partitioned into R disjoint clusters
by minimizing the normalized cut, cost function,
C(A, W),
V1,,P, index set of all data points
AArr?1,,R, Union of ArV.
is total
weight between A and B.
, normalized term penalizes unbalanced
partition.

6
Spectral Clustering NCuts 2/4

Another form of Ncuts
E(e1,,eR), er is the indicator vector (P by 1)
for the r-th cluster.
Spectral Relaxation
Removing the constraint (a), the relaxed
optimization problem is solved as follows,
The relaxed solutions generally are not piecewise
constant, so have to be projected back to subset
defined by (a).

7
Spectral Clustering NCuts 3/4

Rounding
Minimization of a metric between the relaxed
solution and the entire set of discrete allowed
solutions,
Relaxed solution
Desired solution
Try to compare the subspaces spanned by their
columns? compare the orthogonal projection
operator on those subspaces, i.e. Frobenius norm
between YeigYeigTUUT and
.
Cost function is given as

8
Spectral Clustering NCuts 4/4

Spectral clustering algorithms
Variational form of cost function,
An weighted K-means algorithm can be used to
solve the minimization.

9
Learning the Similarity Matrix 1/2

Objective
Assume known partition E and a parametric form
for W, learn parameters that generalized to
unseen data sets.
Naïve approach
Minimize the distance between true E and the
output of spectral clustering algorithm (function
of W).
Hard to optimize because of non continuous cost
function.
Cost functions as upper bounds of naïve cost
function
Minimize cost function J1(W, E), J2(W, E) is
equivalent to minimize an upper bound on the true
cost function.

10
Learning the Similarity Matrix 2/2

Algorithms
Given N data sets Dn, each Dn is composed of Pn
points
Each data set is segmented, known partition En
The cost function is
L-1 norm feature selection
Use steepest descent method to minimize H(a)
w.r.t a.

11
Approximation Scheme

Low-rank nonnegative decomposition
Approximate each column of W by a linear
combination of a set of randomly chosen columns
(I) wj?i?IHijwi , j?J
H is chosen so that
is minimum.
Decomposition
Randomly select a set of columns (I)
Approximate W(I,J) as W(I,I)H.
Approximate W(J,J) as W(J,I)HHTW(I,J).
Complexity
Storage requirement is O(MP), M is of selected
columns.
Overall complexity is O(M2P).

12
Toy Examples
13
Line Drawings
Training set 1 Favor connectedness
Training set 2 Favor direction continuity
Examples of testing segmentation trained with
Training set 2
Examples of testing segmentation trained with
Training set 1
14
Conclusions

Two sets of algorithms are presented one for
spectral clustering and one for learning the
similarity matrix.
Minimization of a single cost function w.r.t. its
two arguments leads to these algorithms.
The approximation scheme is efficient.
New approach is more robust to irrelevant
features than current methods.

Write a Comment

User Comments (0)