Title: Diffusion%20Maps%20and%20Spectral%20Clustering
1Diffusion Maps and Spectral Clustering
1/14
Machine Learning Seminar Series
- Author Ronald R. Coifman et al. (Yale
University) - Presenter Nilanjan Dasgupta (SIG Inc.)
2Motivation
2/14
-- Datum
Low-dimensional Manifold
- Data lie on a low-dimensional manifold. The
shape of the - manifold is not known a priori.
- PCA would fail to make compact representation
since the - manifold is not linear !
- Spectral clustering as a non-linear
dimensionality reduction - scheme.
3Outline
3/14
- Non-linear dimensionality reduction and
spectral clustering. - Diffusion based probabilistic interpretation of
spectral methods. - Eigenvectors of normalized graph Laplacian is a
discrete - approximation of the continuous
Fokker-Plank operator. - Justification of the success of spectral
clustering. - Conclusions.
4 Spectral clustering
4/14
- Nomalized graph Laplacian
- Given N data points where each
, the distance - (similarity) between any two points xi and
xj is given by - with Gaussian kernel of
width e -
- and a diagonal normalization matrix
- Solve the normalized eigenvalue problem
- Use first few eigenvectors of M for
low-dimensional - representation of data or good
coordinates for clustering.
5Spectral Clustering previous work
5/14
- Non-linear dimensionality analysis by S. Roweis
and L.Saul - (published in Science magazine, 2000).
- Belkin Niyogi (NIPS02) show that if data are
sampled uniformly - from the low-dimensional manifold, first
few eigenvectors of - MD-1L are discrete approximation of the
Laplace-Beltrami - operator on the manifold.
- Meila Shi (AIStat01) interpret M as a
stochastic matrix - representing random walk on the graph.
6Diffusion distance and Diffusion map
6/14
- A symmetric matrix Ms can be derived from M as
- M and Ms has same N eigenvalues,
- Under random walk representation of the graph M
f left eigenvector of M y right eigenvector
of M
e time step
7Diffusion distance and Diffusion map
7/14
- e has the dual representation (time step and
kernel width).
- If one starts random walk from location xi , the
probability of - landing in location y after r time steps
is given by - For large e, all points in the graph are
connected (Mi,j gt0) and - the eigenvalues of M
-
where ei is a row vector with all zeros except
that ith position 1.
8Diffusion distance and Diffusion map
8/14
- One can show that regardless of starting point
xi
Left eigenvector of M with eigenvalue l01
with
- Eigenvector f0(x) has the dual representation
- 1. Stationary probability distribution on
the curve, i.e., the - probability of landing at location x
after taking infinite - steps of random walk (independent of the start
location). - 2. It is the density estimate at location
x.
9Diffusion distance
9/14
- yk and fk are the right and left eigenvectors
of graph Laplacian M. - is the kth eigenvalue of M r (arranged in
descending order).
- Given the definition of random walk, we denote
Diffusion - distance as a distance measure at time t
between two pmfs as
with empirical choice w(y)1/f0(y).
10Diffusion Map
10/14
- Diffusion map Mapping between original space
and first - k eigenvectors as
Relationship
- This relationship justifies using Euclidean
distance in diffusion - map space for spectral clustering.
- Since , it is justified
to stop at appropriate k with - a negligible error of order O(lk1/lk)t).
11Asymptotics of Diffusion Map
11/14
- Suppose xi are sampled i.i.d. from
probability density p(x) - defined over manifold
Z
- Suppose p(x) e-U(x) with U(x) is potential
- (energy) at location x.
- As , random walk on a discrete graph
- converges to random walk on the continuous
manifold W. - The forward and backward operators are
given by
12Asymptotics of Diffusion Map
12/14
- Tff the probability distribution after one
time-step e - f(x) is probability distribution on the graph
at t0. - Tby(x) is the mean of function y after one
time-step e, for a random walk - that started at location x at time t0.
- Consider the limit , i.e., when
- each data point contains infinite nearby
neighbors. Hence - in that limit, random walk converges to a
diffusion process - with probability density evolving
continuously in time as
13Fokker-Plank operator
13/14
- Infinitesimal generators (propagators)
- The eigenfunctions of Tf and Tb converge to
those of Hf and Hb, respectively. - The backward generator is given by the Fokker
Plank operator
which corresponds to a diffusion process in a
potential field 2U(x).
14Spectral clustering and Fokker-Plank operator
14/14
- The term is interpreted as the
drift term towards - low potential (higher data density).
- The left and right eigenvectors of M can be
viewed as discrete - approximations of Tf and Tb, respectively.
- Tf and Tb can be viewed as approximation to Hf
and Hb, which - in the asymptotic case ( ) can be
viewed as diffusion - process with potential 2U(x)
(p(x)exp(-U(x)).