Advanced Machine Learning

About This Presentation

Title:

Advanced Machine Learning

Description:

PCA: linear manifold MDS: get inter-point distances, find 2D data with same LLE: mimic neighborhoods using low dimensional vectors GTM: ... Invariant Stress = Eg. – PowerPoint PPT presentation

Number of Views:1254

Avg rating:3.0/5.0

Slides: 18

Provided by: jeb75

Learn more at: http://www.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Advanced Machine Learning

1
Advanced Machine Learning Perception
Instructor Tony Jebara
2
Topic 12

Manifold Learning (Unsupervised)
Beyond Principal Components Analysis (PCA)
Multidimensional Scaling (MDS)
Generative Topographic Map (GTM)
Locally Linear Embedding (LLE)
Convex Invariance Learning (CoIL)
Kernel PCA (KPCA)

3
Manifolds

Data is often embedded in a lower dimensional
space
Consider image of face being translated from
left-to-right
How to capture the true coordinates of the data
on the
manifold or embedding space and represent it
compactly?
Open problem many possible approaches
PCA linear manifold
MDS get inter-point distances, find 2D data with
same
LLE mimic neighborhoods using low dimensional
vectors
GTM fit a grid of Gaussians to data via
nonlinear warp
Linear after Nonlinear normalization/invariance
of data
Linear in Hilbert space (Kernels)

4
Principal Components Analysis

If we have eigenvectors, mean and coefficients
Getting eigenvectors (I.e. approximating the
covariance)
Eigenvectors are orthonormal
In coordinates of v, Gaussian is diagonal, cov
L
All eigenvalues are non-negative
Higher eigenvalues are higher variance, use those
first
To compute the coefficients

5
Multidimensional Scaling (MDS)

Idea capture only distances between points X
in original space
Construct another set of low dim or 2D Y points
having
same distances
A Dissimilarity d(x,y) is a function of two
objects x and y
such that
A Metric also has to satisfy triangle inequality
Standard example Euclidean l2 metric
Assume for N objects, we compute a dissimilarity
D
matrix which tells us how far they are

6
Multidimensional Scaling

Given dissimilarity D between original X points
under
original d() metric, find Y points with
dissimilarity D under
another d() metric such that D is similar to D
Want to find Ys that minimize some difference
from D to D
Eg. Least Squares Stress
Eg. Invariant Stress
Eg. Sammon Mapping
Eg. Strain

Some are global Some are local Gradient descent
7
MDS Example 3D to 2D

Have distances from
cities to cities, these
are on the surface of
a sphere (Earth) in
3D space
Reconstructed 2D
points on plane
capture essential
properties (poles?)

8
MDS Example Multi-D to 2D

More
elaborate
example
Have
correlation
matrix between
crimes. These
are arbitrary
dimensionality.
Hack convert
correlation
to dissimilarity
and show
reconstructed Y

9
Locally Linear Embedding

Instead of distance, look at neighborhood of each
point.
Preserve reconstruction of point with neighbors
in low dim
Find K nearest neighbors
for each point
Describe neighborhood as
best weights on neighbors
to reconstruct the point
Find best vectors that still
have same weights

Why?
10
Locally Linear Embedding

Finding Ws (convex combination of weights on
neighbors)

3) Find l 4) Find w
1) Take Deriv Set to 0
2) Solve Linear system
11
Locally Linear Embedding

Finding Ys (new low-D points that agree with the
Ws)
Solve for Y as
the bottom d1
eigenvectors of M
Plot the Y values

12
LLE Examples

Original X data are raw
images
Dots are reconstructed
two-dimensional Y
points

13
LLEs

TopPCA
BottomLLE

14
Generative Topographic Map

A principled altenative to the Kohonen map
Forms a generative
model of the
manifold. Can
sample it, etc.
Find a nonlinear
mapping y() from
a 2D grid of Gaussians.
Pick params W of mapping such that mapped
Gaussians in
data space maximize the likelihood of the
observed data.
Have two spaces, the data space t (old notation
were Xs)
and the hidden latent space x (old notation
were Ys).
The mapping goes from latent space to observed
space

15
GTM as a Grid of Gaussians

We choose our priors and
conditionals for all
variables of
interest
Assume Gaussian
noise on the
y() mapping
Assume our prior latent variables are a grid
model
equally spaced in latent space
Can now write out the full likelihood

16
GTM Distribution Model

Integrating over delta functions makes a
summation
Note the log-sum, need to apply EM to maximize
Also, use the following parametric
(linear in the basis) form of the mapping
Examples of
manifolds for
randomly chosen
W mappings
Typically, we are
given the data and
want to find the maximum likelihood mapping W
for it

17
GTM Examples

Recover non-linear
manifold by warping
grid with W params
Synthetic Example
Left Initialized
Right Converged
Real Example
Oil Data
3-Classes
Left GTM
Right PCA

Write a Comment

User Comments (0)

About PowerShow.com

Advanced Machine Learning - PowerPoint PPT Presentation

Advanced Machine Learning

PCA: linear manifold MDS: get inter-point distances, find 2D data with same LLE: mimic neighborhoods using low dimensional vectors GTM: ... Invariant Stress = Eg. – PowerPoint PPT presentation