Title: Using Manifold Structure for Partially Labeled Classification
1Using Manifold Structure for Partially Labeled
Classification
by Belkin and Niyogi, NIPS 2002
Presented by Chunping Wang Machine Learning
Group, Duke University November 16, 2007
2Outline
- Motivations
- Algorithm Description
- Theoretical Interpretation
- Experimental Results
- Comments
3Motivations (1)
- Why manifold structure is useful?
- Data lies on a lower-dimensional manifold a
dimension reduction is preferable - an example a handwritten digit 0
Usually, dimensionality is the number of pixels,
typically very high (256)
4Motivations (1)
- Why manifold structure is useful?
- Data lies on a lower-dimensional manifold a
dimension reduction is preferable - an example a handwritten digit 0
Usually, dimensionality is the number of pixels,
typically very high (256)
d1
Ideally, 5-dimensional features
f1
d2
f2
5Motivations (1)
- Why manifold structure is useful?
- Data lies on a lower-dimensional manifold a
dimension reduction is preferable - an example a handwritten digit 0
Actually, a higher dimensionality, but perhaps no
more than several dozens
Usually, dimensionality is the number of pixels,
typically far higher (256)
d1
Ideally, 5-dimensional features
f1
d2
f2
6Motivations (2)
- Why manifold structure is useful?
- Data representation in the original space is
unsatisfactory
labeled
unlabeled
In the original space
2-d representation with Laplacian Eigenmaps
7Algorithm Description (1)
Semi-supervised classification
k points
First s are labeled (sltk)
for binary cases
- Constructing the Adjacency Graph
- if i is among n nearest neighbors of j or j is
among n nearest neighbors of i - Eigenfunctions
- compute , corresponding to
the p smallest eigenvelues - for the graph Laplacian L D-W,
8Algorithm Description (2)
Semi-supervised classification
k points
First s are labeled (sltk)
for binary cases
- Building the classifier
- minimize the error function
over the
space of coefficients a - the solution is
- Classifying unlabeled points (i gts)
9Theoretical Interpretation (1)
For a manifold , the
eigenfunctions of its Laplacian form a basis for
the Hilbert space , i.e., any
function can be written as
with eigenfunctions satisfying
The simplest nontrivial example the manifold is
a unit circle S1
Fourier series
10Theoretical Interpretation (2)
Smoothness measure S a small S means smooth
For unit circle S1
Generally
Smaller eigenvalues correspond to smoother
eigenfunctions (lower frequency)
is a constant function
In terms of the smoothest p eigenfunctions, the
approximation of an arbitrary function
11Theoretical Interpretation (3)
Back to our problem with finite number of points
The solution of a discrete version
For binary classification, the alphabet of the
function f only contains two possible values. For
M-ary cases, the only difference is the number of
possible values is more than two.
12Results (1)
Handwritten Digit Recognition (MNIST data set)
60,000 28-by-28 gray images (the first 100
principal components are used)
p20 k
13Results (2)
Text Classification (20 Newsgroups data set)
19,935 vectors with dimensionality of 6000
p20 k
14Comments
- This semi-supervised algorithm essentially
converts the original problem to a linear
regression problem in a new space with lower
dimensionality. - The approach to solve this linear regression
problem is the standard least square estimation. - Only n nearest neighbors are considered for
each data point, thus the computation for
eigen-decomposition is reduced. - Little additional computation is expended after
dimensionality reduction. - More comments