Using Manifold Structure for Partially Labeled Classification - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Using Manifold Structure for Partially Labeled Classification

Description:

by Belkin and Niyogi, NIPS 2002. Outline. Motivations. Algorithm Description ... Data lies on a lower-dimensional manifold a dimension reduction is preferable ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 15
Provided by: hui4
Category:

less

Transcript and Presenter's Notes

Title: Using Manifold Structure for Partially Labeled Classification


1
Using Manifold Structure for Partially Labeled
Classification
by Belkin and Niyogi, NIPS 2002
Presented by Chunping Wang Machine Learning
Group, Duke University November 16, 2007
2
Outline
  • Motivations
  • Algorithm Description
  • Theoretical Interpretation
  • Experimental Results
  • Comments

3
Motivations (1)
  • Why manifold structure is useful?
  • Data lies on a lower-dimensional manifold a
    dimension reduction is preferable
  • an example a handwritten digit 0

Usually, dimensionality is the number of pixels,
typically very high (256)
4
Motivations (1)
  • Why manifold structure is useful?
  • Data lies on a lower-dimensional manifold a
    dimension reduction is preferable
  • an example a handwritten digit 0

Usually, dimensionality is the number of pixels,
typically very high (256)
d1
Ideally, 5-dimensional features
f1

d2
f2

5
Motivations (1)
  • Why manifold structure is useful?
  • Data lies on a lower-dimensional manifold a
    dimension reduction is preferable
  • an example a handwritten digit 0

Actually, a higher dimensionality, but perhaps no
more than several dozens
Usually, dimensionality is the number of pixels,
typically far higher (256)
d1
Ideally, 5-dimensional features
f1

d2
f2

6
Motivations (2)
  • Why manifold structure is useful?
  • Data representation in the original space is
    unsatisfactory

labeled
unlabeled
In the original space
2-d representation with Laplacian Eigenmaps
7
Algorithm Description (1)
Semi-supervised classification
k points
First s are labeled (sltk)
for binary cases
  • Constructing the Adjacency Graph
  • if i is among n nearest neighbors of j or j is
    among n nearest neighbors of i
  • Eigenfunctions
  • compute , corresponding to
    the p smallest eigenvelues
  • for the graph Laplacian L D-W,

8
Algorithm Description (2)
Semi-supervised classification
k points
First s are labeled (sltk)
for binary cases
  • Building the classifier
  • minimize the error function
    over the
    space of coefficients a
  • the solution is
  • Classifying unlabeled points (i gts)


9
Theoretical Interpretation (1)
For a manifold , the
eigenfunctions of its Laplacian form a basis for
the Hilbert space , i.e., any
function can be written as
with eigenfunctions satisfying
The simplest nontrivial example the manifold is
a unit circle S1
Fourier series
10
Theoretical Interpretation (2)
Smoothness measure S a small S means smooth
For unit circle S1
Generally
Smaller eigenvalues correspond to smoother
eigenfunctions (lower frequency)
is a constant function
In terms of the smoothest p eigenfunctions, the
approximation of an arbitrary function
11
Theoretical Interpretation (3)
Back to our problem with finite number of points
The solution of a discrete version
For binary classification, the alphabet of the
function f only contains two possible values. For
M-ary cases, the only difference is the number of
possible values is more than two.
12
Results (1)
Handwritten Digit Recognition (MNIST data set)
60,000 28-by-28 gray images (the first 100
principal components are used)
p20 k
13
Results (2)
Text Classification (20 Newsgroups data set)
19,935 vectors with dimensionality of 6000
p20 k
14
Comments
  • This semi-supervised algorithm essentially
    converts the original problem to a linear
    regression problem in a new space with lower
    dimensionality.
  • The approach to solve this linear regression
    problem is the standard least square estimation.
  • Only n nearest neighbors are considered for
    each data point, thus the computation for
    eigen-decomposition is reduced.
  • Little additional computation is expended after
    dimensionality reduction.
  • More comments
Write a Comment
User Comments (0)
About PowerShow.com