Problems with Isomap, LLE and other Nonlinear Dimensionality Reduction Techniques PowerPoint PPT Presentation

presentation player overlay
1 / 19
About This Presentation
Transcript and Presenter's Notes

Title: Problems with Isomap, LLE and other Nonlinear Dimensionality Reduction Techniques


1
Problems with Isomap, LLE and other Nonlinear
Dimensionality Reduction Techniques
  • Jonathan Huang (jch1_at_cs.cmu.edu)
  • Advanced Perception 2006, CMU
  • May 1st, 2006

2
ISOMAP Algorithm Review
  • Connect each point to its k nearest neighbors to
    form a graph
  • Approximate pairwise geodesic distances using
    Dijkstra on this graph
  • Apply Metric MDS to recover a low dimensional
    isometric embedding

3
Questionable Assumptions
  • ISOMAP can fail in both of its steps, (during the
    geodesic approximation and during MDS) if the
    assumptions which guarantee success are not met
  • Geodesic Approximation
  • Points need to be sampled uniformly (and fairly
    densely) from a manifold with no noise
  • The intrinsic parameter space must be convex
  • MDS
  • There might not exist an isometric embedding (or
    anything close to one)

4
ISOMAP Topological Instabilities
  • ISOMAP is prone to short circuits and topological
    instabilities
  • Its not always clear how to define neighborhoods
  • We might get a disconnected graph

5
ISOMAP Convex Intrinsic Geometry
  • Image databases (Donoho/Grimes)

Input Data
ISOMAP output
6
ISOMAP Nonconvex Intrinsic Geometry
  • Problems in estimating the geodesic can arise if
    the parameter space is not convex
  • Examples (S1 topology, Images of a rotating
    teapot)

Input Data
ISOMAP output
eigenvalues
Images by Lawrence Saul and Carrie Grimes
7
ISOMAP Nonconvex Intrinsic Geometry
  • ISOMAP can fail for
  • Occlusions
  • Periodic Gaits

(Is this guy wearing clothes???)
Images by Lawrence Saul and Carrie Grimes
8
ISOMAP Complexity
  • For large datasets, ISOMAP can be slow
  • k-nearest neighbors scales as O(n2 D) (the naïve
    implementation anyway)
  • Djikstra scales as O(n2 logn n2 k)
  • Metric MDS scales as O(n2 d)
  • One solution is to use Nystrom approximations
    (Landmark ISOMAP)
  • But we need lots of points to get an accurate
    approximation to the true geodesic!

9
ISOMAP Dimensionality Estimation
  • Preserving distances may hamper dimensionality
    reduction!
  • Gauss Theorema Egregium says that some objects
    just cant be embedded isometrically in a lower
    dimension
  • (The intrinsic curvature of a surface is
    invariant under local isometry)
  • It is sometimes possible to figure out the
    intrinsic dimension of a surface by looking at
    the spectrum given by MDS. This will not work
    for a large class of surfaces though.

Fish Bowl Dataset
ISOMAP embedding
A Better embedding
10
ISOMAP Weakness Summary
  • Sensitive to noise (short circuits)
  • Fails for nonconvex parameter spaces
  • Fails to recover correct dimension for spaces
    with high intrinsic curvature
  • Slow for large training sets

11
LLE Algorithm Review
  • Compute the k nearest neighbors
  • Solve for the weights necessary to reconstruct
    each point using a linear combination of its
    neighbors
  • Find a low dimensional embedding which minimizes
    reconstruction loss

12
LLE
  • LLE has some nice properties
  • The result is globally optimal
  • The hardest part is a sparse eigenvector
    problem
  • Does not worry about distance preservation (this
    can be good or bad I suppose)
  • But
  • Dimensionality estimation is not as straight
    forward
  • There are no theoretical guarantees
  • Like ISOMAP, it is sensitive to noise

13
LLE Estimating Dimension
  • The eigenvalues of the matrix do not clearly
    indicate dimensionality!

14
LLE
  • Dependency on the size of the neighborhood set
    (ISOMAP also has this problem)

15
LLE Alternative
  • In contrast to ISOMAP, LLE does not really come
    with many theoretical guarantees
  • There are LLE extensions that do have guarantees
    e.g. Hessian LLE

16
LLE
  • Versus PCA on Digit Classification
  • Is Dimensionality Reduction really the right
    thing to do for this supervised learning task?
  • Does the space of handwritten digits have
    manifold geometry?

17
General Remarks
  • ISOMAP, LLE share many virtues, but they are also
    marred by the same flaws
  • Sensitivity to noise
  • Sensitivity to non-uniformly sampled data
  • No principled approaches to determining intrinsic
    topology (or dimensionality for that matter)
  • No principled way to set K, the size of the
    neighborhood set
  • Dealing with different clusters (connected
    components)
  • No easy out-of-sample extensions

18
More General Remarks
  • Manifold Learning is not always appropriate!
  • Example PCA is often bad for classification
  • When does natural data actually lie on manifold?
  • How do we reconcile nonlinear dimensionality
    reduction with kernel methods?
  • There is very little work thats been done on
    determining the intrinsic topology of high
    dimensional data (and topology is important if we
    hope to recover natural parameterizations)

19
References
  • J. B. Tenenbaum, V. De Silva, and J. C. Langford.
    A global geometric framework for nonlinear
    dimensionality reduction. Science, 2902319-2323,
    2000.
  • Sam Roweis Lawrence Saul. Nonlinear
    dimensionality reduction by locally linear
    embedding. Science, 2902323-2326, 2000.
  • David Donoho and Carrie Grimes. Hessian
    Eigenmaps New Locally-Linear Embedding
    Techniques for High-Dimensional Data. PNAS 100
    (2003) 55915596.
  • David Donoho and Carrie Grimes. Image manifolds
    which are isometric to Euclidean space. TR2002-27
    (Dept. of Statistics, Stanford University,
    Stanford, CA). 2002.
  • Lawrence K. Saul Sam T. Roweis. Think Globally,
    Fit Locally Unsupervised Learning of Low
    Dimensional Manifolds. JMLR, v4, pp. 119-155,
    2003.
Write a Comment
User Comments (0)
About PowerShow.com