Learning structure of manifolds using random projections - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

Learning structure of manifolds using random projections

Description:

Data matrix Xi,j of coordinates. Row i=1..N is data sample ... Compression also possible when going to projection coordinates. My interest: ... – PowerPoint PPT presentation

Number of Views:298

Avg rating:3.0/5.0

Slides: 35

Provided by: stev276

Category:

more less

Transcript and Presenter's Notes

Title: Learning structure of manifolds using random projections

1
Learning structure of manifolds using random
projections

Freund, Dasgupta, Kabra, Verma
UC San Diego
Presentation by Steven Bergner
Simon Fraser University

2
Structure

Definitions and problem setting
Related Work
Random projections trees
Results

3
Data

Data matrix Xi,j of coordinates
Row i1..N is data sample
Column j1..D is attribute or dimension
Challenges
Large N storage, streaming, sampling
Large D insufficient training data
Undefined fields graphical models

4
Manifolds

Every point has an Rn neighborhood
Global structure may be different

Chinese Swiss roll
source wikipedia
Earth
5
Dimension

Extrinsic
Number of measurements
(Non-)linear dependencies
Intrinsic
Data near d-dimensional manifold dltD
Independent, uncorrelated
E.g. doubling dimension

6
Distributions with low intrinsic d

Example Motion capturing
D markers each with 3 coordinates
Body posture determined by joint angles

7
Related work
8
(Non-)parametric statistics

Parametric
E.g. fitting a Gaussian to observations
Needs a model
Non-parametric
E.g. estimating a histogram (density)
Bayesian statistics
Manifold learning
Needs lots of examples
Framework Approximation theory

9
Manifold learning

Incrementally grow neighborhoods
Locally-linear embeddings Roweis Saul 2001
Wi,j weights of local neighbors to reconstruct
point i
Embedding coordinates in first eigenvectors of
Wi,j
ISOMAP Tenenbaum et al. 2001
Build k-nearest neighbor graph
Shortest path lengths between all points in
matrix A
Eigenvectors of A provide embedding coords

10
Random projections

Johnson-Lindenstrauss 83
Classifier capacity with random projections Garg
2002
Compressive sensing Candes 2006

11
Johnson-Lindenstrauss Lemma

Target dimension k does not depend on original
dimension d

DasguptaGupta 99
12
Random projection trees
13
Kd-trees

BSP
Used for nearest neighbor queries
Associative memory

14
RP-trees

Split along random directions
Split point minimized inner-cell variance

15
Algorithm Make Tree

S is point set
Rule(x) divides the set

16
Algorithm PCA choose rule

Sorting along random direction v will give
similar median

17
Point set diameters

Diameter of S
maxx-y for all x,y in S
Average diameter

18
Algorithm RP tree choose rule

Split minimizes inner-class variance

19
Building an RP tree

PCA Ellipsoid for comparison only
split now chosen via RP rule

20
Building an RP tree
21
Building an RP tree
22
Building an RP tree
23
Building an RP tree
24
Building an RP tree
25
Split diameter Theorem

Covariance dimension d(?) fulfils
For a cell C split into several C

26
Proof for doubling dimension d

A cell of diameter ? may be covered by O(dlogd)
balls of radiuslt?/2
Those can be split with O(dlogd) projections

27
Streaming implementation

Fixed set of random directions v chosen at
beginning
Use v that minimizes avg. diameter
Both splits operate on projected pts.
Statistics updated for each node

28
Results
29
Results (synthetic data 1)
Data set 1 10,000 points in 1000-dimensional
unit cube randomly perturbed by Gaussian noise
with sigma1
30
Results (synthetic data 2)
Data set 2 10,000 points chosen equally from
two 1000-dimensional Gaussians at (1,..,1) and
(-1,,-1)
31
MNIST data Handwritten digits 1
32
MNIST data Handwritten digits 1
33
MNIST data Handwritten digits 1
34
Applications