Principle Component Analysis - PowerPoint PPT Presentation

1 / 26

About This Presentation

Title:

Principle Component Analysis

Description:

In practice, pre-processing data is one of main issues in network training; Post ... information, and on the other hand, neglects those unimportant components. ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 27

Provided by: AndyPhil8

Category:

more less

Transcript and Presenter's Notes

Title: Principle Component Analysis

1
Principle Component Analysis
2
The general framework for network training
Output Data
Post-processing
Network Optimization
Pre-processing
Input data
In practice, pre-processing data is one of main
issues in network training Post-processing often
only involves passively re-map the network output
into its raw form.
3
Pre-processing Data (1)

Input Normalization to normalize data to be in
the same scale and
have zero mean.
Dimensionality Reduction Because of the curse of
dimensionality,
for high dimensional data, it is essential to
reduce their dimensionality
first.
Feature Selection We may use prior knowledge to
choose those
important features as inputs.

4
Pre-processing Data (2)

Dimensionality reduction, or Feature Selection,
will in general
lead to information loss.
A good Dimensionality Reduction, or Feature
Selection, strategy should, on one hand, retains
as much as possible relevant information, and on
the other hand, neglects those unimportant
components.
In general, Dimensionality Reduction or Feature
Selection is
task-dependent, but there have some unsupervised
strategies which tend to work well in many
cases.

5
A general idea for unsupervised feature selection

Intuitively, for an input component i, if
has large variability among
examples, this component tends to be important,
in the sense of that
it more likely determine the output value.
On the other hand, if takes roughly the
same or similar values for
all examples, then this component is unlikely to
have a significant
influence on outputs, and hence can be
neglected.
In other words, it is sensible to choose those
components having
large variability as important features if we do
not have other prior
knowledge.

Mathematically, we may measure the variability
of an input component by its variance

Principle component analysis is one strategy to
implement this idea.

6
An example
The important feature, or the principle
component, in this example, is the component 1.
7
Another example

In general, the principle component is unlikely
to be just one of the input
components, but rather, a linear combination of
them.
In this example, the principle component is the
direction of u_1,
on which data points vary a lot.
The important feature of data is then given by
the projection of data on the principle
component, which in this example is
. The projection on the other
direction can be ignored.

8
A little bit linear algebra (1)

For a data point in a M-dimensional space, it can
always be expressed as a linear combination of M
independent M-dimensional vectors.

We may choose the basis vectors to be orthogonal,
i.e.,

or furthermore to be orthonormal if one more
condition is satisfied
9
A little bit of linear algebra (2)

The projection of x on a orthonormal basis vector
is given by

its squared value can be written as

Eigenvector and Eigenvalue

10
Principle Component Analysis (1)

For each M-dimensional input vector, we project
it on a set of M orthonomal basis vectors. These
M projections consists the new representation of
the data point (a kind of coordinate
transformation)
The variance of projections on one basis
direction can be written as

where the matrix S is called the covariance
matrix of data points
11
Principle Component Analysis (2)

If the projection direction is the eigenvector of
the covariance matrix

Thus, the jth eigenvalue of the covariance matrix
of data quantifies the variability
of projections of data on the jth eigenvector.
To reduce the dimensionality of input
to number d, we can first list eigenvectors in
the order of magnitude of the eignevalues
and choose the first d eigenvectors as principle
components. The projections of data
on these principle components form the new
representation of input.

PCA works well for Gaussian distribution data,
since in this case the eignenvalues
of the covariance matrix indeed quantifies the
variability of data along their
corresponding directions.

12
The algorithm of PCA

Compute the covariance matrix of input data

2. Compute the eigenvalues and eigenvectors of
the covariance matrix
3. Arrange eigenvectors in the order of magnitude
of their eigenvalues.
Take the first d eigenvectors as principle
components if the input dimensionality is to be
reduced to d.
4. Project the input data onto the principle
components, which forms the representation of
input data.
13
Cases when PCA fail (1)

PCA projects data onto a set of orthogonal
vectors (principle components). This restricts
the new input components to be the linear
combination of old ones.
In cases, however, when the intrinsic freedom of
data can not be expressed as a linear combination
of input components, PCA will overestimate the
input dimensionality (matlab demo).

PCA can not find out the non-linear intrinsic
dimension of data, like the angle q in this
example Instead it will find out two components
with equal importance.
14
Cases when PCA fail (2)

In cases when components with small variability
really matter,
PCA will make mistakes due to the unsupervised
nature.

In this example, if we only consider the
projections of two classes of data as input,
the two classes become indistinguishable.
15
Example Face Recognition

Fast and accurate face recognition is the
prerequisite ability for animals
to survive in a natural environment, e.g., to
identify enemy or prey.
During the long history of evolution, human brain
has developed high
efficient strategy for face recognition.
Obviously, understanding the mechanism for face
recognition have
important applications.

16
Characters of Face Images

The dimensionality of a face image is extremely
high.

Consider we describe a face image by a MxM
two-dimensional gird, the dimensionality of the
input vector is then M2. Consider, for
example, M256 is needed to achieve a reasonable
precision of the image, the dimensionality of
input vector is then 65,536!

The face images have structures.

If we put all face images in a M2 dimensional
space, they will not fully fill the whole space,
but will instead only cover a very limited
volume. This implies many input components are
correlated, and the dimensionality can be
significantly reduced.
17
What are salient features of face images?

If the task is to distinguish faces and
non-faces, the salient features
are apparently the eyes, noses, mouth and so on.
If the task is to identify one particular face
from a set of face images,
some global features may be more efficient.

A hypothesis on object recognition in neural
systems

--Analysis by parts non-face objects,
words --Holistic analysis face images
18
Eigenface PCA for face images

Use PCA to identify non-trivial, global features
of face images.

Each face image is represented as a Mx1 colomn
vector F, (M6400
in our example).

Calculate the average face

Calculate the covariance matrix of data
(6400x6400)

Choose d eigenvectors with the first d largest
eigenvalues as the
principle components, which we call the
eigenfaces.

Project face images on the eigenfaces, getting
new representations
of data with the reduced dimensionality

19
A little trick in mathematics

The dimensionality of the covariance matrix S is
MxM. If M is large as in this
example, it is extremely difficult to calculate
the eigenvalues and eigenvectors.

A trick

Thus if N is smaller than M, as the case in this
example N100, we can first calculate
the eigenvectors and eigenvalues of AAT, and
later re-transform the obtained
eigenvectors as eigenfaces.

20
Examples of face images
21
The average face
22
The spectra of eigenvalue
23
The first nine eigenfaces
24
The code for Matlab demo (1)

preparing data

N50 the number of data
points xzeros(N,2) the input
vector x(,1)randn(N,1) one input component
with large variability x(,2)randn(N,1)0.2
the other input component with small
variability plot(x(,1),x(,2),'o','MarkerSize',10
) xlim(-2 2) ylim(-2 2) pause Rotate
the data patch thetarand(1)2pi an angle
to rotate the data rotation_matrixcos(theta)
-sin(theta)sin(theta) cos(theta) rotation
matrix xxrotation_matrix plot(x(,1),x(,2),'o'
,'MarkerSize',10) xlim(-2 2) ylim(-2
2) hold on
25
The code for Matlab demo (2)

Compute the covariance matrix x_meanmean(x) co
v_matrixzeros(2,2) for n1N
cov_matrixcov_matrix(x(n,)-x_mean)'(x(n,)-x_m
ean) end cov_matrixcov_matrix/N V
Deig(cov_matrix) Compute the eigenvalues and
eigenvectors Decide the principle
component principle_componentzeros(2,1) if
D(1,1)gtD(2,2) principle_componentV(,1)
ignore_componentV(,2) else
principle_componentV(,2) ignore_componentV(
,1) end
26
The code for Matlab demo

Display the results

Display the results (this part of code is not
efficient) principle_directionzeros(2,2) princip
le_direction(1,)0 0 principle_direction(2,)
2principle_component(1,1) 2principle_component(
2,1) plot(principle_direction(,1),principle_dir
ection(,2),'r-') hold on ignore_directionzero
s(2,2) ignore_direction(1,)0
0 ignore_direction(2,)0.5ignore_component(1,
1) 0.5ignore_component(2,1) plot(ignore_directi
on(,1),ignore_direction(,2),'r-')

Write a Comment

User Comments (0)