Principle Component Analysis - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Principle Component Analysis

Description:

In practice, pre-processing data is one of main issues in network training; Post ... information, and on the other hand, neglects those unimportant components. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 27
Provided by: AndyPhil8
Category:

less

Transcript and Presenter's Notes

Title: Principle Component Analysis


1
Principle Component Analysis
2
The general framework for network training
Output Data
Post-processing
Network Optimization
Pre-processing
Input data
In practice, pre-processing data is one of main
issues in network training Post-processing often
only involves passively re-map the network output
into its raw form.
3
Pre-processing Data (1)
  • Input Normalization to normalize data to be in
    the same scale and
  • have zero mean.
  • Dimensionality Reduction Because of the curse of
    dimensionality,
  • for high dimensional data, it is essential to
    reduce their dimensionality
  • first.
  • Feature Selection We may use prior knowledge to
    choose those
  • important features as inputs.

4
Pre-processing Data (2)
  • Dimensionality reduction, or Feature Selection,
    will in general
  • lead to information loss.
  • A good Dimensionality Reduction, or Feature
    Selection, strategy should, on one hand, retains
    as much as possible relevant information, and on
    the other hand, neglects those unimportant
    components.
  • In general, Dimensionality Reduction or Feature
    Selection is
  • task-dependent, but there have some unsupervised
    strategies which tend to work well in many
    cases.

5
A general idea for unsupervised feature selection
  • Intuitively, for an input component i, if
    has large variability among
  • examples, this component tends to be important,
    in the sense of that
  • it more likely determine the output value.
  • On the other hand, if takes roughly the
    same or similar values for
  • all examples, then this component is unlikely to
    have a significant
  • influence on outputs, and hence can be
    neglected.
  • In other words, it is sensible to choose those
    components having
  • large variability as important features if we do
    not have other prior
  • knowledge.

Mathematically, we may measure the variability
of an input component by its variance
  • Principle component analysis is one strategy to
    implement this idea.

6
An example
The important feature, or the principle
component, in this example, is the component 1.
7
Another example
  • In general, the principle component is unlikely
    to be just one of the input
  • components, but rather, a linear combination of
    them.
  • In this example, the principle component is the
    direction of u_1,
  • on which data points vary a lot.
  • The important feature of data is then given by
    the projection of data on the principle
  • component, which in this example is
    . The projection on the other
  • direction can be ignored.

8
A little bit linear algebra (1)
  • For a data point in a M-dimensional space, it can
    always be expressed as a linear combination of M
    independent M-dimensional vectors.
  • We may choose the basis vectors to be orthogonal,
    i.e.,

or furthermore to be orthonormal if one more
condition is satisfied
9
A little bit of linear algebra (2)
  • The projection of x on a orthonormal basis vector
    is given by

its squared value can be written as
  • Eigenvector and Eigenvalue

10
Principle Component Analysis (1)
  • For each M-dimensional input vector, we project
    it on a set of M orthonomal basis vectors. These
    M projections consists the new representation of
    the data point (a kind of coordinate
    transformation)
  • The variance of projections on one basis
    direction can be written as

where the matrix S is called the covariance
matrix of data points
11
Principle Component Analysis (2)
  • If the projection direction is the eigenvector of
    the covariance matrix
  • Thus, the jth eigenvalue of the covariance matrix
    of data quantifies the variability
  • of projections of data on the jth eigenvector.
    To reduce the dimensionality of input
  • to number d, we can first list eigenvectors in
    the order of magnitude of the eignevalues
  • and choose the first d eigenvectors as principle
    components. The projections of data
  • on these principle components form the new
    representation of input.
  • PCA works well for Gaussian distribution data,
    since in this case the eignenvalues
  • of the covariance matrix indeed quantifies the
    variability of data along their
  • corresponding directions.

12
The algorithm of PCA
  • Compute the covariance matrix of input data

2. Compute the eigenvalues and eigenvectors of
the covariance matrix
3. Arrange eigenvectors in the order of magnitude
of their eigenvalues.
Take the first d eigenvectors as principle
components if the input dimensionality is to be
reduced to d.
4. Project the input data onto the principle
components, which forms the representation of
input data.
13
Cases when PCA fail (1)
  • PCA projects data onto a set of orthogonal
    vectors (principle components). This restricts
    the new input components to be the linear
    combination of old ones.
  • In cases, however, when the intrinsic freedom of
    data can not be expressed as a linear combination
    of input components, PCA will overestimate the
    input dimensionality (matlab demo).

PCA can not find out the non-linear intrinsic
dimension of data, like the angle q in this
example Instead it will find out two components
with equal importance.
14
Cases when PCA fail (2)
  • In cases when components with small variability
    really matter,
  • PCA will make mistakes due to the unsupervised
    nature.

In this example, if we only consider the
projections of two classes of data as input,
the two classes become indistinguishable.
15
Example Face Recognition
  • Fast and accurate face recognition is the
    prerequisite ability for animals
  • to survive in a natural environment, e.g., to
    identify enemy or prey.
  • During the long history of evolution, human brain
    has developed high
  • efficient strategy for face recognition.
  • Obviously, understanding the mechanism for face
    recognition have
  • important applications.

16
Characters of Face Images
  • The dimensionality of a face image is extremely
    high.

Consider we describe a face image by a MxM
two-dimensional gird, the dimensionality of the
input vector is then M2. Consider, for
example, M256 is needed to achieve a reasonable
precision of the image, the dimensionality of
input vector is then 65,536!
  • The face images have structures.

If we put all face images in a M2 dimensional
space, they will not fully fill the whole space,
but will instead only cover a very limited
volume. This implies many input components are
correlated, and the dimensionality can be
significantly reduced.
17
What are salient features of face images?
  • If the task is to distinguish faces and
    non-faces, the salient features
  • are apparently the eyes, noses, mouth and so on.
  • If the task is to identify one particular face
    from a set of face images,
  • some global features may be more efficient.
  • A hypothesis on object recognition in neural
    systems

--Analysis by parts non-face objects,
words --Holistic analysis face images
18
Eigenface PCA for face images
  • Use PCA to identify non-trivial, global features
    of face images.
  • Each face image is represented as a Mx1 colomn
    vector F, (M6400
  • in our example).
  • Calculate the average face
  • Calculate the covariance matrix of data
    (6400x6400)
  • Choose d eigenvectors with the first d largest
    eigenvalues as the
  • principle components, which we call the
    eigenfaces.
  • Project face images on the eigenfaces, getting
    new representations
  • of data with the reduced dimensionality

19
A little trick in mathematics
  • The dimensionality of the covariance matrix S is
    MxM. If M is large as in this
  • example, it is extremely difficult to calculate
    the eigenvalues and eigenvectors.
  • A trick
  • Thus if N is smaller than M, as the case in this
    example N100, we can first calculate
  • the eigenvectors and eigenvalues of AAT, and
    later re-transform the obtained
  • eigenvectors as eigenfaces.

20
Examples of face images
21
The average face
22
The spectra of eigenvalue
23
The first nine eigenfaces
24
The code for Matlab demo (1)
  • preparing data

N50 the number of data
points xzeros(N,2) the input
vector x(,1)randn(N,1) one input component
with large variability x(,2)randn(N,1)0.2
the other input component with small
variability plot(x(,1),x(,2),'o','MarkerSize',10
) xlim(-2 2) ylim(-2 2) pause Rotate
the data patch thetarand(1)2pi an angle
to rotate the data rotation_matrixcos(theta)
-sin(theta)sin(theta) cos(theta) rotation
matrix xxrotation_matrix plot(x(,1),x(,2),'o'
,'MarkerSize',10) xlim(-2 2) ylim(-2
2) hold on
25
The code for Matlab demo (2)
  • PCA

Compute the covariance matrix x_meanmean(x) co
v_matrixzeros(2,2) for n1N
cov_matrixcov_matrix(x(n,)-x_mean)'(x(n,)-x_m
ean) end cov_matrixcov_matrix/N V
Deig(cov_matrix) Compute the eigenvalues and
eigenvectors Decide the principle
component principle_componentzeros(2,1) if
D(1,1)gtD(2,2) principle_componentV(,1)
ignore_componentV(,2) else
principle_componentV(,2) ignore_componentV(
,1) end
26
The code for Matlab demo
  • Display the results

Display the results (this part of code is not
efficient) principle_directionzeros(2,2) princip
le_direction(1,)0 0 principle_direction(2,)
2principle_component(1,1) 2principle_component(
2,1) plot(principle_direction(,1),principle_dir
ection(,2),'r-') hold on ignore_directionzero
s(2,2) ignore_direction(1,)0
0 ignore_direction(2,)0.5ignore_component(1,
1) 0.5ignore_component(2,1) plot(ignore_directi
on(,1),ignore_direction(,2),'r-')
Write a Comment
User Comments (0)
About PowerShow.com