Title: Principle Component Analysis
1Principle Component Analysis
2The general framework for network training
Output Data
Post-processing
Network Optimization
Pre-processing
Input data
In practice, pre-processing data is one of main
issues in network training Post-processing often
only involves passively re-map the network output
into its raw form.
3Pre-processing Data (1)
- Input Normalization to normalize data to be in
the same scale and - have zero mean.
- Dimensionality Reduction Because of the curse of
dimensionality, - for high dimensional data, it is essential to
reduce their dimensionality - first.
- Feature Selection We may use prior knowledge to
choose those - important features as inputs.
4Pre-processing Data (2)
- Dimensionality reduction, or Feature Selection,
will in general - lead to information loss.
- A good Dimensionality Reduction, or Feature
Selection, strategy should, on one hand, retains
as much as possible relevant information, and on
the other hand, neglects those unimportant
components. - In general, Dimensionality Reduction or Feature
Selection is - task-dependent, but there have some unsupervised
strategies which tend to work well in many
cases.
5A general idea for unsupervised feature selection
- Intuitively, for an input component i, if
has large variability among - examples, this component tends to be important,
in the sense of that - it more likely determine the output value.
- On the other hand, if takes roughly the
same or similar values for - all examples, then this component is unlikely to
have a significant - influence on outputs, and hence can be
neglected. - In other words, it is sensible to choose those
components having - large variability as important features if we do
not have other prior - knowledge.
Mathematically, we may measure the variability
of an input component by its variance
- Principle component analysis is one strategy to
implement this idea.
6An example
The important feature, or the principle
component, in this example, is the component 1.
7Another example
- In general, the principle component is unlikely
to be just one of the input - components, but rather, a linear combination of
them. - In this example, the principle component is the
direction of u_1, - on which data points vary a lot.
- The important feature of data is then given by
the projection of data on the principle - component, which in this example is
. The projection on the other - direction can be ignored.
8A little bit linear algebra (1)
- For a data point in a M-dimensional space, it can
always be expressed as a linear combination of M
independent M-dimensional vectors.
- We may choose the basis vectors to be orthogonal,
i.e.,
or furthermore to be orthonormal if one more
condition is satisfied
9A little bit of linear algebra (2)
- The projection of x on a orthonormal basis vector
is given by
its squared value can be written as
- Eigenvector and Eigenvalue
10Principle Component Analysis (1)
- For each M-dimensional input vector, we project
it on a set of M orthonomal basis vectors. These
M projections consists the new representation of
the data point (a kind of coordinate
transformation) - The variance of projections on one basis
direction can be written as
where the matrix S is called the covariance
matrix of data points
11Principle Component Analysis (2)
- If the projection direction is the eigenvector of
the covariance matrix
- Thus, the jth eigenvalue of the covariance matrix
of data quantifies the variability - of projections of data on the jth eigenvector.
To reduce the dimensionality of input - to number d, we can first list eigenvectors in
the order of magnitude of the eignevalues - and choose the first d eigenvectors as principle
components. The projections of data - on these principle components form the new
representation of input.
- PCA works well for Gaussian distribution data,
since in this case the eignenvalues - of the covariance matrix indeed quantifies the
variability of data along their - corresponding directions.
12The algorithm of PCA
- Compute the covariance matrix of input data
2. Compute the eigenvalues and eigenvectors of
the covariance matrix
3. Arrange eigenvectors in the order of magnitude
of their eigenvalues.
Take the first d eigenvectors as principle
components if the input dimensionality is to be
reduced to d.
4. Project the input data onto the principle
components, which forms the representation of
input data.
13Cases when PCA fail (1)
- PCA projects data onto a set of orthogonal
vectors (principle components). This restricts
the new input components to be the linear
combination of old ones. - In cases, however, when the intrinsic freedom of
data can not be expressed as a linear combination
of input components, PCA will overestimate the
input dimensionality (matlab demo).
PCA can not find out the non-linear intrinsic
dimension of data, like the angle q in this
example Instead it will find out two components
with equal importance.
14Cases when PCA fail (2)
- In cases when components with small variability
really matter, - PCA will make mistakes due to the unsupervised
nature.
In this example, if we only consider the
projections of two classes of data as input,
the two classes become indistinguishable.
15Example Face Recognition
- Fast and accurate face recognition is the
prerequisite ability for animals - to survive in a natural environment, e.g., to
identify enemy or prey. - During the long history of evolution, human brain
has developed high - efficient strategy for face recognition.
- Obviously, understanding the mechanism for face
recognition have - important applications.
16Characters of Face Images
- The dimensionality of a face image is extremely
high.
Consider we describe a face image by a MxM
two-dimensional gird, the dimensionality of the
input vector is then M2. Consider, for
example, M256 is needed to achieve a reasonable
precision of the image, the dimensionality of
input vector is then 65,536!
- The face images have structures.
If we put all face images in a M2 dimensional
space, they will not fully fill the whole space,
but will instead only cover a very limited
volume. This implies many input components are
correlated, and the dimensionality can be
significantly reduced.
17What are salient features of face images?
- If the task is to distinguish faces and
non-faces, the salient features - are apparently the eyes, noses, mouth and so on.
- If the task is to identify one particular face
from a set of face images, - some global features may be more efficient.
- A hypothesis on object recognition in neural
systems
--Analysis by parts non-face objects,
words --Holistic analysis face images
18Eigenface PCA for face images
- Use PCA to identify non-trivial, global features
of face images.
- Each face image is represented as a Mx1 colomn
vector F, (M6400 - in our example).
- Calculate the average face
- Calculate the covariance matrix of data
(6400x6400)
- Choose d eigenvectors with the first d largest
eigenvalues as the - principle components, which we call the
eigenfaces.
- Project face images on the eigenfaces, getting
new representations - of data with the reduced dimensionality
19A little trick in mathematics
- The dimensionality of the covariance matrix S is
MxM. If M is large as in this - example, it is extremely difficult to calculate
the eigenvalues and eigenvectors.
- Thus if N is smaller than M, as the case in this
example N100, we can first calculate - the eigenvectors and eigenvalues of AAT, and
later re-transform the obtained - eigenvectors as eigenfaces.
20Examples of face images
21The average face
22The spectra of eigenvalue
23The first nine eigenfaces
24The code for Matlab demo (1)
N50 the number of data
points xzeros(N,2) the input
vector x(,1)randn(N,1) one input component
with large variability x(,2)randn(N,1)0.2
the other input component with small
variability plot(x(,1),x(,2),'o','MarkerSize',10
) xlim(-2 2) ylim(-2 2) pause Rotate
the data patch thetarand(1)2pi an angle
to rotate the data rotation_matrixcos(theta)
-sin(theta)sin(theta) cos(theta) rotation
matrix xxrotation_matrix plot(x(,1),x(,2),'o'
,'MarkerSize',10) xlim(-2 2) ylim(-2
2) hold on
25The code for Matlab demo (2)
Compute the covariance matrix x_meanmean(x) co
v_matrixzeros(2,2) for n1N
cov_matrixcov_matrix(x(n,)-x_mean)'(x(n,)-x_m
ean) end cov_matrixcov_matrix/N V
Deig(cov_matrix) Compute the eigenvalues and
eigenvectors Decide the principle
component principle_componentzeros(2,1) if
D(1,1)gtD(2,2) principle_componentV(,1)
ignore_componentV(,2) else
principle_componentV(,2) ignore_componentV(
,1) end
26The code for Matlab demo
Display the results (this part of code is not
efficient) principle_directionzeros(2,2) princip
le_direction(1,)0 0 principle_direction(2,)
2principle_component(1,1) 2principle_component(
2,1) plot(principle_direction(,1),principle_dir
ection(,2),'r-') hold on ignore_directionzero
s(2,2) ignore_direction(1,)0
0 ignore_direction(2,)0.5ignore_component(1,
1) 0.5ignore_component(2,1) plot(ignore_directi
on(,1),ignore_direction(,2),'r-')