Eigen Value Analysis in Pattern Recognition

About This Presentation

Title:

Eigen Value Analysis in Pattern Recognition

Description:

Eigen Value Analysis in Pattern Recognition By Dr. M. Asmat Ullah Khan COMSATS Institute of Information Technology, Abbottabad Principal Component Analysis (PCA ... – PowerPoint PPT presentation

Number of Views:128

Avg rating:3.0/5.0

Slides: 109

Provided by: sha8161

Category:

more less

Transcript and Presenter's Notes

Title: Eigen Value Analysis in Pattern Recognition

1
Eigen Value Analysis in Pattern Recognition

By
Dr. M. Asmat Ullah Khan
COMSATS Institute of Information Technology,
Abbottabad

2
(No Transcript)
3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
MULTI SPECTRAL IMAGE COMPRESSION
11
MULTI SPECTRAL IMAGE COMPRESSION
12
MULTI SPECTRAL IMAGE COMPRESSION
13
MULTI SPECTRAL IMAGE COMPRESSION
14
MULTI SPECTRAL IMAGE COMPRESSION
15
Principal Component Analysis (PCA)

Pattern recognition in high-dimensional spaces

Problems arise when performing recognition in a
high-dimensional space (curse of dimensionality).
Significant improvements can be achieved by first
mapping the data into a lower-dimensional
sub-space.

The goal of PCA is to reduce the dimensionality
of the data while retaining as much as possible
of the variation present in the original dataset.

16
Principal Component Analysis (PCA)

Dimensionality reduction

PCA allows us to compute a linear transformation
that maps data from a high dimensional space to a
lower dimensional sub-space.

17
Principal Component Analysis (PCA)

Lower dimensionality basis

Approximate vectors by finding a basis in an
appropriate lower dimensional space.

(1) Higher-dimensional space representation
(2) Lower-dimensional space representation
18
Principal Component Analysis (PCA)

Example

19
Principal Component Analysis (PCA)

Information loss

Dimensionality reduction implies information loss
!!
Want to preserve as much information as possible,
that is

How to determine the best lower dimensional
sub-space?

20
Principal Component Analysis (PCA)

Methodology

Suppose x1, x2, ..., xM are N x 1 vectors

21
Principal Component Analysis (PCA)

Methodology cont.

22
Principal Component Analysis (PCA)

Linear transformation implied by PCA

The linear transformation RN ? RK that performs
the dimensionality reduction is

23
Principal Component Analysis (PCA)

Geometric interpretation

PCA projects the data along the directions where
the data varies the most.
These directions are determined by the
eigenvectors of the covariance matrix
corresponding to the largest eigenvalues.
The magnitude of the eigenvalues corresponds to
the variance of the data along the eigenvector
directions.

24
Principal Component Analysis (PCA)

How to choose the principal components?

To choose K, use the following criterion

25
Principal Component Analysis (PCA)

What is the error due to dimensionality reduction?

We saw above that an original vector x can be
reconstructed using its principal components

It can be shown that the low-dimensional basis
based on principal components minimizes the
reconstruction error

It can be shown that the error is equal to

26
Principal Component Analysis (PCA)

Standardization

The principal components are dependent on the
units used to measure the original variables as
well as on the range of values they assume.
We should always standardize the data prior to
using PCA.
A common standardization method is to transform
all the data to have zero mean and unit standard
deviation

27
Principal Component Analysis (PCA)

PCA and classification

PCA is not always an optimal dimensionality-reduct
ion procedure for classification purposes

28
Principal Component Analysis (PCA)

Case Study Eigenfaces for Face
Detection/Recognition

M. Turk, A. Pentland, "Eigenfaces for
Recognition", Journal of Cognitive Neuroscience,
vol. 3, no. 1, pp. 71-86, 1991.

Face Recognition

The simplest approach is to think of it as a
template matching problem

Problems arise when performing recognition in a
high-dimensional space.
Significant improvements can be achieved by first
mapping the data into a lower dimensionality
space.
How to find this lower-dimensional space?

29
Principal Component Analysis (PCA)

Main idea behind eigenfaces

30
Principal Component Analysis (PCA)

Computation of the eigenfaces

31
Principal Component Analysis (PCA)

Computation of the eigenfaces cont.

32
Principal Component Analysis (PCA)

Computation of the eigenfaces cont.

33
Principal Component Analysis (PCA)

Computation of the eigenfaces cont.

34
Principal Component Analysis (PCA)

Representing faces onto this basis

35
Principal Component Analysis (PCA)

Representing faces onto this basis cont.

36
Principal Component Analysis (PCA)

Face Recognition Using Eigenfaces

37
Principal Component Analysis (PCA)

Face Recognition Using Eigenfaces cont.

The distance er is called distance within the
face space (difs)
Comment we can use the common Euclidean distance
to compute er, however, it has been reported that
the Mahalanobis distance performs better

38
Principal Component Analysis (PCA)

Face Detection Using Eigenfaces

39
Principal Component Analysis (PCA)

Face Detection Using Eigenfaces cont.

40
Principal Component Analysis (PCA)

Problems

Background (de-emphasize the outside of the face
e.g., by multiplying the input image by a 2D
Gaussian window centered on the face)
Lighting conditions (performance degrades with
light changes)
Scale (performance decreases quickly with changes
to head size)
multi-scale eigenspaces
scale input image to multiple sizes
Orientation (performance decreases but not as
fast as with scale changes)
plane rotations can be handled
out-of-plane rotations are more difficult to
handle

41
Linear Discriminant Analysis (LDA)

Multiple classes and PCA

Suppose there are C classes in the training data.
PCA is based on the sample covariance which
characterizes the scatter of the entire data set,
irrespective of class-membership.
The projection axes chosen by PCA might not
provide good discrimination power.

What is the goal of LDA?

Perform dimensionality reduction while preserving
as much of the class discriminatory information
as possible.
Seeks to find directions along which the classes
are best separated.
Takes into consideration the scatter
within-classes but also the scatter
between-classes.
More capable of distinguishing image variation
due to identity from variation due to other
sources such as illumination and expression.

42
Linear Discriminant Analysis (LDA)

Methodology

43
Linear Discriminant Analysis (LDA)

Methodology cont.

LDA computes a transformation that maximizes the
between-class scatter while minimizing the
within-class scatter

Such a transformation should retain class
separability while reducing the variation due to
sources other than identity (e.g., illumination).

44
Linear Discriminant Analysis (LDA)

Linear transformation implied by LDA

The linear transformation is given by a matrix U
whose columns are the eigenvectors of Sw-1 Sb
(called Fisherfaces).

The eigenvectors are solutions of the generalized
eigenvector problem

45
Linear Discriminant Analysis (LDA)

Does Sw-1 always exist?

If Sw is non-singular, we can obtain a
conventional eigenvalue problem by writing

In practice, Sw is often singular since the data
are image vectors with large dimensionality while
the size of the data set is much smaller (M ltlt N )

46
Linear Discriminant Analysis (LDA)

Does Sw-1 always exist? cont.

To alleviate this problem, we can perform two
projections

PCA is first applied to the data set to reduce
its dimensionality.

LDA is then applied to further reduce the
dimensionality.

47
Linear Discriminant Analysis (LDA)

Case Study Using Discriminant Eigenfeatures for
Image Retrieval

D. Swets, J. Weng, "Using Discriminant
Eigenfeatures for Image Retrieval", IEEE
Transactions on Pattern Analysis and Machine
Intelligence, vol. 18, no. 8, pp. 831-836, 1996.

Content-based image retrieval

The application being studied here is
query-by-example image retrieval.
The paper deals with the problem of selecting a
good set of image features for content-based
image retrieval.

48
Linear Discriminant Analysis (LDA)

Assumptions

"Well-framed" images are required as input for
training and query-by-example test probes.
Only a small variation in the size, position, and
orientation of the objects in the images is
allowed.

49
Linear Discriminant Analysis (LDA)

Some terminology

Most Expressive Features (MEF) the features
(projections) obtained using PCA.
Most Discriminating Features (MDF) the features
(projections) obtained using LDA.

Computational considerations

When computing the eigenvalues/eigenvectors of
Sw-1SBuk ?kuk numerically, the computations can
be unstable since Sw-1SB is not always symmetric.
See paper for a way to find the
eigenvalues/eigenvectors in a stable way.
Important Dimensionality of LDA is bounded by
C-1 --- this is the rank of Sw-1SB

50
Linear Discriminant Analysis (LDA)

Case Study PCA versus LDA

A. Martinez, A. Kak, "PCA versus LDA", IEEE
Transactions on Pattern Analysis and Machine
Intelligence, vol. 23, no. 2, pp. 228-233, 2001.

Is LDA always better than PCA?

There has been a tendency in the computer vision
community to prefer LDA over PCA.
This is mainly because LDA deals directly with
discrimination between classes while PCA does not
pay attention to the underlying class structure.
This paper shows that when the training set is
small, PCA can outperform LDA.
When the number of samples is large and
representative for each class, LDA outperforms
PCA.

51
Linear Discriminant Analysis (LDA)

Is LDA always better than PCA? cont.

52
Linear Discriminant Analysis (LDA)

Is LDA always better than PCA? cont.

53
Linear Discriminant Analysis (LDA)

Is LDA always better than PCA? cont.

54
Linear Discriminant Analysis (LDA)

Critique of LDA

Only linearly separable classes will remain
separable after applying LDA.
It does not seem to be superior to PCA when the
training data set is small.

55
Appearance-based Recognition

Directly represent appearance (image
brightness), not geometry.
Why?
Avoids modeling geometry, complex interactions
between geometry, lighting and reflectance.
Why not?
Too many possible appearances!
m visual degrees of freedom (eg., pose,
lighting, etc)
R discrete samples for each DOF
How to discretely sample the DOFs?
How to PREDICT/SYNTHESIS/MATCH with novel views?

56
Appearance-based Recognition

Example
Visual DOFs Object type P, Lighting Direction
L, Pose R
Set of R P L possible images
Image as a point in high dimensional space

is an image of N pixels and A point in
N-dimensional space
Pixel 2 gray value
Pixel 1 gray value
57
The Space of Faces

An image is a point in a high dimensional space
An N x M image is a point in RNM
We can define vectors in this space as we did in
the 2D case

Thanks to Chuck Dyer, Steve Seitz, Nishino
58
Key Idea

Images in the possible set are
highly correlated.
So, compress them to a low-dimensional subspace
that
captures key appearance characteristics of the
visual DOFs.
EIGENFACES Turk and Pentland

USE PCA!
59
Eigenfaces
Eigenfaces look somewhat like generic faces.
60
Linear Subspaces

Classification can be expensive
Must either search (e.g., nearest neighbors) or
store large probability density functions.

Suppose the data points are arranged as above
Ideafit a line, classifier measures distance to
line

61
Dimensionality Reduction

Dimensionality reduction
We can represent the orange points with only
their v1 coordinates
since v2 coordinates are all essentially 0
This makes it much cheaper to store and compare
points
A bigger deal for higher dimensional problems

62
Linear Subspaces
Consider the variation along direction v among
all of the orange points
What unit vector v minimizes var?
What unit vector v maximizes var?
Solution v1 is eigenvector of A with largest
eigenvalue v2 is eigenvector of A
with smallest eigenvalue
63
Higher Dimensions

Suppose each data point is N-dimensional
Same procedure applies
The eigenvectors of A define a new coordinate
system
eigenvector with largest eigenvalue captures the
most variation among training vectors x
eigenvector with smallest eigenvalue has least
variation
We can compress the data by only using the top
few eigenvectors
corresponds to choosing a linear subspace
represent points on a line, plane, or
hyper-plane
these eigenvectors are known as the principal
components

64
Problem Size of Covariance Matrix A

Suppose each data point is N-dimensional (N
pixels)
The size of covariance matrix A is N x N
The number of eigenfaces is N
Example For N 256 x 256 pixels,
Size of A will be 65536 x 65536 !
Number of eigenvectors will be 65536 !
Typically, only 20-30 eigenvectors suffice. So,
this
method is very inefficient!

2
2
65
Efficient Computation of Eigenvectors

If B is MxN and MltltN then ABTB is NxN gtgt MxM
M ? number of images, N ? number of pixels
use BBT instead, eigenvector of BBT is easily
converted to that of BTB
(BBT) y e y
gt BT(BBT) y e (BTy)
gt (BTB)(BTy) e (BTy)
gt BTy is the eigenvector of BTB

66
Eigenfaces summary in words

Eigenfaces are
the eigenvectors of
the covariance matrix of
the probability distribution of
the vector space of
human faces
Eigenfaces are the standardized face
ingredients derived from the statistical
analysis of many pictures of human faces
A human face may be considered to be a
combination of these standardized faces

67
Generating Eigenfaces in words

Large set of images of human faces is taken.
The images are normalized to line up the eyes,
mouths and other features.
The eigenvectors of the covariance matrix of the
face image vectors are then extracted.
These eigenvectors are called eigenfaces.

68
Eigenfaces for Face Recognition

When properly weighted, eigenfaces can be summed
together to create an approximate gray-scale
rendering of a human face.
Remarkably few eigenvector terms are needed to
give a fair likeness of most people's faces.
Hence eigenfaces provide a means of applying data
compression to faces for identification purposes.

69
Dimensionality Reduction

The set of faces is a subspace of the set
of images
Suppose it is K dimensional
We can find the best subspace using PCA
This is like fitting a hyper-plane to the set
of faces
spanned by vectors v1, v2, ..., vK

Any face
70
Eigenfaces

PCA extracts the eigenvectors of A
Gives a set of vectors v1, v2, v3, ...
Each one of these vectors is a direction in face
space
what do these look like?

71
Projecting onto the Eigenfaces

The eigenfaces v1, ..., vK span the space of
faces
A face is converted to eigenface coordinates by

72
Is this a face or not?
73
Recognition with Eigenfaces

Algorithm
Process the image database (set of images with
labels)
Run PCAcompute eigenfaces
Calculate the K coefficients for each image
Given a new image (to be recognized) x, calculate
K coefficients
Detect if x is a face
If it is a face, who is it?

Find closest labeled face in database
nearest-neighbor in K-dimensional space

74
Key Property of Eigenspace Representation

Given
2 images that are used to
construct the Eigenspace
is the eigenspace projection of image
is the eigenspace projection of image
Then,
That is, distance in Eigenspace is approximately
equal to the
correlation between two images.

75
Choosing the Dimension K
eigenvalues

How many eigenfaces to use?
Look at the decay of the eigenvalues
the eigenvalue tells you the amount of variance
in the direction of that eigenface
ignore eigenfaces with low variance

76
Papers
77
(No Transcript)
78
More Problems Outliers
Sample Outliers
Intra-sample outliers
Need to explicitly reject outliers before or
during computing PCA.
De la Torre and Black
79
Robustness to Intra-sample outliers
RPCA Robust PCA, De la Torre and Black
80
Robustness to Sample Outliers
PCA
Original
RPCA
Outliers
Finding outliers Tracking moving objects
81
Research Questions

Does PCA encode information related to gender,
ethnicity, age, and identity efficiently?
What information do PCA encode?
Are there components (features) of PCA that
encode multiples properties?

82
PCA

The aim of the PCA is a linear reduction of D
dimensional data to d dimensional data (dltD),
while preserving as much information, in the
data, as possible.
Linear functions
y1 w1 X
y2 w2 X
yd wd X
Y W X
X inputs Y outputs, components W
eigenvectors, eigenfaces, basis vectors

83
How many components?

Usual choice consider the first d PCs which
account for some percentage, usually above 90 ,
of the cumulative variance of the data.
This is disadvantageous if the last components
are interesting

84
Dataset
Property No. Categories Categories No. Faces
Gender 2 Male 1603
Gender 2 Female 1067
Ethnicity 3 Caucasian 1758
Ethnicity 3 African 320
Ethnicity 3 East Asian 363
Age 5 20 29 665
Age 5 30 39 1264
Age 5 40 49 429
Age 5 50 59 206
Age 5 60 106
Identity 358 Individuals with 3 or more examples 1161

A subset of FERET dataset
2670 grey scale frontal face images
Rich in variety face images vary in pose,
background lighting, presence or absence of
glasses, slight change in expression

85
Dataset

Each image is pre-processed to a 65 X 75
resolution.
Aligned based on eye locations
Cropped such that little or no hair information
is available
Histogram equalisation is applied to reduce
lighting effects

86
Does PCA efficiently represents information in
face images?

Images of 65 75 resolution leads to a
dimensionality of 4875.
The first 350 components accounted for 90
variance of the data.
Each face is thus represented using 350
components instead of 4875 dimensions
Classification employing 5-fold cross validation,
with 80 of faces in each category for training
and 20 of faces in each category for testing
for identity recognition leave-one-out method is
used.
LDA is performed on the PCA data
Euclidean measure is used for classification

Property Classification
Gender 86.4
Ethnicity 81.6
Age 91.5
Identity 90
87
What information does PCA encode? Gender

Gender encoding power estimated using the LDA
3rd component carries highest gender encoding
power followed by the 4th components
All important components are among the first 50
components

88
What information does PCA encode? Gender
-6 SD
-4 SD
-2 SD
Mean
2 SD
4 SD
6 SD

Reconstructed images from the altered components
(a) third and (b) fourth components. The
components are progressively added by quantities
of -6 S.D (extreme left) to 6 S.D (extreme
right) in steps of 2 S.D.

Third component encodes information related to
the complexion, length of the nose, presence or
absence of hair on the forehead, and texture
around the mouth region.
Fourth component encodes information related to
the eyebrow thickness, presence or absence of
smiling expression

89
Gender

(a) Face examples with the first two being female
and the next two being male faces. (b)
Reconstructed faces of (a) using the top 20
gender important components. (c) Reconstructed
faces of (a) using all components, except the top
20 gender important components.

90
What information does PCA encode? Ethnicity

6th component carries highest ethnicity encoding
power followed by the 15th components
All ethnicity important components are among the
first 50 components

91
Ethnicity
-6 SD -4 SD -2 SD Mean 2
SD 4 SD 6 SD

Reconstructed images from the altered components
(a) 6th and (b) 4th components. The components
are progressively added by quantities of -6 S.D
(extreme left) to 6 S.D (extreme right) in steps
of 2 S.D.

6th component encodes information related to
complexion, broadness and length of the nose
15th component encodes information related to
length of the nose, complexion, and presence or
absence of smiling expression

92
What information does PCA encode? Age

Age 20-39 and 50-60 age groups termed as young
and old)
10th component is found to be the most important
for age

-6 SD -4 SD -2 SD Mean 2
SD 4 SD 6 SD
Reconstructed images from the altered tenth
component. The component is progressively added
by quantities of -6 S.D (extreme left) to 6 S.D
(extreme right) in steps of 2 S.D
93
What information does PCA encode? Identity

Many components are found to be important for
identity. However, their importance magnitude is
small.
These components are widely distributed and not
restricted to the first 50 components

94
Can a single component encode multiple properties?

A grey beard informs that the person is a male
and also, most probably, old.
As all important components of gender, ethnicity,
and age are among the first 50 components there
are overlapping components.
One example is the 3rd component which is found
to be the most important for gender and second
most important for age

95
Can a single component encode multiple properties?

Normal distribution plots of the (a) third (b)
and fourth components for male and female classes
of young and old age groups.

96
Summary

PCA encodes face image properties such as gender,
ethnicity, age, and identity efficiently.
Very few components are required to encode
properties such as gender, ethnicity and age and
these components are amongst the first few
components which capture large part of the
variance of the data. Large number of components
are required to encode identity and these
components are widely distributed.
There may be components which encode multiple
properties.

97
Principal Component Analysis (PCA)

PCA and classification

PCA is not always an optimal dimensionality-reduct
ion procedure for classification purposes.

Multiple classes and PCA

Suppose there are C classes in the training data.
PCA is based on the sample covariance which
characterizes the scatter of the entire data set,
irrespective of class-membership.
The projection axes chosen by PCA might not
provide good discrimination power.

98
Linear Discriminant Analysis (LDA)

What is the goal of LDA?

Perform dimensionality reduction while
preserving as much of the class discriminatory
information as possible.
Seeks to find directions along which the classes
are best separated.
Takes into consideration the scatter
within-classes but also the scatter
between-classes.
More capable of distinguishing image variation
due to identity from variation due to other
sources such as illumination and expression.

99
Linear Discriminant Analysis (LDA)
100
Angiograph Image Enhancement
101
(No Transcript)
102
(No Transcript)
103
(No Transcript)
104
Webcamera Calibration
105
(No Transcript)
106
(No Transcript)
107
(No Transcript)
108
QUESTIONS
THANKS

Write a Comment

User Comments (0)