Title: Eigen Value Analysis in Pattern Recognition
1Eigen Value Analysis in Pattern Recognition
- By
- Dr. M. Asmat Ullah Khan
- COMSATS Institute of Information Technology,
- Abbottabad
2(No Transcript)
3(No Transcript)
4(No Transcript)
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10MULTI SPECTRAL IMAGE COMPRESSION
11MULTI SPECTRAL IMAGE COMPRESSION
12MULTI SPECTRAL IMAGE COMPRESSION
13MULTI SPECTRAL IMAGE COMPRESSION
14MULTI SPECTRAL IMAGE COMPRESSION
15Principal Component Analysis (PCA)
- Pattern recognition in high-dimensional spaces
- Problems arise when performing recognition in a
high-dimensional space (curse of dimensionality). - Significant improvements can be achieved by first
mapping the data into a lower-dimensional
sub-space.
- The goal of PCA is to reduce the dimensionality
of the data while retaining as much as possible
of the variation present in the original dataset.
16Principal Component Analysis (PCA)
- PCA allows us to compute a linear transformation
that maps data from a high dimensional space to a
lower dimensional sub-space.
17Principal Component Analysis (PCA)
- Lower dimensionality basis
- Approximate vectors by finding a basis in an
appropriate lower dimensional space.
(1) Higher-dimensional space representation
(2) Lower-dimensional space representation
18Principal Component Analysis (PCA)
19Principal Component Analysis (PCA)
- Dimensionality reduction implies information loss
!! - Want to preserve as much information as possible,
that is
- How to determine the best lower dimensional
sub-space?
20Principal Component Analysis (PCA)
- Suppose x1, x2, ..., xM are N x 1 vectors
21Principal Component Analysis (PCA)
22Principal Component Analysis (PCA)
- Linear transformation implied by PCA
- The linear transformation RN ? RK that performs
the dimensionality reduction is
23Principal Component Analysis (PCA)
- PCA projects the data along the directions where
the data varies the most. - These directions are determined by the
eigenvectors of the covariance matrix
corresponding to the largest eigenvalues. - The magnitude of the eigenvalues corresponds to
the variance of the data along the eigenvector
directions.
24Principal Component Analysis (PCA)
- How to choose the principal components?
- To choose K, use the following criterion
25Principal Component Analysis (PCA)
- What is the error due to dimensionality reduction?
- We saw above that an original vector x can be
reconstructed using its principal components
- It can be shown that the low-dimensional basis
based on principal components minimizes the
reconstruction error
- It can be shown that the error is equal to
26Principal Component Analysis (PCA)
- The principal components are dependent on the
units used to measure the original variables as
well as on the range of values they assume. - We should always standardize the data prior to
using PCA. - A common standardization method is to transform
all the data to have zero mean and unit standard
deviation
27Principal Component Analysis (PCA)
- PCA is not always an optimal dimensionality-reduct
ion procedure for classification purposes
28Principal Component Analysis (PCA)
- Case Study Eigenfaces for Face
Detection/Recognition
- M. Turk, A. Pentland, "Eigenfaces for
Recognition", Journal of Cognitive Neuroscience,
vol. 3, no. 1, pp. 71-86, 1991.
- The simplest approach is to think of it as a
template matching problem
- Problems arise when performing recognition in a
high-dimensional space. - Significant improvements can be achieved by first
mapping the data into a lower dimensionality
space. - How to find this lower-dimensional space?
29Principal Component Analysis (PCA)
- Main idea behind eigenfaces
30Principal Component Analysis (PCA)
- Computation of the eigenfaces
31Principal Component Analysis (PCA)
- Computation of the eigenfaces cont.
32Principal Component Analysis (PCA)
- Computation of the eigenfaces cont.
33Principal Component Analysis (PCA)
- Computation of the eigenfaces cont.
34Principal Component Analysis (PCA)
- Representing faces onto this basis
35Principal Component Analysis (PCA)
- Representing faces onto this basis cont.
36Principal Component Analysis (PCA)
- Face Recognition Using Eigenfaces
37Principal Component Analysis (PCA)
- Face Recognition Using Eigenfaces cont.
- The distance er is called distance within the
face space (difs) - Comment we can use the common Euclidean distance
to compute er, however, it has been reported that
the Mahalanobis distance performs better
38Principal Component Analysis (PCA)
- Face Detection Using Eigenfaces
39Principal Component Analysis (PCA)
- Face Detection Using Eigenfaces cont.
40Principal Component Analysis (PCA)
- Background (de-emphasize the outside of the face
e.g., by multiplying the input image by a 2D
Gaussian window centered on the face) - Lighting conditions (performance degrades with
light changes) - Scale (performance decreases quickly with changes
to head size) - multi-scale eigenspaces
- scale input image to multiple sizes
- Orientation (performance decreases but not as
fast as with scale changes) - plane rotations can be handled
- out-of-plane rotations are more difficult to
handle
41Linear Discriminant Analysis (LDA)
- Suppose there are C classes in the training data.
- PCA is based on the sample covariance which
characterizes the scatter of the entire data set,
irrespective of class-membership. - The projection axes chosen by PCA might not
provide good discrimination power.
- Perform dimensionality reduction while preserving
as much of the class discriminatory information
as possible. - Seeks to find directions along which the classes
are best separated. - Takes into consideration the scatter
within-classes but also the scatter
between-classes. - More capable of distinguishing image variation
due to identity from variation due to other
sources such as illumination and expression.
42Linear Discriminant Analysis (LDA)
43Linear Discriminant Analysis (LDA)
- LDA computes a transformation that maximizes the
between-class scatter while minimizing the
within-class scatter
- Such a transformation should retain class
separability while reducing the variation due to
sources other than identity (e.g., illumination).
44Linear Discriminant Analysis (LDA)
- Linear transformation implied by LDA
- The linear transformation is given by a matrix U
whose columns are the eigenvectors of Sw-1 Sb
(called Fisherfaces).
- The eigenvectors are solutions of the generalized
eigenvector problem
45Linear Discriminant Analysis (LDA)
- If Sw is non-singular, we can obtain a
conventional eigenvalue problem by writing
- In practice, Sw is often singular since the data
are image vectors with large dimensionality while
the size of the data set is much smaller (M ltlt N )
46Linear Discriminant Analysis (LDA)
- Does Sw-1 always exist? cont.
- To alleviate this problem, we can perform two
projections
- PCA is first applied to the data set to reduce
its dimensionality.
- LDA is then applied to further reduce the
dimensionality.
47Linear Discriminant Analysis (LDA)
- Case Study Using Discriminant Eigenfeatures for
Image Retrieval
- D. Swets, J. Weng, "Using Discriminant
Eigenfeatures for Image Retrieval", IEEE
Transactions on Pattern Analysis and Machine
Intelligence, vol. 18, no. 8, pp. 831-836, 1996.
- Content-based image retrieval
- The application being studied here is
query-by-example image retrieval. - The paper deals with the problem of selecting a
good set of image features for content-based
image retrieval.
48Linear Discriminant Analysis (LDA)
- "Well-framed" images are required as input for
training and query-by-example test probes. - Only a small variation in the size, position, and
orientation of the objects in the images is
allowed.
49Linear Discriminant Analysis (LDA)
- Most Expressive Features (MEF) the features
(projections) obtained using PCA. - Most Discriminating Features (MDF) the features
(projections) obtained using LDA.
- Computational considerations
- When computing the eigenvalues/eigenvectors of
Sw-1SBuk ?kuk numerically, the computations can
be unstable since Sw-1SB is not always symmetric. - See paper for a way to find the
eigenvalues/eigenvectors in a stable way. - Important Dimensionality of LDA is bounded by
C-1 --- this is the rank of Sw-1SB
50Linear Discriminant Analysis (LDA)
- Case Study PCA versus LDA
- A. Martinez, A. Kak, "PCA versus LDA", IEEE
Transactions on Pattern Analysis and Machine
Intelligence, vol. 23, no. 2, pp. 228-233, 2001.
- Is LDA always better than PCA?
- There has been a tendency in the computer vision
community to prefer LDA over PCA. - This is mainly because LDA deals directly with
discrimination between classes while PCA does not
pay attention to the underlying class structure. - This paper shows that when the training set is
small, PCA can outperform LDA. - When the number of samples is large and
representative for each class, LDA outperforms
PCA.
51Linear Discriminant Analysis (LDA)
- Is LDA always better than PCA? cont.
52Linear Discriminant Analysis (LDA)
- Is LDA always better than PCA? cont.
53Linear Discriminant Analysis (LDA)
- Is LDA always better than PCA? cont.
54Linear Discriminant Analysis (LDA)
- Only linearly separable classes will remain
separable after applying LDA. - It does not seem to be superior to PCA when the
training data set is small.
55Appearance-based Recognition
- Directly represent appearance (image
brightness), not geometry. - Why?
- Avoids modeling geometry, complex interactions
- between geometry, lighting and reflectance.
- Why not?
- Too many possible appearances!
- m visual degrees of freedom (eg., pose,
lighting, etc) - R discrete samples for each DOF
- How to discretely sample the DOFs?
-
- How to PREDICT/SYNTHESIS/MATCH with novel views?
56Appearance-based Recognition
- Example
- Visual DOFs Object type P, Lighting Direction
L, Pose R - Set of R P L possible images
- Image as a point in high dimensional space
is an image of N pixels and A point in
N-dimensional space
Pixel 2 gray value
Pixel 1 gray value
57The Space of Faces
- An image is a point in a high dimensional space
- An N x M image is a point in RNM
- We can define vectors in this space as we did in
the 2D case
Thanks to Chuck Dyer, Steve Seitz, Nishino
58Key Idea
- Images in the possible set are
highly correlated. - So, compress them to a low-dimensional subspace
that - captures key appearance characteristics of the
visual DOFs. - EIGENFACES Turk and Pentland
USE PCA!
59Eigenfaces
Eigenfaces look somewhat like generic faces.
60Linear Subspaces
- Classification can be expensive
- Must either search (e.g., nearest neighbors) or
store large probability density functions.
- Suppose the data points are arranged as above
- Ideafit a line, classifier measures distance to
line
61Dimensionality Reduction
- Dimensionality reduction
- We can represent the orange points with only
their v1 coordinates - since v2 coordinates are all essentially 0
- This makes it much cheaper to store and compare
points - A bigger deal for higher dimensional problems
62Linear Subspaces
Consider the variation along direction v among
all of the orange points
What unit vector v minimizes var?
What unit vector v maximizes var?
Solution v1 is eigenvector of A with largest
eigenvalue v2 is eigenvector of A
with smallest eigenvalue
63Higher Dimensions
- Suppose each data point is N-dimensional
- Same procedure applies
- The eigenvectors of A define a new coordinate
system - eigenvector with largest eigenvalue captures the
most variation among training vectors x - eigenvector with smallest eigenvalue has least
variation - We can compress the data by only using the top
few eigenvectors - corresponds to choosing a linear subspace
- represent points on a line, plane, or
hyper-plane - these eigenvectors are known as the principal
components
64Problem Size of Covariance Matrix A
- Suppose each data point is N-dimensional (N
pixels) - The size of covariance matrix A is N x N
- The number of eigenfaces is N
- Example For N 256 x 256 pixels,
- Size of A will be 65536 x 65536 !
- Number of eigenvectors will be 65536 !
- Typically, only 20-30 eigenvectors suffice. So,
this - method is very inefficient!
2
2
65Efficient Computation of Eigenvectors
- If B is MxN and MltltN then ABTB is NxN gtgt MxM
- M ? number of images, N ? number of pixels
- use BBT instead, eigenvector of BBT is easily
- converted to that of BTB
-
- (BBT) y e y
- gt BT(BBT) y e (BTy)
- gt (BTB)(BTy) e (BTy)
- gt BTy is the eigenvector of BTB
66Eigenfaces summary in words
- Eigenfaces are
- the eigenvectors of
- the covariance matrix of
- the probability distribution of
- the vector space of
- human faces
- Eigenfaces are the standardized face
ingredients derived from the statistical
analysis of many pictures of human faces - A human face may be considered to be a
combination of these standardized faces
67Generating Eigenfaces in words
- Large set of images of human faces is taken.
- The images are normalized to line up the eyes,
mouths and other features. - The eigenvectors of the covariance matrix of the
face image vectors are then extracted. - These eigenvectors are called eigenfaces.
68Eigenfaces for Face Recognition
- When properly weighted, eigenfaces can be summed
together to create an approximate gray-scale
rendering of a human face. - Remarkably few eigenvector terms are needed to
give a fair likeness of most people's faces. - Hence eigenfaces provide a means of applying data
compression to faces for identification purposes.
69Dimensionality Reduction
- The set of faces is a subspace of the set
- of images
- Suppose it is K dimensional
- We can find the best subspace using PCA
- This is like fitting a hyper-plane to the set
of faces - spanned by vectors v1, v2, ..., vK
Any face
70Eigenfaces
- PCA extracts the eigenvectors of A
- Gives a set of vectors v1, v2, v3, ...
- Each one of these vectors is a direction in face
space - what do these look like?
71Projecting onto the Eigenfaces
- The eigenfaces v1, ..., vK span the space of
faces - A face is converted to eigenface coordinates by
72Is this a face or not?
73Recognition with Eigenfaces
- Algorithm
- Process the image database (set of images with
labels) - Run PCAcompute eigenfaces
- Calculate the K coefficients for each image
- Given a new image (to be recognized) x, calculate
K coefficients - Detect if x is a face
- If it is a face, who is it?
- Find closest labeled face in database
- nearest-neighbor in K-dimensional space
74Key Property of Eigenspace Representation
- Given
-
- 2 images that are used to
construct the Eigenspace -
- is the eigenspace projection of image
- is the eigenspace projection of image
- Then,
-
-
- That is, distance in Eigenspace is approximately
equal to the - correlation between two images.
75Choosing the Dimension K
eigenvalues
- How many eigenfaces to use?
- Look at the decay of the eigenvalues
- the eigenvalue tells you the amount of variance
in the direction of that eigenface - ignore eigenfaces with low variance
76Papers
77(No Transcript)
78More Problems Outliers
Sample Outliers
Intra-sample outliers
Need to explicitly reject outliers before or
during computing PCA.
De la Torre and Black
79Robustness to Intra-sample outliers
RPCA Robust PCA, De la Torre and Black
80Robustness to Sample Outliers
PCA
Original
RPCA
Outliers
Finding outliers Tracking moving objects
81Research Questions
- Does PCA encode information related to gender,
ethnicity, age, and identity efficiently? - What information do PCA encode?
- Are there components (features) of PCA that
encode multiples properties?
82PCA
- The aim of the PCA is a linear reduction of D
dimensional data to d dimensional data (dltD),
while preserving as much information, in the
data, as possible. - Linear functions
- y1 w1 X
- y2 w2 X
-
-
-
- yd wd X
- Y W X
- X inputs Y outputs, components W
eigenvectors, eigenfaces, basis vectors
83How many components?
- Usual choice consider the first d PCs which
account for some percentage, usually above 90 ,
of the cumulative variance of the data. - This is disadvantageous if the last components
are interesting
84Dataset
Property No. Categories Categories No. Faces
Gender 2 Male 1603
Gender 2 Female 1067
Ethnicity 3 Caucasian 1758
Ethnicity 3 African 320
Ethnicity 3 East Asian 363
Age 5 20 29 665
Age 5 30 39 1264
Age 5 40 49 429
Age 5 50 59 206
Age 5 60 106
Identity 358 Individuals with 3 or more examples 1161
- A subset of FERET dataset
- 2670 grey scale frontal face images
- Rich in variety face images vary in pose,
background lighting, presence or absence of
glasses, slight change in expression
85Dataset
- Each image is pre-processed to a 65 X 75
resolution. - Aligned based on eye locations
- Cropped such that little or no hair information
is available - Histogram equalisation is applied to reduce
lighting effects
86Does PCA efficiently represents information in
face images?
- Images of 65 75 resolution leads to a
dimensionality of 4875. - The first 350 components accounted for 90
variance of the data. - Each face is thus represented using 350
components instead of 4875 dimensions - Classification employing 5-fold cross validation,
with 80 of faces in each category for training
and 20 of faces in each category for testing - for identity recognition leave-one-out method is
used. - LDA is performed on the PCA data
- Euclidean measure is used for classification
Property Classification
Gender 86.4
Ethnicity 81.6
Age 91.5
Identity 90
87What information does PCA encode? Gender
- Gender encoding power estimated using the LDA
- 3rd component carries highest gender encoding
power followed by the 4th components - All important components are among the first 50
components
88What information does PCA encode? Gender
-6 SD
-4 SD
-2 SD
Mean
2 SD
4 SD
6 SD
- Reconstructed images from the altered components
(a) third and (b) fourth components. The
components are progressively added by quantities
of -6 S.D (extreme left) to 6 S.D (extreme
right) in steps of 2 S.D.
- Third component encodes information related to
the complexion, length of the nose, presence or
absence of hair on the forehead, and texture
around the mouth region. - Fourth component encodes information related to
the eyebrow thickness, presence or absence of
smiling expression
89Gender
- (a) Face examples with the first two being female
and the next two being male faces. (b)
Reconstructed faces of (a) using the top 20
gender important components. (c) Reconstructed
faces of (a) using all components, except the top
20 gender important components.
90What information does PCA encode? Ethnicity
- 6th component carries highest ethnicity encoding
power followed by the 15th components - All ethnicity important components are among the
first 50 components
91Ethnicity
-6 SD -4 SD -2 SD Mean 2
SD 4 SD 6 SD
- Reconstructed images from the altered components
(a) 6th and (b) 4th components. The components
are progressively added by quantities of -6 S.D
(extreme left) to 6 S.D (extreme right) in steps
of 2 S.D.
- 6th component encodes information related to
complexion, broadness and length of the nose - 15th component encodes information related to
length of the nose, complexion, and presence or
absence of smiling expression
92What information does PCA encode? Age
- Age 20-39 and 50-60 age groups termed as young
and old) - 10th component is found to be the most important
for age
-6 SD -4 SD -2 SD Mean 2
SD 4 SD 6 SD
Reconstructed images from the altered tenth
component. The component is progressively added
by quantities of -6 S.D (extreme left) to 6 S.D
(extreme right) in steps of 2 S.D
93What information does PCA encode? Identity
- Many components are found to be important for
identity. However, their importance magnitude is
small. - These components are widely distributed and not
restricted to the first 50 components
94Can a single component encode multiple properties?
- A grey beard informs that the person is a male
and also, most probably, old. - As all important components of gender, ethnicity,
and age are among the first 50 components there
are overlapping components. - One example is the 3rd component which is found
to be the most important for gender and second
most important for age
95Can a single component encode multiple properties?
- Normal distribution plots of the (a) third (b)
and fourth components for male and female classes
of young and old age groups.
96Summary
- PCA encodes face image properties such as gender,
ethnicity, age, and identity efficiently. - Very few components are required to encode
properties such as gender, ethnicity and age and
these components are amongst the first few
components which capture large part of the
variance of the data. Large number of components
are required to encode identity and these
components are widely distributed. - There may be components which encode multiple
properties.
97Principal Component Analysis (PCA)
- PCA is not always an optimal dimensionality-reduct
ion procedure for classification purposes.
- Suppose there are C classes in the training data.
- PCA is based on the sample covariance which
characterizes the scatter of the entire data set,
irrespective of class-membership. - The projection axes chosen by PCA might not
provide good discrimination power.
98Linear Discriminant Analysis (LDA)
- Perform dimensionality reduction while
preserving as much of the class discriminatory
information as possible. - Seeks to find directions along which the classes
are best separated. - Takes into consideration the scatter
within-classes but also the scatter
between-classes. - More capable of distinguishing image variation
due to identity from variation due to other
sources such as illumination and expression.
99Linear Discriminant Analysis (LDA)
100Angiograph Image Enhancement
101(No Transcript)
102(No Transcript)
103(No Transcript)
104Webcamera Calibration
105(No Transcript)
106(No Transcript)
107(No Transcript)
108QUESTIONS
THANKS