Eigen Value Analysis in Pattern Recognition - PowerPoint PPT Presentation

1 / 108
About This Presentation
Title:

Eigen Value Analysis in Pattern Recognition

Description:

Eigen Value Analysis in Pattern Recognition By Dr. M. Asmat Ullah Khan COMSATS Institute of Information Technology, Abbottabad Principal Component Analysis (PCA ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 109
Provided by: sha8161
Category:

less

Transcript and Presenter's Notes

Title: Eigen Value Analysis in Pattern Recognition


1
Eigen Value Analysis in Pattern Recognition
  • By
  • Dr. M. Asmat Ullah Khan
  • COMSATS Institute of Information Technology,
  • Abbottabad

2
(No Transcript)
3
(No Transcript)
4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
MULTI SPECTRAL IMAGE COMPRESSION
11
MULTI SPECTRAL IMAGE COMPRESSION
12
MULTI SPECTRAL IMAGE COMPRESSION
13
MULTI SPECTRAL IMAGE COMPRESSION
14
MULTI SPECTRAL IMAGE COMPRESSION
15
Principal Component Analysis (PCA)
  • Pattern recognition in high-dimensional spaces
  • Problems arise when performing recognition in a
    high-dimensional space (curse of dimensionality).
  • Significant improvements can be achieved by first
    mapping the data into a lower-dimensional
    sub-space.
  • The goal of PCA is to reduce the dimensionality
    of the data while retaining as much as possible
    of the variation present in the original dataset.

16
Principal Component Analysis (PCA)
  • Dimensionality reduction
  • PCA allows us to compute a linear transformation
    that maps data from a high dimensional space to a
    lower dimensional sub-space.

17
Principal Component Analysis (PCA)
  • Lower dimensionality basis
  • Approximate vectors by finding a basis in an
    appropriate lower dimensional space.

(1) Higher-dimensional space representation
(2) Lower-dimensional space representation
18
Principal Component Analysis (PCA)
  • Example

19
Principal Component Analysis (PCA)
  • Information loss
  • Dimensionality reduction implies information loss
    !!
  • Want to preserve as much information as possible,
    that is
  • How to determine the best lower dimensional
    sub-space?

20
Principal Component Analysis (PCA)
  • Methodology
  • Suppose x1, x2, ..., xM are N x 1 vectors

21
Principal Component Analysis (PCA)
  • Methodology cont.

22
Principal Component Analysis (PCA)
  • Linear transformation implied by PCA
  • The linear transformation RN ? RK that performs
    the dimensionality reduction is

23
Principal Component Analysis (PCA)
  • Geometric interpretation
  • PCA projects the data along the directions where
    the data varies the most.
  • These directions are determined by the
    eigenvectors of the covariance matrix
    corresponding to the largest eigenvalues.
  • The magnitude of the eigenvalues corresponds to
    the variance of the data along the eigenvector
    directions.

24
Principal Component Analysis (PCA)
  • How to choose the principal components?
  • To choose K, use the following criterion

25
Principal Component Analysis (PCA)
  • What is the error due to dimensionality reduction?
  • We saw above that an original vector x can be
    reconstructed using its principal components
  • It can be shown that the low-dimensional basis
    based on principal components minimizes the
    reconstruction error
  • It can be shown that the error is equal to

26
Principal Component Analysis (PCA)
  • Standardization
  • The principal components are dependent on the
    units used to measure the original variables as
    well as on the range of values they assume.
  • We should always standardize the data prior to
    using PCA.
  • A common standardization method is to transform
    all the data to have zero mean and unit standard
    deviation

27
Principal Component Analysis (PCA)
  • PCA and classification
  • PCA is not always an optimal dimensionality-reduct
    ion procedure for classification purposes

28
Principal Component Analysis (PCA)
  • Case Study Eigenfaces for Face
    Detection/Recognition
  • M. Turk, A. Pentland, "Eigenfaces for
    Recognition", Journal of Cognitive Neuroscience,
    vol. 3, no. 1, pp. 71-86, 1991.
  • Face Recognition
  • The simplest approach is to think of it as a
    template matching problem
  • Problems arise when performing recognition in a
    high-dimensional space.
  • Significant improvements can be achieved by first
    mapping the data into a lower dimensionality
    space.
  • How to find this lower-dimensional space?

29
Principal Component Analysis (PCA)
  • Main idea behind eigenfaces

30
Principal Component Analysis (PCA)
  • Computation of the eigenfaces

31
Principal Component Analysis (PCA)
  • Computation of the eigenfaces cont.

32
Principal Component Analysis (PCA)
  • Computation of the eigenfaces cont.

33
Principal Component Analysis (PCA)
  • Computation of the eigenfaces cont.

34
Principal Component Analysis (PCA)
  • Representing faces onto this basis

35
Principal Component Analysis (PCA)
  • Representing faces onto this basis cont.

36
Principal Component Analysis (PCA)
  • Face Recognition Using Eigenfaces

37
Principal Component Analysis (PCA)
  • Face Recognition Using Eigenfaces cont.
  • The distance er is called distance within the
    face space (difs)
  • Comment we can use the common Euclidean distance
    to compute er, however, it has been reported that
    the Mahalanobis distance performs better

38
Principal Component Analysis (PCA)
  • Face Detection Using Eigenfaces

39
Principal Component Analysis (PCA)
  • Face Detection Using Eigenfaces cont.

40
Principal Component Analysis (PCA)
  • Problems
  • Background (de-emphasize the outside of the face
    e.g., by multiplying the input image by a 2D
    Gaussian window centered on the face)
  • Lighting conditions (performance degrades with
    light changes)
  • Scale (performance decreases quickly with changes
    to head size)
  • multi-scale eigenspaces
  • scale input image to multiple sizes
  • Orientation (performance decreases but not as
    fast as with scale changes)
  • plane rotations can be handled
  • out-of-plane rotations are more difficult to
    handle

41
Linear Discriminant Analysis (LDA)
  • Multiple classes and PCA
  • Suppose there are C classes in the training data.
  • PCA is based on the sample covariance which
    characterizes the scatter of the entire data set,
    irrespective of class-membership.
  • The projection axes chosen by PCA might not
    provide good discrimination power.
  • What is the goal of LDA?
  • Perform dimensionality reduction while preserving
    as much of the class discriminatory information
    as possible.
  • Seeks to find directions along which the classes
    are best separated.
  • Takes into consideration the scatter
    within-classes but also the scatter
    between-classes.
  • More capable of distinguishing image variation
    due to identity from variation due to other
    sources such as illumination and expression.

42
Linear Discriminant Analysis (LDA)
  • Methodology

43
Linear Discriminant Analysis (LDA)
  • Methodology cont.
  • LDA computes a transformation that maximizes the
    between-class scatter while minimizing the
    within-class scatter
  • Such a transformation should retain class
    separability while reducing the variation due to
    sources other than identity (e.g., illumination).

44
Linear Discriminant Analysis (LDA)
  • Linear transformation implied by LDA
  • The linear transformation is given by a matrix U
    whose columns are the eigenvectors of Sw-1 Sb
    (called Fisherfaces).
  • The eigenvectors are solutions of the generalized
    eigenvector problem

45
Linear Discriminant Analysis (LDA)
  • Does Sw-1 always exist?
  • If Sw is non-singular, we can obtain a
    conventional eigenvalue problem by writing
  • In practice, Sw is often singular since the data
    are image vectors with large dimensionality while
    the size of the data set is much smaller (M ltlt N )

46
Linear Discriminant Analysis (LDA)
  • Does Sw-1 always exist? cont.
  • To alleviate this problem, we can perform two
    projections
  1. PCA is first applied to the data set to reduce
    its dimensionality.
  1. LDA is then applied to further reduce the
    dimensionality.

47
Linear Discriminant Analysis (LDA)
  • Case Study Using Discriminant Eigenfeatures for
    Image Retrieval
  • D. Swets, J. Weng, "Using Discriminant
    Eigenfeatures for Image Retrieval", IEEE
    Transactions on Pattern Analysis and Machine
    Intelligence, vol. 18, no. 8, pp. 831-836, 1996.
  • Content-based image retrieval
  • The application being studied here is
    query-by-example image retrieval.
  • The paper deals with the problem of selecting a
    good set of image features for content-based
    image retrieval.

48
Linear Discriminant Analysis (LDA)
  • Assumptions
  • "Well-framed" images are required as input for
    training and query-by-example test probes.
  • Only a small variation in the size, position, and
    orientation of the objects in the images is
    allowed.

49
Linear Discriminant Analysis (LDA)
  • Some terminology
  • Most Expressive Features (MEF) the features
    (projections) obtained using PCA.
  • Most Discriminating Features (MDF) the features
    (projections) obtained using LDA.
  • Computational considerations
  • When computing the eigenvalues/eigenvectors of
    Sw-1SBuk ?kuk numerically, the computations can
    be unstable since Sw-1SB is not always symmetric.
  • See paper for a way to find the
    eigenvalues/eigenvectors in a stable way.
  • Important Dimensionality of LDA is bounded by
    C-1 --- this is the rank of Sw-1SB

50
Linear Discriminant Analysis (LDA)
  • Case Study PCA versus LDA
  • A. Martinez, A. Kak, "PCA versus LDA", IEEE
    Transactions on Pattern Analysis and Machine
    Intelligence, vol. 23, no. 2, pp. 228-233, 2001.
  • Is LDA always better than PCA?
  • There has been a tendency in the computer vision
    community to prefer LDA over PCA.
  • This is mainly because LDA deals directly with
    discrimination between classes while PCA does not
    pay attention to the underlying class structure.
  • This paper shows that when the training set is
    small, PCA can outperform LDA.
  • When the number of samples is large and
    representative for each class, LDA outperforms
    PCA.

51
Linear Discriminant Analysis (LDA)
  • Is LDA always better than PCA? cont.

52
Linear Discriminant Analysis (LDA)
  • Is LDA always better than PCA? cont.

53
Linear Discriminant Analysis (LDA)
  • Is LDA always better than PCA? cont.

54
Linear Discriminant Analysis (LDA)
  • Critique of LDA
  • Only linearly separable classes will remain
    separable after applying LDA.
  • It does not seem to be superior to PCA when the
    training data set is small.

55
Appearance-based Recognition
  • Directly represent appearance (image
    brightness), not geometry.
  • Why?
  • Avoids modeling geometry, complex interactions
  • between geometry, lighting and reflectance.
  • Why not?
  • Too many possible appearances!
  • m visual degrees of freedom (eg., pose,
    lighting, etc)
  • R discrete samples for each DOF
  • How to discretely sample the DOFs?
  • How to PREDICT/SYNTHESIS/MATCH with novel views?

56
Appearance-based Recognition
  • Example
  • Visual DOFs Object type P, Lighting Direction
    L, Pose R
  • Set of R P L possible images
  • Image as a point in high dimensional space

is an image of N pixels and A point in
N-dimensional space
Pixel 2 gray value
Pixel 1 gray value
57
The Space of Faces
  • An image is a point in a high dimensional space
  • An N x M image is a point in RNM
  • We can define vectors in this space as we did in
    the 2D case

Thanks to Chuck Dyer, Steve Seitz, Nishino
58
Key Idea
  • Images in the possible set are
    highly correlated.
  • So, compress them to a low-dimensional subspace
    that
  • captures key appearance characteristics of the
    visual DOFs.
  • EIGENFACES Turk and Pentland

USE PCA!
59
Eigenfaces
Eigenfaces look somewhat like generic faces.
60
Linear Subspaces
  • Classification can be expensive
  • Must either search (e.g., nearest neighbors) or
    store large probability density functions.
  • Suppose the data points are arranged as above
  • Ideafit a line, classifier measures distance to
    line

61
Dimensionality Reduction
  • Dimensionality reduction
  • We can represent the orange points with only
    their v1 coordinates
  • since v2 coordinates are all essentially 0
  • This makes it much cheaper to store and compare
    points
  • A bigger deal for higher dimensional problems

62
Linear Subspaces
Consider the variation along direction v among
all of the orange points
What unit vector v minimizes var?
What unit vector v maximizes var?
Solution v1 is eigenvector of A with largest
eigenvalue v2 is eigenvector of A
with smallest eigenvalue
63
Higher Dimensions
  • Suppose each data point is N-dimensional
  • Same procedure applies
  • The eigenvectors of A define a new coordinate
    system
  • eigenvector with largest eigenvalue captures the
    most variation among training vectors x
  • eigenvector with smallest eigenvalue has least
    variation
  • We can compress the data by only using the top
    few eigenvectors
  • corresponds to choosing a linear subspace
  • represent points on a line, plane, or
    hyper-plane
  • these eigenvectors are known as the principal
    components

64
Problem Size of Covariance Matrix A
  • Suppose each data point is N-dimensional (N
    pixels)
  • The size of covariance matrix A is N x N
  • The number of eigenfaces is N
  • Example For N 256 x 256 pixels,
  • Size of A will be 65536 x 65536 !
  • Number of eigenvectors will be 65536 !
  • Typically, only 20-30 eigenvectors suffice. So,
    this
  • method is very inefficient!

2
2
65
Efficient Computation of Eigenvectors
  • If B is MxN and MltltN then ABTB is NxN gtgt MxM
  • M ? number of images, N ? number of pixels
  • use BBT instead, eigenvector of BBT is easily
  • converted to that of BTB
  • (BBT) y e y
  • gt BT(BBT) y e (BTy)
  • gt (BTB)(BTy) e (BTy)
  • gt BTy is the eigenvector of BTB

66
Eigenfaces summary in words
  • Eigenfaces are
  • the eigenvectors of
  • the covariance matrix of
  • the probability distribution of
  • the vector space of
  • human faces
  • Eigenfaces are the standardized face
    ingredients derived from the statistical
    analysis of many pictures of human faces
  • A human face may be considered to be a
    combination of these standardized faces

67
Generating Eigenfaces in words
  • Large set of images of human faces is taken.
  • The images are normalized to line up the eyes,
    mouths and other features.
  • The eigenvectors of the covariance matrix of the
    face image vectors are then extracted.
  • These eigenvectors are called eigenfaces.

68
Eigenfaces for Face Recognition
  • When properly weighted, eigenfaces can be summed
    together to create an approximate gray-scale
    rendering of a human face.
  • Remarkably few eigenvector terms are needed to
    give a fair likeness of most people's faces.
  • Hence eigenfaces provide a means of applying data
    compression to faces for identification purposes.

69
Dimensionality Reduction
  • The set of faces is a subspace of the set
  • of images
  • Suppose it is K dimensional
  • We can find the best subspace using PCA
  • This is like fitting a hyper-plane to the set
    of faces
  • spanned by vectors v1, v2, ..., vK

Any face
70
Eigenfaces
  • PCA extracts the eigenvectors of A
  • Gives a set of vectors v1, v2, v3, ...
  • Each one of these vectors is a direction in face
    space
  • what do these look like?

71
Projecting onto the Eigenfaces
  • The eigenfaces v1, ..., vK span the space of
    faces
  • A face is converted to eigenface coordinates by

72
Is this a face or not?
73
Recognition with Eigenfaces
  • Algorithm
  • Process the image database (set of images with
    labels)
  • Run PCAcompute eigenfaces
  • Calculate the K coefficients for each image
  • Given a new image (to be recognized) x, calculate
    K coefficients
  • Detect if x is a face
  • If it is a face, who is it?
  • Find closest labeled face in database
  • nearest-neighbor in K-dimensional space

74
Key Property of Eigenspace Representation
  • Given
  • 2 images that are used to
    construct the Eigenspace
  • is the eigenspace projection of image
  • is the eigenspace projection of image
  • Then,
  • That is, distance in Eigenspace is approximately
    equal to the
  • correlation between two images.

75
Choosing the Dimension K
eigenvalues
  • How many eigenfaces to use?
  • Look at the decay of the eigenvalues
  • the eigenvalue tells you the amount of variance
    in the direction of that eigenface
  • ignore eigenfaces with low variance

76
Papers
77
(No Transcript)
78
More Problems Outliers
Sample Outliers
Intra-sample outliers
Need to explicitly reject outliers before or
during computing PCA.
De la Torre and Black
79
Robustness to Intra-sample outliers
RPCA Robust PCA, De la Torre and Black
80
Robustness to Sample Outliers
PCA
Original
RPCA
Outliers
Finding outliers Tracking moving objects
81
Research Questions
  • Does PCA encode information related to gender,
    ethnicity, age, and identity efficiently?
  • What information do PCA encode?
  • Are there components (features) of PCA that
    encode multiples properties?

82
PCA
  • The aim of the PCA is a linear reduction of D
    dimensional data to d dimensional data (dltD),
    while preserving as much information, in the
    data, as possible.
  • Linear functions
  • y1 w1 X
  • y2 w2 X
  • yd wd X
  • Y W X
  • X inputs Y outputs, components W
    eigenvectors, eigenfaces, basis vectors

83
How many components?
  • Usual choice consider the first d PCs which
    account for some percentage, usually above 90 ,
    of the cumulative variance of the data.
  • This is disadvantageous if the last components
    are interesting

84
Dataset
Property No. Categories Categories No. Faces
Gender 2 Male 1603
Gender 2 Female 1067
Ethnicity 3 Caucasian 1758
Ethnicity 3 African 320
Ethnicity 3 East Asian 363
Age 5 20 29 665
Age 5 30 39 1264
Age 5 40 49 429
Age 5 50 59 206
Age 5 60 106
Identity 358 Individuals with 3 or more examples 1161
  • A subset of FERET dataset
  • 2670 grey scale frontal face images
  • Rich in variety face images vary in pose,
    background lighting, presence or absence of
    glasses, slight change in expression

85
Dataset
  • Each image is pre-processed to a 65 X 75
    resolution.
  • Aligned based on eye locations
  • Cropped such that little or no hair information
    is available
  • Histogram equalisation is applied to reduce
    lighting effects

86
Does PCA efficiently represents information in
face images?
  • Images of 65 75 resolution leads to a
    dimensionality of 4875.
  • The first 350 components accounted for 90
    variance of the data.
  • Each face is thus represented using 350
    components instead of 4875 dimensions
  • Classification employing 5-fold cross validation,
    with 80 of faces in each category for training
    and 20 of faces in each category for testing
  • for identity recognition leave-one-out method is
    used.
  • LDA is performed on the PCA data
  • Euclidean measure is used for classification

Property Classification
Gender 86.4
Ethnicity 81.6
Age 91.5
Identity 90
87
What information does PCA encode? Gender
  • Gender encoding power estimated using the LDA
  • 3rd component carries highest gender encoding
    power followed by the 4th components
  • All important components are among the first 50
    components

88
What information does PCA encode? Gender
-6 SD
-4 SD
-2 SD
Mean
2 SD
4 SD
6 SD
  • Reconstructed images from the altered components
    (a) third and (b) fourth components. The
    components are progressively added by quantities
    of -6 S.D (extreme left) to 6 S.D (extreme
    right) in steps of 2 S.D.
  • Third component encodes information related to
    the complexion, length of the nose, presence or
    absence of hair on the forehead, and texture
    around the mouth region.
  • Fourth component encodes information related to
    the eyebrow thickness, presence or absence of
    smiling expression

89
Gender
  • (a) Face examples with the first two being female
    and the next two being male faces. (b)
    Reconstructed faces of (a) using the top 20
    gender important components. (c) Reconstructed
    faces of (a) using all components, except the top
    20 gender important components.

90
What information does PCA encode? Ethnicity
  • 6th component carries highest ethnicity encoding
    power followed by the 15th components
  • All ethnicity important components are among the
    first 50 components

91
Ethnicity
-6 SD -4 SD -2 SD Mean 2
SD 4 SD 6 SD
  • Reconstructed images from the altered components
    (a) 6th and (b) 4th components. The components
    are progressively added by quantities of -6 S.D
    (extreme left) to 6 S.D (extreme right) in steps
    of 2 S.D.
  • 6th component encodes information related to
    complexion, broadness and length of the nose
  • 15th component encodes information related to
    length of the nose, complexion, and presence or
    absence of smiling expression

92
What information does PCA encode? Age
  • Age 20-39 and 50-60 age groups termed as young
    and old)
  • 10th component is found to be the most important
    for age

-6 SD -4 SD -2 SD Mean 2
SD 4 SD 6 SD
Reconstructed images from the altered tenth
component. The component is progressively added
by quantities of -6 S.D (extreme left) to 6 S.D
(extreme right) in steps of 2 S.D
93
What information does PCA encode? Identity
  • Many components are found to be important for
    identity. However, their importance magnitude is
    small.
  • These components are widely distributed and not
    restricted to the first 50 components

94
Can a single component encode multiple properties?
  • A grey beard informs that the person is a male
    and also, most probably, old.
  • As all important components of gender, ethnicity,
    and age are among the first 50 components there
    are overlapping components.
  • One example is the 3rd component which is found
    to be the most important for gender and second
    most important for age

95
Can a single component encode multiple properties?
  • Normal distribution plots of the (a) third (b)
    and fourth components for male and female classes
    of young and old age groups.

96
Summary
  • PCA encodes face image properties such as gender,
    ethnicity, age, and identity efficiently.
  • Very few components are required to encode
    properties such as gender, ethnicity and age and
    these components are amongst the first few
    components which capture large part of the
    variance of the data. Large number of components
    are required to encode identity and these
    components are widely distributed.
  • There may be components which encode multiple
    properties.

97
Principal Component Analysis (PCA)
  • PCA and classification
  • PCA is not always an optimal dimensionality-reduct
    ion procedure for classification purposes.
  • Multiple classes and PCA
  • Suppose there are C classes in the training data.
  • PCA is based on the sample covariance which
    characterizes the scatter of the entire data set,
    irrespective of class-membership.
  • The projection axes chosen by PCA might not
    provide good discrimination power.

98
Linear Discriminant Analysis (LDA)
  • What is the goal of LDA?
  • Perform dimensionality reduction while
    preserving as much of the class discriminatory
    information as possible.
  • Seeks to find directions along which the classes
    are best separated.
  • Takes into consideration the scatter
    within-classes but also the scatter
    between-classes.
  • More capable of distinguishing image variation
    due to identity from variation due to other
    sources such as illumination and expression.

99
Linear Discriminant Analysis (LDA)
100
Angiograph Image Enhancement
101
(No Transcript)
102
(No Transcript)
103
(No Transcript)
104
Webcamera Calibration
105
(No Transcript)
106
(No Transcript)
107
(No Transcript)
108
QUESTIONS
THANKS
Write a Comment
User Comments (0)
About PowerShow.com