Feature Extraction - PowerPoint PPT Presentation

1 / 68

About This Presentation

Title:

Feature Extraction

Description:

Feature Extraction Content Principal Component Analysis (PCA) PCA Calculation for Fewer-Sample Case Factor Analysis Fisher s Linear ... – PowerPoint PPT presentation

Number of Views:226

Avg rating:3.0/5.0

Slides: 69

Provided by: TaiWe9

Category:

more less

Transcript and Presenter's Notes

Title: Feature Extraction

1
Feature Extraction
??????
2
Content

Principal Component Analysis (PCA)
PCA Calculation for Fewer-Sample Case
Factor Analysis
Fishers Linear Discriminant Analysis
Multiple Discriminant Analysis

3
Feature Extraction
Principal Component Analysis (PCA)
4
Principle Component Analysis

It is a linear procedure to find the direction in
input space where most of the energy of the input
lies.
Feature Extraction
Dimension Reduction
It is also called the (discrete) Karhunen-Loève
transform, or the Hotelling transform.

5
The Basis Concept
Assume data x (random vector) has zero mean.
PCA finds a unit vector w to reflect the largest
amount of variance of the data.
That is,
Demo
6
The Method
Remark C is symmetric and semipositive definite.
Covariance Matrix
7
The Method
maximize
subject to
The method of Lagrange multiplier
Define
The extreme point, say, w satisfies
8
The Method
maximize
subject to
Setting
9
Discussion
At extreme points

Let w1, w2, , wd be the eigenvectors of C whose
corresponding eigenvalues are ?1? ?2 ? ? ?d.
They are called the principal components of C.
Their significance can be ordered according to
their eigenvalues.

w is a eigenvector of C, and ? is its
corresponding eigenvalue.
10
Discussion
At extreme points

Let w1, w2, , wd be the eigenvectors of C whose
corresponding eigenvalues are ?1? ?2 ? ? ?d.
They are called the principal components of C.
Their significance can be ordered according to
their eigenvalues.

If C is symmetric and semipositive definite, all
their eigenvectors are orthogonal.
They, hence, form a basis of the feature space.
For dimensionality reduction, only choose few of
them.

11
Applications

Image Processing
Signal Processing
Compression
Feature Extraction
Pattern Recognition

12
Example
Projecting the data onto the most significant
axis will facilitate classification.
This also achieves dimensionality reduction.
13
Issues
The most significant component obtained using PCA.

PCA is effective for identifying the multivariate
signal distribution.
Hence, it is good for signal reconstruction.
But, it may be inappropriate for pattern
classification.

The most significant component for classification
14
Whitening

Whitening is a process that transforms the random
vector, say, x (x1, x2 , , xn )T (assumed it
is zero mean) to, say, z (z1, z2 , , zn )T
with zero mean and unit variance.
z is said to be white or sphered.
This implies that all of its elements are
uncorrelated.
However, this doesnt implies its elements are
independent.

15
Whitening Transform
Clearly, D is a diagonal matrix and E is an
orthonormal matrix.
Let V be a whitening transform, then
Decompose Cx as
Set
16
Whitening Transform
If V is a whitening transform, and U is any
orthonormal matrix, show that UV, i.e., rotation,
is also a whitening transform.
Proof)
17
Why Whitening?

With PCA, we usually choose several major
eigenvectors as the basis for representation.
This basis is efficient for reconstruction, but
may be inappropriate for other applications,
e.g., classification.
By whitening, we can rotate the basis to get more
interesting features.

18
Feature Extraction
PCA Calculation for Fewer-Sample Case
19
Complexity for PCA Calculation

Let C be of size n n
Time complexity by direct computation - O(n3)
Are there any efficient method in case that

20
PCA for Covariance Matrixfrom Fewer Samples

Consider N samples of
with
Define

21
PCA for Covariance Matrixfrom Fewer Samples

Define N N matrix
Let be the orthonormal
eigenvectors of of T with corresponding
eigenvalues ?i, i.e.,

22
PCA for Covariance Matrixfrom Fewer Samples
23
PCA for Covariance Matrixfrom Fewer Samples
Define
24
PCA for Covariance Matrixfrom Fewer Samples
Define
pi are orthonormal eigenvectors of C with
eigenvalues
25
Feature Extraction
Factor Analysis
26
What is a Factor?

If several variables correlate highly, they might
measure aspects of a common underlying dimension.
These dimensions are called factors.
Factors are classification axis along which the
measures can be plotted.
The greater the loading of variables on a factor,
the more that factor can explain
intercorrelations between those variables.

27
Graph Representation
28
What is Factor Analysis?

A method for investigating whether a number of
variables of interest Y1, Y2, , Yn, are linearly
related to a smaller number of unobservable
factors F1, F2, , Fm.
For data reduction and summarization.
Statistical approach to analyze
interrelationships among the large number of
variables to explain these variables in term of
their common underlying dimensions (factors).

29
Example
What factors influence students grades?
Quantitative skill?
unobservable
Verbal skill?
Observable Data
30
The Model
y Observation Vector
B Factor-Loading Matrix
f Factor Vector
? Gaussian-Noise Matrix
31
The Model
y Observation Vector
B Factor-Loading Matrix
f Factor Vector
? Gaussian-Noise Matrix
32
The Model
Can be obtained from the model
Can be estimated from data
33
The Model
Commuality
Specific Variance
Explained
Unexplained
34
Example
35
Goal
Our goal is to minimize
Hence,
36
Uniqueness
Is the solution unique?
There are infinite number of solutions.
Since if B is a solution and T is an orthonormal
transformation (rotation), then BT is also a
solution.
37
Example
Cy
Which one is better?
38
Example
Left each factor have nonzero loading for all
variables.
Right each factor controls different variables.
39
The Method

Determine the first set of loadings using
principal component method.

40
Example
41
Factor Rotation
Factor-Loading Matrix
Rotation Matrix
Factor Rotation
42
Factor Rotation
Criteria

Varimax
Quartimax
Equimax
Orthomax
Oblimin

Factor-Loading Matrix
Factor Rotation
43
Varimax
Criterion
Maxmize
Subject to
Let
44
Varimax
Criterion
Maxmize
Subject to
Construct the Lagrangian
45
Varimax
dk
cjk
bjk
46
Varimax
Define
is the kth column of
47
Varimax
is the kth column of
48
Varimax
Goal
reaches maximum once
49
Varimax
Goal

Initially,
obtain B0 by whatever method, e.g., PCA.
set T0 as the approximation rotation matrix,
e.g., T0I.

50
Varimax
Goal
Pre-multiplying each side by its transpose.
51
Varimax
Criterion
Maximize
52
Varimax
Maximize
Let
53
Feature Extraction
Fishers Linear Discriminant Analysis
54
Main Concept

PCA seeks directions that are efficient for
representation.
Discriminant analysis seeks directions that are
efficient for discrimination.

55
Classification Efficiencies on Projections
56
Criterion ? Two-Category
w 1
w
57
Scatter
Between-Class Scatter Matrix
w 1
w
Between-Class Scatter
The larger the better
58
Scatter
Between-Class Scatter Matrix
Within-Class Scatter Matrix
w 1
w
Within-Class Scatter
The smaller the better
59
Goal
Between-Class Scatter Matrix
Within-Class Scatter Matrix
w 1
Define
Generalized Rayleigh quotient
w
The length of w is immaterial.
60
Generalized Eigenvector
To maximize J(w), w is the generalized
eigenvector associated with largest generalized
eigenvalue.
Define
Generalized Rayleigh quotient
That is,
or
The length of w is immaterial.
61
Proof
To maximize J(w), w is the generalized
eigenvector associated with largest generalized
eigenvalue.
Set
That is,
or
?
62
Example
63
Feature Extraction
Multiple Discriminant Analysis
64
Generalization of Fishers Linear Discriminant
For the c-class problem, we seek a
(c?1)-dimension projection for efficient
discrimination.
65
Scatter Matrices ? Feature Space
Total Scatter Matrix
Within-Class Scatter Matrix
Between-Class Scatter Matrix
66
The (c?1)-Dim Projection
The projection space will be described using a
d?(c?1) matrix W.
67
Scatter Matrices ? Projection Space
Total Scatter Matrix
Within-Class Scatter Matrix
W
Between-Class Scatter Matrix
68
Criterion

Write a Comment

User Comments (0)