Feature Extraction - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Feature Extraction

Description:

PCA is effective for identifying the multivariate signal distribution. ... With PCA, we usually choose several major eigenvectors as the basis for representation. ... – PowerPoint PPT presentation

Number of Views:183
Avg rating:3.0/5.0
Slides: 69
Provided by: taiwe
Category:

less

Transcript and Presenter's Notes

Title: Feature Extraction


1
Feature Extraction
??????
2
Content
  • Principal Component Analysis (PCA)
  • PCA Calculation for Fewer-Sample Case
  • Factor Analysis
  • Fishers Linear Discriminant Analysis
  • Multiple Discriminant Analysis

3
Feature Extraction
Principal Component Analysis (PCA)
4
Principle Component Analysis
  • It is a linear procedure to find the direction in
    input space where most of the energy of the input
    lies.
  • Feature Extraction
  • Dimension Reduction
  • It is also called the (discrete) Karhunen-Loève
    transform, or the Hotelling transform.

5
The Basis Concept
Assume data x (random vector) has zero mean.
PCA finds a unit vector w to reflect the largest
amount of variance of the data.
That is,
Demo
6
The Method
Remark C is symmetric and semipositive definite.
Covariance Matrix
7
The Method
maximize
subject to
The method of Lagrange multiplier
Define
The extreme point, say, w satisfies
8
The Method
maximize
subject to
Setting
9
Discussion
At extreme points
  • Let w1, w2, , wd be the eigenvectors of C whose
    corresponding eigenvalues are ?1? ?2 ? ? ?d.
  • They are called the principal components of C.
  • Their significance can be ordered according to
    their eigenvalues.

w is a eigenvector of C, and ? is its
corresponding eigenvalue.
10
Discussion
At extreme points
  • Let w1, w2, , wd be the eigenvectors of C whose
    corresponding eigenvalues are ?1? ?2 ? ? ?d.
  • They are called the principal components of C.
  • Their significance can be ordered according to
    their eigenvalues.
  • If C is symmetric and semipositive definite, all
    their eigenvectors are orthogonal.
  • They, hence, form a basis of the feature space.
  • For dimensionality reduction, only choose few of
    them.

11
Applications
  • Image Processing
  • Signal Processing
  • Compression
  • Feature Extraction
  • Pattern Recognition

12
Example
Projecting the data onto the most significant
axis will facilitate classification.
This also achieves dimensionality reduction.
13
Issues
The most significant component obtained using PCA.
  • PCA is effective for identifying the multivariate
    signal distribution.
  • Hence, it is good for signal reconstruction.
  • But, it may be inappropriate for pattern
    classification.

The most significant component for classification
14
Whitening
  • Whitening is a process that transforms the random
    vector, say, x (x1, x2 , , xn )T (assumed it
    is zero mean) to, say, z (z1, z2 , , zn )T
    with zero mean and unit variance.
  • z is said to be white or sphered.
  • This implies that all of its elements are
    uncorrelated.
  • However, this doesnt implies its elements are
    independent.

15
Whitening Transform
Clearly, D is a diagonal matrix and E is an
orthonormal matrix.
Let V be a whitening transform, then
Decompose Cx as
Set
16
Whitening Transform
If V is a whitening transform, and U is any
orthonormal matrix, show that UV, i.e., rotation,
is also a whitening transform.
Proof)
17
Why Whitening?
  • With PCA, we usually choose several major
    eigenvectors as the basis for representation.
  • This basis is efficient for reconstruction, but
    may be inappropriate for other applications,
    e.g., classification.
  • By whitening, we can rotate the basis to get more
    interesting features.

18
Feature Extraction
PCA Calculation for Fewer-Sample Case
19
Complexity for PCA Calculation
  • Let C be of size n n
  • Time complexity by direct computation - O(n3)
  • Are there any efficient method in case that

20
PCA for Covariance Matrixfrom Fewer Samples
  • Consider N samples of
  • with
  • Define

21
PCA for Covariance Matrixfrom Fewer Samples
  • Define N N matrix
  • Let be the orthonormal
    eigenvectors of of T with corresponding
    eigenvalues ?i, i.e.,

22
PCA for Covariance Matrixfrom Fewer Samples
23
PCA for Covariance Matrixfrom Fewer Samples
Define
24
PCA for Covariance Matrixfrom Fewer Samples
Define
pi are orthonormal eigenvectors of C with
eigenvalues
25
Feature Extraction
Factor Analysis
26
What is a Factor?
  • If several variables correlate highly, they might
    measure aspects of a common underlying dimension.
  • These dimensions are called factors.
  • Factors are classification axis along which the
    measures can be plotted.
  • The greater the loading of variables on a factor,
    the more that factor can explain
    intercorrelations between those variables.

27
Graph Representation
28
What is Factor Analysis?
  • A method for investigating whether a number of
    variables of interest Y1, Y2, , Yn, are linearly
    related to a smaller number of unobservable
    factors F1, F2, , Fm.
  • For data reduction and summarization.
  • Statistical approach to analyze
    interrelationships among the large number of
    variables to explain these variables in term of
    their common underlying dimensions (factors).

29
Example
What factors influence students grades?
Quantitative skill?
unobservable
Verbal skill?
Observable Data
30
The Model
y Observation Vector
B Factor-Loading Matrix
f Factor Vector
? Gaussian-Noise Matrix
31
The Model
y Observation Vector
B Factor-Loading Matrix
f Factor Vector
? Gaussian-Noise Matrix
32
The Model
Can be obtained from the model
Can be estimated from data
33
The Model
Commuality
Specific Variance
Explained
Unexplained
34
Example
35
Goal
Our goal is to minimize
Hence,
36
Uniqueness
Is the solution unique?
There are infinite number of solutions.
Since if B is a solution and T is an orthonormal
transformation (rotation), then BT is also a
solution.
37
Example
Cy
Which one is better?
38
Example
Left each factor have nonzero loading for all
variables.
Right each factor controls different variables.
39
The Method
  • Determine the first set of loadings using
    principal component method.

40
Example
41
Factor Rotation
Factor-Loading Matrix
Rotation Matrix
Factor Rotation
42
Factor Rotation
Criteria
  • Varimax
  • Quartimax
  • Equimax
  • Orthomax
  • Oblimin

Factor-Loading Matrix
Factor Rotation
43
Varimax
Criterion
Maxmize
Subject to
Let
44
Varimax
Criterion
Maxmize
Subject to
Construct the Lagrangian
45
Varimax
dk
cjk
bjk
46
Varimax
Define
is the kth column of
47
Varimax
is the kth column of
48
Varimax
Goal
reaches maximum once
49
Varimax
Goal
  • Initially,
  • obtain B0 by whatever method, e.g., PCA.
  • set T0 as the approximation rotation matrix,
    e.g., T0I.

50
Varimax
Goal
Pre-multiplying each side by its transpose.
51
Varimax
Criterion
Maximize
52
Varimax
Maximize
Let
53
Feature Extraction
Fishers Linear Discriminant Analysis
54
Main Concept
  • PCA seeks directions that are efficient for
    representation.
  • Discriminant analysis seeks directions that are
    efficient for discrimination.

55
Classification Efficiencies on Projections
56
Criterion ? Two-Category
w 1
w
57
Scatter
Between-Class Scatter Matrix
w 1
w
Between-Class Scatter
The larger the better
58
Scatter
Between-Class Scatter Matrix
Within-Class Scatter Matrix
w 1
w
Within-Class Scatter
The smaller the better
59
Goal
Between-Class Scatter Matrix
Within-Class Scatter Matrix
w 1
Define
Generalized Rayleigh quotient
w
The length of w is immaterial.
60
Generalized Eigenvector
To maximize J(w), w is the generalized
eigenvector associated with largest generalized
eigenvalue.
Define
Generalized Rayleigh quotient
That is,
or
The length of w is immaterial.
61
Proof
To maximize J(w), w is the generalized
eigenvector associated with largest generalized
eigenvalue.
Set
That is,
or
?
62
Example
63
Feature Extraction
Multiple Discriminant Analysis
64
Generalization of Fishers Linear Discriminant
For the c-class problem, we seek a
(c?1)-dimension projection for efficient
discrimination.
65
Scatter Matrices ? Feature Space
Total Scatter Matrix
Within-Class Scatter Matrix
Between-Class Scatter Matrix
66
The (c?1)-Dim Projection
The projection space will be described using a
d?(c?1) matrix W.
67
Scatter Matrices ? Projection Space
Total Scatter Matrix
Within-Class Scatter Matrix
W
Between-Class Scatter Matrix
68
Criterion
Write a Comment
User Comments (0)
About PowerShow.com