Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis presentation

About This Presentation

Transcript and Presenter's Notes

Title: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis

1
Kernelized Discriminant Analysis and Adaptive
Methods for Discriminant Analysis

Haesun Park
Georgia Institute of Technology,
Atlanta, GA, USA
(joint work with C. Park)
KAIST, Korea, June 2007

2
Clustering
3

Clustering
grouping of data based on similarity
measures

4
Classification

Classification
assign a class label to new unseen data

5
Data Mining

Mining or discovery of new information -
patterns
or rules - from large
databases

Data Preparation
Data Reduction

Dimension reduction
Feature Selection
-

Preprocessing
Feature Extraction

Association Analysis
Regression
Probabilistic modeling

Classification
Clustering
6
Feature Extraction

Optimal feature extraction
- Reduce the dimensionality of data space
- Minimize effects of redundant features and
noise

Curse of dimensionality
number of features
new data
..
..
..
feature extraction
Apply a classifier to predict a class label of
new data
..
..
..
7
Linear dimension reduction
Maximize class separability in the reduced
dimensional space
8
Linear dimension reduction
Maximize class separability in the reduced
dimensional space
9
What if data is not linear separable?
Nonlinear Dimension Reduction
10
Contents

Linear Discriminant Analysis
Nonlinear Dimension Reduction based on Kernel
Methods
- Nonlinear Discriminant Analysis
Application to Fingerprint Classification

11
Linear Discriminant Analysis (LDA)
For a given data set a1,?,an

Centroids

Within-class scatter matrix
trace(Sw)

Between-class scatter matrix
trace(Sb)

a1? an
GTa1? GTan
GT
?
trace(GTSbG)

maximize
minimize

trace(GTSwG)
13
Eigenvalue problem
G

Sw-1 Sb
Sw-1Sb X ? X
rank(Sb) ? number of classes - 1
14
Face Recognition
dimension reduction to maximize the distances
among classes.

92 x 112
?
10304
GT

15
Text Classification

A bag of words each document is represented with
frequencies of words contained

Education
Recreation
Faculty Student Syllabus Grade Tuition .
Movie Music Sport Hollywood Theater ..
GT
16
Generalized LDA Algorithms

Undersampled problems
high dimensionality small number of data
? Cant compute Sw-1Sb

Sb
Sw
17
Nonlinear Dimension Reductionbased on Kernel
Methods
18
Nonlinear Dimension Reduction
nonlinear mapping
linear dimension reduction
GT
19
Kernel Method

If a kernel function k(x,y) satisfies Mercers
condition, then there exists a mapping ?
for which lt?(x),?(y)gt k(x,y) holds

?
A
?(A) lt x, y gt lt ?(x),
?(y) gt k(x,y)

For a finite data set Aa1,,an, Mercers
condition can be rephrased as the kernel
matrix
is positive
semi-definite.

20
Nonlinear Dimension Reduction by Kernel Methods
Given a kernel function k(x,y)
linear dimension reduction
GT
21
Positive Definite Kernel Functions

Gaussian kernel
Polynomial kernel

22
Nonlinear Discriminant Analysis using Kernel
Methods
?(a1),,?(an)
?
Want to apply LDA
lt?(x),?(y)gt k(x,y)

a1,a2,,an

Sb x? Sw x
23
Nonlinear Discriminant Analysis using Kernel
Methods
?(a1),,?(an)
?
k(a1,a1) k(a1,an) ,,
k(an,a1) k(an,an)

a1,a2,,an

Sb u? Sw u
Sb x? Sw x
Apply Generalized LDA
Algorithms

24
Generalized LDA Algorithms
Sb
Sw
Minimize trace(xT Sw x) xT Sw x 0 x ?
null(Sw)
Maximize trace(xT Sb x) xT Sb x ? 0 x ?
range(Sb)
25
Generalized LDA algorithms

Add a positive diagonal matrix ?I
to Sw so that Sw?I is nonsingular

RLDA

Apply the generalized singular value
decomposition (GSVD) to Hw , Hb
in Sb Hb HbT and SwHw HwT

LDA/GSVD
To-N(Sw)

Projection to null space of Sw
Maximize between-class scatter
in the projected space

26
Generalized LDA Algorithms

Transformation to range space of Sb
Diagonalize within-class scatter matrix
in the transformed space

To-R(Sb)

Reduce data dimension by PCA
Maximize between-class scatter
in range(Sw) and null(Sw)

To-NR(Sw)
27
Data sets
From Machine Learning Repository Database

Data dim no. of data
no. of classes
Musk 166 6599
2
Isolet 617 7797
26
Car 6 1728
4
Mfeature 649 2000
10
Bcancer 9 699
2
Bscale 4 625
3

28
Experimental Settings
Original data
Split
Training data
Test data
kernel function k and a linear transf. GT
Dimension reducing
Predict class labels of test data using training
data
29
Prediction accuracies
methods

Each color represents different data sets

30
Linear and Nonlinear Discriminant Analysis
Data sets
31
Face Recognition
32
Application of Nonlinear Discriminant Analysis to
Fingerprint Classification
33
Fingerprint Classification
Left Loop Right Loop
Whorl
Arch Tented
Arch
From NIST Fingerprint database 4
34
Previous Works in Fingerprint Classification

Feature representation
Minutiae
Gabor filtering
Directional partitioning

Apply Classifiers Neural Networks
Support Vector Machines
Probabilistic NN
Our Approach Construct core directional images
by DFT Dimension Reduction by Nonlinear
Discriminant Analysis
35
Construction of Core Directional Images
Left Loop Right Loop
Whorl
36
Construction of Core Directional Images
Core Point
37
Discrete Fourier transform (DFT)
38
Discrete Fourier transform (DFT)
39
Construction of Directional Images

Computation of local dominant directions by DFT
and directional filtering
Core point detection
Reconstruction of core directional images
Fast computation of DFT by FFT
Reliable for low quality images

Computation of local dominant directions by DFT
and directional filtering

41
Construction of Directional Images
512 x 512

105 x 105

42
Nonlinear discriminant Analysis
105 x 105
Maximizing class separability in the reduced
dimensional space

Whorl
Right loop
Left loop

GT
Tented arch
Arch
4-dim. space
11025-dim. space
43
Comparison of Experimental Results

NIST Database 4
Rejection rate () 0 1.8
8.5 20.0
Nonlinear LDA/GSVD 90.7 91.3 92.8 95.3
PCASYS ? 89.7 90.5
92.8 95.6
Jain et.al. 1999,TPAMI - 90.0
91.2 93.5
Yao et al. 2003,PR - 90.0
92.2 95.6
prediction accuracies ()

44
Summary

Nonlinear Feature Extraction based on Kernel
Methods
- Nonlinear Discriminant Analysis
- Kernel Orthogonal Centroid Method (KOC)
A comparison of Generalized Linear and Nonlinear
Discriminant Analysis Algorithms
Application to Fingerprint Classification

Dimension reduction - feature transformation
linear combination of original features
Feature selection
select a part of original features
gene expression microarray data anaysis
-- gene selection
Visualization of high dimensional data
Visual data mining

Core point detection

?i,j dominant direction on the neighborhood
centered at (i, j)
Measure consistency of local dominant directions
SSi,j-1,0,1 cos(2?i,j), sin(2?i,j)
distance from the starting point to finishing
point
the lowest value -gt Core point

47
References

L.Chen et al., A new LDA-based face recognition
system which can solve the small sample size
problem, Pattern Recognition, 331713-1726, 2000
P.Howland et al., Structure preserving dimension
reduction for clustered text data based on the
generalized singular value decomposition, SIMAX,
25(1)165-179, 2003
H.Yu and J.Yang, A direct LDA algorithm for
high-dimensional data-with application to face
recognition, Pattern Recognition, 342067-2070,
2001
J.Yang and J.-Y.Yang, Why can LDA be performed in
PCA transformed space?, Pattern Recognition,
36563-566, 2003
H. Park et al., Lower dimensional representation
of text data based on centroids and least
squares, BIT Numerical Mathematics, 43(2)1-22,
2003
S. Mika et al., Fisher discriminant analysis
with kernels, Neural networks for signal
processing IX, J.Larsen and S.Douglas, pp.41-48,
IEEE, 1999
B. Scholkopf et al., Nonlinear component
analysis as a kernel eigenvalue problem, Neural
computation, 101299-1319, 1998
G. Baudat and F. Anouar, Generalized
discriminant analysis using a kernel approach,
Neural computation, 122385-2404, 2000
V. Roth and V. Steinhage, Nonlinear discriminant
analysis using a kernel functions, Advances in
neural information processing functions,
12568-574, 2000

..
48

S.A. Billings and K.L. Lee, Nonlinear fisher
discriminant analysis using a minimum squared
error cost function and the orthogonal least
squares algorithm, Neural networks,
15(2)263-270, 2002
C.H. Park and H. Park, Nonlinear discriminant
analysis based on generalized singular value
decomposition, SIMAX, 27-1, pp. 98-102, 2005
A.K.Jain et al., A multichannel approach to
fingerprint classification, IEEE transactions on
Pattern Analysis and Machine Intelligence,
21(4)348-359,1999
Y.Yao et al., Combining flat and structural
representations for fingerprint classifiaction
with recursive neural networks and support vector
machines, Pattern recognition, 36(2)397-406,2003
C.H.Park and H.Park, Nonlinear feature extraction
based on cetroids and kernel functions, Pattern
recognition, 37(4)801-810
C.H.Park and H.Park, A Comparison of Generalized
LDA algorithms for undersampled problems, Pattern
Recognition, to appear
C.H.Park and H.Park, Fingerprint classification
using fast fourier transform and nonlinear
discriminant analysis, Pattern recognition, 2006

Write a Comment

User Comments (0)

About PowerShow.com

Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis PowerPoint PPT Presentation