Title: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis
1Kernelized Discriminant Analysis and Adaptive
Methods for Discriminant Analysis
- Haesun Park
- Georgia Institute of Technology,
- Atlanta, GA, USA
- (joint work with C. Park)
- KAIST, Korea, June 2007
2Clustering
3- Clustering
- grouping of data based on similarity
measures
4Classification
- Classification
- assign a class label to new unseen data
5Data Mining
- Mining or discovery of new information -
patterns - or rules - from large
databases
Data Preparation
Data Reduction
- Dimension reduction
- Feature Selection
- -
Preprocessing
Feature Extraction
- Association Analysis
- Regression
- Probabilistic modeling
Classification
Clustering
6Feature Extraction
- Optimal feature extraction
- - Reduce the dimensionality of data space
- - Minimize effects of redundant features and
noise
Curse of dimensionality
number of features
new data
..
..
..
feature extraction
Apply a classifier to predict a class label of
new data
..
..
..
7Linear dimension reduction
Maximize class separability in the reduced
dimensional space
8Linear dimension reduction
Maximize class separability in the reduced
dimensional space
9 What if data is not linear separable?
Nonlinear Dimension Reduction
10Contents
- Linear Discriminant Analysis
- Nonlinear Dimension Reduction based on Kernel
Methods - - Nonlinear Discriminant Analysis
- Application to Fingerprint Classification
11Linear Discriminant Analysis (LDA)
For a given data set a1,?,an
Centroids
- Within-class scatter matrix
- trace(Sw)
12- Between-class scatter matrix
- trace(Sb)
a1? an
GTa1? GTan
GT
?
trace(GTSbG)
trace(GTSwG)
13 Eigenvalue problem
G
Sw-1 Sb
Sw-1Sb X ? X
rank(Sb) ? number of classes - 1
14Face Recognition
dimension reduction to maximize the distances
among classes.
92 x 112
?
10304
GT
15Text Classification
- A bag of words each document is represented with
frequencies of words contained
Education
Recreation
Faculty Student Syllabus Grade Tuition .
Movie Music Sport Hollywood Theater ..
GT
16Generalized LDA Algorithms
- Undersampled problems
- high dimensionality small number of data
- ? Cant compute Sw-1Sb
Sb
Sw
17Nonlinear Dimension Reductionbased on Kernel
Methods
18Nonlinear Dimension Reduction
nonlinear mapping
linear dimension reduction
GT
19Kernel Method
- If a kernel function k(x,y) satisfies Mercers
condition, then there exists a mapping ? - for which lt?(x),?(y)gt k(x,y) holds
?
A
?(A) lt x, y gt lt ?(x),
?(y) gt k(x,y)
- For a finite data set Aa1,,an, Mercers
condition can be rephrased as the kernel
matrix - is positive
semi-definite.
20Nonlinear Dimension Reduction by Kernel Methods
Given a kernel function k(x,y)
linear dimension reduction
GT
21Positive Definite Kernel Functions
- Gaussian kernel
- Polynomial kernel
22Nonlinear Discriminant Analysis using Kernel
Methods
?(a1),,?(an)
?
Want to apply LDA
lt?(x),?(y)gt k(x,y)
Sb x? Sw x
23Nonlinear Discriminant Analysis using Kernel
Methods
?(a1),,?(an)
?
k(a1,a1) k(a1,an) ,,
k(an,a1) k(an,an)
Sb u? Sw u
Sb x? Sw x
Apply Generalized LDA
Algorithms
24Generalized LDA Algorithms
Sb
Sw
Minimize trace(xT Sw x) xT Sw x 0 x ?
null(Sw)
Maximize trace(xT Sb x) xT Sb x ? 0 x ?
range(Sb)
25Generalized LDA algorithms
- Add a positive diagonal matrix ?I
- to Sw so that Sw?I is nonsingular
RLDA
- Apply the generalized singular value
- decomposition (GSVD) to Hw , Hb
- in Sb Hb HbT and SwHw HwT
LDA/GSVD
To-N(Sw)
- Projection to null space of Sw
- Maximize between-class scatter
- in the projected space
26Generalized LDA Algorithms
- Transformation to range space of Sb
- Diagonalize within-class scatter matrix
- in the transformed space
To-R(Sb)
- Reduce data dimension by PCA
- Maximize between-class scatter
- in range(Sw) and null(Sw)
To-NR(Sw)
27Data sets
From Machine Learning Repository Database
- Data dim no. of data
no. of classes - Musk 166 6599
2 - Isolet 617 7797
26 - Car 6 1728
4 - Mfeature 649 2000
10 - Bcancer 9 699
2 - Bscale 4 625
3
28Experimental Settings
Original data
Split
Training data
Test data
kernel function k and a linear transf. GT
Dimension reducing
Predict class labels of test data using training
data
29Prediction accuracies
methods
- Each color represents different data sets
30Linear and Nonlinear Discriminant Analysis
Data sets
31Face Recognition
32Application of Nonlinear Discriminant Analysis to
Fingerprint Classification
33Fingerprint Classification
Left Loop Right Loop
Whorl
Arch Tented
Arch
From NIST Fingerprint database 4
34Previous Works in Fingerprint Classification
- Feature representation
- Minutiae
- Gabor filtering
- Directional partitioning
Apply Classifiers Neural Networks
Support Vector Machines
Probabilistic NN
Our Approach Construct core directional images
by DFT Dimension Reduction by Nonlinear
Discriminant Analysis
35Construction of Core Directional Images
Left Loop Right Loop
Whorl
36Construction of Core Directional Images
Core Point
37Discrete Fourier transform (DFT)
38Discrete Fourier transform (DFT)
39Construction of Directional Images
- Computation of local dominant directions by DFT
and directional filtering - Core point detection
- Reconstruction of core directional images
- Fast computation of DFT by FFT
- Reliable for low quality images
40- Computation of local dominant directions by DFT
and directional filtering
41Construction of Directional Images
512 x 512
42Nonlinear discriminant Analysis
105 x 105
Maximizing class separability in the reduced
dimensional space
Whorl
Right loop
Left loop
GT
Tented arch
Arch
4-dim. space
11025-dim. space
43Comparison of Experimental Results
- NIST Database 4
- Rejection rate () 0 1.8
8.5 20.0 - Nonlinear LDA/GSVD 90.7 91.3 92.8 95.3
- PCASYS ? 89.7 90.5
92.8 95.6 - Jain et.al. 1999,TPAMI - 90.0
91.2 93.5 - Yao et al. 2003,PR - 90.0
92.2 95.6 - prediction accuracies ()
44Summary
- Nonlinear Feature Extraction based on Kernel
Methods - - Nonlinear Discriminant Analysis
- - Kernel Orthogonal Centroid Method (KOC)
- A comparison of Generalized Linear and Nonlinear
Discriminant Analysis Algorithms - Application to Fingerprint Classification
45- Dimension reduction - feature transformation
- linear combination of original features
- Feature selection
- select a part of original features
- gene expression microarray data anaysis
- -- gene selection
- Visualization of high dimensional data
- Visual data mining
46- ?i,j dominant direction on the neighborhood
centered at (i, j) - Measure consistency of local dominant directions
- SSi,j-1,0,1 cos(2?i,j), sin(2?i,j)
- distance from the starting point to finishing
point - the lowest value -gt Core point
47References
- L.Chen et al., A new LDA-based face recognition
system which can solve the small sample size
problem, Pattern Recognition, 331713-1726, 2000 - P.Howland et al., Structure preserving dimension
reduction for clustered text data based on the
generalized singular value decomposition, SIMAX,
25(1)165-179, 2003 - H.Yu and J.Yang, A direct LDA algorithm for
high-dimensional data-with application to face
recognition, Pattern Recognition, 342067-2070,
2001 - J.Yang and J.-Y.Yang, Why can LDA be performed in
PCA transformed space?, Pattern Recognition,
36563-566, 2003 - H. Park et al., Lower dimensional representation
of text data based on centroids and least
squares, BIT Numerical Mathematics, 43(2)1-22,
2003 - S. Mika et al., Fisher discriminant analysis
with kernels, Neural networks for signal
processing IX, J.Larsen and S.Douglas, pp.41-48,
IEEE, 1999 - B. Scholkopf et al., Nonlinear component
analysis as a kernel eigenvalue problem, Neural
computation, 101299-1319, 1998 - G. Baudat and F. Anouar, Generalized
discriminant analysis using a kernel approach,
Neural computation, 122385-2404, 2000 - V. Roth and V. Steinhage, Nonlinear discriminant
analysis using a kernel functions, Advances in
neural information processing functions,
12568-574, 2000
..
48- S.A. Billings and K.L. Lee, Nonlinear fisher
discriminant analysis using a minimum squared
error cost function and the orthogonal least
squares algorithm, Neural networks,
15(2)263-270, 2002 - C.H. Park and H. Park, Nonlinear discriminant
analysis based on generalized singular value
decomposition, SIMAX, 27-1, pp. 98-102, 2005 - A.K.Jain et al., A multichannel approach to
fingerprint classification, IEEE transactions on
Pattern Analysis and Machine Intelligence,
21(4)348-359,1999 - Y.Yao et al., Combining flat and structural
representations for fingerprint classifiaction
with recursive neural networks and support vector
machines, Pattern recognition, 36(2)397-406,2003 - C.H.Park and H.Park, Nonlinear feature extraction
based on cetroids and kernel functions, Pattern
recognition, 37(4)801-810 - C.H.Park and H.Park, A Comparison of Generalized
LDA algorithms for undersampled problems, Pattern
Recognition, to appear - C.H.Park and H.Park, Fingerprint classification
using fast fourier transform and nonlinear
discriminant analysis, Pattern recognition, 2006