Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis PowerPoint PPT Presentation

presentation player overlay
1 / 48
About This Presentation
Transcript and Presenter's Notes

Title: Kernelized Discriminant Analysis and Adaptive Methods for Discriminant Analysis


1
Kernelized Discriminant Analysis and Adaptive
Methods for Discriminant Analysis
  • Haesun Park
  • Georgia Institute of Technology,
  • Atlanta, GA, USA
  • (joint work with C. Park)
  • KAIST, Korea, June 2007

2
Clustering
3
  • Clustering
  • grouping of data based on similarity
    measures

4
Classification
  • Classification
  • assign a class label to new unseen data

5
Data Mining
  • Mining or discovery of new information -
    patterns
  • or rules - from large
    databases

Data Preparation
Data Reduction
  • Dimension reduction
  • Feature Selection
  • -

Preprocessing
Feature Extraction
  • Association Analysis
  • Regression
  • Probabilistic modeling

Classification
Clustering
6
Feature Extraction
  • Optimal feature extraction
  • - Reduce the dimensionality of data space
  • - Minimize effects of redundant features and
    noise

Curse of dimensionality
number of features
new data
..
..
..
feature extraction
Apply a classifier to predict a class label of
new data
..
..
..
7
Linear dimension reduction
Maximize class separability in the reduced
dimensional space
8
Linear dimension reduction
Maximize class separability in the reduced
dimensional space
9
What if data is not linear separable?
Nonlinear Dimension Reduction
10
Contents
  • Linear Discriminant Analysis
  • Nonlinear Dimension Reduction based on Kernel
    Methods
  • - Nonlinear Discriminant Analysis
  • Application to Fingerprint Classification

11
Linear Discriminant Analysis (LDA)
For a given data set a1,?,an

Centroids
  • Within-class scatter matrix
  • trace(Sw)

12
  • Between-class scatter matrix
  • trace(Sb)

a1? an
GTa1? GTan
GT
?
trace(GTSbG)
  • maximize
  • minimize

trace(GTSwG)
13
Eigenvalue problem
G

Sw-1 Sb
Sw-1Sb X ? X
rank(Sb) ? number of classes - 1
14
Face Recognition
dimension reduction to maximize the distances
among classes.

92 x 112
?
10304
GT



15
Text Classification
  • A bag of words each document is represented with
    frequencies of words contained

Education
Recreation
Faculty Student Syllabus Grade Tuition .
Movie Music Sport Hollywood Theater ..
GT
16
Generalized LDA Algorithms
  • Undersampled problems
  • high dimensionality small number of data
  • ? Cant compute Sw-1Sb

Sb
Sw
17
Nonlinear Dimension Reductionbased on Kernel
Methods
18
Nonlinear Dimension Reduction
nonlinear mapping
linear dimension reduction
GT
19
Kernel Method
  • If a kernel function k(x,y) satisfies Mercers
    condition, then there exists a mapping ?
  • for which lt?(x),?(y)gt k(x,y) holds

?
A
?(A) lt x, y gt lt ?(x),
?(y) gt k(x,y)
  • For a finite data set Aa1,,an, Mercers
    condition can be rephrased as the kernel
    matrix
  • is positive
    semi-definite.

20
Nonlinear Dimension Reduction by Kernel Methods
Given a kernel function k(x,y)
linear dimension reduction
GT
21
Positive Definite Kernel Functions
  • Gaussian kernel
  • Polynomial kernel

22
Nonlinear Discriminant Analysis using Kernel
Methods
?(a1),,?(an)
?
Want to apply LDA
lt?(x),?(y)gt k(x,y)
  • a1,a2,,an

Sb x? Sw x
23
Nonlinear Discriminant Analysis using Kernel
Methods
?(a1),,?(an)
?
k(a1,a1) k(a1,an) ,,
k(an,a1) k(an,an)
  • a1,a2,,an

Sb u? Sw u
Sb x? Sw x
Apply Generalized LDA
Algorithms

24
Generalized LDA Algorithms
Sb
Sw
Minimize trace(xT Sw x) xT Sw x 0 x ?
null(Sw)
Maximize trace(xT Sb x) xT Sb x ? 0 x ?
range(Sb)
25
Generalized LDA algorithms
  • Add a positive diagonal matrix ?I
  • to Sw so that Sw?I is nonsingular

RLDA
  • Apply the generalized singular value
  • decomposition (GSVD) to Hw , Hb
  • in Sb Hb HbT and SwHw HwT

LDA/GSVD
To-N(Sw)
  • Projection to null space of Sw
  • Maximize between-class scatter
  • in the projected space

26
Generalized LDA Algorithms
  • Transformation to range space of Sb
  • Diagonalize within-class scatter matrix
  • in the transformed space

To-R(Sb)
  • Reduce data dimension by PCA
  • Maximize between-class scatter
  • in range(Sw) and null(Sw)

To-NR(Sw)
27
Data sets
From Machine Learning Repository Database
  • Data dim no. of data
    no. of classes
  • Musk 166 6599
    2
  • Isolet 617 7797
    26
  • Car 6 1728
    4
  • Mfeature 649 2000
    10
  • Bcancer 9 699
    2
  • Bscale 4 625
    3

28
Experimental Settings
Original data
Split
Training data
Test data
kernel function k and a linear transf. GT
Dimension reducing
Predict class labels of test data using training
data
29
Prediction accuracies
methods
  • Each color represents different data sets

30
Linear and Nonlinear Discriminant Analysis
Data sets
31
Face Recognition
32
Application of Nonlinear Discriminant Analysis to
Fingerprint Classification
33
Fingerprint Classification
Left Loop Right Loop
Whorl
Arch Tented
Arch
From NIST Fingerprint database 4
34
Previous Works in Fingerprint Classification
  • Feature representation
  • Minutiae
  • Gabor filtering
  • Directional partitioning

Apply Classifiers Neural Networks
Support Vector Machines
Probabilistic NN
Our Approach Construct core directional images
by DFT Dimension Reduction by Nonlinear
Discriminant Analysis
35
Construction of Core Directional Images
Left Loop Right Loop
Whorl
36
Construction of Core Directional Images
Core Point
37
Discrete Fourier transform (DFT)
38
Discrete Fourier transform (DFT)
39
Construction of Directional Images
  • Computation of local dominant directions by DFT
    and directional filtering
  • Core point detection
  • Reconstruction of core directional images
  • Fast computation of DFT by FFT
  • Reliable for low quality images

40
  • Computation of local dominant directions by DFT
    and directional filtering

41
Construction of Directional Images
512 x 512
  • 105 x 105

42
Nonlinear discriminant Analysis
105 x 105
Maximizing class separability in the reduced
dimensional space

Whorl
Right loop
Left loop

GT
Tented arch
Arch
4-dim. space
11025-dim. space
43
Comparison of Experimental Results
  • NIST Database 4
  • Rejection rate () 0 1.8
    8.5 20.0
  • Nonlinear LDA/GSVD 90.7 91.3 92.8 95.3
  • PCASYS ? 89.7 90.5
    92.8 95.6
  • Jain et.al. 1999,TPAMI - 90.0
    91.2 93.5
  • Yao et al. 2003,PR - 90.0
    92.2 95.6
  • prediction accuracies ()

44
Summary
  • Nonlinear Feature Extraction based on Kernel
    Methods
  • - Nonlinear Discriminant Analysis
  • - Kernel Orthogonal Centroid Method (KOC)
  • A comparison of Generalized Linear and Nonlinear
    Discriminant Analysis Algorithms
  • Application to Fingerprint Classification

45
  • Dimension reduction - feature transformation
  • linear combination of original features
  • Feature selection
  • select a part of original features
  • gene expression microarray data anaysis
  • -- gene selection
  • Visualization of high dimensional data
  • Visual data mining

46
  • Core point detection
  • ?i,j dominant direction on the neighborhood
    centered at (i, j)
  • Measure consistency of local dominant directions
  • SSi,j-1,0,1 cos(2?i,j), sin(2?i,j)
  • distance from the starting point to finishing
    point
  • the lowest value -gt Core point

47
References
  • L.Chen et al., A new LDA-based face recognition
    system which can solve the small sample size
    problem, Pattern Recognition, 331713-1726, 2000
  • P.Howland et al., Structure preserving dimension
    reduction for clustered text data based on the
    generalized singular value decomposition, SIMAX,
    25(1)165-179, 2003
  • H.Yu and J.Yang, A direct LDA algorithm for
    high-dimensional data-with application to face
    recognition, Pattern Recognition, 342067-2070,
    2001
  • J.Yang and J.-Y.Yang, Why can LDA be performed in
    PCA transformed space?, Pattern Recognition,
    36563-566, 2003
  • H. Park et al., Lower dimensional representation
    of text data based on centroids and least
    squares, BIT Numerical Mathematics, 43(2)1-22,
    2003
  • S. Mika et al., Fisher discriminant analysis
    with kernels, Neural networks for signal
    processing IX, J.Larsen and S.Douglas, pp.41-48,
    IEEE, 1999
  • B. Scholkopf et al., Nonlinear component
    analysis as a kernel eigenvalue problem, Neural
    computation, 101299-1319, 1998
  • G. Baudat and F. Anouar, Generalized
    discriminant analysis using a kernel approach,
    Neural computation, 122385-2404, 2000
  • V. Roth and V. Steinhage, Nonlinear discriminant
    analysis using a kernel functions, Advances in
    neural information processing functions,
    12568-574, 2000

..
48
  • S.A. Billings and K.L. Lee, Nonlinear fisher
    discriminant analysis using a minimum squared
    error cost function and the orthogonal least
    squares algorithm, Neural networks,
    15(2)263-270, 2002
  • C.H. Park and H. Park, Nonlinear discriminant
    analysis based on generalized singular value
    decomposition, SIMAX, 27-1, pp. 98-102, 2005
  • A.K.Jain et al., A multichannel approach to
    fingerprint classification, IEEE transactions on
    Pattern Analysis and Machine Intelligence,
    21(4)348-359,1999
  • Y.Yao et al., Combining flat and structural
    representations for fingerprint classifiaction
    with recursive neural networks and support vector
    machines, Pattern recognition, 36(2)397-406,2003
  • C.H.Park and H.Park, Nonlinear feature extraction
    based on cetroids and kernel functions, Pattern
    recognition, 37(4)801-810
  • C.H.Park and H.Park, A Comparison of Generalized
    LDA algorithms for undersampled problems, Pattern
    Recognition, to appear
  • C.H.Park and H.Park, Fingerprint classification
    using fast fourier transform and nonlinear
    discriminant analysis, Pattern recognition, 2006
Write a Comment
User Comments (0)
About PowerShow.com