Kernel PCA for Novelty Detection

About This Presentation

Title:

Kernel PCA for Novelty Detection

Description:

Fraud detection. Friday, September 8, 2006. Hyoung-joo Lee. 2 /21. Introduction ... Cancer from the UCI machine learning repository ... – PowerPoint PPT presentation

Number of Views:418

Avg rating:3.0/5.0

Slides: 25

Provided by: dmlab6

Category:

more less

Transcript and Presenter's Notes

Title: Kernel PCA for Novelty Detection

1
Kernel PCA for Novelty Detection

Heiko Hoffmann
To Appear in Pattern Recognition
Summarized by Hyoung-joo Lee

2
Introduction

Novelty Detection (One-class Classification)
A machine learns from ordinary (normal) data
To detect novel data which are different from the
normal ones
Useful when
Novel data are rare. Ex) healthy tissue vs.
malignant cancer
The structure of novel data is obscure
Applications
Medical diagnosis
Fault detection
Fraud detection

3
Introduction

Existing Novelty Detectors
Kernel Methods
Support vector machines (SVMs)
For novelty detection, 1-SVM and SVDD
Distribution modeling
Linear approach PCA
Non-linear approach Gaussian mixture, AAMLP,
Principal curve/surface
Kernel PCA
Kernel method PCA
Reconstruction error in feature space as a
novelty measure
Not reported before, though simple

4
Kernel PCA

Outline
Non-linear extension of the standard PCA
Mapping a data point into a high-dimensional
feature space
PCA is performed in the feature space
Kernel trick
(an inner product in the feature space) (a
kernel function)
RBF kernel

5
Kernel PCA

Formulation
Centering
Kernel matrix
Covariance matrix
An eigenvetor
Eigen-problem

New eigen-problem
6
Measure for Novelty

Motivation
Decision boundary of kernel PCA
Decision boundary based on the reconstruction
errors
Reconstruction errors the distance between the
original and reconstructed points
Kernel PCA vs. 1-SVM/SVDD
Two SV methods only enclose data with a
hypersphere/hyperplane
Kernel PCA considers the variance of the
distribution of data
Two SV methods fall through if the distribution
doesnt fit the model
Kernel PCA is more flexible
The decision boundary of kernel PCA is in general
tighter

7
Measure for Novelty

Motivation (contd)

8
Measure for Novelty

Spherical Potential
With no principal components
The reconstruction error ? a spherical potential
in feature space
Spherical potential the distance of a points
from the origin
Equivalent to the Parzen window density
estimator

constant
9
Measure for Novelty

Reconstruction Error
Original (centered) point
Reconstructed point
Reconstruction error the distance between the
original and reconstructed points

where W is a matrix of the first q eigenvectors
I
Spherical potential
10
Measure for Novelty

Reconstruction Error (contd)
Reconstruction error in the final form

11
Experiments

Datasets
Five synthetic datasets
Square, Square-noise, Ring-line-square, Spiral,
Sine-noise
Digit 0 from the MNIST digit database
784(28?28) pixels images of digits 0
Subsampled to 64(8?8) pixels
Training data the first 2,000 0 from the
training set
Test data 980 0 and 109 from each other digit
from the test set
Cancer from the UCI machine learning repository
Classifying two classes (benign and malignant)
based on 9 input variables
Patterns with missing values were removed
Training data the first 200 benign samples
Test data 244 benign and 239 malignant samples

12
Experiments

Implementation and Evaluation
Novelty detectors
Kernel PCA with a RBF kernel (a polynomial
kernel in a few cases)
1-SVM with a RBF kernel
Parzen density estimator with a RBF kernel (?
spherical potential)
Linear PCA (? kernel PCA with a linear kernel)
Evaluation
Synthetic datasets qualitative evaluation
Real-world datasets ROC curve and AUROC

13
Experiments

Square Datasets
400 training points
Linear PCA
Cannot describe the data
Parzen density estimator
Follows the irregularities (overfitting)
1-SVM
Omitted (similar to Kernel PCA)
Kernel PCA, polynomial
Cannot describe the data
Kernel PCA, RBF
Follows the shape of the distribution

14
Experiments

Ring-Line-Square and Spiral Datasets

850 training points
700 training points
? 0.4, q 40
? 0.25, q 40
15
Experiments

Square-Noise Datasets
? fraction of noise points
Kernel PCA
? 0.3, q 20
Rejecting fraction ? of training data as
outliers
Encloses the main part of data
Undisturbed by noise
1-SVM
? 0.362, ? ? (1/9)
Deformed decision boundary
Disturbed by noise

16
Experiments

Sine-Noise Datasets

? 0.4, q 40
? 0.489, ? 2/7
17
Experiments

Effects of ? and q
Kernel PCA depends on ? and q
For small ?, q has little effects
Increasing both leads to a good performance
When ? is too small
For all ,
All points are orthogonal to each other
PCA becomes meaningless
When ? is too large
Kernel PCA approaches linear PCA

18
Experiments

Results on Real-world Datasets

? 4, q 100
? 2, q 190
19
Experiments

Results on Real-world Datasets
For small ?, kernel PCA and Parzen are equivalent

20
Experiments

Results on Real-world Datasets
For large ?, kernel PCA and linear PCA are
equivalent

21
Experiments

Results on Real-world Datasets
The most unusual 0s, based on the reconstruction
errors
Most of these look indeed unusual

? 4, q 100
22
Discussion

Noisy Data
Kernel PCA is not robust against noise
Robust versions of PCA can be also applied to
kernel PCA
In the experiments, kernel PCA was more robust
than 1-SVM
Computational Complexity
Computationally expensive O(n3) training
Memory exhaustive n?n kernel matrix
Expensive testing
Times elapsed on the digit datasets (sec)
1-SVM 1.3(training) 0.5 (test)
Kernel PCA 31.6(training) 34.4 (test)
But 1-SVM needs to be retrained for different ?
values

23
Discussion

Related Methods
Denoising
Kernel whitening
To make the variance in each direction equal
Whitening the data in feature space using kernel
PCA
Training SVDD with the whitened data

24
Conclusions

Summary
Kernel PCA for novelty detection
Reconstruction error as a measure of novelty
Good performance on synthetic and real-world
datasets
Future Works
Parameter selection
What data distributions can kernel PCA learn?

Write a Comment

User Comments (0)

About PowerShow.com

Kernel PCA for Novelty Detection - PowerPoint PPT Presentation

Kernel PCA for Novelty Detection

Fraud detection. Friday, September 8, 2006. Hyoung-joo Lee. 2 /21. Introduction ... Cancer from the UCI machine learning repository ... – PowerPoint PPT presentation