Kernel PCA for Novelty Detection - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Kernel PCA for Novelty Detection

Description:

Fraud detection. Friday, September 8, 2006. Hyoung-joo Lee. 2 /21. Introduction ... Cancer from the UCI machine learning repository ... – PowerPoint PPT presentation

Number of Views:418
Avg rating:3.0/5.0
Slides: 25
Provided by: dmlab6
Category:

less

Transcript and Presenter's Notes

Title: Kernel PCA for Novelty Detection


1
Kernel PCA for Novelty Detection
  • Heiko Hoffmann
  • To Appear in Pattern Recognition
  • Summarized by Hyoung-joo Lee

2
Introduction
  • Novelty Detection (One-class Classification)
  • A machine learns from ordinary (normal) data
  • To detect novel data which are different from the
    normal ones
  • Useful when
  • Novel data are rare. Ex) healthy tissue vs.
    malignant cancer
  • The structure of novel data is obscure
  • Applications
  • Medical diagnosis
  • Fault detection
  • Fraud detection

3
Introduction
  • Existing Novelty Detectors
  • Kernel Methods
  • Support vector machines (SVMs)
  • For novelty detection, 1-SVM and SVDD
  • Distribution modeling
  • Linear approach PCA
  • Non-linear approach Gaussian mixture, AAMLP,
    Principal curve/surface
  • Kernel PCA
  • Kernel method PCA
  • Reconstruction error in feature space as a
    novelty measure
  • Not reported before, though simple

4
Kernel PCA
  • Outline
  • Non-linear extension of the standard PCA
  • Mapping a data point into a high-dimensional
    feature space
  • PCA is performed in the feature space
  • Kernel trick
  • (an inner product in the feature space) (a
    kernel function)
  • RBF kernel

5
Kernel PCA
  • Formulation
  • Centering
  • Kernel matrix
  • Covariance matrix
  • An eigenvetor
  • Eigen-problem

New eigen-problem
6
Measure for Novelty
  • Motivation
  • Decision boundary of kernel PCA
  • Decision boundary based on the reconstruction
    errors
  • Reconstruction errors the distance between the
    original and reconstructed points
  • Kernel PCA vs. 1-SVM/SVDD
  • Two SV methods only enclose data with a
    hypersphere/hyperplane
  • Kernel PCA considers the variance of the
    distribution of data
  • Two SV methods fall through if the distribution
    doesnt fit the model
  • Kernel PCA is more flexible
  • The decision boundary of kernel PCA is in general
    tighter

7
Measure for Novelty
  • Motivation (contd)

8
Measure for Novelty
  • Spherical Potential
  • With no principal components
  • The reconstruction error ? a spherical potential
    in feature space
  • Spherical potential the distance of a points
    from the origin
  • Equivalent to the Parzen window density
    estimator

constant
9
Measure for Novelty
  • Reconstruction Error
  • Original (centered) point
  • Reconstructed point
  • Reconstruction error the distance between the
    original and reconstructed points

where W is a matrix of the first q eigenvectors
I
Spherical potential
10
Measure for Novelty
  • Reconstruction Error (contd)
  • Reconstruction error in the final form

11
Experiments
  • Datasets
  • Five synthetic datasets
  • Square, Square-noise, Ring-line-square, Spiral,
    Sine-noise
  • Digit 0 from the MNIST digit database
  • 784(28?28) pixels images of digits 0
  • Subsampled to 64(8?8) pixels
  • Training data the first 2,000 0 from the
    training set
  • Test data 980 0 and 109 from each other digit
    from the test set
  • Cancer from the UCI machine learning repository
  • Classifying two classes (benign and malignant)
    based on 9 input variables
  • Patterns with missing values were removed
  • Training data the first 200 benign samples
  • Test data 244 benign and 239 malignant samples

12
Experiments
  • Implementation and Evaluation
  • Novelty detectors
  • Kernel PCA with a RBF kernel (a polynomial
    kernel in a few cases)
  • 1-SVM with a RBF kernel
  • Parzen density estimator with a RBF kernel (?
    spherical potential)
  • Linear PCA (? kernel PCA with a linear kernel)
  • Evaluation
  • Synthetic datasets qualitative evaluation
  • Real-world datasets ROC curve and AUROC

13
Experiments
  • Square Datasets
  • 400 training points
  • Linear PCA
  • Cannot describe the data
  • Parzen density estimator
  • Follows the irregularities (overfitting)
  • 1-SVM
  • Omitted (similar to Kernel PCA)
  • Kernel PCA, polynomial
  • Cannot describe the data
  • Kernel PCA, RBF
  • Follows the shape of the distribution

14
Experiments
  • Ring-Line-Square and Spiral Datasets

850 training points
700 training points
? 0.4, q 40
? 0.25, q 40
15
Experiments
  • Square-Noise Datasets
  • ? fraction of noise points
  • Kernel PCA
  • ? 0.3, q 20
  • Rejecting fraction ? of training data as
    outliers
  • Encloses the main part of data
  • Undisturbed by noise
  • 1-SVM
  • ? 0.362, ? ? (1/9)
  • Deformed decision boundary
  • Disturbed by noise

16
Experiments
  • Sine-Noise Datasets

? 0.4, q 40
? 0.489, ? 2/7
17
Experiments
  • Effects of ? and q
  • Kernel PCA depends on ? and q
  • For small ?, q has little effects
  • Increasing both leads to a good performance
  • When ? is too small
  • For all ,
  • All points are orthogonal to each other
  • PCA becomes meaningless
  • When ? is too large
  • Kernel PCA approaches linear PCA

18
Experiments
  • Results on Real-world Datasets

? 4, q 100
? 2, q 190
19
Experiments
  • Results on Real-world Datasets
  • For small ?, kernel PCA and Parzen are equivalent

20
Experiments
  • Results on Real-world Datasets
  • For large ?, kernel PCA and linear PCA are
    equivalent

21
Experiments
  • Results on Real-world Datasets
  • The most unusual 0s, based on the reconstruction
    errors
  • Most of these look indeed unusual

? 4, q 100
22
Discussion
  • Noisy Data
  • Kernel PCA is not robust against noise
  • Robust versions of PCA can be also applied to
    kernel PCA
  • In the experiments, kernel PCA was more robust
    than 1-SVM
  • Computational Complexity
  • Computationally expensive O(n3) training
  • Memory exhaustive n?n kernel matrix
  • Expensive testing
  • Times elapsed on the digit datasets (sec)
  • 1-SVM 1.3(training) 0.5 (test)
  • Kernel PCA 31.6(training) 34.4 (test)
  • But 1-SVM needs to be retrained for different ?
    values

23
Discussion
  • Related Methods
  • Denoising
  • Kernel whitening
  • To make the variance in each direction equal
  • Whitening the data in feature space using kernel
    PCA
  • Training SVDD with the whitened data

24
Conclusions
  • Summary
  • Kernel PCA for novelty detection
  • Reconstruction error as a measure of novelty
  • Good performance on synthetic and real-world
    datasets
  • Future Works
  • Parameter selection
  • What data distributions can kernel PCA learn?
Write a Comment
User Comments (0)
About PowerShow.com