Multivariate classification. SIMCA - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Multivariate classification. SIMCA

Description:

How estimate classification results? Multivariate classification with ... Part I. Basic theory. Classification ... by Svante Wold, in 1970th. SIMCA: ... – PowerPoint PPT presentation

Number of Views:1662
Avg rating:4.5/5.0
Slides: 41
Provided by: sergeikuc
Category:

less

Transcript and Presenter's Notes

Title: Multivariate classification. SIMCA


1
Multivariate classification. SIMCA
  • Sergey Kucheryavskiy
  • svk_at_aaue.dk

2
Lecture outline
  • Basic theory
  • What classification is?
  • Types and stages
  • Geometrical view
  • How estimate classification results?
  • Multivariate classification with SIMCA
  • Introduction
  • Examples
  • Conclusions

3
Part I. Basic theory
4
Classification questions and answers
  • How to recognize fake pills by their spectra?
  • How to distinguish different types of glasses
    knowing metal oxides content?
  • How to recognize a human personality
    (introvert/extravert) using questionnaire answers?

Classification discrimination (arranging) of
objects to several groups (classes) by finding an
analogy in their features values
5
What we have?
  • Object anything person, object, phenomenon,
    process...
  • Features set of variables and their values,
    describing the object
  • Group or class set of objects having a similar
    (analogous) features
  • Example (People dataset)

Object a person Features height, weight, hair
length, swimming ability, ... Groups sex,
location
6
Geometrical view
  • Features variables, axes of coordinate space
  • Objects points in this space
  • Classes subspaces of the variable space
    hypercube, hypersphere, etc

Class 2
features
Class 2
Class 1
Class 1
objects
7
Classification methods
Aim
Methods
Conditions
Level of knowledge
No information about are there any classes and
how many
Looking for analogies in objects location in
feature space
Looking for objects grouping
We know how many classes but do not know what
class an object belongs to
Looking for analogies in objects location feature
space
Find a groupings reason, do data clasterization
Unsupervised Classification
We have a calibration dataset with known classes
and objects
Prediction of class of unknown samples
Calibrate a classification model
Supervised Classification
8
Example linear discriminant analysis
features
Step 1. There is initial values 26 samples, 2
features, we know nothing about any groups
objects
9
Example linear discriminant analysis
Step 2. Visual analysis of geometrical
representation of the data gives us the
information about grouping
Step 3. The reason of grouping certain
combination of variables values is found
10
Example linear discriminant analysis
  • Step 4. Building a classification model
  • Find the eqation of separation line
  • Find the equation of perpendicular line
  • Find the equation for samples projection

y
y lt 0
y gt 0
y
11
Classification methods
  • One class classification
  • A classification model is calibrated for each
    class
  • Model gives a binary prediction
  • 0 sample belongs to the class
  • 1 sample does not belong to the class
  • Multiple classes classification
  • A model, describing several classes is calibrated
  • Model gives a number of class as a prediction
  • Multiple classes classification can be done using
    several one class classification models!

12
The world is not so green and easy
Often, in real data samples are not discriminated
clearly. There could be some outliers. How to
estimate the quality of classification?
13
Classification errors
  • Classification errors
  • Type I errors false negatives a sample
    belongs to the class but model said no
  • Type II errors false positives, a sample does
    not belong to the class, but model said yes
  • Decreasing type I errors leads to increase errors
    of type II and vice versa

The choice depends on the problem!
14
Classification errors
  • Decrease of type I errors when it is very
    important not to lose a class sample
  • Examples hazardous substances recognition,
    medicine diagnosis
  • Decrease of type II errors when it is more
    important not to classify a wrong sample
  • Examples legal procedure (presumption of
    innocence)

15
How to compare classification models?
16
Back to our example
  • Find centroids center of classes
  • Calculate distances to samples
  • Analize distance plot

17
Coomans (distance) plot
Distance to the center of class 2
Samples of class 1 far from class 2 close to
class 1
Outliers far from class 2 far from class1
Samples of class 2 far from class 1 close to
class 1
Samples belongs to both classes close to class
1 close to class 2
Distance to the center of class 1
18
Coomans plot
Distance to the center of class 2
S2
S1
S2
Distance to the center of class 1
S1
19
Conclusions
  • Classification process of arranging objects
    into two or more groups (classes)
  • Supervised classification implies
  • Building a classification model using calibration
    set
  • Estimation of discrimination power of the model
  • Using model to predict a class for new unknown
    samples

20
Classification methods
  • Simple
  • Linear and quadratic discriminant analysis
  • K-nearest neighbors
  • Cluster analysis
  • More complex
  • Bayesian classification
  • Support vector machines
  • Neural networks

21
Known problems
  • Number of features 10, 100, 1000
  • It is not possible to provide visual analysis
  • It is quite difficult to find what features are
    relevant for the problem
  • Data contains noise and outliers
  • Variables are correlated

How to tackle these problems? Using projection
methods!
22
Part II. Multivariate classification. SIMCA
23
Multivariate classification
  • Main idea using projection on latent variables
    instead of original samples/variables
  • Unsupervised classification
  • PCA classical algorithms
  • Supervised classification
  • SIMCA
  • PLS DA

24
SIMCA
  • SIMCA (Soft Independent Modeling of Class
    Analogy)
  • Object may belongs to several classes which is
    quite typical for real data
  • Basic idea build separate models for each class
  • Proposed by Svante Wold, in 1970th

25
SIMCA main steps
  • Step one build separate PCA model for each class
  • Different models could have different number of
    latent variables
  • When calibrate a model be careful about outliers.
    If data preprocessing is needed it should be the
    same for each model

26
SIMCA main steps
  • Step two applying PCA models for each samples
    and analyze distances and plots
  • Distance between models
  • Distance from model to sample
  • Leverage of sample for each model
  • Coomans plot
  • And
  • Modeling power of variables
  • Discrimination power of variables

27
SIMCA models
28
Glass dataset
  • Pieces of two types glasses
  • Vehicle headlights
  • Street lamps

39 samples ? 5 variables
29
PCA analysis
How to make classification model?
30
SIMCA calibrate separate PCA models
31
SIMCA results
Discrimination power
Modelling power
  • Shows the ability of each variable to
    discriminate between two models
  • Shows the influence of each variable over the
    model

32
SIMCA results
Model distcance
Coomans plot
  • Shows how different the models are from each
    other
  • Distance from samples to two models

33
SIMCA results
  • Distance to the model and leverage

34
SIMCA results
  • Distance to the model and leverage

35
SIMCA results
  • Classification table

36
SIMCA conclusions
  • Simple and efficient methods for supervised
    classification
  • Allows to find and exclude outliers on the
    calibration stage
  • Allows to compare models for each class
  • Allows to arrange a sample to several classes
  • For one-class classification needs only
    calibration set with samples from this class!

37
Example 1
  • Wine dataset
  • 178 samples three types of wine grown in the
    same region in Italy but derived from three
    different cultivars 
  • Calibration set 166 samples
  • Test set 12 samples
  • 13 variables Alcohol, Malic acid, Ash,
    Alcalinity, Magnesium, Total phenols, Flavanoids,
    Nonflavanoid Ph, Proanthocyanins, Color
    intensity, Hue, OD280/OD315, Proline

38
Example 2
  • SOT dataset (pills)
  • 30 samples NIR spectra of three pills types
  • genuine originals
  • analogue the same medicine, but produced by
    other company
  • fake counterfeit pills
  • 1500 ?-variables NIR spectra of the pills

39
SIMCA step by step
  • Data preprocessing
  • Projection methods are very sensitive to data
    preprocessing. If there is no a priory
    information try at least centering and
    autoscaling
  • Preliminary analysis
  • Build and analyze PCA model for the whole dataset
    are there any groups, outliers and other
    aperiodicities?
  • Separate class modeling
  • Calibrate separate PCA models for each class
    samples. Analyze scores and loadings plots for
    outliers and other anomalies. Save models.

40
SIMCA step by step
  • Analyze class models
  • Use separate test to discover how good your
    models are. Classify samples from the test set
    and analyze all plots Coomans, Leverage vs.
    Model distance, Distance between models and so
    on. Set the proper value for significance level.
  • Classification of unknown samples
  • Simply use your model to classify new, unknown
    samples, keeping in mind the value of possible
    classification errors for your model.
Write a Comment
User Comments (0)
About PowerShow.com