Dual data driven SIMCA as a one-class classifier - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Dual data driven SIMCA as a one-class classifier

Description:

Dual data driven SIMCA as a one-class classifier Alexey Pomerantsev ICP RAS WSC-9 20.02.14 * Type II validation 20.02.14 * WSC-9 Risk management 20.02.14 * WSC-9 ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 40
Provided by: AlexeyLPo9
Category:
Tags: simca | class | classifier | data | driven | dual | one

less

Transcript and Presenter's Notes

Title: Dual data driven SIMCA as a one-class classifier


1
Dual data driven SIMCA as a one-class classifier
Alexey PomerantsevICP RAS
2
One-class classifier, e.g. SIMCA
3
Standard bi-variate normal distribution
4
Extremes and Outliers
?0.01?0.05
a is Extreme significance
? is Outlier significance
5
Extreme plot
6
Principal Component Analysis
Karl Pearson, 1901
7
Scores Orthogonal Distances
8
SD OD distributions
OD
SD
9
Data Driven SIMCA
SD
OD
10
Total Distance
11
Tolerance Areas
a is Extreme significance
? is Outlier significance
12
Classical Data Driven (CDD) SIMCA
Classical Method of Moments
Given
Then
Where
13
Robust Data Driven (RDD) SIMCA
Robust Method of Moments
Given
Then
Where
Mmedian(u) Rinterquartile(u)
14
Dual Data Driven SIMCA
Given
XTtPE h(h1,...., hI) v(v1,....,
vI)
Then
CDD SIMCA RDD SIMCA





YesCDD SIMCA NoRDD SIMCA
15
Case study I. Simulated data with outliers
The numbers of variables, J3 The numbers of
objects, I100 The number of principal
components, A2The ? properties areE(?) 0,
v11 v22 v33 0.28, rank(V) 2. The ?
component properties are E(?) 0, ?0.05
(first 97 objects)E(?) 0, ? 0.2 (last 3
objects)
16
SIMCA plots
17
REFERENCE RDD-SIMCA
18
Totally in 10 data sets with outliers
Expected
19
Case study II. Real world data with 2 groups
Substance in the closed PE bags, 82 drums
measured by NIR.Totally 246 spectra Group G1
200 objectsGroup G2 46 objects
ACA 642 (2009) 222-227
20
Probe position effect
21
Extreme plots
Expected number of extremes NaI
Clean subset G1
Contaminated dataset G1G2
22
Results of separation
Subset G1 revealed
Subset G2 revealed
23
Reference
24
One-class classification
Alternatives
Type II error 1- Type I error
25
How to find ß in case AC is known
Target
Alternative
26
Two-classes discrimination plums apples
27
Errors of Type I and Type II
28
Type II error ß
Target
PCA
29
Non-central chi-squared distribution
chi-squared distribution
non-central chi-squared distribution
the noncentrality parameter
30
Calculation of ß
Total distance of Target class (TC)
h0? ,v0?, Nh?, Nv?
31
Case study II. Real world data with 2 groups
Substance in the closed PE bags, 82 drums
measured by NIR.Totally 246 spectra Group G1
200 objectsGroup G2 46 objects
Type II error estimation
32
G2 AC1 AC2 AC3 AC4
33
Total distance c distributions
34
Type II validation
35
Risk management
given a
calculated ccrit
found ß
given ß
found a
calculated ccrit
36
Conclusion 1
Extreme objects play an important role in data
analysis. These objects should not be confused
with outliers. The number of extremes should be
compared to the expected number, coupled with the
significance level ?.
Clean dataset
Contaminated dataset
37
Conclusion 2
Errors in decision making are inevitable.
Reducing one error, we increase the other. The
researcher's task is to find the balance of
risks. Our approach provides such an
opportunity.Examples will be presented in
Oxanas lecture.
38
Conclusion 3
The proposed Dual Data Driven PCA/SIMCA approach
looks like a fine competitor to the pure
classical and to the strictly robust methods.
This technique has demonstrated a proper
performance in the analysis of both regular and
contaminated data sets.
Clean dataset
Contaminated dataset
39
Thank you for your attention
A Lawyers Mistake
Write a Comment
User Comments (0)
About PowerShow.com