Title: A Novel Approach to Novelty Detection:
1A Novel Approach to Novelty Detection Voronoi
Tesselation
Jeffrey.D.Scargle_at_nasa.gov Space Science
Division NASA Ames Research Center
Collaborator Nikunj Oza, NASA-Ames Research
Center, IC PureSense, Inc. Machine Learning
Seminar
2- The Basic Ideas
- Nonparametric Density Estimation
- Voronoi Tessellation
- Voronoi Cells as Point Surrogates
- 1/Area of cell local point density
- Cell geometry ? local density gradient
- Tessellate training points plus 1 test point
- If the Voronoi cell assigned to the test point is
an edge cell, the test point is an outlier
otherwise it is normal
3- Modes of Operation (1)
- Static training data and test data
4- Modes of Operation (2)
- Training data all past data
- Test data one new data point
5- Modes of Operation (2)
- Training data all past data
- Test data one new data point
6- Modes of Operation (3)
- Training data past data of fixed size
- Test data one new data point
7Voronoi Tessellation of data in any dimension
8Construct Voronoi cells to represent local photon
density density 1 / cell area
9(No Transcript)
10(No Transcript)
11(No Transcript)
12Voronoi cells also represent local photon density
gradients
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17(No Transcript)
18The Voronoi cells are a local representation of
the data
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30Selecting the smallest Voronoi cells yields the
regions of highest photon density
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46(No Transcript)
47(No Transcript)
48(No Transcript)
49(No Transcript)
50MatLab code
do abnormal data for id 1 num_test
data train_data test_data( id )
vertices, v_cells voronoin( data
) vertices_last v_cells num_use 1
if find( vertices_last 1 )
infinite vertex 1 count_correct
count_correct 1 else
count_error count_error 1 end
end
51Biomed dataset Cox, Johnson and Kafadar (1982),
Exposition of statistical graphics
technology, ASA Proceedings of the Statistical
Computing Section, p. 55-6
52Biomed dataset Cox, Johnson and Kafadar (1982),
Exposition of statistical graphics
technology, ASA Proceedings of the Statistical
Computing Section, p. 55-6
67 Abnormal Inputs 27 Normal
Inputs Correct Wrong Correct Wrong ---------
--------------------------------------------------
--------------------------------- Kernel Classifie
r 57 10 25 2 Grow When Required
net 56 11 25 2 Voronoi - mean 57.2 9.8 17.6
9.3 - best 60 7 25 2
53Curse of Dimensionality?
Computation time for Voronoi Tessellation is
roughly linear in number of data points. But
much steeper function of the dimensionality. In
the ball bearing data set (following example)
the dimensionality of the raw data is 32. I used
singular Value decomposition to reduce the
dimensionality.
54Curse of Dimensionality?
55Ball-bearing data EPSRC Structural Integrity
Damage Assessment www.brunel.ac.uk/research/cnca/
sida/html/data.htm
Normal Broken Damaged Basket
½ runs (New) Ring Basket destroyed loosely
--------------------------------------------------
----------------------------------------- Linear
programming kernel 1.3 0 46.7 71.7 74.5
Grow When Required net 37.8 40.3 43.8 4.6
4.9 LPDD 0? 0 8.3 Voronoi
3 1.6 0 30.7 30.7 35.7 4 6.4 0
12.1 16.2 19.9 5
13.5 0.7 25.5 28.9 34.2
56Novelty Detection in Time Series
- Multivariate Time Series
- For single time series, use embedding
- captures the dynamical behavior of the process
- increases the dimensionality.
- X(tn) ? X(tn), X(tn1), X(tn2), , X(tnk-1)
- Online Novelty Detection on Temporal Sequences
- Junshui Ma and Simon Perkins, SIGKDD 2003
- Better
- X(tn) ? X(tn), X(tnm), X(tn2m), ,
X(tn(k-1))
57(No Transcript)
58(No Transcript)