Title: Applying Statistical Machine Learning to Retinal Electrophysiology
1Applying Statistical Machine Learning to Retinal
Electrophysiology
- Matt Boardman
- January, 2006
- matt.boardman_at_dal.ca
2Discussions
- Axotomy ERG Data Sets
- Classification using Support Vector Machines
(SVM) - Assessing Waveform Significance
- Probability Density Estimation
- Confidence Measures
3Axotomy ERG Data Sets (from F. Tremblay,
Retinal Electrophysiology)
- Data Set A
- 19 axotomy subjects, 19 control subjects (total
38) - time between control axotomy?
- Multifocal ERG 145 data points (mean of all
locations) - 1000 Hz (?) sample rate
- Data Set B
- 6 axotomy subjects, 8 control subjects (total 14)
- measurements approximately six weeks after
axotomy - Multifocal ERG 14,935 data points (103
locations x 145 ms) - Corneal and Optic Nerve readings (control
subjects only)
4Classification using Support Vector Machines
- SVM use statistical machine learning
- Constrained optimization problem
- Objective Find a hyperplane which
maximizes margin - Higher dimensional mappings provide flexibility
- Non-separable data a cost parameter controls
the tradeoff between outlier detection and
generalization performance - Non-linear SVM (Polynomial, Sigmoid, Gaussian
kernels)
5Data Normalization
- Balanced training data
- Number of positive samples number of negative
samples - Data set A is already balanced
- Keep data set B balanced through combination,
i.e. 8C628 - Independently and identically distributed (iid)
data - Independence not true
- e.g. value of point x17 most likely depends on
x16 - Not Identically distributed
- e.g. x26 is always positive (P1 wave), but x40 is
always negative (N2 wave) - Approximate iid data by subtracting mean from
each dimension, then dividing each dimension by
its maximum magnitude - results in zero mean for all dimensions, with all
values between -1 and 1 - No zero-setting necessary!
- e.g. subtracting mean tail value does not affect
classification accuracy!
6Parameter Selection for Classification
- Selection of best gamma (?) and cost (c) values
obtained by exhaustive search of loge-space - try all possible parameter values, choose best
points (red circles) - accuracy-weighted centre of mass gives optimal
point (green circle) - Training / Testing
- 75 / 25
- Leave one out
- Better searches
- 3 strikes
- Simulated annealing (?)
7Classification Results
- Data set A (38 samples x 145 data points)
- 94.7
- Data set B (14 samples x 145 data points)
- 99.4
- Data set B (14 samples x 14,935 data points)
- 90.8
8Classification Benchmarks
- How does this method perform on industry-standard
classification benchmark data sets? - Wisconsin Breast Cancer Database
- O.L. Mangasarian, W.H. Wolberg, Cancer diagnosis
via linear programming, SIAM News,
23(5)1-18, 1990. - Iris Plants Database
- R.A. Fisher, The use of multiple measurements in
taxonomic problems, Annual Eugenics,
7(2)179-88, 1936.
9Classification Benchmarks
Wisconsin 96.9, s0.18
Iris (Class 1 or not) 100.0
Iris (Class 2 or not) 96.9, s0.55
Iris (Class 3 or not) 97.1, s0.77
10Assessing Waveform Significance
- Which are the most important parts of the
waveform, with respect to classification
accuracy? - Fisher Ratio
- distance between means over sum of variance
(linear) - Pearson Correlation Coefficients
- strength of association between variables
(linear) - Kolmogorov-Smirnoff
- distance between cumulative distributions
(non-linear) - Linear SVM
- classification on one dimension only (linear)
- Cross-Entropy
- mutual information measure (non-linear)
- SVM Sensitivity
- Monte Carlo simulation using SVM (non-linear)
11Comparison of All Measures (Dataset B)
12Probability Density Estimation
- Goal define a measure to show how sure the
classifier is with the result - Density Estimation is known to be a hard
problem - Generally need large number of samples for
accuracy - Small deviations in sample points have magnified
effect - How do we estimate a probability distribution?
- Best-Fit Gaussian
- Assume Gaussian distribution, find sigmoid that
fits best - Kernel Smoothing
- Part of MATLABs Statistics Toolbox
- SVM Density Estimation (RSDE method)
- Special case of SVM Regression
13Comparison of Estimation Techniques
14Confidence Measures
- Support is the overall distribution of the
sample - Denote p(x)
- Density H p(x) dx 1
- Confidence is defined as the posterior
probability - Probability that sample x is of class C
- Denote p(Cx)
- Can we combine these measures somehow?
15Confidence Measures
16Confidence Measures
17Confidence Measures
18References
- SVM Tutorial (mathematical but practical)
- C. Burges, A Tutorial on Support Vector Machines
for Pattern Recognition, Data Mining and
Knowledge Discovery, 2(2)121-67, 1998. - SVM Density Estimation (RSDE algorithm)
- Mark Girolami, Chao He, Probability Density
Estimation from Optimally Condensed Data
Samples, IEEE Trans. Pattern Analysis and
Machine Intelligence, 25(10)1253-64, 2003. - MATLAB versions
- LIBSVM http//www.csie.ntu.edu.tw/cjlin/libsvm
- SVMlight http//svmlight.joachims.org/
- An excellent online SVM demo (Java applet)
- http//www.csie.ntu.edu.tw/cjlin/libsvm/GUI
19Data Representation
- We can represent the input data in many ways
- Unprocessed vector (145 dimensions as is)
- Second order information (first time derivative)
- Third order information (second time derivative)
- Frequency information (Power Spectral Density)
- Wavelet transforms (Daubechies, Symlet)
- Result Only small differences in accuracy!
20Data Representation
- Example Wavelet representations
- i.e. some indications, but nothing statistically
significant (5)
21Cross Entropy
22SVM Sensitivity Analysis
23SVM Sensitivity Analysis (Windowed)
24Comparison of Estimation Techniques
25Comparison of Estimation Techniques
26Comparison of Estimation Techniques