Applying Statistical Machine Learning to Retinal Electrophysiology - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Applying Statistical Machine Learning to Retinal Electrophysiology

Description:

Classification using Support Vector Machines (SVM) Assessing ... Selection of best gamma (?) and cost (c) values obtained by exhaustive search of loge-space ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 27
Provided by: flame6
Category:

less

Transcript and Presenter's Notes

Title: Applying Statistical Machine Learning to Retinal Electrophysiology


1
Applying Statistical Machine Learning to Retinal
Electrophysiology
  • Matt Boardman
  • January, 2006
  • matt.boardman_at_dal.ca

2
Discussions
  • Axotomy ERG Data Sets
  • Classification using Support Vector Machines
    (SVM)
  • Assessing Waveform Significance
  • Probability Density Estimation
  • Confidence Measures

3
Axotomy ERG Data Sets (from F. Tremblay,
Retinal Electrophysiology)
  • Data Set A
  • 19 axotomy subjects, 19 control subjects (total
    38)
  • time between control axotomy?
  • Multifocal ERG 145 data points (mean of all
    locations)
  • 1000 Hz (?) sample rate
  • Data Set B
  • 6 axotomy subjects, 8 control subjects (total 14)
  • measurements approximately six weeks after
    axotomy
  • Multifocal ERG 14,935 data points (103
    locations x 145 ms)
  • Corneal and Optic Nerve readings (control
    subjects only)

4
Classification using Support Vector Machines
  • SVM use statistical machine learning
  • Constrained optimization problem
  • Objective Find a hyperplane which
    maximizes margin
  • Higher dimensional mappings provide flexibility
  • Non-separable data a cost parameter controls
    the tradeoff between outlier detection and
    generalization performance
  • Non-linear SVM (Polynomial, Sigmoid, Gaussian
    kernels)

5
Data Normalization
  • Balanced training data
  • Number of positive samples number of negative
    samples
  • Data set A is already balanced
  • Keep data set B balanced through combination,
    i.e. 8C628
  • Independently and identically distributed (iid)
    data
  • Independence not true
  • e.g. value of point x17 most likely depends on
    x16
  • Not Identically distributed
  • e.g. x26 is always positive (P1 wave), but x40 is
    always negative (N2 wave)
  • Approximate iid data by subtracting mean from
    each dimension, then dividing each dimension by
    its maximum magnitude
  • results in zero mean for all dimensions, with all
    values between -1 and 1
  • No zero-setting necessary!
  • e.g. subtracting mean tail value does not affect
    classification accuracy!

6
Parameter Selection for Classification
  • Selection of best gamma (?) and cost (c) values
    obtained by exhaustive search of loge-space
  • try all possible parameter values, choose best
    points (red circles)
  • accuracy-weighted centre of mass gives optimal
    point (green circle)
  • Training / Testing
  • 75 / 25
  • Leave one out
  • Better searches
  • 3 strikes
  • Simulated annealing (?)

7
Classification Results
  • Data set A (38 samples x 145 data points)
  • 94.7
  • Data set B (14 samples x 145 data points)
  • 99.4
  • Data set B (14 samples x 14,935 data points)
  • 90.8

8
Classification Benchmarks
  • How does this method perform on industry-standard
    classification benchmark data sets?
  • Wisconsin Breast Cancer Database
  • O.L. Mangasarian, W.H. Wolberg, Cancer diagnosis
    via linear programming, SIAM News,
    23(5)1-18, 1990.
  • Iris Plants Database
  • R.A. Fisher, The use of multiple measurements in
    taxonomic problems, Annual Eugenics,
    7(2)179-88, 1936.

9
Classification Benchmarks
Wisconsin 96.9, s0.18
Iris (Class 1 or not) 100.0
Iris (Class 2 or not) 96.9, s0.55
Iris (Class 3 or not) 97.1, s0.77
10
Assessing Waveform Significance
  • Which are the most important parts of the
    waveform, with respect to classification
    accuracy?
  • Fisher Ratio
  • distance between means over sum of variance
    (linear)
  • Pearson Correlation Coefficients
  • strength of association between variables
    (linear)
  • Kolmogorov-Smirnoff
  • distance between cumulative distributions
    (non-linear)
  • Linear SVM
  • classification on one dimension only (linear)
  • Cross-Entropy
  • mutual information measure (non-linear)
  • SVM Sensitivity
  • Monte Carlo simulation using SVM (non-linear)

11
Comparison of All Measures (Dataset B)
12
Probability Density Estimation
  • Goal define a measure to show how sure the
    classifier is with the result
  • Density Estimation is known to be a hard
    problem
  • Generally need large number of samples for
    accuracy
  • Small deviations in sample points have magnified
    effect
  • How do we estimate a probability distribution?
  • Best-Fit Gaussian
  • Assume Gaussian distribution, find sigmoid that
    fits best
  • Kernel Smoothing
  • Part of MATLABs Statistics Toolbox
  • SVM Density Estimation (RSDE method)
  • Special case of SVM Regression

13
Comparison of Estimation Techniques
14
Confidence Measures
  • Support is the overall distribution of the
    sample
  • Denote p(x)
  • Density H p(x) dx 1
  • Confidence is defined as the posterior
    probability
  • Probability that sample x is of class C
  • Denote p(Cx)
  • Can we combine these measures somehow?

15
Confidence Measures
16
Confidence Measures
17
Confidence Measures
18
References
  • SVM Tutorial (mathematical but practical)
  • C. Burges, A Tutorial on Support Vector Machines
    for Pattern Recognition, Data Mining and
    Knowledge Discovery, 2(2)121-67, 1998.
  • SVM Density Estimation (RSDE algorithm)
  • Mark Girolami, Chao He, Probability Density
    Estimation from Optimally Condensed Data
    Samples, IEEE Trans. Pattern Analysis and
    Machine Intelligence, 25(10)1253-64, 2003.
  • MATLAB versions
  • LIBSVM http//www.csie.ntu.edu.tw/cjlin/libsvm
  • SVMlight http//svmlight.joachims.org/
  • An excellent online SVM demo (Java applet)
  • http//www.csie.ntu.edu.tw/cjlin/libsvm/GUI

19
Data Representation
  • We can represent the input data in many ways
  • Unprocessed vector (145 dimensions as is)
  • Second order information (first time derivative)
  • Third order information (second time derivative)
  • Frequency information (Power Spectral Density)
  • Wavelet transforms (Daubechies, Symlet)
  • Result Only small differences in accuracy!

20
Data Representation
  • Example Wavelet representations
  • i.e. some indications, but nothing statistically
    significant (5)

21
Cross Entropy
22
SVM Sensitivity Analysis
23
SVM Sensitivity Analysis (Windowed)
24
Comparison of Estimation Techniques
25
Comparison of Estimation Techniques
26
Comparison of Estimation Techniques
Write a Comment
User Comments (0)
About PowerShow.com