Optimization-Based Data Mining Approaches in Neuroscience Research - PowerPoint PPT Presentation

1 / 108
About This Presentation
Title:

Optimization-Based Data Mining Approaches in Neuroscience Research

Description:

References Feature Selection for Consistent Biclustering via Fractional 0-1 Programming (with Stanislav Busygin and Oleg A. Prokopyev), Journal of ... – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 109
Provided by: maiLiuSe
Category:

less

Transcript and Presenter's Notes

Title: Optimization-Based Data Mining Approaches in Neuroscience Research


1
Optimization-Based Data Mining Approaches in
Neuroscience Research
  • Panos M. Pardalos
  • University of Florida

2
Introduction
  • Data Mining the practice of searching through
    large amounts of computerized data to find useful
    patterns or trends.
  • Optimization An act, process, or methodology of
    making something (as a design, system, or
    decision) as fully perfect, functional, or
    effective as possible specifically the
    mathematical procedures (as finding the maximum
    of a function) involved in this.
  • Merriam Webster Dictionary

3
Introduction
  • The combination of data mining and optimization
  • Find the best way to extract meaningful
    patterns from data.
  • Not always an easy task.

4
How difficult Optimization can be?
  • Given integers N1,N2,,Nk and M find a subset of
    N1,N2,,Nk such that their sum is equal to M.
  • Can you find a better algorithm than of O(2k).
  • Exponential complexity ?

5
Hard drive Cost
  • Approximately 1/10 cheaper every 5 years

6
Hard Drive Capacity
  • Approximately 10 times more every 5 years

7
Processing power
  • Number of transistors of a computer processor
    double every two years

8
References
  • Handbook of Massive Data Sets, co-editors J.
    Abello, P.M. Pardalos, and M. Resende, Kluwer
    Academic Publishers, (2002).

9
Main problems in data mining
  • Data preprocessing
  • Dimensionality reduction
  • Feature selection
  • Regression
  • Clustering (Unsupervised learning)
  • Classification (Supervised Learning)
  • Semi-Supervised learning (between unsupervised
    and unsupervised)
  • Biclustering
  • Result Validation
  • Data Visualization/Representation
  • Biomedical Informatics is a challenging area with
    lots of these problems.

10
Agenda
  • Research Background
  • Epilepsy
  • Seizure Prediction
  • Sources of Data
  • Electroencephalogram (EEG) Time Series
  • Dimensionality Reduction
  • Chaos Theory
  • Feature Selection for Brain Monitoring
  • Time Series Classification of Neuro-Physiological
    States
  • Brain Clustering
  • Brain Network Models
  • Concluding Remarks

11
Facts About Epilepsy
  • At least 2 million Americans and other 40-50
    million people worldwide (about 1 of population)
    suffer from Epilepsy.
  • Epilepsy is the second most common brain disorder
    (after stroke) that causes recurrent seizures.
  • Epileptic seizures occur when a massive group of
    neurons in the cerebral cortex suddenly begin to
    discharge in a highly organized rhythmic pattern.

12
Epileptic Seizures
  • Seizures usually occur spontaneously, in the
    absence of external triggers.
  • Seizures cause temporary disturbances of brain
    functions such as motor control, responsiveness
    and recall which typically last from seconds to a
    few minutes.
  • Seizures may be followed by a post-ictal period
    of confusion or impaired sensorial that can
    persist for several hours.

13
10-second EEGs Seizure Evolution
14
Why do we care?
  • Based on 1995 estimates, epilepsy imposes an
    annual economic burden of 12.5 billion in the
    U.S. in associated health care costs and losses
    in employment, wages, and productivity.
  • Cost per patient ranged from 4,272 for persons
    with remission after initial diagnosis and
    treatment to 138,602 for persons with
    intractable and frequent seizures.

15
Current Epilepsy Treatment
  • Pharmacological Therapy
  • Anti-Epileptic Drugs (AEDs)
  • Mainstay of epilepsy treatment
  • Approximately 25 to 30 remain unresponsive
  • Epilepsy Resective Surgery
  • Require long-term invasive EEG monitoring to
    locate a specific, localized part of the brain
    where the seizures are thought to originate
  • 50 of pre-surgical candidates do not undergo
    respective surgery
  • Multiple epileptogenic zones
  • Epileptogenic zone located in functional brain
    tissue
  • Only 50-60 of surgery cases result in seizure
    free

16
Current Epilepsy Treatment
  • Electrical Stimulation (Vagus nerve stimulator)
  • Parameters (amplitude and duration of
    stimulation) arbitrarily adjusted
  • As effective as one additional AED dose
  • Side Effects
  • Seizure Prediction?
  • Monitoring Unit?
  • Forecasting Impending Seizures?
  • Seizure Control?
  • Deep Brain Stimulator?

17
Electroencephalogram (EEG)
  • is a traditional tool for evaluating the
    physiological state of the brain.
  • offers excellent spatial and temporal resolution
    to characterize rapidly changing electrical
    activity of brain activation
  • captures voltage potentials produced by brain
    cells while communicating.
  • In an EEG, electrodes are implanted in deep brain
    or placed on the scalp over multiple areas of the
    brain to detect and record patterns of electrical
    activity and check for abnormalities.

18
From Microscopic to Macroscopic Level
(Electroencephalogram - EEG)
19
Electrode Montage and EEGs
20
Scalp EEG Data Acquisition
21
Open Problems
  • Is the seizure occurrence random?
  • If not, can seizures be predicted?
  • If yes, are there seizure pre-cursors (in EEGs)
    preceding seizures?
  • If yes, what data mining techniques can be used
    to indicate these pre-cursors?
  • Does normal brain activity during differ from
    abnormal brain activity?

22
Goals of Research
  • Test the hypothesis that seizures are not a
    random process.
  • Demonstrate that seizures could be predicted
  • Feature Selection to identify seizure pre-cursors
    (Statistical Process Control)
  • Demonstrate that normal and abnormal EEGs can be
    differentiated
  • Time Series Classification
  • Better understand the epileptogenic process how
    seizures are initiated and propagated.
  • Brain Clustering
  • Develop a closed-loop seizure control device
    (Brain Pacemaker)

23
Dimensionality Reduction
  • Chaos Theory

24
EEGs with the Curse of Dimensionality
  • The brain is a non-stationary system.
  • EEG time series is non-stationary.
  • With 200 Hz sampling, 1 hour of EEGs is comprised
    of
  • 200606030 21,600,000 data points 43.2MB
  • (assume 16-bit ASCI format)
  • 1 day 1.04GB
  • 1 week 7.28GB
  • 20 patients 0.15TB

? Terabytes
? Gigabytes
? Megabytes
Kilobytes
25
Data Transformation Using Chaos Theory
  • Measure the brain dynamics from time series
  • Stock Market
  • Currency Exchanges (e.g., Swedish Kroner)
  • Apply dynamical measures (based on chaos theory)
    to non-overlapping EEG epochs of 10.24 seconds
    2048 points.
  • Maximum Short-Term Lyapunov Exponent
  • measure the average uncertainty along the local
    eigenvectors and phase differences of an
    attractor in the phase space
  • measure the stability/chaoticity of EEG signals

26
Measure of Chaos
27
STLmax Profiles
28
Hidden Synchronization Patterns
29
How similar are they?Statistics to quantify the
convergence of STLmax
  • By paired-T statistic
  • Per electrode, for EEG signal epochs i and j,
    suppose their STLmax values in the epochs (of
    length 60 points, 10 minutes) are

The T-index between EEG signal epochs i and j is
defined as
30
Statistically Quantifying the Convergence
31
Convergence of STLmax
32
Why Feature Selection?
  • Not every electrode site shows the convergence.
  • Feature Selection Select the electrodes that are
    most likely to show the convergence preceding the
    next seizure.

33
Feature Selection
  • Quadratic Integer Programming with Quadratic
    Constraints

34
Optimization Problem
  • Optimization
  • We apply optimization techniques to find a group
    of electrode sites such that
  • They are the most converged (in STLmax) electrode
    sites during 10-min window before the seizure
  • They show the dynamical resetting (diverged in
    STLmax) during 10-min window after the seizure.
  • Such electrode sites are defined as critical
    electrode sites.
  • Hypothesis
  • The critical electrode sites should be most
    likely to show the convergence in STLmax again
    before the next seizure.

35
Notation and Modeling
  • x is an n-dimensional column vector (decision
    variables), where each xi represents the
    electrode site i.
  • xi 1 if electrode i is selected to be one of
    the critical electrode sites.
  • xi 0 otherwise.
  • Q is an (n?n) matrix, whose each element qij
    represents the T-index between electrode i and j
    during 10-minute window before a seizure.
  • b is an integer constant. (the number of critical
    electrode sites)
  • D is an (n?n) matrix, whose each element dij
    represents the T-index between electrode i and j
    during 10-minute window after a seizure.
  • a 2.662b(b-1), an integer constant. 2.662 is
    the critical value of T-index, as previously
    defined, to reject H0 two brain sites acquire
    identical STLmax values within 10-minute window

36
Multi-Quadratic Integer Programming
  • To select critical electrode sites, we formulated
    this problem as a multi-quadratic integer (0-1)
    programming (MQIP) problem with
  • objective function to minimize the average
    T-index among electrode sites
  • a linear constraint to identify the number of
    critical electrode sites
  • a quadratic constraint to ensure that the
    selected electrode sites show the dynamical
    resetting

37
Conventional Linearization Approach for
Multi-Quadratic 0-1 Problem
38
Theoretical ResultsMILP formulation for MQIP
problem
  • Consider the MQIP problem
  • We proved that the MQIP program is EQUIVALENT to
    a MILP problem with the SAME number of integer
    variables.

Equivalent
39
Empirical ResultsPerformance on Larger Problems
40
Hypothesis Testing - Simulation
  • Hypothesis
  • The critical electrode sites should be most
    likely to show the convergence in STLmax (drop in
    T-index below the critical value) again before
    the next seizure.
  • The critical electrode sites are electrode sites
    that
  • are the most converged (in STLmax ) electrode
    sites during 10-min window before the seizure
  • show the dynamical resetting (diverged in STLmax
    ) during 10-min window after the seizure
  • Simulation
  • Based on 3 patients with 20 seizures, we compare
    the probability of showing the convergence in
    STLmax (drop in T-index below the critical value)
    before the next seizure between the electrode
    sites, which are
  • Critical electrode sites
  • Randomly selected (5,000 times)

41
Optimal VS Non-Optimal
42
Simulation - Results
43
Statistical Process ControlHow to automate the
system?
44
Automated Seizure Warning System
Select critical electrode sites after every
subsequent seizure
Continuously calculate STLmax from multi- channel
EEG.
EEG Signals
Give a warning when T-index value drops below a
critical value
45
Data Characteristics
46
Performance Evaluation for ASWS
  • To test this algorithm, a warning was considered
    to be true if a seizure occurred within 3 hours
    after the warning.
  • Sensitivity
  • False Prediction Rate average number of false
    warnings per hour

47
Training Results
Performance characteristics of automated seizure
warning algorithm with the best
parameter-settings of training data set.
48
RECEIVER OPERATING CHARACTERISTICS (ROC)
  • ROC curve (receiver operating characteristic) is
    used to indicate an appropriate trade-off that
    one can achieve between
  • the false positive rate (1-Specificity, plotted
    on X-axis) that needs to be minimized
  • the detection rate (Sensitivity, plotted on
    Y-axis) that needs to be maximized.

49
Test Results
Performance characteristics of automated seizure
warning algorithm with the best parameter
settings on testing data set.
50
Validation of the ASWS algorithm
  • Temporal Properties
  • Surrogate Seizure Time Data Set
  • 100 Surrogate Data Sets
  • Spatial Properties
  • Non-Optimized ASWS Selecting non-optimal
    electrode sites
  • 100 Randomly Selected Electrodes

51
Prediction Scores Surrogate Data and
Non-Optimized ASWS
52
Remarks
  • Optimization as feature selection for brain
    monitoring
  • Developed an online real-time seizure prediction
    system
  • Tested on the dataset of
  • 10 patients suffering from temporal lope seizures
  • 90 days (2100 hours) of EEG data
  • 58 seizures
  • Seizure Prediction
  • Predicting 70 of temporal lobe seizures on
    average
  • Giving a false alarm rate of 0.16 per hour on
    average
  • Whats next?-fundamental questions on brain
    physiology

53
Time Series Classification I
  • Support Vector Machines with Dynamic Time Warping

54
Other Dynamical MeasuresPhase Profiles
55
Other Dynamical MeasuresEntropy H of Attractor
56
Classification of Physiological States
57
Support Vector Machines
From 1 electrode
58
Input
  • Standard SVM Input
  • 30 electrodes, 30 data points, 3 dynamical
    features 2,700 features
  • Time Series SVM Input
  • 3029 data pairs, 3 dynamical features 2,700
    90 features

59
Dynamic Time Warping
60
Preliminary Data Set
  • 132 5-minute epochs of pre-seizure EEGs
  • 300 5-minute epochs of normal EEGs
  • Pre-seizure 0-30 minutes before seizure
  • Normal 10 hours away from seizure

61
Metrics for Performance Evaluation
PREDICTED CLASS PREDICTED CLASS PREDICTED CLASS
ACTUALCLASS ClassYes ClassNo
ACTUALCLASS ClassYes a b
ACTUALCLASS ClassNo c d
a TP (true positive) b FN (false negative)
c FP (false positive) d TN (true negative)
62
Sensitivity and Specificity
  • Sensitivity measures the fraction of positive
    cases that are classified as positive.
  • Specificity measures the fraction of negative
    cases classified as negative.
  • Sensitivity TP/(TPFN)
  • Specificity TN/(TNFP)
  • Sensitivity can be considered as a detection
    (prediction or classification) rate that one
    wants to maximize.
  • Maximize the probability of correctly classifying
    patient states.
  • False positive rate can be considered as
    1-Specificity which one wants to minimize.

63
Leave-one-out Cross Validation
  • Cross-validation can be seen as a way of applying
    partial information about the applicability of
    alternative classification strategies.
  • K-fold cross validation
  • Divide all the data into k subsets of equal size.
  • Train a classifier using k-1 groups of training
    data.
  • Test a classifier on the omitted subset.
  • Iterate k times.

64
Empirical Results
65
Automated Seizure Prediction Paradigm
Multichannel
 
Com
Feature Extraction/ Cluster Analysis
Data Acquisition
 
Interface Technology

Pattern Recognition


VNS
Initiate a variety of therapies (e.g., electrical
stimulation, drug injection)
User
Drug
66
Related Patents
  • Multi-dimensional multi-parameter time
    series processing for seizure warning and
    predictionPatent 7,263,467 (Issued on August 28,
    2007).Optimization of Multi-dimensional Time
    Series Processing for seizure warning and
    predictionPatent 7,373,199 (Issued on May 13,
    2008).Optimization of spatio-temporal pattern
    processing for seizure warning and
    predictionPatent 7,461,045 (Issued on December
    2, 2008).Multi-dimensional dynamical
    analysisU.S. Utility Patent application filed on
    December 21, 2006, Serial No. 11/339,606.Closed
    -Loop State-Dependent Seizure Prevention
    SystemsU.S. Utility Patent application filed on
    December 19, 2006, Serial No. 11/641,292.

67
Brain Network Models
  • Brain Connectivity Networks Based on fMRI Data

68
The Problem
  • Certain neurological diseases are very difficult
    to diagnose at early stages
  • Functional Magnetic Resonance Imaging (fMRI)
    technique provides vast amount of information
    about structure and function of human brain, but
    there is lack of methods to analyse these data
  • Computational methods and algorithms based on
    mathematical models should be applied in order to
    find and recognize key patterns in this ocean
    of data

69
Network Models
  • Network models of human brain
  • Partition of the brain into regions of interest
  • Functional interconnections between regions in
    brain

70
Connectivity Networks
71
MRI Data
  • Blood flow level as an indicator of neuronal
    activity
  • Representation of values of signal in spatial
    voxels as 2D and 3D images

72
fMRI Data
  • The measurements are being performed every 2
    seconds over 6 minutes for each voxel of brain of
    size 2mm x 2mm x 2mm
  • The fMRI data is therefore a set of time series,
    corresponding to particular elementary volumes of
    the brain. In our data set each series contains
    180 elements.

73
fMRI Data, Vector Representation
Z
(x, y, z)
X
0
Y
74
Small World Networks
  • Small world phenomenon first described by Stanley
    Milgram in 1960.
  • Six degrees of separation
  • Erdos number

75
From Random Graph to Regular Lattice
  • Random graphs generally have property of low mean
    shortest path length and low clustering
    coefficient
  • Regular lattice has high mean shortest path and
    high cluster coefficient
  • Small world networks have low mean shortest path
    length while still high clustering coefficient

76
Random Graph vs Regular Lattice
77
Small World Network
78
Quantitative Measures of Small World Property
  • Characteristic path length
  • Clustering coefficient
  • Global efficiency
  • Nodal efficiency

79
Brain Connectivity Networks
  • Brain connectivity networks possess small world
    properties
  • We predicted, that network characteristics, such
    as global and local efficiency values, would be
    decreased for people with Parkinsons disease.

80
Nodes in Connectivity Network
  • How to define brain regions nodes in the
    network?
  • Clustering problem
  • Standard MNI template

81
Signal Time Series Form Clusters
time
82
Clustering Problem
  • Each data set contains roughly 100 000 of time
    series, each of them consist of 180 elements
  • Efficient algorithms should be developed in order
    to solve this problem

83
Standard MNI Brain Atlas
  • Partition of the brain into 116 brain regions

84
Edges in Connectivity Network
  • Weighted graph with nodes corresponding to MNI
    brain regions
  • Weights of edges defined based on correlation
    between averaged neural activity over the regions

85
Signal Processing
  • Neural activity
  • Head movements during the MR session
  • Respiratory and heart rhythms
  • Noise

86
Maximal Overlap Discrete Wavelet Transform
  • Wavelet is a small wave
  • Wavelet transform is a decomposition of initial
    signal into linear combination of wavelets

87
Time Series Decomposition by Wavelets
88
Wavelet Coefficients Correlation
  • Inter-regional correlations in resting state fMRI
    data are particularly salient at frequencies
    below 0.1 Hz
  • Second scale wavelet coefficients correspond to
    0.06 0.12 Hz frequency range

89
Connectivity Strength
  • Averaged over the regions signal vectors
  • Define level 2 wavelet coefficients of averaged
    signals , .
  • The connectivity between regions A and B is

90
Definition of Distance Between Nodes
  • For each time series S s1, s2, , sn of size
    n there is a corresponding point in n-dimensional
    space
  • For normal vectors x and y the distance between
    end points is equal to
  • Therefore, (1 corr(x,y)) may serve as a
    measure of distance between time series

91
Geometrical Representation
x
x - y
S (s1, s2, , sn)
y
0
92
Data Set
  • 15 healthy controls, 14 Parkinson patients
  • Each network for each patient consist of 116
    nodes

93
Averaged Connectivity Networks
Control
Parkinson
94
Global Network Efficiency Values
Control (1.85 /- 0.57), Parkinson (1.12 /-
0.55), independent t-test p-value 0.0017
95
Top 30 Nodal Efficiency Values
96
Nodal Efficiency Plot
Red line Control set, blue line - PD set
97
Discussion
  • Parkinsons brain network properties possess
    measurable alteration in comparison with healthy
    ones
  • Further research, in particular, different
    network model, may reveal the pattern in brain
    networks, which could be used as a diagnosis
    criteria

98
Concluding Remarks
  • Overview of Epilepsy Research
  • Applications of Data Mining and Optimization
    Techniques
  • Interplay between theory and application
  • Feature Selection
  • Time Series Classification
  • Brain Clustering
  • Brain Network Models

99
Related Patents
  • Sensor registration by global optimization
    proceduresPatent 7,653,513 (Issued January 26,
    2010).Atomic Magnetometer Sensor Array
    Magnetoencephalogram Systems and MethodUnited
    States Patent Application 20100219820 (Filed
    April 14, 2008)

100
References
  • Handbook of Massive Data Sets, co-editors J.
    Abello, P.M. Pardalos, and M. Resende, Kluwer
    Academic Publishers, (2002).

101
References
Clustering Challenges on Biological Networks S.
Butenko, W. A. Chaovalitwongse and P. M.
Pardalos, World Scientific (2009).
  • Feature Selection for Consistent Biclustering
    via Fractional 0-1 Programming (with Stanislav
    Busygin and Oleg A. Prokopyev), Journal of
    Combinatorial Optimization, Volume 10, Number 1
    (2005), pp. 7-21.
  • Biclustering in Data Mining (with S. Busygin,
    and O. Prokopyev), Computers Operations
    Research, Volume 35, Issue 9 (2008), pp.
    2964-2987.
  • On Biclustering with Features Selection for
    Microarray Data Set (with S. Busygin and O.
    Prokopyev), In (BIOMAT 2005) Proceedings of the
    International Symposium on Mathematical and
    Computational Biology (Edited by R. Mondaini R.
    Dilao), World Scientific (2006), pp. 367-377.
  • Biclustering algorithms and applications in
    data mining and forecasting (with P.
    Xanthopoulos, N. Boyko and N. Fan) In
    Encyclopedia of Operations Research and
    Management Science (accepted to appear)
    Wiley(2010).

102
References
  • Quantitative Neuroscience, co-editors P.M.
    Pardalos, C. Sackellares, P. Carney, and L.
    Iasemidis, Kluwer Academic Publishers, (2004).
  • Biocomputing, co-editors P.M. Pardalos and J.
    Principe, Kluwer Academic Publishers, (2002).

103
References
  • New in 2010 Computational Neuroscience,
    co-editors
  • W.A. Chaovalitwongse, P.M. Pardalos, P.
    Xanthopoulos (Eds.) Series Springer Optimization
    and Its Applications , Vol. 38.

104
References
  • Optimization in Medicine, Carlos Alves,, Panos M.
    Pardalos, Luis Vicente (Eds.), 2008

105
References
  • Handbook of Optimization in Medicine, Panos M.
    Pardalos, Edwin H. Romeijn (Eds.), 2009

106
Reference
  • W. Chaovalitwongse, L.D. Iasemidis, P.M.
    Pardalos, P.R. Carney, D.-S. Shiau, and J.C.
    Sackellares. A Robust Method for Studying the
    Dynamics of the Intracranial EEG Application to
    Epilepsy. Epilepsy Research, 64, 93-133, 2005.
  • W. Chaovalitwongse, P.M. Pardalos, and O.A.
    Prokopyev. Electroencephalogram (EEG) time series
    classification Applications in epilepsy , Annals
    of Operations Research, 148, 1 (2006), p 227-250.
  • Jicong Zhang, Petros Xanthopoulos ,Chang-Chia
    Liu, Panos M. Pardalos. Real-time differentiation
    of nonconvulsive status epilepticus from other
    encephalopathies using quantitative EEG analysis
    A pilot study, Epilepsia, 51, 2 (2010), pp.
    243-250
  • W. Chaovalitwongse , P.M. Pardalos, L.D.
    Iasemidis, D.-S. Shiau, and J.C. Sackellares.
    Dynamical Approaches and Multi-Quadratic Integer
    Programming for Seizure Prediction. Optimization
    Methods and Software, 20 (2-3) 383-394, 2005 .
  • L.D. Iasemidis, P.M. Pardalos, D.-S. Shiau, W.
    Chaovalitwongse, K. Narayanan, A. Prasad, K.
    Tsakalis, P.R. Carney, and J.C. Sackellares. Long
    Term Prospective On-Line Real-Time Seizure
    Prediction. Journal of Clinical Neurophysiology,
    116 (3) 532-544, 2005.
  • P.M. Pardalos, W. Chaovalitwongse, L.D.
    Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R.
    Carney, O.A. Prokopyev, and V.A. Yatsenko.
    Seizure Warning Algorithm Based on Spatiotemporal
    Dynamics of Intracranial EEG. Mathematical
    Programming, 101(2) 365-385, 2004. (INFORMS
    Pierskalla Best Paper Award 2004)
  • W. Chaovalitwongse , P.M. Pardalos, and O.A.
    Prokopyev. A New Linearization Technique for
    Multi-Quadratic 0-1 Programming Problems.
    Operations Research Letters, 32(6) 517-522,
    2004. (Rank 5th in Top 25 Articles in Operations
    Research Letters)

107
Thank you for your attention!
  • Questions?

108
Conference in 2011
Write a Comment
User Comments (0)
About PowerShow.com