Title: Optimization-Based Data Mining Approaches in Neuroscience Research
1Optimization-Based Data Mining Approaches in
Neuroscience Research
- Panos M. Pardalos
- University of Florida
2Introduction
- Data Mining the practice of searching through
large amounts of computerized data to find useful
patterns or trends. - Optimization An act, process, or methodology of
making something (as a design, system, or
decision) as fully perfect, functional, or
effective as possible specifically the
mathematical procedures (as finding the maximum
of a function) involved in this. - Merriam Webster Dictionary
3Introduction
- The combination of data mining and optimization
- Find the best way to extract meaningful
patterns from data. - Not always an easy task.
4How difficult Optimization can be?
- Given integers N1,N2,,Nk and M find a subset of
N1,N2,,Nk such that their sum is equal to M. - Can you find a better algorithm than of O(2k).
- Exponential complexity ?
5Hard drive Cost
- Approximately 1/10 cheaper every 5 years
6Hard Drive Capacity
- Approximately 10 times more every 5 years
7Processing power
- Number of transistors of a computer processor
double every two years
8References
- Handbook of Massive Data Sets, co-editors J.
Abello, P.M. Pardalos, and M. Resende, Kluwer
Academic Publishers, (2002).
9Main problems in data mining
- Data preprocessing
- Dimensionality reduction
- Feature selection
- Regression
- Clustering (Unsupervised learning)
- Classification (Supervised Learning)
- Semi-Supervised learning (between unsupervised
and unsupervised) - Biclustering
- Result Validation
- Data Visualization/Representation
- Biomedical Informatics is a challenging area with
lots of these problems.
10Agenda
- Research Background
- Epilepsy
- Seizure Prediction
- Sources of Data
- Electroencephalogram (EEG) Time Series
- Dimensionality Reduction
- Chaos Theory
- Feature Selection for Brain Monitoring
- Time Series Classification of Neuro-Physiological
States - Brain Clustering
- Brain Network Models
- Concluding Remarks
11Facts About Epilepsy
- At least 2 million Americans and other 40-50
million people worldwide (about 1 of population)
suffer from Epilepsy. - Epilepsy is the second most common brain disorder
(after stroke) that causes recurrent seizures. - Epileptic seizures occur when a massive group of
neurons in the cerebral cortex suddenly begin to
discharge in a highly organized rhythmic pattern.
12Epileptic Seizures
- Seizures usually occur spontaneously, in the
absence of external triggers. - Seizures cause temporary disturbances of brain
functions such as motor control, responsiveness
and recall which typically last from seconds to a
few minutes. - Seizures may be followed by a post-ictal period
of confusion or impaired sensorial that can
persist for several hours.
1310-second EEGs Seizure Evolution
14Why do we care?
- Based on 1995 estimates, epilepsy imposes an
annual economic burden of 12.5 billion in the
U.S. in associated health care costs and losses
in employment, wages, and productivity. - Cost per patient ranged from 4,272 for persons
with remission after initial diagnosis and
treatment to 138,602 for persons with
intractable and frequent seizures.
15Current Epilepsy Treatment
- Pharmacological Therapy
- Anti-Epileptic Drugs (AEDs)
- Mainstay of epilepsy treatment
- Approximately 25 to 30 remain unresponsive
- Epilepsy Resective Surgery
- Require long-term invasive EEG monitoring to
locate a specific, localized part of the brain
where the seizures are thought to originate - 50 of pre-surgical candidates do not undergo
respective surgery - Multiple epileptogenic zones
- Epileptogenic zone located in functional brain
tissue - Only 50-60 of surgery cases result in seizure
free
16Current Epilepsy Treatment
- Electrical Stimulation (Vagus nerve stimulator)
- Parameters (amplitude and duration of
stimulation) arbitrarily adjusted - As effective as one additional AED dose
- Side Effects
- Seizure Prediction?
- Monitoring Unit?
- Forecasting Impending Seizures?
- Seizure Control?
- Deep Brain Stimulator?
17Electroencephalogram (EEG)
- is a traditional tool for evaluating the
physiological state of the brain. - offers excellent spatial and temporal resolution
to characterize rapidly changing electrical
activity of brain activation - captures voltage potentials produced by brain
cells while communicating. - In an EEG, electrodes are implanted in deep brain
or placed on the scalp over multiple areas of the
brain to detect and record patterns of electrical
activity and check for abnormalities.
18From Microscopic to Macroscopic Level
(Electroencephalogram - EEG)
19Electrode Montage and EEGs
20Scalp EEG Data Acquisition
21Open Problems
- Is the seizure occurrence random?
- If not, can seizures be predicted?
- If yes, are there seizure pre-cursors (in EEGs)
preceding seizures? - If yes, what data mining techniques can be used
to indicate these pre-cursors? - Does normal brain activity during differ from
abnormal brain activity?
22Goals of Research
- Test the hypothesis that seizures are not a
random process. - Demonstrate that seizures could be predicted
- Feature Selection to identify seizure pre-cursors
(Statistical Process Control) - Demonstrate that normal and abnormal EEGs can be
differentiated - Time Series Classification
- Better understand the epileptogenic process how
seizures are initiated and propagated. - Brain Clustering
- Develop a closed-loop seizure control device
(Brain Pacemaker)
23Dimensionality Reduction
24EEGs with the Curse of Dimensionality
- The brain is a non-stationary system.
- EEG time series is non-stationary.
- With 200 Hz sampling, 1 hour of EEGs is comprised
of - 200606030 21,600,000 data points 43.2MB
- (assume 16-bit ASCI format)
- 1 day 1.04GB
- 1 week 7.28GB
- 20 patients 0.15TB
? Terabytes
? Gigabytes
? Megabytes
Kilobytes
25Data Transformation Using Chaos Theory
- Measure the brain dynamics from time series
- Stock Market
- Currency Exchanges (e.g., Swedish Kroner)
- Apply dynamical measures (based on chaos theory)
to non-overlapping EEG epochs of 10.24 seconds
2048 points. - Maximum Short-Term Lyapunov Exponent
- measure the average uncertainty along the local
eigenvectors and phase differences of an
attractor in the phase space - measure the stability/chaoticity of EEG signals
26Measure of Chaos
27STLmax Profiles
28Hidden Synchronization Patterns
29How similar are they?Statistics to quantify the
convergence of STLmax
- By paired-T statistic
- Per electrode, for EEG signal epochs i and j,
suppose their STLmax values in the epochs (of
length 60 points, 10 minutes) are
The T-index between EEG signal epochs i and j is
defined as
30Statistically Quantifying the Convergence
31Convergence of STLmax
32Why Feature Selection?
- Not every electrode site shows the convergence.
- Feature Selection Select the electrodes that are
most likely to show the convergence preceding the
next seizure.
33Feature Selection
- Quadratic Integer Programming with Quadratic
Constraints
34Optimization Problem
- Optimization
- We apply optimization techniques to find a group
of electrode sites such that - They are the most converged (in STLmax) electrode
sites during 10-min window before the seizure - They show the dynamical resetting (diverged in
STLmax) during 10-min window after the seizure. - Such electrode sites are defined as critical
electrode sites. - Hypothesis
- The critical electrode sites should be most
likely to show the convergence in STLmax again
before the next seizure.
35Notation and Modeling
- x is an n-dimensional column vector (decision
variables), where each xi represents the
electrode site i. - xi 1 if electrode i is selected to be one of
the critical electrode sites. - xi 0 otherwise.
- Q is an (n?n) matrix, whose each element qij
represents the T-index between electrode i and j
during 10-minute window before a seizure. - b is an integer constant. (the number of critical
electrode sites) - D is an (n?n) matrix, whose each element dij
represents the T-index between electrode i and j
during 10-minute window after a seizure. - a 2.662b(b-1), an integer constant. 2.662 is
the critical value of T-index, as previously
defined, to reject H0 two brain sites acquire
identical STLmax values within 10-minute window
36Multi-Quadratic Integer Programming
- To select critical electrode sites, we formulated
this problem as a multi-quadratic integer (0-1)
programming (MQIP) problem with - objective function to minimize the average
T-index among electrode sites - a linear constraint to identify the number of
critical electrode sites - a quadratic constraint to ensure that the
selected electrode sites show the dynamical
resetting
37Conventional Linearization Approach for
Multi-Quadratic 0-1 Problem
38Theoretical ResultsMILP formulation for MQIP
problem
- Consider the MQIP problem
- We proved that the MQIP program is EQUIVALENT to
a MILP problem with the SAME number of integer
variables.
Equivalent
39Empirical ResultsPerformance on Larger Problems
40Hypothesis Testing - Simulation
- Hypothesis
- The critical electrode sites should be most
likely to show the convergence in STLmax (drop in
T-index below the critical value) again before
the next seizure. - The critical electrode sites are electrode sites
that - are the most converged (in STLmax ) electrode
sites during 10-min window before the seizure - show the dynamical resetting (diverged in STLmax
) during 10-min window after the seizure - Simulation
- Based on 3 patients with 20 seizures, we compare
the probability of showing the convergence in
STLmax (drop in T-index below the critical value)
before the next seizure between the electrode
sites, which are - Critical electrode sites
- Randomly selected (5,000 times)
41Optimal VS Non-Optimal
42Simulation - Results
43Statistical Process ControlHow to automate the
system?
44Automated Seizure Warning System
Select critical electrode sites after every
subsequent seizure
Continuously calculate STLmax from multi- channel
EEG.
EEG Signals
Give a warning when T-index value drops below a
critical value
45Data Characteristics
46Performance Evaluation for ASWS
- To test this algorithm, a warning was considered
to be true if a seizure occurred within 3 hours
after the warning. - Sensitivity
- False Prediction Rate average number of false
warnings per hour
47Training Results
Performance characteristics of automated seizure
warning algorithm with the best
parameter-settings of training data set.
48RECEIVER OPERATING CHARACTERISTICS (ROC)
- ROC curve (receiver operating characteristic) is
used to indicate an appropriate trade-off that
one can achieve between - the false positive rate (1-Specificity, plotted
on X-axis) that needs to be minimized - the detection rate (Sensitivity, plotted on
Y-axis) that needs to be maximized.
49Test Results
Performance characteristics of automated seizure
warning algorithm with the best parameter
settings on testing data set.
50Validation of the ASWS algorithm
- Temporal Properties
- Surrogate Seizure Time Data Set
- 100 Surrogate Data Sets
- Spatial Properties
- Non-Optimized ASWS Selecting non-optimal
electrode sites - 100 Randomly Selected Electrodes
51Prediction Scores Surrogate Data and
Non-Optimized ASWS
52Remarks
- Optimization as feature selection for brain
monitoring - Developed an online real-time seizure prediction
system - Tested on the dataset of
- 10 patients suffering from temporal lope seizures
- 90 days (2100 hours) of EEG data
- 58 seizures
- Seizure Prediction
- Predicting 70 of temporal lobe seizures on
average - Giving a false alarm rate of 0.16 per hour on
average - Whats next?-fundamental questions on brain
physiology
53Time Series Classification I
- Support Vector Machines with Dynamic Time Warping
54Other Dynamical MeasuresPhase Profiles
55Other Dynamical MeasuresEntropy H of Attractor
56Classification of Physiological States
57Support Vector Machines
From 1 electrode
58Input
- Standard SVM Input
- 30 electrodes, 30 data points, 3 dynamical
features 2,700 features - Time Series SVM Input
- 3029 data pairs, 3 dynamical features 2,700
90 features
59Dynamic Time Warping
60Preliminary Data Set
- 132 5-minute epochs of pre-seizure EEGs
- 300 5-minute epochs of normal EEGs
- Pre-seizure 0-30 minutes before seizure
- Normal 10 hours away from seizure
61Metrics for Performance Evaluation
PREDICTED CLASS PREDICTED CLASS PREDICTED CLASS
ACTUALCLASS ClassYes ClassNo
ACTUALCLASS ClassYes a b
ACTUALCLASS ClassNo c d
a TP (true positive) b FN (false negative)
c FP (false positive) d TN (true negative)
62Sensitivity and Specificity
- Sensitivity measures the fraction of positive
cases that are classified as positive. - Specificity measures the fraction of negative
cases classified as negative. - Sensitivity TP/(TPFN)
- Specificity TN/(TNFP)
- Sensitivity can be considered as a detection
(prediction or classification) rate that one
wants to maximize. - Maximize the probability of correctly classifying
patient states. - False positive rate can be considered as
1-Specificity which one wants to minimize.
63Leave-one-out Cross Validation
- Cross-validation can be seen as a way of applying
partial information about the applicability of
alternative classification strategies. - K-fold cross validation
- Divide all the data into k subsets of equal size.
- Train a classifier using k-1 groups of training
data. - Test a classifier on the omitted subset.
- Iterate k times.
64Empirical Results
65Automated Seizure Prediction Paradigm
Multichannel
Â
Com
Feature Extraction/ Cluster Analysis
Data Acquisition
Â
Interface Technology
Pattern Recognition
VNS
Initiate a variety of therapies (e.g., electrical
stimulation, drug injection)
User
Drug
66Related Patents
- Multi-dimensional multi-parameter time
series processing for seizure warning and
predictionPatent 7,263,467 (Issued on August 28,
2007).Optimization of Multi-dimensional Time
Series Processing for seizure warning and
predictionPatent 7,373,199 (Issued on May 13,
2008).Optimization of spatio-temporal pattern
processing for seizure warning and
predictionPatent 7,461,045 (Issued on December
2, 2008).Multi-dimensional dynamical
analysisU.S. Utility Patent application filed on
December 21, 2006, Serial No. 11/339,606.Closed
-Loop State-Dependent Seizure Prevention
SystemsU.S. Utility Patent application filed on
December 19, 2006, Serial No. 11/641,292.
67Brain Network Models
- Brain Connectivity Networks Based on fMRI Data
68The Problem
- Certain neurological diseases are very difficult
to diagnose at early stages - Functional Magnetic Resonance Imaging (fMRI)
technique provides vast amount of information
about structure and function of human brain, but
there is lack of methods to analyse these data - Computational methods and algorithms based on
mathematical models should be applied in order to
find and recognize key patterns in this ocean
of data
69Network Models
- Network models of human brain
- Partition of the brain into regions of interest
- Functional interconnections between regions in
brain
70Connectivity Networks
71MRI Data
- Blood flow level as an indicator of neuronal
activity - Representation of values of signal in spatial
voxels as 2D and 3D images
72fMRI Data
- The measurements are being performed every 2
seconds over 6 minutes for each voxel of brain of
size 2mm x 2mm x 2mm - The fMRI data is therefore a set of time series,
corresponding to particular elementary volumes of
the brain. In our data set each series contains
180 elements.
73fMRI Data, Vector Representation
Z
(x, y, z)
X
0
Y
74Small World Networks
- Small world phenomenon first described by Stanley
Milgram in 1960. - Six degrees of separation
- Erdos number
75From Random Graph to Regular Lattice
- Random graphs generally have property of low mean
shortest path length and low clustering
coefficient - Regular lattice has high mean shortest path and
high cluster coefficient - Small world networks have low mean shortest path
length while still high clustering coefficient
76Random Graph vs Regular Lattice
77Small World Network
78Quantitative Measures of Small World Property
- Characteristic path length
- Clustering coefficient
- Global efficiency
- Nodal efficiency
79Brain Connectivity Networks
- Brain connectivity networks possess small world
properties - We predicted, that network characteristics, such
as global and local efficiency values, would be
decreased for people with Parkinsons disease.
80Nodes in Connectivity Network
- How to define brain regions nodes in the
network? - Clustering problem
- Standard MNI template
81Signal Time Series Form Clusters
time
82Clustering Problem
- Each data set contains roughly 100 000 of time
series, each of them consist of 180 elements - Efficient algorithms should be developed in order
to solve this problem
83Standard MNI Brain Atlas
- Partition of the brain into 116 brain regions
84Edges in Connectivity Network
- Weighted graph with nodes corresponding to MNI
brain regions - Weights of edges defined based on correlation
between averaged neural activity over the regions
85Signal Processing
- Neural activity
- Head movements during the MR session
- Respiratory and heart rhythms
- Noise
86Maximal Overlap Discrete Wavelet Transform
- Wavelet is a small wave
- Wavelet transform is a decomposition of initial
signal into linear combination of wavelets
87Time Series Decomposition by Wavelets
88Wavelet Coefficients Correlation
- Inter-regional correlations in resting state fMRI
data are particularly salient at frequencies
below 0.1 Hz - Second scale wavelet coefficients correspond to
0.06 0.12 Hz frequency range
89Connectivity Strength
- Averaged over the regions signal vectors
- Define level 2 wavelet coefficients of averaged
signals , . - The connectivity between regions A and B is
90Definition of Distance Between Nodes
- For each time series S s1, s2, , sn of size
n there is a corresponding point in n-dimensional
space - For normal vectors x and y the distance between
end points is equal to - Therefore, (1 corr(x,y)) may serve as a
measure of distance between time series
91Geometrical Representation
x
x - y
S (s1, s2, , sn)
y
0
92Data Set
- 15 healthy controls, 14 Parkinson patients
- Each network for each patient consist of 116
nodes
93Averaged Connectivity Networks
Control
Parkinson
94Global Network Efficiency Values
Control (1.85 /- 0.57), Parkinson (1.12 /-
0.55), independent t-test p-value 0.0017
95Top 30 Nodal Efficiency Values
96Nodal Efficiency Plot
Red line Control set, blue line - PD set
97Discussion
- Parkinsons brain network properties possess
measurable alteration in comparison with healthy
ones - Further research, in particular, different
network model, may reveal the pattern in brain
networks, which could be used as a diagnosis
criteria
98Concluding Remarks
- Overview of Epilepsy Research
- Applications of Data Mining and Optimization
Techniques - Interplay between theory and application
- Feature Selection
- Time Series Classification
- Brain Clustering
- Brain Network Models
99Related Patents
- Sensor registration by global optimization
proceduresPatent 7,653,513 (Issued January 26,
2010).Atomic Magnetometer Sensor Array
Magnetoencephalogram Systems and MethodUnited
States Patent Application 20100219820 (Filed
April 14, 2008)
100References
- Handbook of Massive Data Sets, co-editors J.
Abello, P.M. Pardalos, and M. Resende, Kluwer
Academic Publishers, (2002).
101References
Clustering Challenges on Biological Networks S.
Butenko, W. A. Chaovalitwongse and P. M.
Pardalos, World Scientific (2009).
- Feature Selection for Consistent Biclustering
via Fractional 0-1 Programming (with Stanislav
Busygin and Oleg A. Prokopyev), Journal of
Combinatorial Optimization, Volume 10, Number 1
(2005), pp. 7-21. - Biclustering in Data Mining (with S. Busygin,
and O. Prokopyev), Computers Operations
Research, Volume 35, Issue 9 (2008), pp.
2964-2987. - On Biclustering with Features Selection for
Microarray Data Set (with S. Busygin and O.
Prokopyev), In (BIOMAT 2005) Proceedings of the
International Symposium on Mathematical and
Computational Biology (Edited by R. Mondaini R.
Dilao), World Scientific (2006), pp. 367-377. - Biclustering algorithms and applications in
data mining and forecasting (with P.
Xanthopoulos, N. Boyko and N. Fan) In
Encyclopedia of Operations Research and
Management Science (accepted to appear)
Wiley(2010).
102References
- Quantitative Neuroscience, co-editors P.M.
Pardalos, C. Sackellares, P. Carney, and L.
Iasemidis, Kluwer Academic Publishers, (2004). - Biocomputing, co-editors P.M. Pardalos and J.
Principe, Kluwer Academic Publishers, (2002).
103References
- New in 2010 Computational Neuroscience,
co-editors - W.A. Chaovalitwongse, P.M. Pardalos, P.
Xanthopoulos (Eds.) Series Springer Optimization
and Its Applications , Vol. 38.
104References
- Optimization in Medicine, Carlos Alves,, Panos M.
Pardalos, Luis Vicente (Eds.), 2008
105References
- Handbook of Optimization in Medicine, Panos M.
Pardalos, Edwin H. Romeijn (Eds.), 2009
106Reference
- W. Chaovalitwongse, L.D. Iasemidis, P.M.
Pardalos, P.R. Carney, D.-S. Shiau, and J.C.
Sackellares. A Robust Method for Studying the
Dynamics of the Intracranial EEG Application to
Epilepsy. Epilepsy Research, 64, 93-133, 2005. - W. Chaovalitwongse, P.M. Pardalos, and O.A.
Prokopyev. Electroencephalogram (EEG) time series
classification Applications in epilepsy , Annals
of Operations Research, 148, 1 (2006), p 227-250. - Jicong Zhang, Petros Xanthopoulos ,Chang-Chia
Liu, Panos M. Pardalos. Real-time differentiation
of nonconvulsive status epilepticus from other
encephalopathies using quantitative EEG analysis
A pilot study, Epilepsia, 51, 2 (2010), pp.
243-250 - W. Chaovalitwongse , P.M. Pardalos, L.D.
Iasemidis, D.-S. Shiau, and J.C. Sackellares.
Dynamical Approaches and Multi-Quadratic Integer
Programming for Seizure Prediction. Optimization
Methods and Software, 20 (2-3) 383-394, 2005 . - L.D. Iasemidis, P.M. Pardalos, D.-S. Shiau, W.
Chaovalitwongse, K. Narayanan, A. Prasad, K.
Tsakalis, P.R. Carney, and J.C. Sackellares. Long
Term Prospective On-Line Real-Time Seizure
Prediction. Journal of Clinical Neurophysiology,
116 (3) 532-544, 2005. - P.M. Pardalos, W. Chaovalitwongse, L.D.
Iasemidis, J.C. Sackellares, D.-S. Shiau, P.R.
Carney, O.A. Prokopyev, and V.A. Yatsenko.
Seizure Warning Algorithm Based on Spatiotemporal
Dynamics of Intracranial EEG. Mathematical
Programming, 101(2) 365-385, 2004. (INFORMS
Pierskalla Best Paper Award 2004) - W. Chaovalitwongse , P.M. Pardalos, and O.A.
Prokopyev. A New Linearization Technique for
Multi-Quadratic 0-1 Programming Problems.
Operations Research Letters, 32(6) 517-522,
2004. (Rank 5th in Top 25 Articles in Operations
Research Letters)
107Thank you for your attention!
108Conference in 2011