Title: Application of Nonlinear Data Analysis Methods to Locating Disease Clusters
1Application of Nonlinear Data Analysis Methods to
Locating Disease Clusters
- Linda Moniz and William Peter
- International Society for Disease Surveillance
- Track 1 Surveillance Innovations - Spatial
Method Innovations - Raleigh, NC
- December 4, 2008
2The Problem
- Daily reports of syndrome counts by clinic or
home zip code. - In this study we focus on the respiratory
syndrome. - We need to detect any unusual clusters of
disease, quickly - These detections are for health departments and
need to be done in real time, updated daily,
automated. - Whats unusual and whats a cluster?
- Can dynamical methods assist currently used
operational algorithms?
3Operational Algorithms
- Many operationally utilized clustering methods
are based the Kulldorff Scan Statistic (1997). - This algorithm is an individually most powerful
test under ideal circumstances. - This algorithm finds the most unlikely cluster in
a space given the expected distribution. It
calculates the significance using a Monte-Carlo
process. - The shapes of the clusters are generally
pre-determined by the algorithm. - Much current research into clustering revolves
around improving the Scan Statistics speed
and/or accuracy. - Scan Statistics are prone to false alerts when
the algorithm is tuned for sensitivity. - Can we use dynamical methods to filter the
false alerts? - We use a recent C-implementation of the scan
statistic a collaboration of APL/CDC/SAIC that
works well and is fast (uses a Gumbel
distribution).
4Data - Particulars
- About 3 years of data from a network of
- Clinics in the S.W. USA, all with the same
- reporting and care protocol GOOD DATA
5Re-examination of the Problem
- This is a dynamical problem as well as a
statistical problem. A cluster is a set of
locations with shared dynamics. - We proposed de-coupling the detection of unusual
behavior and the clustering of unusual behavior. - First, see if there are detections.
- Secondly, see if the detections cluster in space.
- Lastly, see if the clustering is coincidental or
if there is evidence of shared dynamics at the
cluster locations.
- Detections in the same place
- Do they have evidence of shared dynamics?
6Spatial Clustering Detection
- We use the ACE algorithm to see if detections
cluster spatially. - The ACE algorithm is a particle-mesh heuristic
that can find any shaped cluster, based on
adaptive criteria. - The results we see here did not require spatial
clustering we used clinic-level data at 16
clinics throughout the U.S. Southwest. - The method must be used for home zip-code level
data, which counts disease cases for hundreds of
zip codes. - To VERIFY that the spatial clusters are related,
we use Transfer Entropy (Schreiber, 2000) to test
for information transfer between proposed
locations in the cluster.
7Detection
- Viboud et al. (2003) used Lorenzs Method of
Analogues(1969) to retrospectively analyze and
predict flu seasons across France over 10 years.
Their results were much more accurate than
currently used statistical methods. - We will use the Method of Analogues prospectively
and with an adaptation we use the data itself to
determine a threshold level for unusual behavior.
- We use the data to determine the parameters for
the method (the number of neighbors (7), and the
number of days used in the forecast (1). - These parameters gave optimal prediction in the
testing period (60 days) for nearly all zip
codes time series.
8Prediction The Method of Analogues
Weight these observations to produce a
prediction for the circled point
.9995
.9565
.9175
.98
Prediction .9995 (.3705) .9565(.2743 )
.9175(.3551 ) .9585
9Transfer Entropy
- Consider transition probabilities of a system at
spatial site X - p(xi1
xi). - What happens if we also consider the information
at site Y? - p(xi1 xi, yi).
- If Y gives no information about X, p(xi1 xi,
yi ) p(xi1 xi). - Otherwise, we can define the Transfer Entropy
from Y to X
- Transfer Entropy from Y to X tells us how much we
can learn - about site Xs dynamics by monitoring Y.
10Testing Method
- We compare our method with a fast, efficient
method the APL/CDC/SAIC scan statistic. - Problem We have 3 years worth of historical
data. We do not know if there were any outbreaks
of disease in the historical data. - In order to test or compare any detection
algorithm, we inject artificial disease clusters
on top of the real data and see if they can be
detected. - We also see if other, spurious clusters are
detected (false alerts?). - We choose a geometry/location for which the
APL/CDC/SAIC algorithm has poor sensitivity and
is prone to false alarms.
11Injected Data
- We inject both 50 cases and 100 cases on top of
the real clinic data. - These cases are distributed according to a
lognormal epidemic curve, spatially over 3
clinics. - We choose 5 different times of the year (labeled
A-E) to inject the data and consider detections
for each one separately. - Affected zip codes are CZ3, CZ5 (center of the
epidemic), CZ10.
12Detections - Analogue Method and Scan Statistics
There were also detections at other zip codes.
At the time of Inject E there were detections at
zips CZ7 and CZ9 for the Analogue method and
clusters detected at several zip codes for
the Scan Statistics
13Detection of Shared Dynamics Via Transfer Entropy
- For each zip code with a detection for Inject E
we looked at (normalized) Transfer Entropy before
the injection period and after the injection
period. - This included TE between zip codes that did not
include the injects. - We compared this with the same (normalized)
differences in Transfer Entropy without injects. - Transfer Entropy increased between zip codes that
had injects and stayed the same between zip codes
that did not have injects.
14Results
- Whether or not there was a detection at site CZ3,
CZ10 or CZ5, the Transfer Entropy still showed an
increase during the inject period. - We computed TE for injections E and observed the
same behavior in both inject periods. - The detections together with the TE results
showed that CZ3, CZ5 and CZ10 were in a cluster
(definition shared dynamics plus a detection at
one or more of the zips) during the inject
period. - The Scan Statistics missed the center of the
inject cluster and in some cases, the cluster
itself. - Our method used a 60-day baseline for detection,
the same as used for the Scan statistics, but
used the entire time series up to injection D and
E to calculate Transfer Entropy for those inject
periods.
15Conclusions/Further Work
- With minimal tuning, the detection capability of
the Method of Analogues was as good or better in
most of these cases than the Scan Statistics. - Detection sensitivity could be increased with
some more investigation into optimal parameters
and weighting of observations. - The Transfer Entropy results show there is
potential for identifying clustering behavior
rather than just spatial coincidence. - This method has potential but there are
adaptability questions. It is difficult to get a
reliable density estimation for TE with sparse
time series. - This problem is not as serious as it seems
clinics with low background case levels generally
have few false alarms. False alarms for
locations with high background case levels are
the larger problem. - We hope to use these results to encourage
research in dynamical analysis of these data. - Fundamental Question how does a outbreak or
bioterrorism event differ dynamically from the
normal background dynamics?
16Acknowledgements
- Thanks to the organizers and sponsors!
- Thanks to the APL IRAD program for partially
funding this research. - The APL/CDC/SAIC scan statistic was developed
under an RO1 grant from CDC. - References
- Kuldorff , M. A spatial scan statistic.
Communications in Statistics Theory and Methods
26(6) (1997). - Lorenz, EN. Atmospheric predictability as
revealed by naturally occurring analogues. J.
Atmosphere Sci 26 (1969). - Schreiber, T. Measuring information transfer.
PRL 85(2) (2000). - Viboud et al. Prediction of the spread of
influenza epidemics by the method of analogues.
American Journal of Epidemiology 158 (10) (2003).
17ACE Algorithm
Dx 2.2 Dy 2.2 Ng 100
- Create a simple, coarse grid over the data points
3. Now add agents with rules to move along
mesh to find points with high densities.
2. Weight each point to its nearest grid
point by interpolation GRID points get the
mass!
18ACE Algorithm
m(i,j1)
m 10.5
m(i1,j)
m(i-1,j)
m 10.8
m12.7
m(i,j-1)
m 2.6
m10.5
m 1.1
ACE finds all clusters, with relative weights.
Rule-based Agent searching through masses at grid
points.
m10.8
19Dynamics of a Process
- A statistical analysis of a time series treats
each state of the system as a random process that
is governed by a distribution of the states only.
- A dynamical analysis of a time series assumes
that each state is determined by the immediately
previous state.
- The expected value (mean) of both is the same
- The next state of the blue line is 11, the next
state of the red line is 8.
20Differences in Transfer Entropy for Injects D.