Application of Nonlinear Data Analysis Methods to Locating Disease Clusters PowerPoint PPT Presentation

presentation player overlay
1 / 20
About This Presentation
Transcript and Presenter's Notes

Title: Application of Nonlinear Data Analysis Methods to Locating Disease Clusters


1
Application of Nonlinear Data Analysis Methods to
Locating Disease Clusters
  • Linda Moniz and William Peter
  • International Society for Disease Surveillance
  • Track 1 Surveillance Innovations - Spatial
    Method Innovations
  • Raleigh, NC
  • December 4, 2008

2
The Problem
  • Daily reports of syndrome counts by clinic or
    home zip code.
  • In this study we focus on the respiratory
    syndrome.
  • We need to detect any unusual clusters of
    disease, quickly
  • These detections are for health departments and
    need to be done in real time, updated daily,
    automated.
  • Whats unusual and whats a cluster?
  • Can dynamical methods assist currently used
    operational algorithms?

3
Operational Algorithms
  • Many operationally utilized clustering methods
    are based the Kulldorff Scan Statistic (1997).
  • This algorithm is an individually most powerful
    test under ideal circumstances.
  • This algorithm finds the most unlikely cluster in
    a space given the expected distribution. It
    calculates the significance using a Monte-Carlo
    process.
  • The shapes of the clusters are generally
    pre-determined by the algorithm.
  • Much current research into clustering revolves
    around improving the Scan Statistics speed
    and/or accuracy.
  • Scan Statistics are prone to false alerts when
    the algorithm is tuned for sensitivity.
  • Can we use dynamical methods to filter the
    false alerts?
  • We use a recent C-implementation of the scan
    statistic a collaboration of APL/CDC/SAIC that
    works well and is fast (uses a Gumbel
    distribution).

4
Data - Particulars
  • About 3 years of data from a network of
  • Clinics in the S.W. USA, all with the same
  • reporting and care protocol GOOD DATA

5
Re-examination of the Problem
  • This is a dynamical problem as well as a
    statistical problem. A cluster is a set of
    locations with shared dynamics.
  • We proposed de-coupling the detection of unusual
    behavior and the clustering of unusual behavior.
  • First, see if there are detections.
  • Secondly, see if the detections cluster in space.
  • Lastly, see if the clustering is coincidental or
    if there is evidence of shared dynamics at the
    cluster locations.
  • Detections in the same place
  • Do they have evidence of shared dynamics?

6
Spatial Clustering Detection
  • We use the ACE algorithm to see if detections
    cluster spatially.
  • The ACE algorithm is a particle-mesh heuristic
    that can find any shaped cluster, based on
    adaptive criteria.
  • The results we see here did not require spatial
    clustering we used clinic-level data at 16
    clinics throughout the U.S. Southwest.
  • The method must be used for home zip-code level
    data, which counts disease cases for hundreds of
    zip codes.
  • To VERIFY that the spatial clusters are related,
    we use Transfer Entropy (Schreiber, 2000) to test
    for information transfer between proposed
    locations in the cluster.

7
Detection
  • Viboud et al. (2003) used Lorenzs Method of
    Analogues(1969) to retrospectively analyze and
    predict flu seasons across France over 10 years.
    Their results were much more accurate than
    currently used statistical methods.
  • We will use the Method of Analogues prospectively
    and with an adaptation we use the data itself to
    determine a threshold level for unusual behavior.
  • We use the data to determine the parameters for
    the method (the number of neighbors (7), and the
    number of days used in the forecast (1).
  • These parameters gave optimal prediction in the
    testing period (60 days) for nearly all zip
    codes time series.

8
Prediction The Method of Analogues
Weight these observations to produce a
prediction for the circled point
.9995
.9565
.9175
.98
Prediction .9995 (.3705) .9565(.2743 )
.9175(.3551 ) .9585
9
Transfer Entropy
  • Consider transition probabilities of a system at
    spatial site X
  • p(xi1
    xi).
  • What happens if we also consider the information
    at site Y?
  • p(xi1 xi, yi).
  • If Y gives no information about X, p(xi1 xi,
    yi ) p(xi1 xi).
  • Otherwise, we can define the Transfer Entropy
    from Y to X
  • Transfer Entropy from Y to X tells us how much we
    can learn
  • about site Xs dynamics by monitoring Y.

10
Testing Method
  • We compare our method with a fast, efficient
    method the APL/CDC/SAIC scan statistic.
  • Problem We have 3 years worth of historical
    data. We do not know if there were any outbreaks
    of disease in the historical data.
  • In order to test or compare any detection
    algorithm, we inject artificial disease clusters
    on top of the real data and see if they can be
    detected.
  • We also see if other, spurious clusters are
    detected (false alerts?).
  • We choose a geometry/location for which the
    APL/CDC/SAIC algorithm has poor sensitivity and
    is prone to false alarms.

11
Injected Data
  • We inject both 50 cases and 100 cases on top of
    the real clinic data.
  • These cases are distributed according to a
    lognormal epidemic curve, spatially over 3
    clinics.
  • We choose 5 different times of the year (labeled
    A-E) to inject the data and consider detections
    for each one separately.
  • Affected zip codes are CZ3, CZ5 (center of the
    epidemic), CZ10.

12
Detections - Analogue Method and Scan Statistics
There were also detections at other zip codes.
At the time of Inject E there were detections at
zips CZ7 and CZ9 for the Analogue method and
clusters detected at several zip codes for
the Scan Statistics
13
Detection of Shared Dynamics Via Transfer Entropy
  • For each zip code with a detection for Inject E
    we looked at (normalized) Transfer Entropy before
    the injection period and after the injection
    period.
  • This included TE between zip codes that did not
    include the injects.
  • We compared this with the same (normalized)
    differences in Transfer Entropy without injects.
  • Transfer Entropy increased between zip codes that
    had injects and stayed the same between zip codes
    that did not have injects.

14
Results
  • Whether or not there was a detection at site CZ3,
    CZ10 or CZ5, the Transfer Entropy still showed an
    increase during the inject period.
  • We computed TE for injections E and observed the
    same behavior in both inject periods.
  • The detections together with the TE results
    showed that CZ3, CZ5 and CZ10 were in a cluster
    (definition shared dynamics plus a detection at
    one or more of the zips) during the inject
    period.
  • The Scan Statistics missed the center of the
    inject cluster and in some cases, the cluster
    itself.
  • Our method used a 60-day baseline for detection,
    the same as used for the Scan statistics, but
    used the entire time series up to injection D and
    E to calculate Transfer Entropy for those inject
    periods.

15
Conclusions/Further Work
  • With minimal tuning, the detection capability of
    the Method of Analogues was as good or better in
    most of these cases than the Scan Statistics.
  • Detection sensitivity could be increased with
    some more investigation into optimal parameters
    and weighting of observations.
  • The Transfer Entropy results show there is
    potential for identifying clustering behavior
    rather than just spatial coincidence.
  • This method has potential but there are
    adaptability questions. It is difficult to get a
    reliable density estimation for TE with sparse
    time series.
  • This problem is not as serious as it seems
    clinics with low background case levels generally
    have few false alarms. False alarms for
    locations with high background case levels are
    the larger problem.
  • We hope to use these results to encourage
    research in dynamical analysis of these data.
  • Fundamental Question how does a outbreak or
    bioterrorism event differ dynamically from the
    normal background dynamics?

16
Acknowledgements
  • Thanks to the organizers and sponsors!
  • Thanks to the APL IRAD program for partially
    funding this research.
  • The APL/CDC/SAIC scan statistic was developed
    under an RO1 grant from CDC.
  • References
  • Kuldorff , M. A spatial scan statistic.
    Communications in Statistics Theory and Methods
    26(6) (1997).
  • Lorenz, EN. Atmospheric predictability as
    revealed by naturally occurring analogues. J.
    Atmosphere Sci 26 (1969).
  • Schreiber, T. Measuring information transfer.
    PRL 85(2) (2000).
  • Viboud et al. Prediction of the spread of
    influenza epidemics by the method of analogues.
    American Journal of Epidemiology 158 (10) (2003).

17
ACE Algorithm
Dx 2.2 Dy 2.2 Ng 100
  • Create a simple, coarse grid over the data points

3. Now add agents with rules to move along
mesh to find points with high densities.
2. Weight each point to its nearest grid
point by interpolation GRID points get the
mass!
18
ACE Algorithm
m(i,j1)
m 10.5
m(i1,j)
m(i-1,j)
m 10.8
m12.7
m(i,j-1)
m 2.6
m10.5
m 1.1
ACE finds all clusters, with relative weights.
Rule-based Agent searching through masses at grid
points.
m10.8
19
Dynamics of a Process
  • A statistical analysis of a time series treats
    each state of the system as a random process that
    is governed by a distribution of the states only.
  • A dynamical analysis of a time series assumes
    that each state is determined by the immediately
    previous state.
  • The expected value (mean) of both is the same
  • The next state of the blue line is 11, the next
    state of the red line is 8.

20
Differences in Transfer Entropy for Injects D.
Write a Comment
User Comments (0)
About PowerShow.com