Title: SpaceTime Modeling and Application to Emerging Infectious Diseases
1Space-Time Modeling and Application to Emerging
Infectious Diseases
National Health Research Institutes
July 26th, 2005
Division of Biostatistics and Bioinformatics
2Outline
- Introduction
- STARMA Models
- Methods for STARMA Modeling and Software IEAST
- Modeling Emerging Infectious Diseases using
STARMA and IEAST - Conclusion
3Introduction
4Introduction
- Toblers First Law of Geography
- Everything is related to everything else, but
near things are more related than distant
things.
5Introduction
- Biological and ecological processes are often
organized and correlated in both space and time. - Why use space-time data and space-time analyses?
- Various space-time models
- STKF, KKF, VARMA, STARMA, etc.
- Why STARMA models?
- Is emerging infectious diseases the only
application?
6Scope of the Work
- An efficient and robust STARMA modeling method
- Space-time extensions of optimization algorithm
and model fitness measures - Refinement of the space-time modeling procedure
- Software development -- IEAST
- The first general-purpose STARMA modeling and
analysis software - Integrated Environment for Analyzing STARMA
models - Application to the spread of WNV in an epidemic
in Detroit - Modeling and analysis of Dead Crow Data
- Modeling and analysis of Human Case Data
- Cross analysis of Human Case Data and Dead Crow
Data - Statistical inferences from these space-time
analyses
7STARMA Models
8Space-Time Variables Evolving over Time
- zt,x some ecological variable at spatial
coordinates vector x at time t. zx forms a time
series for location x. - These time series are not independent, but
influence each other via spatial proximity.
zt,(2,2)
random noise
zt,(1,2)
zt,(2,1)
zt,(0,0)
time
X
Y
9General STARMA Models
- The general STARMA model has the stochastic
equation - Model types
- STAR model (when ?k,b0)
- STMA model (when ?k,b0)
- Mixed model (when ?k,b ? 0 and ?k,b ? 0).
----- AR terms ----- ----- MA
terms ----- The strengths of the autoregressive
components is measured by ?k,b and the strengths
of shared moving average stochastic inputs are
?k,b.
10A Useful Form for STARMA Modeling
- By introducing the spatial weight matrices W(l),
we can express the general STARMA model as the
following form - This is the equation actually used for the
implementation of IEAST and applications.
where l spatial lag, k temporal lag zt is
the observation vector at time t W(l) is the
weight matrix for l-th order ?kl are the
parameters of autoregressive terms ?kl are the
parameters of moving average terms et is the
random noise vector at time t.
11Spatial Correlation Structure and Weight Matrices
- Spatial weight matrices are used to construct the
spatial correlation structure among locations. - The following ordering is an example of the
definition of spatial correlation structure (up
to 4th order neighbors) in 2D system.
12Some Limitations of STARMA Modeling
- Raster based
- Requires massive amount of space-time data
- Models generally may not be fully mechanistic
- Assumptions
- Stationarity
- Spatial Regularity
- Effects are constant
- Effects are linearly correlated
13Methods for STARMA Modeling and Software IEAST
14Box-Jenkins Modeling Method
Data
Model Identification
Parameter Estimation
Modify Model
Diagnostic Check
No
Good?
Yes
End
15Model Identification
- To determine the model type and orders.
- Conventionally, space-time autocorrelations (i.e.
STACF/STPACF) are used (Pfeifer and Deutsch,
1980). - In this research, space-time extensions of model
fitness measures (i.e. AIC, BIC) are used to
assist identification when the method above does
not work. These measures are more objective and
computationally efficient.
16Model Identificationusing Space-Time
Autocorrelation Functions
- Example 1 STAR (MaxT2, MaxS1)
- STACF tails-off
- STPACF cuts-off at T-lag2 S-lag1
- Example 2 STMA (MaxT1, MaxS1)
- STACF cuts-off at T-lag1 S-lag1
- STPACF tails-off
17Model Identification using Space-Time
Autocorrelation Functions
18Model Identification using Model Fitness
Measures
Accuracies (number in red) of model type
selection using (1)Variance of residuals, (2)AIC,
(3)BIC, and (4)AICBIC based on 150 Monte Carlo
simulated datasets
19Parameter Estimation
- To calculate coefficients of a candidate model
for given model type and orders. - Two methods needed for two kinds of models
- Linear models (i.e. STAR) Linear ML estimator.
- Non-linear models (i.e. STMA and Mixed)
Multi-variate nonlinear optimization. - The multi-variate and non-linear nature raises
problems while in optimization - Converge to local optima
- Very time-consuming
- A good starting point is crucial for optimization
- Extra step Pre-estimation
- Space-time extended Hannan-Rissanen Algorithm is
used.
20Diagnostic Check
- To decide the adequacy of a candidate model for
representing the given data. - Methods
- Variance of residuals
- Space-time autocorrelations of residuals
- Significance testing of parameters
- Space-time extension of AIC/BIC
21Modeling Procedures
Data
Model Identification
Parameter Estimation
Modify Model
Diagnostic Check
No
Good?
Yes
End
Box-Jenkins method
22Software for STARMA Modeling -- IEAST
- Developed using GNU Octave v2.1.40 and able to be
used under various popular OS, e.g. MS Windows,
Mac OS, Unix. - Two interfaces menu-driven mode and programming
mode. - Features
- True spatio-temporal analysis software
- Analyzing 2D lattice space-time datasets
- Full configurability
- Programming environment
- Improved estimation algorithms
- Improved diagnostic measures
- Estimation of spatial correlation structure
- Cross correlation analysis
- 2D/3D plotting abilities
23IEAST Menu-Driven Mode vs Programming Mode
In menu-driven mode, users can conduct the
modeling procedure by selecting a series of
commands/options from the menu hierarchy.
24IEAST Menu-Driven Mode vs Programming Mode
In programming mode, a set of sophisticated
instructions can be used to compose programs to
control the modeling flow and to conduct
statistical analyses.
25Modeling Emerging Infectious Diseases using
STARMA and IEAST
26State of Art for Statistical Analyses of Emerging
Infectious Diseases
- As far as we know, no true spatial-temporal
statistical models and methods have been used. - Space-time cluster analysis available
(Theophilides et al, 2003 Mostashari et al,
2003 Hoebe et al, 2004) - Spatial models available (Watson et al, 2004).
- Temporal models available.
27Limitations of Simply Observing How a Spatial
Distribution Changes over Time
- For example, expansion of the leading edge of a
disease range. - Is the disease spreading directly over long
distances but infrequently, or over short
distances frequently? - This is important for projecting the future
spread.
28STARMA Has Potential for the Early
Characterization of Infectious Diseases.
- STARMA acts as a prism. Can filter the
spatial-temporal correlations into direct effects
with known magnitude and spatial and temporal
lags. - Not generally a complete, mechanistic model, but
puts critical constraints on models.
29West Nile Virus
- The West Nile Virus (WNV) was first detected
in a woman with a mild fever in the West Nile
District of Uganda in 1937. Since then WNV has
been spreading to North Africa, Europe, West and
Central Asia, and the Middle East.
30 West Nile Virus in the United States
- Outbreak in NYC in Sep 1999. Vector is Culex
mosquitoes. - Wild birds (89 are American crows) are the
principal hosts. Humans, horses, etc. are
incidental hosts. - The incidence rate among crows is high. Infected
crow almost always die (68).
- Surveillance of Dead crows has been used as an
indicator of WNV epidemic.
31Dead Crow Data (DCD) Human Case Datasets (HCD)
in 2002
- Time Summer in 2002 (AprilOctober)
- Place Detroit metro area (Oakland, Macomb, and
Wayne) - DCD were collected systematically before and
during an outbreak among humans. Data mainly
consisted of locations and dates of reported
public sightings. - HCD were obtained from clinicians in Michigan.
Data on address of residence and date of onset of
disease were obtained from the case-patient or
attending physician through telephone interviews.
32Two Datasets Collected in 2002
Human Cases
Interview
GIS - ArcMap
Toll-free
Dead Crows
Longitude/Latitude
Data Cleaning Geocoding
WWW pages
From www.rci.rutgers.edu/ insects/crowid.htm
33Space-Time Analysis for Dead Crow Data
34The Dead Crow Data
- Totally, 1817 dead crow sightings scattered
within the three counties (red lines), spanning
28 weeks. - Covered area (after truncation) a rectangular
area of 31.6x25.8 mi - Divide the covered area into 10x10 cells. Cell
size 3.16x2.58mi
35Spatial Correlation Structure and Trends
- Spatial correlation structure (uniform weighting)
- Preprocessing
- Remove spatio-temporal trend
- Spatial trend 4th order polynomial regression
trend surface - Temporal trend averaging over space.
- Remove mean
36Model Identification STACF
Tail-off
STACF tails-off
37Model Identification STPACF
38Parameter Estimation
The parameters (?ts) of this STAR model can be
estimated in IEAST by linear maximum likelihood
estimator.
- Values in dark blue are nominally significant at
the 0.001 level. - Values in light blue are nominally significant
at the 0.01 level.
39Diagnostic Check
- Statistical significance of parameters
- The probabilities P that ?ts are not significant
are - Residuals autocorrelations
STACF
STPACF
40Interpretations for the DCD Analysis
- STAR(3,4) model is the best-fitted one.
- The max. of spatial and temporal lags that are
important are still smaller. S2 (or 6.4 km) and
T2 weeks. - Compare S1 to S2. Value for S1 is much
largercell boundary length effects. - The virus is not spreading very far very fast.
Crows are not much spreading the virus spatially,
though they probably are amplifying it locally. - Negative Autoregressive Effect At S1, and T2,3.
- Appears to be a real effect.
- May be due to crow population depletion.
- Suggests there is a mixture of two STAR
processes, the dominant one reflecting
probability of infection, the other an echo
effect from depletion.
41Additional Analyses and Results
- Additional Analyses
- Using 20x20 and other cell configurations
- Using different lag structures Pfeiffers vs.
Ring structure - Using various polynomials for Spatial de-trending
- Using sub-sample of the data
- Results
- Consistent over various methods of spatial
de-trending, except high order polynomials
resulted in smaller AR. - Consistent AR values using different lag
structures and cell sizes. - Consistent implied spatial and temporal scales
over which there are significant or substantial
AR effects
42Distances for Which There Are Significant Spatial
Correlation
- Based on different cell configurations 10x10,
16x16, and 20x20 - The effective correlated area in the modeling
result is consistently about 10.75 km regardless
of cell sizes.
43Alternative Spatial Correlation Structures
Ring structure
Pfeifers
44Space-Time Analysis for Human Case Data
45Human Case Data
- Over 500 human cases spanning 13 weeks
- Date of onset-converted to week
- Home addresses (names stripped)-converted to
cell, same as for DCD. - Used same arrays of cell sizes and spatial
correlation structures as for DCD. - Same spatial and temporal de-trending method
46Model Identification STACF
47Model Identification STPACF
48Parameter Estimation
Spatial lags
Temporal lags (weeks)
- Values in dark blue are nominally significant at
the 0.001 level. - Values in light blue are nominally significant
at the 0.01 level.
49Diagnostic Check
- Residuals STACF and STPACF
STACF
STPACF
50Interpretations for the HCD Analysis
- Most people are getting infected at or near their
homes. - The incidences are highly autocorrelated in space
and time. - The distribution or probability of infection is
highly localized. - The WNV load and probability of human infection
is spreading slowly, in the sense of not
spreading very far very fast. - Suggests localized spraying could reduce cases.
- Without depletion effect, the human case data
show positive and significant above zero for
T-lag2 and S-laggt1, esp. at S-lag1.
51Space-Time Cross Analysis for HCD and DCD
52Space-Time Data HCD and DCD
- The areas for cross analysis are same for both
datasets. - The configuration is again 10x10 and spanning 28
weeks. - Cell size is 6.31x6.31 km.
53Both Temporal Epidemic Curves
- Dead crow reported is leading human cases in time.
54Space-Time Cross Correlations
-3
55Interpretations for Space-Time Cross Correlations
- Drop smoothly to zero spatially and temporally.
- Very large (as high as 0.7).
- Across all spatial lags, the max. cross
correlations are aligned at 3 weeks. - The cross correlations at spatial lag 1 is
slightly greater than at spatial lag 0. - When temporal lag decreases to 8 or below, the
correlations between these two datasets are
negligible (lt0.1). - When spatial lag increases up to 10, the cross
correlations are reduced to as low as 0.2.
56Is the Cross Correlations Spurious?
The autocorrelation of the DCD can spuriously
contribute to cross correlations. To eliminate
this effect, both datasets were pre-whitened
before calculating cross correlations.
- The result shows that the real cross
correlations are much larger than the spurious
components.
57Summary for Modeling the Spread of WNV
- Crows are not spreading the disease spatially
very far very fast. - Spread is very localized, perhaps other animals
or the mosquitoes themselves are spreading it
spatially. - Humans are being infected largely at or near
their homes. - Both crows and humans appear to be responding to
local viral loads. - Dead crow findings precede human cases by two to
three weeks. Dead crows can be a good indicator
of human epidemics.
58Conclusion
- It appears that STARMA modeling could be an
important tool of the early characterization of
many emerging and re-emerging infectious disease
epidemics. - During the course of an epidemic, it could be
used (in principle) for forecasting, under
existing conditions or under potential courses of
action. - While not generally a mechanistic model, STARMA
does inform spatial and temporal scales of
spread, hence places constraints on mechanistic
models (which otherwise may have too many
parameters).
59Funding Acknowledgements
- Michigan Agricultural Experiment Station,
Michigan State University. - Center for Emerging Infectious Diseases, Michigan
State University. - Centers for Disease Control and Prevention, USA.
60- Thanks for your attention!
- Questions?
61References
- C.J.P.A. Hoebe, H. de Melker, L. Spanjaard, J.
Dankert, and N. Nagelkerke. Space-time cluster
analysis of invasive meningococcal disease,
Emerging Infectious Disease, Vol.10, No. 9,
p1621-1626, 2004. - C.N. Theophilides, S.C. Ahearn, S. Grady, and M.
Merlino. Identifying West Nile virus risk areas
The dynamic continuous-area space-time system.
American Journal of Epidemiology, 157843-854,
2003. - J. Watson, R. Jones, K. Gibbs, and W. Paul. Dead
crow reports and location of human West Nile
virus cases, Chicago, 2002. Emerging Infectious
Diseases, 10(5)938-940, 2004. - F. Mostashari, M. Kulldorff, J.J. Hartman, J.R.
Miller, V. Kulasekera. Dead bird clustering A
potential early warning system for West Nile
virus activity. Emerging Infectious Diseases,
9641-646, 2003.