Title: Statistical issues and some solutions around spatial surveillance for public health
1Statistical issues (and some solutions) around
spatial surveillance for public health
- Ken Kleinman
- Applied Statistics Workshop, 3/21/2007
2Outline
- Motivation
- Data setting
- Analytic approaches
- Evaluation
- Discussion
3Anthrax
In the initial phase of inhalational anthrax,
the symptoms resemble those of a bad cold.
Diagnosis of anthrax typically would require an
X-ray and a culture, but these are not common
responses to cold symptoms.
4Background Course of anthrax
5Background Victims behavior
6Background Hospital surveillance
7Background Outpatient surveillance
8Background
- Anthrax cases by day after release
9Background
- In addition to anthrax, many other possible
bioterrorism agents have non-specific initial
presentation botulism, plague, smallpox, and
tularemia - Also pandemic influenza
- Together with anthrax, this list includes all the
CDC class A bioterrorism agents except for viral
hemorrhagic fevers
10Data setting
11Data setting
- Our goal is to detect anthrax early, using doctor
visits - The setting An HMO and Provider group with
240,000 members in eastern Mass.
12Data syndromes
- Care providers enter diagnoses during each visit,
but no one will actually diagnose anthrax at the
first visit - More common cough
- Define syndrome (symptom set) made up of
flu-like symptoms Lower Respiratory Illness (LRI)
13Data Electronic records
- Automated ambulatory medical records
- Live, continuously updated records of each
patient contact, including ICD-9 diagnosis - In use at many practices
- Part of standard care procedures
- No additional burden on busy care providers low
cost practical - Includes at least zip code spatial data
14Data basic details
- The overall rate of LRI is small 75,000 cases
over 1400 days for 240,000 individuals marginal
probability a person is seen with symptoms that
land them in this syndrome on a given day is
0.00022
15Data
More than one per 1000 were sick and went to the
doctor!
16Data
17Data
18The question
- Is there any evidence from this data to suggest
something unusual is going on? - Prospectively, on any given day?
- Meaning, collect data today, decide today
collect tomorrow, decide tomorrow,
19Analytic approaches
20Analyses naive
- The plots suggest approaches from the QC
literature, e.g. - CUSUM
- Other Control charts
- But the data are noisier and there are
predictable patterns that do not signify - Some have proposed CUSUM of residuals from a TS
regression
21Approaches better
- But we have spatial data zip codes for each
patient. Surely well do better if we use it?
22Spatial approaches
- Spatial cluster detection identification
- A long history with many approaches
- Knox (50s), M-statistic (Pagano et al.), Scan
Statistic (Kulldorff et al.), etc., etc., - Little methods work targeted at repeated, ongoing
data collection surveillance - Repeated small-area analysis
- E.g., GLMM Approach (Kleinman at el., AJE 2004)
SMART scores
23GLMM Approach
- Treat zip codes as independent subjects (not
true closer in space gt more highly correlated,
probably) - Treat each day as a repeated observation on the
zip code count of syndrome visits is our outcome - These are longitudinally repeated binomial
observations denominator is the number of
insurees living in the zip code
24Modeling Approach
- We use the Generalized Linear Mixed Models
approach to logistic regression - Takes into account the correlation between
repeated days observed on a given zip code - We could also use a GLMM Poisson regression, or
GLM (both discussed in Kleinman, in Lawson and
Kleinman, Wiley 2005)
25Modeling Approach
- The model looks just like a logistic regression,
with some additional subscripts and one more
parameter - where i is the zip code with repeated days t,
yit is the number of visits, nit is the number of
insured, and bi is a random effect bi N(0, sb2)
26Modeling Approach
- The random effect, bi , allows a unique intercept
for each region is there a little community of
hypochondriacs somewhere? Are there more elderly
or children? - In testing H0 sb2 0, we rejected H0 there
really are differences between areas. - The estimated bi are effectively a weighted
average of the crude rate in each area i and the
average of the bi.
27Estimated random effects
- The weight of the crude rate increases with the
population in area i, so that areas with small
populations tend to have estimated bi weighted
towards the mean - This is helpful, since otherwise the larger
variability of those crude rates would interfere
in later steps - Note bi are AKA shrinkage estimators and
emipirical Bayes estimators
28Modeling Approach
- Fixed effects covariates (xit) 11 months, 6 days
of week, holiday indicators - Face validity
- Odds by month highest in winter months, lowest in
summer - Odds by day highest Mondays, lowest on weekends
- OR for holidays less than 1
29Modeling Approach
- To use the model, we invert the estimated logit
for each tract/day, using the estimated fixed
effects and the shrinkage estimators to get an
estimated binomial pit for each census tract i on
some day t. - (t is greater than any date used to fit the
model.) - Then we calculate the probability of seeing as
many cases as we saw, or more, assuming that pit
is correct.
30Use
- This is basically a p-value, for H0 the data
come from a binomial distribution with the pit
estimated from the model - There are 250 census tracts in our area
- We estimate a p-value for each tract each day
- A small multiple comparisons problem gt 90,000
tests/year.
31Modeling Approach
- We report the Recurrence interval (RI)
- (nominal p ntests)-1
- This is the number of times wed have to do
ntests so that wed expect one p-value this small
or smaller - ntests could be the number of census tracts
tested each day - One advantage to expressing it this way us that
big is bad.
32Approaches Brute force
- Space and Time Scan statistic, aka SaTScan (free
software www.satscan.org) - The basic idea here is to
- Enumerate all possible (circular) clusters
- Calculate a statistic for each one
- Select the most unusual cluster
- (Kulldorff, JRSSA 2001 etc.)
33Brute Force Approach
- To get a p-value, use Monte Carlo testing (Dwass,
1957) - Reassign all cases enumerate, calculate, select
- Repeat n-1 times
- p-value r/n r is the rank of the real cluster
among the set of n including the real statistic
and all of the n-1 Monte Carlo statistics
34SaTScan heuristic
- Suppose these are the observed cases on a day
35SaTScan heuristic
- Consider the possible circular clusters with a
center on one of the cases
36SaTScan heuristic
One of those possible clusters
37SaTScan heuristic
Likelihood prop. to
n cases in circle N cases total
expected cases
38SaTScan heuristic
- Repeat for all possible clusters (those with
different observed and expected cases different
likelihood values)
39SaTScan heuristic
- Of course, those can be centered on any observed
case (or any other point, for that matter)
40SaTScan heuristic
- To add time, imagine stacking maps, and
cylindrical potential clusters.
41SaTScan adaptation
Recall
Likelihood prop. to
n cases in circle N cases total
expected cases SaTScan assumes that the
expected number of cases is proportional to the
population living in the cluster. But I just
argued that this is not viable!
42Adjusted SaTScan
Recall
Likelihood prop. to
n cases in circle N cases total
expected cases Instead, we replace
with some multiple of nitpit, the expected number
of cases under the GLMM model. This adjusts for
spatial variation and all the fixed covariates in
the model. (Kleinman et al., Epi and Inf 2005)
43Evaluation Is surveillance worth doing? How
should we do it?
- Nomenclature
- A signal is generated by a statistical analysis
of the data - An outbreak is a set of cases of disease in the
world - A hit is a signal that is plausibly caused by an
outbreak
44Three approaches to evaluation
- Type I Real events
- Take a data set from a proposed system that
covers a period with known real outbreaks. - Apply proposed signal generation method.
- Evaluate performance of system/method in
detecting those events
45Three approaches to evaluation
- Type I Real events
- Advantages Real data is very convincing
- Disadvantages
- Not informative about all kinds of outbreaks
(e.g., bioterrorism). - Hard to get (exhaustive) data on real outbreaks.
- Few outbreaks.
- Many unknowns.
46Evaluation real-world
- Can automated surveillance help traditional
surveillance? - Compare gold standard detection of outbreaks (by
health department) to signals generated by using
an adjusted SaTScan analysis on HMO/outpatient
data
47Evaluation real-world
- Example food-borne illness in MN
- We mimicked live surveillance repeated
analyses 365 times, adding a day to the data set
each time got 22 signals - How do these compare with the 71 outbreaks of GI
disease? - Sometimes having a good visual presentation of
the data is a key part of the data analysis
48(No Transcript)
49(No Transcript)
50(No Transcript)
51(No Transcript)
52(No Transcript)
53(No Transcript)
54Evaluation
- Fun picture, but how did we do? Did we get more
real hits than wed expect by chance?
55Evaluation
- Did we get more real hits than wed expect by
chance? - Idea randomization test
- Lay mock statistical clusters at random points of
random sizes did we beat that? - BUT Population density varies over space
anti-conservative bias to fewer hits in mock
clusters
56Evaluation
- Did we get more real hits than wed expect by
chance? - Idea better randomization test
- Draw locations (and radii) for mock clusters from
observed distribution did we beat that? - BUT Rate of clusters varies by season
anti-conservative bias to fewer hits with mock
clusters
57Evaluation
- Permutation test (Kleinman et al. Statistics in
Medicine 2006) - Break statistical signal data into
location/radius and date pairs - Permute the pairs (link a location/radius to a
probably different date to create a new set of
pseudo-signals) - Record number of hits
- Repeat. Result is the null distribution for the
number of hits assuming marginals of
location/radius and date compare to actual
number of hits
58Results
59Three approaches to evaluation
- Type III Complete simulation
- Simulate background data that resembles the
outbreak-free observed data from the system in
some period of time. - Then simulate outbreaks with characteristics of
interest, as well as how they would appear in the
system. - Evaluate performance in detecting them.
60Three approaches
- Type III Complete simulation
- Advantages
- Known noise (allows you to assess effects of
wrong assumptions about behavior when no
outbreaks) - Known event characteristics, any of which can be
modified
61Three approaches
- Type III Complete simulation
- Disadvantages
- Many data sets, each large if spatial data
- Lots of speculation about outbreak and
non-outbreak behavior - Hard to convince public health practitioners of
the value of simulations
62Three approaches
- Type II Partial simulation
- Simulate attacks and add to real background data
- Injected events
63Partial simulation
- Advantages
- Known events (vs. real data)
- Many events (vs. real data)
- Smaller data sets (vs. full simulation)
- Fewer assumptions (vs. full simulation)
- Allows evaluation of full system as opposed to
just the statistical methods, at least for false
positives (vs. full simulation)
64Partial simulation
- Disadvantages
- Speculation (vs. real data)
- Effects of real events in the real data
complicate things - Only one data set about true negatives (with
respect to events being simulated vs. full
simulation) - Cant evaluate violations of model assumptions
(vs. full situation)
65Evaluation- Type II
- Simulating anthrax attacks
- Not a whole lot of data, but enough to be
plausible - Sverdlovsk 1979
- US Attacks of October 2001
- Monkey experiments in the 50s
66Schematic of our simulation
(Kleinman et al., Emerging Infectious Diseases
2005)
67Steps of our simulation
- Choose a random point for anthrax drop
- Choose a shape of sporefall
- Calculate the number of spores each person is
exposed to (depends only on 1 and 2) - Determine who gets sick
- Determine when they get sick
- Determine who among sick is under surveillance
and who enters system
68Parameters
- Timing of onset is a fixed distribution
- Number in system by zip code is known
- Probability of seeing doctor is fixed at 0.2
- Areas of release urban and suburban (2)
- Probability of illness/spore 5 levels
- Shape of plume 3 shapes
- We repeat three times each day of the year 1095
simulations for each of 30 sets of conditions
69Evaluation Type II
- Plausible distributions for time to onset, given
illness lognormal
Sverdlovsk
Simulated Lognormal
70Simulation
- Urban and suburban areas around Boston
71Parameters
- Agriculture Gaussian plume Pasquill, 1974
- Concentration depends on constants including wind
speed, plus x, which is the distance from the
release point in the wind direction, and y, the
distance perpendicular from the wind direction.
x appears only in the sigma y and sigma z.
72(No Transcript)
73(No Transcript)
74(No Transcript)
75How to assess?
- False negative rate?
- Mean days to detection?
- Plot these vs. detection threshold?
- For every set of parameters?
- For every detection algorithm?
- Yuck!
76(No Transcript)
77(No Transcript)
78Evaluation
- Some more practical ideas
- From Kleinman and Abrams Statistical Methods in
Medical Research 2006.
79Evaluation
- The ROC curve is generated by changing the
decision threshold, the point at which you
decide the result of the stat analysis is too
unusual to ignore. - For a decision threshold, calculate sensitivity
(probability of a signal, given an outbreak) and
false positive rate (probability no outbreak,
given a signal). - The area under the ROC curve is a useful stat.
80Evaluation wrinkles
- With injected events, sensitivity can easily be
estimated as the proportion of simulated attacks
that are detected - Sensitivity Pr(an event is signaled) ? per
simulated attack - Specificity is more problematic false alarms can
only be estimated from the real data - 1-SpecificityPr(signal in real data) ? per day
81Evaluation wrinkles
- Thus 1) specificity and sensitivity are estimated
with unequal sample sizes 2) linkage of
specificity to sensitivity is lost 3) different
denominators - In addition, assuming any signals in the real
data to be false positives means there can be no
events in the real data not unreasonable for
anthrax attacks, but important to bear in mind
for less catastrophic events.
82Example ROC
83ROC Generalization
- The mean proportion of time saved also changes
with the threshold, and this is a key
consideration in practice. - One simple idea to incorporate the timeliness is
to simply weight the sensitivity by the mean
proportion of time saved at each threshold,
meaning to plot the product of the sensitivity
times the mean time saved, and find the area
under that.
84Propotion time saved
- Assume a reference signal would find the outbreak
on day tr - Assume our signal detection tool finds it on day
td - Proportion time saved is (tr td)/(tr)
- Values closer to 1 mean we did a better job.
85ROC Generalization
- The mean proportion of time saved also changes
with the threshold, and this is a key
consideration in practice. - One simple idea to incorporate the timeliness is
to simply weight the sensitivity by the mean
proportion of time saved at each threshold,
meaning to plot the product of the sensitivity
times the mean time saved, and find the area
under that.
86Weighted ROC
87ROC Generalization 2
- A more complicated notion is to recalculate the
usual ROC curve for a given proportion time
saved. That is, redefine a hit as a signal that
hits and saves at least X percent of the time.
This can be done across the range of proportion
time saved, just as with the ROC curve itself. - Then the sensitivity could be the third dimension
88Repeated ROC
89Results (same data as matrix)
Urban area, type A pattern, Pr(ill per spore)
10-8 same as red-number histograms
90Future work
- Analysis methods
- For regression Negative Binomial, polychotomous
regression - Incorporating information about variability of
estimated pit - Adjusting scan methods to allow for uninteresting
clustering - Methods to incorporate multiple correlated
streams of surveillance data
91Future work
- Evaluation
- Methods to compare signal generation techniques
for real data - Software development to enable quicker and
broader comparisons of signal detection
techniques - Simulations based on more realistic outbreaks
than anthrax
92Discussion
- Lots of cool stuff going on with this kind of
data, plus a feeling of directly helping security
and preparedness - All areas I discussed today are wide open
- Cluster/outbreak detection
- Evaluation of real data
- Evaluation metrics