Practical Aspects of Alerting Algorithms in Biosurveillance - PowerPoint PPT Presentation

About This Presentation

Title:

Practical Aspects of Alerting Algorithms in Biosurveillance

Description:

Practical Aspects of Alerting Algorithms in Biosurveillance. Howard S. Burkom ... School Nurse Data: All Visits. unreported. Cluster Investigation by Record ... – PowerPoint PPT presentation

Number of Views:39

Avg rating:3.0/5.0

Slides: 32

Provided by: BURK8

Learn more at: http://archive.dimacs.rutgers.edu

Category:

more less

Transcript and Presenter's Notes

Title: Practical Aspects of Alerting Algorithms in Biosurveillance

1
Practical Aspects of Alerting Algorithms in
Biosurveillance

Howard S. Burkom
The Johns Hopkins University Applied Physics
Laboratory
National Security Technology Department
Biosurveillance Information Exchange Working
Group
DIMACS Program/Rutgers University
Piscataway, NJ February 22, 2006

2
Outline

What information do temporal alerting algorithms
give the health monitor?
How can typical data issues introduce bias or
other misinformation?
How do spatial scan statistics and other
spatiotemporal methods give the monitor a
different look at the data?
What data issues are important for the quality of
this information?

3
Conceptual approaches to Aberration Detection

What does aberration mean? Different approaches
for a single data source
Process control-based The underlying data
distribution has changed many measures
Model-based The data do not fit an analytical
model based on a historical baseline many
models
Can combine these approaches
Spatiotemporal Approach The relationship of
local data to neighboring data differs from
expectations based on model or recent history

4
Comparing Alerting AlgorithmsCriteria

Sensitivity
Probability of detecting an outbreak signal
Depends on effect of outbreak in data
Specificity ( 1 false alert rate )
Probability(no alert no outbreak )
May be difficult to prove no outbreak exists
Timeliness
Once the effects of an outbreak appear in the
data, how soon is an alert expected?

5
Aggregating Data in Time
6
Elements of an Alerting Algorithm

Values to be tested raw data, or residuals from
a model?
Baseline period
Historical data used to determine expected data
behavior
Fixed or a sliding window?
Outlier removal to avoid training on
unrepresentative data
What does algorithm do when there is all zero/no
baseline data?
Is a warmup period of data history required?
Buffer period (or guardband)
Separation between the baseline period and
interval to be tested
Test period
Interval of current data to be tested
Reset criterion
to prevent flooding by persistent alerts caused
by extreme values
Test statistic value computed to make alerting
decisions
Threshold alert issued if test statistic exceeds
this value

7
Rash Syndrome Grouping of Diagnosis
Codeswww.bt.cdc.gov/surveillance/syndromedef/word
/syndromedefinitions.doc
8
Example Daily Counts with Injected Cases
Injected Cases Presumed Attributable to Outbreak
Event
9
Example Algorithm Alerts Indicated
10
EWMA Monitoring

Exponential Weighted Moving Average
Average with most weight on recent Xk
Sk wS k-1 (1-w)Xk,
where 0 lt w lt 1
Test statistic
Sk compared to expectation from sliding
baseline
Basic idea monitor
(Sk mk) / sk

Added sensitivity for gradual events
Larger w means less smoothing

11
Example with Detection Statistic Plot
12
Example EWMA applied to Rash Data
13
Effects of Data Problems
14
Importance of spatial data for biosurveillance

Purely temporal methods can find anomalies, IF
you know which case counts to monitor
Location of outbreak?
Extent?
Advantages of spatial clustering
Tracking progression of outbreak
Identifying population at risk

15
Evaluating Candidate Clusters
Surveillance Region
Candidate cluster The scan statistic gives a
measure of how unlikely is the number of cases
inside relative to the number outside, given the
expected spatial distribution of cases (Thus, a
populous region wont necessarily flag.)
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
16
Selecting Candidate Clusters
x
x
x
x
x
x
x
x
x
x
x
x
x
17
Searching for Spatial Clustering

form cylinders bases are circles about each
centroid in region A, height is time
calculate statistic for event count in each
cylinder relative to entire region, within space
time limits
most significant clusters regions whose
centroids form base of cylinder with maximum
statistic
but how unusual is it? Repeat procedure with
Monte Carlo runs, compare max statistic to maxima
of each of these

18
Scan Statistic Demo
19
Scan Statistics Advantages

Gives monitor guidance for cluster size,
location, significance
Avoids preselection bias regarding cluster size
or location
Significance testing has control for multiple
testing
Can tailor problem design by data, objective
Location (zipcode, hospital/provider site,
patient/customer residence, school/store address)
Time windows used (cases, history, guardband)
Background estimation method model, history,
population, eligible customers

20
Surveillance ApplicationOTC Anti-flu Sales,
Dates 15-24Apr2002
Total sales as of 25Apr 1804
potential cluster center at 22311 63 sales,
39 exp. from recent data rel. risk 1.6 p
0.041
21
Distribution of Nonsyndromic Visits4 San Diego
Hospitals
22
Effect of Data Discontinuities on OTC Cough/Cold
Clusters
Days
Zip (S to N)

Before removing problem zips, cluster groups
are dominated by zips
that turn on after sustained periods of zero
or abnormally low counts.
After editing, more interesting cluster groups
emerge.

23
School Nurse Data All Visits
unreported
24
Cluster Investigation by Record Inspection
Records Corresponding to a Respiratory Cluster
25
Backups
26
Cumulative Summation Approach (CUSUM)

Widely adapted to disease surveillance
Devised for prompt detection of small shifts
Look for changes of 2k standard deviations from
the mean m (often k 0.5)
Take normalized deviation often Zt (xt m) / s
Compare lower, upper sums to threshold h
SH,j max ( 0, (Zt - k) SH,j-1 )
SL,j max ( 0, (-Zt - k) SL,j-1 )
Phase I sets m, s, h, k

Upper Sum Keep adding differences between
todays count and k std deviations above
mean. Alert when the sum exceeds threshold h.
27
CuSum Example CDC EARS Methods C1-C3

Three adaptive methods chosen by National Center
for Infectious Diseases after 9/1/2001 as most
consistent
Look for aberrations representing increases, not
decreases
Fixed mean, variance replaced by values from
sliding baseline (usually 7 days)

Baseline for C1-MILD (-1 to -7 day)
Baseline C2-MEDIUM (-3 to -9days)
Baseline for C3-ULTRA (-3 to -9 days)
28
Calculation for C1-C3

Individual day statistic for day j with lag n
Sj,n Max 0, ( Countj µn sn ) / sn,
where
µn is 7-day average with n-day lag
( so µ3 is mean of counts in j-3, j-9 ),
and
sn standard deviation of same 7-day window
C1 statistic for day k is Sk,1 (no lag)
C2 statistic for day k is Sk,3 (2-day lag)
C3 statistic for day k is Sk,3 Sk-1,3 Sk-2,3
,where Sk-1,3 , Sk-2,3 are added if they do not
exceed the threshold
Upper bound threshold of 2
equivalent to 3 standard deviations above mean

29
Detailed Example, I
Fewer alerts AND more sensitive why?
30
Detailed Example, II
Signal Detected only with 28-day baseline
31
Detailed Example, IIIthe rest of the story

Write a Comment

User Comments (0)