Practical Aspects of Alerting Algorithms in Biosurveillance - PowerPoint PPT Presentation

About This Presentation
Title:

Practical Aspects of Alerting Algorithms in Biosurveillance

Description:

Practical Aspects of Alerting Algorithms in Biosurveillance. Howard S. Burkom ... School Nurse Data: All Visits. unreported. Cluster Investigation by Record ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 32
Provided by: BURK8
Category:

less

Transcript and Presenter's Notes

Title: Practical Aspects of Alerting Algorithms in Biosurveillance


1
Practical Aspects of Alerting Algorithms in
Biosurveillance
  • Howard S. Burkom
  • The Johns Hopkins University Applied Physics
    Laboratory
  • National Security Technology Department
  • Biosurveillance Information Exchange Working
    Group
  • DIMACS Program/Rutgers University
  • Piscataway, NJ February 22, 2006

2
Outline
  • What information do temporal alerting algorithms
    give the health monitor?
  • How can typical data issues introduce bias or
    other misinformation?
  • How do spatial scan statistics and other
    spatiotemporal methods give the monitor a
    different look at the data?
  • What data issues are important for the quality of
    this information?

3
Conceptual approaches to Aberration Detection
  • What does aberration mean? Different approaches
    for a single data source
  • Process control-based The underlying data
    distribution has changed many measures
  • Model-based The data do not fit an analytical
    model based on a historical baseline many
    models
  • Can combine these approaches
  • Spatiotemporal Approach The relationship of
    local data to neighboring data differs from
    expectations based on model or recent history

4
Comparing Alerting AlgorithmsCriteria
  • Sensitivity
  • Probability of detecting an outbreak signal
  • Depends on effect of outbreak in data
  • Specificity ( 1 false alert rate )
  • Probability(no alert no outbreak )
  • May be difficult to prove no outbreak exists
  • Timeliness
  • Once the effects of an outbreak appear in the
    data, how soon is an alert expected?

5
Aggregating Data in Time
6
Elements of an Alerting Algorithm
  • Values to be tested raw data, or residuals from
    a model?
  • Baseline period
  • Historical data used to determine expected data
    behavior
  • Fixed or a sliding window?
  • Outlier removal to avoid training on
    unrepresentative data
  • What does algorithm do when there is all zero/no
    baseline data?
  • Is a warmup period of data history required?
  • Buffer period (or guardband)
  • Separation between the baseline period and
    interval to be tested
  • Test period
  • Interval of current data to be tested
  • Reset criterion
  • to prevent flooding by persistent alerts caused
    by extreme values
  • Test statistic value computed to make alerting
    decisions
  • Threshold alert issued if test statistic exceeds
    this value

7
Rash Syndrome Grouping of Diagnosis
Codeswww.bt.cdc.gov/surveillance/syndromedef/word
/syndromedefinitions.doc
8
Example Daily Counts with Injected Cases
Injected Cases Presumed Attributable to Outbreak
Event
9
Example Algorithm Alerts Indicated
10
EWMA Monitoring
  • Exponential Weighted Moving Average
  • Average with most weight on recent Xk
  • Sk wS k-1 (1-w)Xk,
  • where 0 lt w lt 1
  • Test statistic
  • Sk compared to expectation from sliding
    baseline
  • Basic idea monitor
  • (Sk mk) / sk
  • Added sensitivity for gradual events
  • Larger w means less smoothing

11
Example with Detection Statistic Plot
12
Example EWMA applied to Rash Data
13
Effects of Data Problems
14
Importance of spatial data for biosurveillance
  • Purely temporal methods can find anomalies, IF
    you know which case counts to monitor
  • Location of outbreak?
  • Extent?
  • Advantages of spatial clustering
  • Tracking progression of outbreak
  • Identifying population at risk

15
Evaluating Candidate Clusters
Surveillance Region
Candidate cluster The scan statistic gives a
measure of how unlikely is the number of cases
inside relative to the number outside, given the
expected spatial distribution of cases (Thus, a
populous region wont necessarily flag.)
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
16
Selecting Candidate Clusters
x
x
x
x
x
x
x
x
x
x
x
x
x
17
Searching for Spatial Clustering
  • form cylinders bases are circles about each
    centroid in region A, height is time
  • calculate statistic for event count in each
    cylinder relative to entire region, within space
    time limits
  • most significant clusters regions whose
    centroids form base of cylinder with maximum
    statistic
  • but how unusual is it? Repeat procedure with
    Monte Carlo runs, compare max statistic to maxima
    of each of these

18
Scan Statistic Demo
19
Scan Statistics Advantages
  • Gives monitor guidance for cluster size,
    location, significance
  • Avoids preselection bias regarding cluster size
    or location
  • Significance testing has control for multiple
    testing
  • Can tailor problem design by data, objective
  • Location (zipcode, hospital/provider site,
    patient/customer residence, school/store address)
  • Time windows used (cases, history, guardband)
  • Background estimation method model, history,
    population, eligible customers

20
Surveillance ApplicationOTC Anti-flu Sales,
Dates 15-24Apr2002
Total sales as of 25Apr 1804
potential cluster center at 22311 63 sales,
39 exp. from recent data rel. risk 1.6 p
0.041
21
Distribution of Nonsyndromic Visits4 San Diego
Hospitals
22
Effect of Data Discontinuities on OTC Cough/Cold
Clusters
Days
Zip (S to N)
  • Before removing problem zips, cluster groups
    are dominated by zips
  • that turn on after sustained periods of zero
    or abnormally low counts.
  • After editing, more interesting cluster groups
    emerge.

23
School Nurse Data All Visits
unreported
24
Cluster Investigation by Record Inspection
Records Corresponding to a Respiratory Cluster
25
Backups
26
Cumulative Summation Approach (CUSUM)
  • Widely adapted to disease surveillance
  • Devised for prompt detection of small shifts
  • Look for changes of 2k standard deviations from
    the mean m (often k 0.5)
  • Take normalized deviation often Zt (xt m) / s
  • Compare lower, upper sums to threshold h
  • SH,j max ( 0, (Zt - k) SH,j-1 )
  • SL,j max ( 0, (-Zt - k) SL,j-1 )
  • Phase I sets m, s, h, k

Upper Sum Keep adding differences between
todays count and k std deviations above
mean. Alert when the sum exceeds threshold h.
27
CuSum Example CDC EARS Methods C1-C3
  • Three adaptive methods chosen by National Center
    for Infectious Diseases after 9/1/2001 as most
    consistent
  • Look for aberrations representing increases, not
    decreases
  • Fixed mean, variance replaced by values from
    sliding baseline (usually 7 days)

Baseline for C1-MILD (-1 to -7 day)
Baseline C2-MEDIUM (-3 to -9days)
Baseline for C3-ULTRA (-3 to -9 days)
28
Calculation for C1-C3
  • Individual day statistic for day j with lag n
  • Sj,n Max 0, ( Countj µn sn ) / sn,
    where
  • µn is 7-day average with n-day lag
  • ( so µ3 is mean of counts in j-3, j-9 ),
    and
  • sn standard deviation of same 7-day window
  • C1 statistic for day k is Sk,1 (no lag)
  • C2 statistic for day k is Sk,3 (2-day lag)
  • C3 statistic for day k is Sk,3 Sk-1,3 Sk-2,3
  • ,where Sk-1,3 , Sk-2,3 are added if they do not
    exceed the threshold
  • Upper bound threshold of 2
  • equivalent to 3 standard deviations above mean

29
Detailed Example, I
Fewer alerts AND more sensitive why?
30
Detailed Example, II
Signal Detected only with 28-day baseline
31
Detailed Example, IIIthe rest of the story
Write a Comment
User Comments (0)
About PowerShow.com