Title: Conceptual approaches to Aberration Detection
1The Tradeoffs Driving Policy and Research
Decisions in Biosurveillance
Howard Burkom, James Edgerton, Wayne Loschen,
Zara Mnatsakanyan, Sheri Lewis, Joe Lombardo The
Johns Hopkins University Applied Physics
Laboratory International Society for Disease
Surveillance 2007 Annual Conference Track 4
Approaches and Tools for Evaluation Indianapolis,
Indiana October 11, 2007
2Objective
- This purpose of this effort is to show how the
goals and capabilities of health monitoring
institutions need to shape the selection, design,
and usage of tools for automated disease
surveillance systems - Design decisions faced by a public health
monitoring operation What are we trying to
detect, and on what scale? - information sources to be used
- aggregation of data in space and time
- the filtering of data records for required
sensitivity, and - the design of content delivery for users
3CDC Evaluation Framework of Buehler, Sosin et.
al.
- Data Quality
- Timeliness
- Validity
- Sensitivity, Specificity, etc.
- System Usefulness
4Three characteristic tradeoffs
- How much to monitor for exceptional vs customary
health threats - Scenario-based vs general monitoring
- e.g. seasonal influenza signal or background?
- Level of aggregation of the monitoring
- Spatial state, local, facility level?
- Temporal by month, day, week, hour?
- Syndromic how broadly syndromic gt how
restrictive the filtering of data records - Degree and role of information technology to be
used
51. Scenario-Based vs General Anomaly Detection
- General Public Health Anomaly Detection
- Simpler control charts
- Applied in current operational systems
- Seasonal epidemics as well as unusual PH threats
- More syndromic
- Leaves to human monitor
- formation of outbreak hypotheses
- fusion with laboratory, radiological results,
other evidence
- Scenario-Based
- Biological attack, unusual natural threats such
as pandemic influenza - Focus on unusual
- Diagnoses
- Age distributions
- Seasonal, other patterns
- Filter out usual seasonal events
- Emphasis on individual-based behavior modeling
(agent-based approaches)
62. Tradeoffs in Aggregation Syndromic
Classification of Data Records
More Diagnostic
More Syndromic
- Narrowly defined syndromes, subsyndromes
- Reportable diseases
- Sparse time series
- Higher specificity
- Broad sensitivity to large-scale events
- Richer time series, more amenable to modeling
- High background alert rates
- Approaches
- Parallel systems for general and reportable
threats? - Use flexible filtering, analysis, visualization
to enable both?
73. Tradeoffs in Use of Information Technology
- Roles of automation
- Collecting, cleaning, organizing data
- Widely accepted as necessary for disease
surveillance - Visualization preselected vs customizable views
- Analysis decision support
- In practice, the use of statistical alerts is
mixed - Investigation/response decision support
- Focus of academic research, but little use
published in public health surveillance - Epidemiological goals and constraints affect
tradeoffs in - thin client vs thick client applications
- Alert lists vs summary views
- Sharing data vs sharing derived information
- Layers of privilege
8Bayes Net Combining Epi Knowledge Data
Analysis for Increased Specificity, Automated
Decision Support
- Challenges
- Validation
- Epi Acceptance
9EXAMPLEAppropriate Application of Spatial
Cluster Detection Methods for Surveillance
Objectives
10Outpatient Clinic DataRespiratory Syndrome
Visits
2005-2006 Flu Season
3 years 1/1/2004 1/22/2006
statewide
large facilty
small facility
11Histograms of Median Daily CountsRespiratory
Syndrome
data dominated by about 6 treatment facilities
1099
Almost 90 of 1233 zones with very sparse visit
counts
12Signal Injection Methodology
- Run alerting algorithm on M days of background
data - Estimate of the false alarm rate per day at
threshold a is then PFAa MFa/M, where MFa
days with p-value lt a - Add signals representing the effects of an
outbreak into the background data (representing
attributable cases) - Conduct N trials, with exact signal start and
shape different for each trial - Estimate of sensitivity at threshold a is then
PDa NDa/N, where NDa trials where p-value lt
a on inject days - Shape of data epicurve for entire region a
random draw from lognormal distribution for each
attributable case - Spatial distribution of injected cases
exponential decay from center region (site of
exposure)
13Study Procedure
- Applied 100 repeated trials to visit count data
for respiratory data syndromes - Used 3 years of data, with each trial adding
simulated attributable counts - for 7-10-day outbreak at a different start date
- used trial start date previous start date 8
days, to vary day of week effect - Examine detection probabilities vs alert rates
for - Detection using time series only state,
facility, residence zip code levels - Spatial cluster detection facility level (32 zip
codes), residence zip code level (1233 zip codes)
14Results Spatial Signal Detection at Home vs
Facility Aggregation Level
Processing at the home zip code level (1233
codes) does give greater sensitivity than at The
facility level (32 locations), but the empirical
background alert rate is higher. Is this
exploitable, acceptable for public health
practice?
15Results Temporal Algorithm Detection Performance
Multiple testing problems may be severe at
facility or local levelsDoes distributed
investigation capability exist?
16Interpretation of Results for Surveillance
Utility
- How small an outbreak is to be detected?
- Constraints
- How much sensitivity is required, or how high an
alert rate is acceptable? - Are investigation and response capabilities
distributed to respond to many potential
localized problems? - Is information available to check for linkage of
cases causing alerts? - Requirements must be applied to each data source.
17Conclusions
- The objectives of a surveillance capability
should determine the basic tradeoff decisions of
aggregation, target public health threat types,
and the role of information technology. - These tradeoff decisions should then be used to
decide - Data source selection and filtering
- Choice of analytic methods
- Visualization techniques
- Resultant investigation and response protocols
- Well-defined public health surveillance
objectives, given at the level of scenarios and
required detection performance, should drive
research initiatives as these tradeoff positions
dictate
appropriate system technologies to enable the
stated goals
18References
- Buehler J.W., Hopkins R.S., Overhage J.M., Sosin
D.M., Tong V., Framework for Evaluating Public
Health Surveillance Systems for Early Detection
of Outbreaks, http//www.cdc.gov/mmwR/preview/mmwr
html/rr5305a1.htm - Burkom HS, Murphy SP, Coberly JS, Hurt-Mullen KJ,
Public Health Monitoring Tools for Multiple Data
Streams (2005), MMWR 54(Suppl), - Reis BY, Mandl KD. Integrating syndromic
surveillance data across multiple locations
effects on outbreak detection performance. Proc
AMIA Symp. 2003 549-53. - Grigoryan VV, Wagner MM, Waller K, Wallstrom GL,
Hogan WR., The Effect of Spatial Granularity of
Data on Reference Dates for Influenza Outbreaks.,
RODS Technical Report, http//rods.health.pitt.edu
/LIBRARY/200520AMIA-Grigoryan-Reference20dates2
0for20flu-submitted.pdf - Marshall C, Best N, Bottle A, and Aylin P,
Statistical Issues in Prospective Monitoring of
Health Outcomes Across Multiple Units, J. Royal
Statist. Soc. A (2004), 167 Pt. 3, pp. 541-559.
19BACKUPS
20- Summary Alert List
- Aggregated geography
- Summary detectors use many different streams to
determine concern level. - Utilizes both statistical detection results and
user concern level. - Can visually see pattern of alerts across
geographies, syndromes, days. - Region/Syndrome Alert List
- Health district / county based geography.
- Temporal algorithms look at a single data
stream. - Page only shows mathematical detection results.
- Can sort, filter, and copy alert table in order
to monitor specific geographies of interest.
21Maximum Likelihood Lognormal Epidemic Curves
Used for Random Signal Draws
22Outpatient Clinic DataRash Syndrome Visits
2005-2006 Flu Season
3 years 1/1/2004 1/22/2006
statewide
large facility
small facility
23Visualizing the Spatial Resolution Tradeoff
- If monitoring at the residence zip code level,
can better detect clusters with small case counts - Increased burden on statistics as well as
information technology - Decision should be driven by public health
objective (other tradeoffs!)
24Importance of Distance Relationships in Utility
of Spatial Data
Data-dense regions with numerous zip
codes possible in a compact cluster
distances in kilometers