Title: TUTORIAL T6
1TUTORIAL T6 Theory and Practice of Outbreak
Detection
II. DATA
Michael M. Wagner, MD, PhD
AMIA Annual MeetingSaturday, November 8, 2003
800 - 430 pm
2The Basic Question Which Data are Useful and
What Does it Take to Get Them?
1999 Influenza
Influenza cultures
Sentinel physicians
WebMD queries about cough etc.
School absenteeism
Sales of cough and cold meds
Sales of cough syrup
ER respiratory complaints
ER viral complaints
Influenza-related deaths
Week (1999-2000)
3Drawing from Two Reports and Several Papers
- To AHRQ about data availability (Wagner, Aryel,
Dato. 2001 188 pages available at
www.health.pitt.edu/rods - To DARPA about data value (Wagner, Pavlin,
Brillman, Stetson) expected completion date
December 2003. - Self-treatment/health seeking literature
- Published studies
See bibliography that will be on Web at
www.health.pitt.edu/rods
4Biosurveillance Data Space
LATER DETECTION
EARLY DETECTION
GOLD STANDARDS
MEDICAL
NON TRADITIONAL
INTELLIGENCE
BIOSENSORS
OTC Pharm
Test Results
Test Results
Complaints
Diagnosis
Sentinel MD
Agribusiness
Poison Centers
Influenza isolates
Environmental
Investi- gations
Web Queries
Medical Examiner
Survey
Nurse Calls
Public Transport (bus)
Wind Speed/ direct. Cloud cover
Limited Utility Some Potential Promising
ER Visits
Radiograph Reports
5Outline (and main points)
- Which data are required (and how do we know?
- Discussion of primary surveillance data
- Availability
- Experimental results about value
- What supplementary surveillance data do we need
to collect? (e.g., spatial, temporal, census,
weather)
6Which data are required (and how do we know?
7Overly Simple Answer
- It is already known-- exactly the data public
health departments collect at present. - Answer We just need to speed up the collection
and processing of the data
Yes, but they sometimes miss outbreaks or detect
them late
Yes, but maybe the data are inherently late.
Plus there is still the problem of undetected
outbreaks
Also, there may just be a better way. There may
be highly useful data that they just do not have
the infrastructure to collect
8More Complicated Answer
9More Complicated Answer
- Analysis of data actually collected routinely
collected by public health - First principles analysis of early detection
- Using CDC case definitions
- Analysis of data used in recognition and
characterization of 57 recent outbreaks - Review of the literature on health psychology,
especially the sub literature relevant to
behaviors of ill individuals between the onset of
symptoms and presentation (if ever) for medical
care and first principle analysis of various
detection strategies.
10Identifying Needed Data from First Principles
Analysis
11Grouping Data by Time of Availability
- Pre-outbreak data
- data obtained during the period prior to the
release of a biologic agent. - E.g., intelligence or host factors such as
vaccinations that determine susceptibility. - Attack, release/or exposure data
- obtained at or very near the time of release.
- E.g., biosensor arrays, police reports of
observed explosions, unauthorized airplane
flights - Pre-symptomatic data (incubation period data)
- between the time of release of an agent until the
recognition of first symptoms in people. - E.g., serology or cultures from pre-symptomatic
individuals from enhanced screening - Early symptom data
- period between the onset of symptoms and when the
illness becomes more fully developed - E.g., diarrheal or upper respiratory symptoms,
sales of over-the-counter cold medicines - Specific syndrome data
- data that either singly or in combination
strongly suggests a specific agent. - E.g., specific symptoms, vital signs, physical
findings, laboratory results, radiology results - Definitive data
- data that are sufficient on their own to conclude
that a patient has a disease. - E.g., microbiology culture or autopsy reports.
12Literature on Care-seeking Behavior and Health
Psychology
Zeng, Wagner et al. JAMIA 2002.
13Data Currently Collected by Public Health
Surveillance Systems (conventional surveillance
data)
- Reportable diseases
- Sentinel physicians
- Reports from astute clinicians
- Results of enhanced surveillance or contact
testing
14Data Used During Outbreak Investigations
- Data mentioned in CDC Case Definitions
- Data items from case investigation forms
- Data items mentioned in MMWR and other published
reports as being pivotal in the initial detection
15Universe of Data Elements
- Table A.1 in Appendix A lists all data elements
identified by our five methods. This list is
comprehensive and contains references to sources
for the data elements, where relevant. Table A.2
contains the data elements used in actual
outbreak detection and in the CDC Case
Definitions classified by type of outbreak (water
borne, food borne etc.).
16Table 2.4. Data and Data Systems for Early
Detection
- PREOUTBREAK
- Environmental conditions favorable for
outbreaksVegetation, climate, sea surface
temperature, cloudiness, rainfallInformation
about susceptibility of population (host)
Immunization informationInformation about
outbreaks in other regions Outbreak report from
WHOEmergence of new infections in other regions
Keywords in the Internet and electronic reports
related to or indicative of outbreaks in other
areas Information about pathogens and their
occurrence in the environment Antimicrobial
resistance patterns Routine testing of food and
water suppliesMonitoring temporal and geographic
patterns of specific viruses in order to tract
conditions likely to be cause future outbreaks
(e.g. looking for serotypes of Infuenza likely to
spread during the next year)Information about
animals Avian morbidity and mortalityCaptive or
free ranging sentinel animals Monitoring diseases
that animals can transmit to humans. including
animals which will eventually be distributed to
consumers in the form of pets or food Satellite
systemsMeteorological data systems Immunization
registriesPublic health systemsVeterinary
systemsFood and water monitoringInformation
retrieval systems
17Outline (and main points)
- ? Which data are required (and how do we know?
- Discussion of primary surveillance data
- Availability
- Experimental results about value
- What supplementary surveillance data do we need
to collect? (e.g., spatial, temporal, census,
weather)
18Availability
- Easier to know
- Methods phone interviews with industries
(hospitals, 911 services, pharmacies, schools )
19Value and Relative Importance
- Much much harder to know
- Methods
- Observational studies of real outbreaks
- Studies of what individuals do when sick with
different diseases (what they buy, who they call
)
20Overview of Primary Surveillance Data
21Clinical Data (availability, what is known about
value)
22- Clinical data are highly relevant to public
health-surveillance. Clinicians and health
systems are a primary point of data collection
about the sick, including data about
demographics, risk factors, symptoms, signs,
special testing, and diagnoses.
23Where are the Clinical Data Types of Clinical
Data Systems
- Paper charts
- HL7 Message Routers
- Registration, Scheduling, and Billing Systems
- Clinical Laboratory Systems
- Radiology Systems
- Pathology
- Dictation
- Pharmacy
- Orders
- Data Warehouses
- Clinical Event Monitors
- Point-of-Care Systems
- Patient Web Portals and Call Centers
24Laboratory Results and Electronic Lab Reporting
- There is no need to prove the value of laboratory
results for public health surveillance - The main issue is getting them
- Studies of ELR of culture proven notifiable
diseases - Hawaii ( 1999)
- Pittsburgh (Panackal et al EID 2002)
- Findings (and methods) similar
- Quicker
- More complete reporting
25Other-than-Culture-Proven Diseases
- New approach (e.g., PA NEDSS) is form on line
- No published comparisons of the completeness of
traditional paper based approaches versus
form-on-line. - Word of mouth is that more cases are being
reported but whether that is true and whether it
is persistent and whether it is due to fear of
disease or the system needs to be teased out. - Timeliness also needs to be studied.
26Chief Complaints
- Chief complaints entered by a triage nurse upon
admission to an emergency facility are available
electronically from hospitals in the United
States. (Paper in AMIA 2003 Proceedings) - ICD-9 coded versus free text
- How to group into syndromic categories is a major
major question - There exist several categorizations (CDC
consensus, RODS, WRAIR)
27Detection Performance from ICD-9 coded Chief
Complaints
- Respiratory Case Detection (Espino 2001)
- Sensitivity 0.43 Specificity 95
- Respiratory Outbreak Detection (Tsui 2001)
- Small sample 1/1 detected, 1 false alarm
- Diarrhea Case Detection (Ivanov 2002)
- Similar results to Espino
- Main points
- These studies provide methods
- More studies needed of more syndromes and more
outbreaks
28Using Free text Chief Complaints (and Natural
Language Processing)
- Also needed for Web or call center queries, and
radiographs
cough
NLP
respiratory prodrome
29CoCo Naive Bayesian Parser
- Maps free-text chief complaint to one of seven
prodrome categories (or an eighth categorynone)
P(RespiratoryNVD) .05 P(BotulinicNVD)
.001 P(ConstitutionalNVD) .01 P(GINVD)
.9 P(HemorrhagicNVD) .001 P(NeurologicNVD)
.001 P(RashNVD) .001 P(NoneNVD) .036
N/V/D
Chief complaint
CoCo Naive BayesClassifier
30Validation CoCo Naïve Bayes vs. UDOH Manual ED
log review
Courtesy Per Gesteland, MD
31Detecting Respiratory Outbreaks by Monitoring
Chief Complaints
Hospital PI Diagnoses
Respiratorychief complaints
SDs from Mean
?7 Years?
Ivanov and Gesteland
32Detecting Respiratory Outbreaks in Children by
monitoring Chief Complaints
Detection from CCs precede that from admissions
by 9 days (95 CI -5-23)
kids respiratory (lower respiratory
infections) pneumonia influenza bronchiolitis bro
nchitis
33Detecting GI Outbreaks in Children by monitoring
Chief Complaints
Detection from CCs precede that from admissions
by 23 days (95 CI 12-33)
gastroenteritis rotavirus
34WHICH IS BETTER, ICD-9 or FREE TEXT? At detecting
Cases of Acute Infectious GI
(Ivanov, Wagner, Chapman)
35 WHICH IS BETTER, ICD-9 or FREE TEXT? At
detecting Acute Lower Respiratory Illness from
Chief Complaints
(Espino, Wagner, Dowling, Chapman)
36Chest Radiograph Reports
- Radiologists dictate a report for most chest
radiographs performed in the United States. - Reports are transcribed after dictation and
available electronically with a twelve to
twenty-four hours latency. - The reports describe specific findings important
for detection of infectious diseases of the lower
respiratory tract such as SARS, Plague,
Tularemia, inhalational Anthrax. - The granularity of the information is quite
specific and allows for detection of different
patterns of pneumonia, pleural effusions, and
mediastinal widening. - The data are identified at the level of the
individual patient and can therefore be
pinpointed to home location and correlated with
other patients to detect clusters of cases,
37Detecting Pneumonia on Radiographs
38Detecting Febrile Illness
- Coded temperature (Possibly best, but rarely
recorded electronically and may be normal) - From NLP of chief complaints
- By NLP of Emergency Department (ED) dictation
- Sensitivity 0.98
- Specificity 0.89
- 1 day delay
39Individual SARS Case Detection
1. Cough or other respiratory symptom 2.
Temperature 38 C 3. Chest x-ray showing
pneumonia or ARDS 4. High risk of exposure
40Summary of an Automatic SARS Syndromic Strategy
(A stretch!)
41Lab Test Ordering
- Motivation What if you saw a large number of
blood culture orders for people with home
addresses in one zip code? - Availability from national laboratory companies
(maybe 10-20 coverage of all tests done, perhaps
less for infectious disease testing which is done
in hospitals) - Barriers need standards
- Demonstrated value no published studies!
42Table 4.1. Clinical systems, data, and market
penetration (estimated)
Legend ED, emergency department LTCF, long
term care facility -, not applicable ?, unknown
43Sales of OTC Healthcare Products
44Take Home Message OTC
- Availability is better and more fully proven than
any other data type because of the National
Retail Data Monitor Project - Value also better understood of all
unconventional types of data because of research,
although still a lot to do
45National Retail Data Monitor How it Works
- OTCs products are UPC bar coded
- Retail stores scan purchases
- Seven chains (18,000 stores) agreed to send daily
sales data - NRDM groups the UPC-level sales data into
categories like cough syrup, pediatric liquid - NRDM makes data available to health departments
via - Web interface 200 accounts/33 States
- Raw data feeds
- New York State, New York City, National Capital
Area (MD, VA, DC), CDC, New Jersey, Georgia - Indiana and Norfolk under development
NRDM
46OTC Product Categories
- There are approximately 7500 products (UPC codes)
used for self-treatment of infectious diseases - We group them into 18 analytic classes at present
(categories)
Antifever Pediatric (274) Antifever Adult
(1340) Bronchial Remedies (43) Chest Rubs
(78) Diarrhea Remedies (165) Electrolytes
Pediatric (75) Hydrocortisones (185) Thermometer
Pediatric (125) Thermometer Adult (313)
Cold Relief Adult Liquid (709 products) Cold
Relief Adult Tablet (2467) Cold Relief Pediatric
Liquid (323) Cold Relief Pediatric Tablet
(74) Cough Syrup Adult Liquid (592) Cough Syrup
Adult Tablet (32) Cough Syrup Pediatric Liquid
(24) Nasal Product Internal (371) Throat Lozenges
(364)
Numbers in parenthesis are the number of UPC
codes in the category
47Detecting Cryptosporidium from Sales of OTC
Diarrhea Remedies
- Diarrhea remedies Kaopectate,Imodium,Pepto
- Stirling et al
- Large, waterborne outbreak of Cryptosporidium in
late March/April 2001 - Convenience sample of three pharmacies in North
Battleford, Saskatchewan - Approximately 5-fold increase in all three
pharmacies (relative to baseline established from
Jan 2001 to early March 2001) - Two pharmacies provided March/April 2000 data and
those data showed no similar increase - Sales peaked weeks before precautionary drinking
water advisory and days to weeks before peak
onset of diarrhea
Stirling R, Aramini J, Ellis A, et al.
Waterborne cryptosporidiosis outbreak, North
Battleford, Saskatchewan, Spring 2001. Can Commun
Dis Rep. Nov 15 200127(22)185-192.
482001 Crypto in North Battleford
Precautionary water advisory issued on 4/26
Detectable peak on 4/2 in sales of
over-the-counter antidiarrheals
49Detecting Crypto from Sales of OTC Antidiarrheal
(cont)
- Rodman et al
- Cryptosporidium outbreak in Collingwood, Ontario
Feb/March 1996 - 3/12 pharmacies that were asked gave data
- Pharmacy 1 26 fold increase in sales in Feb 1996
as compared to February 1995 - Pharmacy 2 1Q 1996 sales were 3 fold 1Q 1995
- Pharmacy 3 Reported no change in sales
- Outbreak detected 3/5
- Yet another Cryptosporidium outbreak in Canada
(Kelowna and Cranbrook, British Columbia) - All pharmacists (10-12 of them in each city)
interviewed acknowledged increased sales (but
there was no data available for study)
Rodman JS et al. Pharmaceutical sales A method
of disease surveillance. Journal of Environmental
Health, Nov 19978-14. Proctor et al.
Surveillance data for waterborne illness
detection an assessment following a massive
waterborne outbreak of Cryptosporidium infection.
Epidemiol Infect. 1998120(1)43-54.
50Detecting Crypto from Sales of OTC Antidiarrheal
(cont)
- Proctor et al
- Studied the famous 1993 Milwaukee Cryptosporidium
outbreak - One pharmacy in outbreak area provided monthly
counts of unit sales - Sales for month of March showed three-fold
increase over baseline (March 1994/March 1995) - Public health awareness of outbreak April 5
- Identified need for knowledge of geographic
distribution of water supply to improve outbreak
detection (North Milwaukee vs. South Milwaukee)
51Cryptosporidium Outbreak Collingwood, Ontario
26-fold increase in sales in Feb
Outbreak detected March 5
Rodman JS et al. Pharmaceutical sales A method
of disease surveillance. Journal of Environmental
Health, Nov 19978-14.
52Cryptosporidium Outbreak Milwaukee
3X increase in sales in March 1993
Public health awareness April 5, 1993
Proctor et al. Surveillance data for waterborne
illness detection an assessment following a
massive waterborne outbreak of Cryptosporidium
infection. Epidemiol Infect. 1998120(1)43-54.
53More Evidence that Crypto May Drive OTC Sales
- Corso et al
- Reviewed 2000 medical records of patients
admitted to Milwaukee EDs - Identified 378 persons who had moderate or severe
case of Cryptosporidium during 1993 outbreak - Self treatment with OTCs prior to ED visit was
documented in the medical record in 30
54How Small? (Limitations of observational Studies
of Real Outbreaks)
- North Battleford outbreak affected half the
population - The Milwaukee outbreak was similarly large
(estimated 400,000 infected) - Other outbreaks no estimates of size avail.
- Only one to three drug stores were studied
- Bottom line How small of a Cryptosporidium
outbreak can be detected is very hard to know
from observational studies
55Detecting Pediatric Diarrheal and Respiratory
Outbreak from Sales of Pediatric Electrolytes
- Pediatric Electrolytes Pedialyte, competitors
- Hogan et al
- 18 Wintertime outbreaks (1998-2001, six cities)
- Strong correlation (0.9) between hospital
diagnoses of respiratory and diarrheal illness in
children - Usually uptick in sales preceded uptick in
hospital diagnoses. Average 2 weeks - Variation in time lag from year to year and city
to city suggests need for additional studies
Hogan et al. Detection of Pediatric Respiratory
and Diarrheal Outbreaks from Sales of
Over-the-counter Electrolyte Products. J Am Med
Inform Assoc, 10(6) November 2003
56Detecting Pediatric Diarrheal and Respiratory
Outbreak from Sales of Pediatric Electrolytes
Data courtesy IRI, Utah DOH, Indianapolis Network
for Patient Care, and PA HC4 Council
57(No Transcript)
58Detectability of Anthrax? Detecting Influenza
from OTC Cold Remedies
- Welliver et al
- Studied 1976-1977 Influenza B outbreak in Los
Angeles - Data from one distribution center of Ralphs
Grocery Company in Los Angeles - OTC cold remedy sales peaked 3 weeks prior to
peak in positive Influenza cultures - No association between aspirin (antipyretic)
sales and influenza
Welliver RC, Cherry JD, Boyer KM, et al. Sales of
nonprescription cold remedies a unique method of
influenza surveillance. Pediatr Res. Sep
197913(9)1015-1017.
59Detecting Influenza from OTC Cold Remedies (cont)
- Correlation of Cough/Cold/Flu OTC Categories With
Hospital Diagnoses of Pneumonia, Influenza,
Bronchitis, and Bronchiolitis
60DATA Co-Variant Contribution
- OTC
- Day of week
- Seasonal effect
- Promotions (smaller effect than anticipated)
- Chief complaints
- Confounders are other diseases such as asthma,
influenza
Co-variants of leading data streams (especially
OTCs) and their quantitative effects on detection
61From Permissive Environments
- Survey data
- Telephone calls to medical offices
- Web queries to medical sites
62Absenteeism
- Absenteeism reporting systems
- Indirectly through other measures of a persons
physical presence at a location. - Affected by weekends, holidays, and vacations or
recess periods (especially school absenteeism!)
63Other Data Sources
- EMS (Buckeridge)
- Poison Centers (Wagner)
- Waste water (Brinkman)
- Animals (Steve Babin)
- Prescription Drug (Foster)
- Web queries (Heather Johnson)
64Other Data Sources
- Transportation (Shah)
- Traffic monitoring (IBM)
- Cafeteria sales (IBM)
- Parking lot (Cheng)
- Orthodontic (Shah)
65Outline (and main points)
- ? Which data are required (and how do we know?
- ? Discussion of primary surveillance data
- Availability
- Experimental results about value
- What supplementary surveillance data do we need
to collect? (e.g., spatial, temporal, census,
weather)
66Spatial Info
- Census tract vs. zip code
- ZIP HIPAA and availability
- Street address 60 automatic recognition problem
- Substreet address (floor of building) Office,
especially in vertical cities like HK, NY - Longitude, height, and latitude maximum
flexibility
67Time Stamping
- It is sort of obvious but worth discussing
- Time zones can cause confusion
- The meaning of time stamps can cause confusion
68Water Supply
- Food and water are key potential transmission
routes for a bioterrorists pathogens. - There are two strategies for monitoring for such
contaminations First, we can monitor food and
water directly and prospectively to detect
contamination and prevent or mitigate a potential
outbreak very early. Second, once an outbreak
begins, we can trace back from affected
individuals to the source as quickly as possible
to prevent additional exposures, and to direct
prophylactic treatment to individuals likely to
have been exposed to the pathogen. This chapter
examines data needed to trace back from affected
individuals to the original source of
contamination and also discusses for water supply
routine monitoring for contamination.1 - 1 Governmental prospective monitoring of food
is limited. Spot checks of food and produce are
made when they enter the country with high risk
items getting more attention, however only a
small fraction of food gets inspected
prospectively. State authority does not extend
outside its boundaries and FDA does not have
authority to require that comprehensive records
be kept.
69Food Distribution
- Trace back investigations41involve tracing a
product by using shipping and purchase records at
each place in the distribution chain back to
common points that can explain the occurrence of
illness among all or most affected individuals.
When consumers falls ill after eating food
purchased from a retailer or restaurant,
investigators will ask the affected individuals
to identify the restaurants or retailers they
patronized, who, in turn, identify the
wholesalers. The wholesalers identify their
suppliers, who identify the farm or farms that
were the ultimate food source. The supply
chains information systems are not vertically
integrated. Each entity knows only to whom they
shipped, or from whom they received food. Public
health police powers can provide the legal basis
for trace backs within state lines however, in
many cases the FDA may become involved. Trace
back by the FDA involves many challenges
including the absence of records, a lack of
authority to require records be kept or provided,
multiple sources of product, complex distribution
systems and the resource-intensive nature of the
process which may or may not confirm a
contamination.41 - At each point in the trace back, the procedures
for storing or processing the food are observed
for abuses that would result in contamination of
the food. Specimens may be taken to determine
if other similarly handled food or the
environment that the food was stored or processed
in is contaminated. If there is evidence to
suggest that contaminated food is in the
distribution chain than a recall of that food is
initiated.
70Weather
- Weather and climate data currently used in
epidemiological analysis include temperature,
wind direction and speed (for bioaerosol related
analyses), and precipitation. Of possible value,
but not used according to our experts, might be
barometric pressure and ultraviolet exposure. - In the United States, weather data are already
highly available. - Temperature
- wind speed
- wind direction
- precipitation.
- Up-to-the-minute information for the entire
nation is available because data are collected in
real-time, in standard formats, and integrated in
a central location that is publicly available
without any technical or administrative barriers
(http//weather.noaa.gov/). - The weather system--in addition to a mature
source of data for public health surveillance and
early warning of bioterrorism--represents a good
case study in how to integrate data from many
sources and independent monitoring stations using
communication networks and standards to achieve a
national surveillance capability.