Title: AMIA02-Tsui
1Bayesian Biosurveillance Using Multiple Data
Streams Greg Cooper, Weng-Keen Wong, Denver
Dash, John Levander, John Dowling, Bill Hogan,
Mike Wagner RODS Laboratory, University of
Pittsburgh Intel Research, Santa Clara
2Outline
- Introduction
- Model
- Inference
- Conclusions
3Over-the-Counter (OTC) Data Being Collected by
the National Retail Data Monitor (NRDM)
- 19,000 stores
- 50 market share nationally
- gt70 market share in large cities
4ED Chief Complaint Data Being Collected by RODS
Chief Complaint ED Records for Allegheny County
Date / Time Admitted Age Gender Home Zip Work Zip Chief Complaint
Nov 1, 2004 302 20-30 Male 15213 Shortness of breath
Nov 1, 2004 309 70-80 Female 15132 15213 Fever
5Objective
Using the ED and OTC data streams, detect a
disease outbreak in a given region as quickly and
accurately as possible
6Our Approach
Population-wide ANomaly Detection and Assessment
(PANDA)
- A detection algorithm that models each individual
in the population - Combines ED and OTC data streams
- The current prototype focuses on detecting an
outdoor aerosolized release of an anthrax-like
agent in Allegheny county
7PANDA
Uses a causal Bayesian network
Home Location of Person
Visit of Person to ED
Anthrax Infection of Person
Location of Anthrax Release
Bayesian Network A graphical model representing
the joint probability distribution of a set of
random variables
8PANDA
Uses a causal Bayesian network
Home Location of Person
Visit of Person to ED
Anthrax Infection of Person
Location of Anthrax Release
The arrows convey conditional independence
relationships among the variables. They also
represent causal relationships.
9Outline
- Introduction
- Model
- Inference
- Conclusions
10A Schematic of the Generic PANDA Model
for Non-Contagious Diseases
Population Risk Factors
Population Disease Exposure (PDE)
Person Model
Person Model
Person Model
Person Model
Population-Wide Evidence
11A Special Case of the Generic Model
Anthrax Release
Time of Release
Location of Release
Person Model
Person Model
Person Model
Person Model
OTC Sales for Region
Each person in the population is represented as a
subnetwork in the overall model
12The Person Model
Location of Release
Age Decile
Home Zip
Time Of Release
Gender
Anthrax Infection
Other ED Disease
Non-ED Acute Respiratory Infection
Respiratory from Anthrax
Respiratory CC From Other
ED Acute Respiratory Infection
Acute Respiratory Infection
Respiratory CC
ED Admit from Anthrax
ED Admit from Other
Daily OTC Purchase
Respiratory CC When Admitted
Last 3 Days OTC Purchase
ED Admission
OTC Sales for Region
13Why Use a Population-Based Approach?
- Representational power
- Spatial, temporal, demographic, and symptom
knowledge of potential diseases can be coherently
represented in a single model - Spatial, temporal, demographic, and symptom
evidence can be combined to derive a posterior
probability of a disease outbreak - Representational flexibility
- New types of knowledge and evidence can be
readily incorporated into the model
Hypothesis A population-based approach will
achieve better detection performance than
non-population-based approaches.
14The Person Model
Location of Release
Age Decile
Home Zip
Time Of Release
Gender
Anthrax Infection
Other ED Disease
Non-ED Acute Respiratory Infection
Respiratory from Anthrax
Respiratory CC From Other
ED Acute Respiratory Infection
Acute Respiratory Infection
Respiratory CC
ED Admit from Anthrax
ED Admit from Other
Daily OTC Purchase
Respiratory CC When Admitted
Last 3 Days OTC Purchase
ED Admission
OTC Sales for Region
15The Person Model
Location of Release
Age Decile
Home Zip
Time Of Release
Gender
Anthrax Infection
Other ED Disease
Non-ED Acute Respiratory Infection
Respiratory from Anthrax
Respiratory CC From Other
ED Acute Respiratory Infection
Acute Respiratory Infection
Respiratory CC
ED Admit from Anthrax
ED Admit from Other
Daily OTC Purchase
Respiratory CC When Admitted
Last 3 Days OTC Purchase
ED Admission
Age Decile Gender Home Zip Respiratory Chief Comp. Date Admitted
20-30 Male 15213 Yes Today
Equivalence Class Example
16Outline
- Introduction
- Model
- Inference
- Conclusions
17Inference
Anthrax Release
Time of Release
Location of Release
Person Model
Person Model
Person Model
Person Model
OTC Sales for Region
Derive P (Anthrax Release true OTC Sales Data
ED Data)
18Inference
AR Anthrax Release ED ED Data
PDE Population Disease Exposure OTC OTC Counts
Key Term in Deriving P ( AR OTC, ED )
P ( OTC, ED PDE ) P ( OTC ED, PDE ) P (
ED PDE )
Contribution of ED Data
Contribution of OTC Counts
Details in Cooper GF, Dash DH, Levander J, Wong
W-K, Hogan W, Wagner M. Bayesian Biosurveillance
of Disease Outbreaks. In Proceedings of the
Conference on Uncertainty in Artificial
Intelligence, 2004.
19Inference
AR Anthrax Release ED ED Data
PDE Population Disease Exposure OTC OTC Counts
Key Term in Deriving P ( AR OTC, ED )
P ( OTC, ED PDE ) P ( OTC ED, PDE ) P (
ED PDE )
The focus of the remainder of this talk
20The Person Model
Location of Release
Age Decile
Home Zip
Time Of Release
Gender
Anthrax Infection
Other ED Disease
Non-ED Acute Respiratory Infection
Respiratory from Anthrax
Respiratory CC From Other
ED Acute Respiratory Infection
Acute Respiratory Infection
Respiratory CC
ED Admit from Anthrax
ED Admit from Other
Daily OTC Purchase
Respiratory CC When Admitted
Last 3 Days OTC Purchase
ED Admission
OTC Sales for Region
21Incorporating the Counts of OTC Purchases
Person1 Zip1 OTC count
Person2 Zip1 OTC count
Person3 Zip1 OTC count
Person4 Zip1 OTC count
Eq Class1 Zip1 OTC count
Eq Classs2 Zip1 OTC count
Approximate binomial distribution with a normal
distribution
Zip1 OTC count
22The PANDA OTC Model
P (OTC sales X ED, PDE )
Recall that P ( OTC, ED PDE ) P ( OTC
ED, PDE ) P ( ED PDE )
23Example
Equivalence Class 1 Normal(100,100)
Age Decile Gender Home Zip Respiratory Chief Comp. Date Admitted
50-60 Male 15213 Yes Today
24Example
Equivalence Class 1 Normal(100,100)
Equivalence Class 2 Normal(150,225)
Age Decile Gender Home Zip Respiratory Chief Comp. Date Admitted
50-60 Male 15213 Yes Today
Age Decile Gender Home Zip Respiratory Chief Comp. Date Admitted
50-60 Female 15213 Yes Today
25Example
Equivalence Class 1 Normal(100,100)
Equivalence Class 2 Normal(150,225)
Age Decile Gender Home Zip Respiratory Chief Comp. Date Admitted
50-60 Male 15213 Yes Today
Age Decile Gender Home Zip Respiratory Chief Comp. Date Admitted
50-60 Female 15213 Yes Today
If these were the only 2 Equivalence Classes in
the County then County Cough Cold OTC
Normal(100150,100225)
26Example
Now suppose 260 units are sold in the county
P( OTC Sales 260 ED Data, PDE )
Normal( 260 250, 325 ) 0.001231
260
27Inference Timing
- Machine P4 3 Gigahertz, 2 GB RAM
Initialization Time (seconds) Each hour of data (seconds)
ED model 55 5
ED and OTC model 229 5
28A Current Limitation
- Problem Currently we assume unrealistically that
a person only makes OTC purchases in his or her
home zip code - Approach 1 Aggregate OTC-counts (e.g., at the
county level) - Approach 2 For each home zip code, model the
distribution of zip codes where OTC purchases are
made
29Outline
- Introduction
- Model
- Inference
- Conclusions
30Challenges in Population-Wide Modeling Include
- Obtaining good parameter estimates to use in
modeling (e.g., the probability of an OTC cough
medication purchase given an acute respiratory
illness) - Modeling time and space in a way that is both
useful and computationally tractable - Modeling contagious diseases
31Conclusions
- PANDA is a multivariate algorithm that can
combine multiple data streams - Modeling each individual in the population is
computationally feasible (so far) - An evaluation of the PANDA approach to modeling
multiple data streams is in progress using
semi-synthetic test data
32- Thank you
- Current funding
- National Science Foundation
- Department of Homeland Security
- Earlier funding
- DARPA
http//www.cbmi.pitt.edu/panda/ gfc_at_cbmi.pitt.edu
33(No Transcript)
34The PANDA OTC Model
Model the OTC purchases for each Equivalence
Class Ei as a binomial Distribution.
Ei Binomial(NEi ,PEi)
35The PANDA OTC Model
Model the OTC purchases for each Equivalence
Class Ei as a binomial Distribution.
Ei Binomial(NEi ,PEi)
Number of people in Equivalence Class Ei
Probability of an OTC cough medication purchase
during the previous 3 days by each person in
Equivalence Class Ei
36The PANDA OTC Model
Model the OTC purchases for each Equivalence
Class Ei as a binomial Distribution.
Approximate the binomial distribution as a normal
distribution.
Ei Binominal(NEi ,PEi) ? Normal(?Ei ,?2Ei)
37The PANDA OTC Model
Model the OTC purchases for each Equivalence
Class Ei as a binomial Distribution.
Approximate the binomial distribution as a normal
distribution.
Ei Binominal(NEi ,PEi) ? Normal(?Ei ,?2Ei)
?Ei NEi PEi
?2Ei NEi PEi (1 - PEi)
38Computational Cost of a Population-Wide Approach?
1.4 million people in Allegheny County,
Pennsylvania
39Equivalence Classes
The 1.4M people in the modeled population can be
partitioned into approximately 24,240
equivalence classes