AMIA02-Tsui - PowerPoint PPT Presentation

About This Presentation
Title:

AMIA02-Tsui

Description:

... an outdoor aerosolized release of an anthrax-like agent in Allegheny county ... Location of Anthrax Release. Anthrax Infection of Person ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 40
Provided by: Rich132
Category:
Tags: amia02 | anthrax | tsui

less

Transcript and Presenter's Notes

Title: AMIA02-Tsui


1
Bayesian Biosurveillance Using Multiple Data
Streams Greg Cooper, Weng-Keen Wong, Denver
Dash, John Levander, John Dowling, Bill Hogan,
Mike Wagner RODS Laboratory, University of
Pittsburgh Intel Research, Santa Clara
2
Outline
  1. Introduction
  2. Model
  3. Inference
  4. Conclusions

3
Over-the-Counter (OTC) Data Being Collected by
the National Retail Data Monitor (NRDM)
  • 19,000 stores
  • 50 market share nationally
  • gt70 market share in large cities

4
ED Chief Complaint Data Being Collected by RODS
Chief Complaint ED Records for Allegheny County
Date / Time Admitted Age Gender Home Zip Work Zip Chief Complaint
Nov 1, 2004 302 20-30 Male 15213 Shortness of breath
Nov 1, 2004 309 70-80 Female 15132 15213 Fever

5
Objective
Using the ED and OTC data streams, detect a
disease outbreak in a given region as quickly and
accurately as possible
6
Our Approach
Population-wide ANomaly Detection and Assessment
(PANDA)
  • A detection algorithm that models each individual
    in the population
  • Combines ED and OTC data streams
  • The current prototype focuses on detecting an
    outdoor aerosolized release of an anthrax-like
    agent in Allegheny county

7
PANDA
Uses a causal Bayesian network
Home Location of Person
Visit of Person to ED
Anthrax Infection of Person
Location of Anthrax Release
Bayesian Network A graphical model representing
the joint probability distribution of a set of
random variables
8
PANDA
Uses a causal Bayesian network
Home Location of Person
Visit of Person to ED
Anthrax Infection of Person
Location of Anthrax Release
The arrows convey conditional independence
relationships among the variables. They also
represent causal relationships.
9
Outline
  1. Introduction
  2. Model
  3. Inference
  4. Conclusions

10
A Schematic of the Generic PANDA Model
for Non-Contagious Diseases
Population Risk Factors
Population Disease Exposure (PDE)
Person Model
Person Model
Person Model
Person Model
Population-Wide Evidence
11
A Special Case of the Generic Model
Anthrax Release
Time of Release
Location of Release
Person Model
Person Model
Person Model
Person Model
OTC Sales for Region
Each person in the population is represented as a
subnetwork in the overall model
12
The Person Model
Location of Release
Age Decile
Home Zip
Time Of Release
Gender
Anthrax Infection
Other ED Disease
Non-ED Acute Respiratory Infection
Respiratory from Anthrax
Respiratory CC From Other
ED Acute Respiratory Infection
Acute Respiratory Infection
Respiratory CC
ED Admit from Anthrax
ED Admit from Other
Daily OTC Purchase
Respiratory CC When Admitted
Last 3 Days OTC Purchase
ED Admission
OTC Sales for Region
13
Why Use a Population-Based Approach?
  • Representational power
  • Spatial, temporal, demographic, and symptom
    knowledge of potential diseases can be coherently
    represented in a single model
  • Spatial, temporal, demographic, and symptom
    evidence can be combined to derive a posterior
    probability of a disease outbreak
  • Representational flexibility
  • New types of knowledge and evidence can be
    readily incorporated into the model

Hypothesis A population-based approach will
achieve better detection performance than
non-population-based approaches.
14
The Person Model
Location of Release
Age Decile
Home Zip
Time Of Release
Gender
Anthrax Infection
Other ED Disease
Non-ED Acute Respiratory Infection
Respiratory from Anthrax
Respiratory CC From Other
ED Acute Respiratory Infection
Acute Respiratory Infection
Respiratory CC
ED Admit from Anthrax
ED Admit from Other
Daily OTC Purchase
Respiratory CC When Admitted
Last 3 Days OTC Purchase
ED Admission
OTC Sales for Region
15
The Person Model
Location of Release
Age Decile
Home Zip
Time Of Release
Gender
Anthrax Infection
Other ED Disease
Non-ED Acute Respiratory Infection
Respiratory from Anthrax
Respiratory CC From Other
ED Acute Respiratory Infection
Acute Respiratory Infection
Respiratory CC
ED Admit from Anthrax
ED Admit from Other
Daily OTC Purchase
Respiratory CC When Admitted
Last 3 Days OTC Purchase
ED Admission
Age Decile Gender Home Zip Respiratory Chief Comp. Date Admitted
20-30 Male 15213 Yes Today
Equivalence Class Example
16
Outline
  1. Introduction
  2. Model
  3. Inference
  4. Conclusions

17
Inference
Anthrax Release
Time of Release
Location of Release
Person Model
Person Model
Person Model
Person Model
OTC Sales for Region
Derive P (Anthrax Release true OTC Sales Data
ED Data)
18
Inference
AR Anthrax Release ED ED Data
PDE Population Disease Exposure OTC OTC Counts
Key Term in Deriving P ( AR OTC, ED )
P ( OTC, ED PDE ) P ( OTC ED, PDE ) P (
ED PDE )
Contribution of ED Data
Contribution of OTC Counts
Details in Cooper GF, Dash DH, Levander J, Wong
W-K, Hogan W, Wagner M. Bayesian Biosurveillance
of Disease Outbreaks. In Proceedings of the
Conference on Uncertainty in Artificial
Intelligence, 2004.
19
Inference
AR Anthrax Release ED ED Data
PDE Population Disease Exposure OTC OTC Counts
Key Term in Deriving P ( AR OTC, ED )
P ( OTC, ED PDE ) P ( OTC ED, PDE ) P (
ED PDE )
The focus of the remainder of this talk
20
The Person Model
Location of Release
Age Decile
Home Zip
Time Of Release
Gender
Anthrax Infection
Other ED Disease
Non-ED Acute Respiratory Infection
Respiratory from Anthrax
Respiratory CC From Other
ED Acute Respiratory Infection
Acute Respiratory Infection
Respiratory CC
ED Admit from Anthrax
ED Admit from Other
Daily OTC Purchase
Respiratory CC When Admitted
Last 3 Days OTC Purchase
ED Admission
OTC Sales for Region
21
Incorporating the Counts of OTC Purchases
Person1 Zip1 OTC count
Person2 Zip1 OTC count
Person3 Zip1 OTC count
Person4 Zip1 OTC count
Eq Class1 Zip1 OTC count
Eq Classs2 Zip1 OTC count
Approximate binomial distribution with a normal
distribution
Zip1 OTC count
22
The PANDA OTC Model
P (OTC sales X ED, PDE )
Recall that P ( OTC, ED PDE ) P ( OTC
ED, PDE ) P ( ED PDE )
23
Example
Equivalence Class 1 Normal(100,100)
Age Decile Gender Home Zip Respiratory Chief Comp. Date Admitted
50-60 Male 15213 Yes Today
24
Example
Equivalence Class 1 Normal(100,100)
Equivalence Class 2 Normal(150,225)
Age Decile Gender Home Zip Respiratory Chief Comp. Date Admitted
50-60 Male 15213 Yes Today
Age Decile Gender Home Zip Respiratory Chief Comp. Date Admitted
50-60 Female 15213 Yes Today
25
Example
Equivalence Class 1 Normal(100,100)
Equivalence Class 2 Normal(150,225)
Age Decile Gender Home Zip Respiratory Chief Comp. Date Admitted
50-60 Male 15213 Yes Today
Age Decile Gender Home Zip Respiratory Chief Comp. Date Admitted
50-60 Female 15213 Yes Today
If these were the only 2 Equivalence Classes in
the County then County Cough Cold OTC
Normal(100150,100225)
26
Example
Now suppose 260 units are sold in the county
P( OTC Sales 260 ED Data, PDE )
Normal( 260 250, 325 ) 0.001231
260
27
Inference Timing
  • Machine P4 3 Gigahertz, 2 GB RAM

Initialization Time (seconds) Each hour of data (seconds)
ED model 55 5
ED and OTC model 229 5
28
A Current Limitation
  • Problem Currently we assume unrealistically that
    a person only makes OTC purchases in his or her
    home zip code
  • Approach 1 Aggregate OTC-counts (e.g., at the
    county level)
  • Approach 2 For each home zip code, model the
    distribution of zip codes where OTC purchases are
    made

29
Outline
  1. Introduction
  2. Model
  3. Inference
  4. Conclusions

30
Challenges in Population-Wide Modeling Include
  • Obtaining good parameter estimates to use in
    modeling (e.g., the probability of an OTC cough
    medication purchase given an acute respiratory
    illness)
  • Modeling time and space in a way that is both
    useful and computationally tractable
  • Modeling contagious diseases

31
Conclusions
  • PANDA is a multivariate algorithm that can
    combine multiple data streams
  • Modeling each individual in the population is
    computationally feasible (so far)
  • An evaluation of the PANDA approach to modeling
    multiple data streams is in progress using
    semi-synthetic test data

32
  • Thank you
  • Current funding
  • National Science Foundation
  • Department of Homeland Security
  • Earlier funding
  • DARPA

http//www.cbmi.pitt.edu/panda/ gfc_at_cbmi.pitt.edu
33
(No Transcript)
34
The PANDA OTC Model
Model the OTC purchases for each Equivalence
Class Ei as a binomial Distribution.
Ei Binomial(NEi ,PEi)
35
The PANDA OTC Model
Model the OTC purchases for each Equivalence
Class Ei as a binomial Distribution.
Ei Binomial(NEi ,PEi)
Number of people in Equivalence Class Ei
Probability of an OTC cough medication purchase
during the previous 3 days by each person in
Equivalence Class Ei
36
The PANDA OTC Model
Model the OTC purchases for each Equivalence
Class Ei as a binomial Distribution.
Approximate the binomial distribution as a normal
distribution.
Ei Binominal(NEi ,PEi) ? Normal(?Ei ,?2Ei)
37
The PANDA OTC Model
Model the OTC purchases for each Equivalence
Class Ei as a binomial Distribution.
Approximate the binomial distribution as a normal
distribution.
Ei Binominal(NEi ,PEi) ? Normal(?Ei ,?2Ei)
?Ei NEi PEi
?2Ei NEi PEi (1 - PEi)
38
Computational Cost of a Population-Wide Approach?
1.4 million people in Allegheny County,
Pennsylvania
39
Equivalence Classes
The 1.4M people in the modeled population can be
partitioned into approximately 24,240
equivalence classes
Write a Comment
User Comments (0)
About PowerShow.com