Title: Bayesian Biosurveillance Using Causal Networks
1Bayesian Biosurveillance Using Causal Networks
Greg Cooper RODS Laboratory and the
Laboratory for Causal
Modeling and Discovery Center for Biomedical
Informatics University of Pittsburgh
2Outline
- Biosurveillance goals
- Biosurveillance as diagnosis of a population
- Introduction to causal networks
- Examples of using causal networks for
biosurveillance - Summary and challenges
3Biosurveillance Detection Goals
- Detect an unanticipated biological disease
outbreak in the population as rapidly and as
accurately as possible - Determine the people who already have the disease
- Predict the people who are likely to get the
disease
4Biosurveillance as Diagnosis of a Population
5The Similarity of Patient Diagnosis and
Population Diagnosis
Patient risk factors
Population risk factors
Patient disease
Population disease
Patient symptom 1
Patient symptom 2
Symptoms of patient 1
Symptoms of patient 2
6Simple Examples of Patient Diagnosis and
Population Diagnosis
smoking
threats of bioterrorism
lung cancer
aerosolized release of anthrax
weight loss
fatigue
Patient 1 has respiratory symptoms
Patient 2 has respiratory symptoms
7Population Diagnosis with a More Detailed Patient
Model
threats of bioterrorism
aerosolized release of anthrax
?
?
?
patient 1 disease status
patient 2 disease status
respiratory symptoms
respiratory symptoms
wide mediastinum on X-ray
wide mediastinum on X-ray
8Population-Level Symptoms
threats of bioterrorism
aerosolized release of anthrax
local sales of over-the-counter (OTC) cough
medications
patient 1 disease status
patient 2 disease status
respiratory symptoms
respiratory symptoms
wide mediastinum on X-ray
wide mediastinum on X-ray
9An Alternative Way of Modeling OTC Sales
threats of bioterrorism
aerosolized release of anthrax
patient 1 disease status
patient 2 disease status
wide mediastinum on X-ray
respiratory symptoms
wide mediastinum on X-ray
respiratory symptoms
local sales of over-the-counter (OTC) cough
medications
10threats of bioterrorism
aerosolized release of anthrax
sales of over-the-counter (OTC) cough medications
patient 1 disease status
patient 2 disease status
respiratory symptoms
respiratory symptoms
wide mediastinum on X-ray
wide mediastinum on X-ray
11An Introduction to Causal Networks
- A causal network has two components
- Structure A diagram in which nodes represent
variables and arcs between nodes represent causal
influence - Parameters A probability distribution for each
effect given its direct causes
The diagram (graph) is not allowed to contain
directed cycles, which conveys that an effect
cannot cause itself.
12An Example of a Causal Network
Causal network structure
aerosolized release of anthrax (ARA)
patient disease status (PDS)
respiratory symptoms (RS)
Causal network parameters
P(ARA true) 0.000001 P(PDS respiratory
anthrax ARA true) 0.001 P(PDS respiratory
anthrax ARA false) 0.00000001 P(RS
present PDS respiratory anthrax) 0.8 P(RS
present PDS other) 0.1
These parameters are for illustration only.
13A Previous Example of a Causal Network
threats of bioterrorism
aerosolized release of anthrax
sells of over-the-counter (OTC) cough medications
patient 1 disease status
patient 2 disease status
respiratory symptoms
respiratory symptoms
wide mediastinum on X-ray
wide mediastinum on X-ray
14The Causal Markov Condition
- The Causal Markov Condition
- Let D be the direct causes of a variable X in a
causal network. - Let Y be a variable that is not causally
influenced by X (either directly or indirectly). - Then X and Y are independent given D.
Example
aerosolized release of anthrax
Y
patient disease status
D
respiratory symptoms
X
15A Key Intuition Behind the Causal Markov
Condition
- An effect is independent of its distant causes,
given its immediate causes
Example
aerosolized release of anthrax
Y
patient disease status
D
respiratory symptoms
X
16Joint Probability Distributions
- For a model with binary variables X and Y, the
joint probability distribution is - P(X t, Y t), P(X t, Y f), P(X f, Y
t), P(X f, Y f) - We can use the joint probability distribution to
derive any conditional probability of interest on
the model variables. - Example P(X t Y t)
-
17A Causal Network Specifies a Joint Probability
Distribution
- The causal Markov condition permits the joint
probability distribution to be factored as
follows - Example
- P(RS, PDS, ARA) P(RS PDS) P(PDS ARA)
P(ARA)
ARA
PDS
RS
18Causal Network Inference
- Inference algorithms exist for deriving a
conditional probability of interest from the
joint probability distribution defined by a
causal network. - Example P(ARA TOB , Pt1_RS ,
Pt2_WM , OTC )
threats of bioterrorism (TOB)
aerosolized release of anthrax (ARA)
sales of over-the-counter (OTC) cough medications
?
?
patient 1 (Pt1) disease status
?
patient (Pt2) disease status
respiratory symptoms
respiratory symptoms (RS)
wide mediastinum on X-ray (WM)
wide mediastinum on X-ray
19Examples of Using Bayesian Inference on Causal
Networks for Biosurveillance
- The following models are highly simplified and
serve as simple examples that suggest a set of
research issues - They are intended only to illustrate basic
principles - These models were implemented using Hugin
(version 6.1) www.hugin.com
20Basic Population Model
21Prior Risk of Release of Agent X
22Basic Patient Model
23A Model with One Patient Case
24A Model with One Abstracted Patient Case
25Where do the probabilities come from?
- Databases of prior cases
- Case studies in the literature
- Animal studies
- Computer models (e.g., particle dispersion
models) - Expert assessments
26A Model with One Abstracted Patient Case
27An Example in Which a Single Patient Case Is
Inadequate to Detect a Release
Data A patient who presents with respiratory
symptoms today
28How Might We Distinguish Anticipated Diseases
(e.g., Influenza) from Unanticipated Diseases
(e.g., Respiratory Anthrax)?
- Differences in their expected spatio-temporal
patterns over the population may be very helpful.
29A Model with Two Patient Cases
30A Model with Three Patient Cases
31A Model with Ten Patient Cases
32A Hypothetical Population of Ten People (not all
of whom are patients)
Person Home Location Day of ED Visit ED
Symptoms 1 area 1 yesterday respiratory 2 area
1 yesterday non-respiratory 3 area
2 yesterday non-respiratory 4 area 2 no
visit to ED NA 5 area 1 no visit to
ED NA 6 area 1 today respiratory 7 area
2 today non-respiratory 8 area
1 today respiratory 9 area 1 no visit to
ED NA 10 area 2 no visit to ED NA
33Posterior Probability of a Release of X Among the
Population of Ten People Being Modeled
34Adding Population-Based Data
Data Increased OTC sales of cough medications
today
35For Each Person in the Population a Probability
of Current Infection with Disease X Can be
Estimated
Person Home Location Day of ED Visit ED
Symptoms Risk for Disease X 1 area
1 yesterday respiratory 26 2 area
1 yesterday non-respiratory 9 3 area
2 yesterday non-respiratory 6 4 area 2 no
visit to ED NA lt 1 5 area 1 no visit to
ED NA lt 1 6 area 1 today respiratory 27 7 are
a 2 today non-respiratory 11 8 area
1 today respiratory 27 9 area 1 no visit to
ED NA lt 1 10 area 2 no visit to ED NA lt 1
36Modeling the Frequency Distribution Over
the Number of Infected People
37The Frequency Distribution Over
the Number of Infected People in the
Example
38A More Detailed Patient Model
39Incorporating Heterogeneous Patient Models
Data Same as before, except patient 1 is now
known to have a chest X-ray result that is
consistent with Disease X
40We Can Use the Derived Posterior Probabilities in
a Computer-Based Ongoing Decision Analysis
P(dx X evidence)
U(alarm, dx X)
sound an alarm
P(no dx X evidence)
U(alarm, no dx X)
P(dx X evidence)
U(silent, dx X)
keep silent
P(no dx X evidence)
U(silent, no dx X)
The probabilities in blue can be derived using a
causal network.
41Summary of Bayesian Biosurveillance Using Causal
Networks
- Biosurveillance can be viewed as ongoing
diagnosis of an entire population. - Causal networks provide a flexible and expressive
means of coherently modeling a population at
different levels of detail. - Inference on causal networks can derive the type
posterior probabilities needed for
biosurveillance. - These probabilities can be used in a decision
analytic system that determines whether to raise
an alarm (and that can recommend which additional
data to collect).
42Challenges Include ...
43One Challenge Modeling Contagious Diseases
- One approach Include arcs among the
disease-status nodes of individuals who were in
close proximity of each other during the period
of concern being modeled.
44Another Challenge Achieving Tractable Inference
on Very Large Causal Networks
- Possible approaches include
- Aggregating individuals into equivalence classes
to reduce the size of the causal network - Use sampling methods to reduce the time of
inference (at the expense of deriving only
approximate posterior probabilities)
45Some Additional Challenges
-
- Constructing realistic outbreak models
- Constructing realistic decision models about when
to raise an alert - Developing explanations of alerts
- Evaluating the detection system
46Suggested Reading
- R.E. Neapolitan, Learning Bayesian Networks
(Prentice Hall, 2003).
47A Sample of Causal Network Commercial Software
- Hugin www.hugin.com
- Netica www.norsys.com
- Bayesware www.bayesware.com