Title: Bayesian%20networks
1Bayesian networks
2Motivation
- We saw that the full joint probability can be
used to answer any question about the domain, but
can become intractable as the number of variables
grow. - Furthermore specifying probabilities of atomic
events is rather unnatural and can be very
difficult unless a large amount of data is
available from which to gather statistics. - Human performance, by contrast, exhibits a
different complexity ordering probabilistic
judgments on conditional statements involving a
small number of propositions are issued swiftly
and reliably, while judging the likelihood of a
conjuction of many propositions is done with a
great degree of difficulty and hesitancy. - This suggests that the elementary building blocks
which make up human knowledge arent the entries
of joint probability table but, rather, the
low-order conditional probabilities defined over
small clusters of propositions.
3Bayesian networks
- A simple, graphical notation for conditional
independence assertions and hence for compact
specification of full joint distributions. - Syntax
- a set of nodes, one per variable
- a directed, acyclic graph (a link means
"directly influences") - a conditional distribution for each node given
its parents - P (Xi Parents (Xi))
- The conditional distribution is represented as a
conditional probability table (CPT) giving the
distribution over Xi for each combination of
parent values.
4Example
- Topology of network encodes conditional
independence assertions - Weather is independent of the other variables
- Toothache and Catch are conditionally independent
given Cavity, which is indicated by the absence
of a link between them.
5Another Example
- I'm at work, neighbor John calls to say my alarm
is ringing, but neighbor Mary doesn't call.
Sometimes it's set off by minor earthquakes. Is
there a burglar? - Variables Burglary, Earthquake, Alarm,
JohnCalls, MaryCalls - Network topology reflects "causal" knowledge
- A burglar can set the alarm off
- An earthquake can set the alarm off
- The alarm can cause Mary to call
- The alarm can cause John to call
6Example contd
The topology shows that burglary and earthquakes
directly affect the probability of alarm, but
whether Mary or John call depends only on the
alarm. Thus our assumptions are that they dont
perceive any burglaries directly, and they dont
confer before calling.
7Compactness of Conditional Probability Tables
(CPTs)
- A CPT for Boolean Xi with k Boolean parents has
2k rows for the combinations of parent values - Each row requires one number p for Xi true(the
number for Xi false is just 1-p) - If each variable has no more than k parents, the
complete network requires O(n 2k) numbers
- I.e., grows linearly with n, vs. O(2n) for the
full joint distribution - For burglary net, 1 1 4 2 2 10 numbers
(vs. 25-1 31)
8Semantics
- The full joint distribution is defined as the
product of the local conditional distributions
- P(x1, ,xn)
- ?i 1 P(xi parents(xi))
- e.g.,
- P(j ? m ? a ? ?b ? ?e)
- P(j a) P(m a) P(a ?b, ?e) P(?b) P(?e)
-
-
9Constructing Bayesian networks
- 1. Choose an ordering of variables X1, ,Xn
- 2. For i 1 to n
- add Xi to the network
- select parents from X1, ,Xi-1 such that
- P(Xi Parents(Xi)) P(Xi X1, ... Xi-1)
- This choice of parents guarantees
- P(X1, ,Xn) ?i 1 P(Xi X1, , Xi-1)
(chain rule) - ?i 1P(Xi Parents(Xi))(by construction)
10Example
- The ordering of variables is very important.
- E.g. suppose we choose the ordering M, J, A, B, E
- Adding MaryCalls No parents
- P(JM) P(J)?
- Is P(John calling) independent of P(Mary
calling)? - Clearly not, since, on any given day, if Mary
called, then the probability that John called is
much better than the background probability that
he called. - So, we add a link from MaryCalls to JohnCalls.
11Example
- Suppose we choose the ordering M, J, A, B, E
- Adding the A (Alarm) node Is
- P(A J, M) P(A J)?
- P(A J, M) P(A)?
- No.
- Clearly, if both call, its more likely that the
alarm has gone off that if just one or neither
call, so we need both MaryCalls and JohnCalls as
parents.
12Example
- Suppose we choose the ordering M, J, A, B, E
- Adding B (Burglary) node Is
- P(B A, J, M) P(B A)?
- P(B A, J, M) P(B)?
- Yes for the first. No for the second.
- If we know the alarm state, then the call from
John or Mary might give us information about the
phone ringing or Marys music, but not about
burglary. - So, we need just Alarm as parent.
13Example
- Suppose we choose the ordering M, J, A, B,
E
- Adding E (Earthquake) node Is
- P(E B, A ,J, M) P(E A)?
- P(E B, A, J, M) P(E A, B)?
- No for the first. Yes for the second.
- If the alarm is on, it is more likely that there
has been an earthquake. - But if we know there has been a burglary, then
that explains the alarm, and the probability of
an earthquake would be only slightly above
normal. - Hence we need both Alarm and Burglary as parents.
14Example contd
- So, the network is less compact if we go
non-causal 1 2 4 2 4 13 numbers needed
instead of 10 if we go in causal direction.
- Deciding conditional independence is hard in
noncausal directions
- Causal models and conditional independence seem
hardwired for humans!
15So
- Bayesian networks provide a natural
representation for (causally induced) conditional
independence - Topology CPTs compact representation of
joint distribution - Generally easy for domain experts to construct
16Noisy-OR
- Even if the maximum number of parents k is small,
filling the CPT for a node is tedius. - Uncertain relationships can often be
characterized by so-called noisy OR relation,
which is generalization of logical OR. - In propositional logic, we might say that Fever
is true if Cold, Flu, or Malaria is true. - The noisy-OR allows for uncertainty about the
ability of each parent cause to cause the child
to be true. - The causal relationship may be inhibited, and so
a patient could have cold, but not exhibit fever.
- So, suppose we managed to find out the
probabilities of inhibitions - P(?fever cold, ?flu, ?malaria) 0.6
- P(?fever ?cold, flu, ?malaria) 0.2
- P(?fever ?cold, ?flu, malaria) 0.1
- Now, we can easily construct the truth table
17Observe that by using the noisy-OR we needed to
specify 3 entries only instead of 8. In general,
for k parents, we need to specify k entries
instead of 2k.
18Exact Inference in Bayesian Networks
- The basic task for any probabilistic inference
system is to compute the posterior probability
for a set of query variables, given some observed
event that is, some assignment of values to a
set of evidence variables. - Notation
- X denotes query variable
- E denotes the set of evidence variables E1,,Em,
and e is a particular event, i.e. an assignment
to the variables in E. - Y will denote the set of the remaining variables
(hidden variables). - A typical query asks for the posterior
probability P(Xxe), i.e. P(xe1,,em). - E.g. We could ask Whats the probability of a
burglary if both Mary and John calls, P(burglary
johhcalls, marycalls)?
19Inference by enumeration
- Slightly intelligent way to sum out variables
from the joint without actually constructing its
explicit representation
20Numerically
Complete it for exercise
- P(b j,m) ? P(b) ?e P(e)?aP(ab,e)P(ja)P(ma)
? 0.00059 - P(?b j,m) ? P(?b) ?eP(e)?aP(a?b,e)P(ja)P(ma
) ? 0.0015 - P(B j,m) ? lt0.00059, 0.0015gt lt0.284, 0.716gt.