Title: Constructing Bayesian Networks
1Constructing Bayesian Networks
- Part 2 of Heckerman Presentation
2Today
- Building a Bayesian network
- In getting there, well review concepts from last
time - Slide help from Andrew Moore from CMU
3Conditional Probability
- P(AB) is the fraction of worlds in which B is
true that also have A true
H Have a headache F Coming down with
Flu P(H) 1/10 P(F) 1/40 P(HF)
1/2 Headaches are rare and flu is rarer, but if
youre coming down with flu theres a 50-50
chance youll have a headache.
F
H
4Conditional Probability
P(HF) Fraction of flu-inflicted cases in which
you have a headache cases with flu and
headache cases with flu Area of H and F
region Area of F region P(H and F)
P(F)
H Have a headache F Coming down with
Flu Let P(H) 1/10 P(F) 1/40 P(HF) 1/2
5Conditional Probability
Does this mean that if you have a headache, that
you have a 50/50 chance of getting the
flu? No. We said that P(HF) P(F and H) /
P(F), so P(F and H) P(HF)P(F), so P(F and H)
1/2 1/40 1/80, which we use in P(FH)
P(F and H) / P(H), so P(FH) (1/80) / (1/10)
1/8 Not 1/2.
P(HF) Fraction of flu-inflicted worlds in
which you have a headache worlds with flu and
headache worlds with flu Area of H and F
region Area of F region P(H, F)
P(F)
H Have a headache F Coming down with
Flu Let P(H) 1/10 P(F) 1/40 P(HF) 1/2
6This is Bayes Rule
- P(A and B) P(AB)P(B)
- P(BA) ----------------
---------------- - P(A) P(A)
- Corollary The Chain Rule
- P(A and B) P(AB)P(B)
7Toward a Bayesian Network
- A BN is a graphical model that efficiently
encodes the joint probability distribution for a
large set of variables - Lets discuss an extended example that will lead
us to creating a Bayesian Network
8Independent Variables
- Say we have two events
- M Mike gives the seminar, else Mike doesnt
(not multi-valued for this example) - S the sun is shining
- The Joint PD has four entries. We can certainly
say P(S) and P(M), but how to get the joint
probabilities? - Assume sunshine doesnt depend on and doesnt
influence who is giving the seminar. Formally, - P(SM) P(S)
9Example Probability Distribution
- Let P(M) 0.3
- Let P(S) 0.9
- Say P(SM) P(S)
- We can fill the table because
- P(M and S)
- P(MS)P(S)
- P(M)P(S)
10Implications of Independence
- L Speaker arrives late
- Assume all speakers sometimes affected by the
weather - Assume Mike is most likely to arrive late
- Thus, lateness is not independent of weather and
is not independent of speaker - Since we know PD of S and M, we need
- P(LSs and Mm)
- For all true/false combinations of s and m.
11Three Variables
P(L ? M and S) 0.1 P(L ? M and S) 0.2 P(L ?
M and S) 0.05 P(L ? M and S) 0.1
P(S ? M) P(S) P(S) 0.9 P(M) 0.3
- With six numbers instead of seven, weve defined
a full joint PDF. - This savings increases with the number of
variables under consideration - Now we can draw our first network
12First Bayesian Network
P(M)0.3
P(S)0.9
S
M
L
P(L?MS)0.1 P(L?MS)0.2 P(L?MS)0.05 P(L?M
S)0.1
- Lack of M-S arrow means knowing S wont help
me predict M - M-L and S-L arrows mean knowing M or S may help
me learn L
13Conditional Independence
- Consider three events
- B seminar considers Bayesian Networks
- M Mike gives the talk
- L Speaker arrives late
- Can we find any independence? If we know the
speaker, lateness does not affect whether the
talk concerns BNs - B and L and conditionally independent given M
- P(BM,L) P(BM) and P(BM,L) P(BM)
14Conditional Independence
- Since we know that L is only influenced by M, we
can write down values of P(LM) and P(LM) - The same is true for B
P(M) 0.3 P(L?M) 0.2 P(L?M) 0.1 P(B?M)
0.8 P(B?M) 0.1
M
Given knowledge of M, knowing anything else in
the diagram wont help us with L, etc.
R
L
15Putting it all together
- T Seminar starts by 335pm
- S, M, B, L all same as before
- T only influenced by L (it is CI of B,M,S given
L) - L only influenced by M,S (it is CI of B given
M,S) - B only influenced by M (it is CI of L,S given M)
- M and S are independent
16Constructing the Network
- Steps
- Add variables
- Add links between variables
- Add probability table for each node
- Rules
- Acyclic
- Table for node X must list P(Xparents(X)) for
each combination of parent values - Each node must be conditionally independent of
all non-descendents given its parents
17The Network
This allows us to compute a joint entry, i.e.,
what is P(T and R and L and M and S)?
18Computing a Joint Entry
Expand CI! Reduce Expand CI! Reduce Expand Expand
Use Independence to Reduce We have all these in
the BN
P(TBLMS) P(T?BLMS)P(BLMS)
P(T?L)P(BLMS) P(T?L)P(B?LMS)P(LM
S) P(T?L)P(B?M)P(LMS)
P(T?L)P(B?M)P(L?MS)P(MS)
P(T?L)P(B?M)P(L?MS)P(MS)P(S)
P(T?L)P(B?M)P(L?MS)P(M)P(S)
19In General
- P(X1x1 X2x2 .Xn-1xn-1 Xnxn)
20Thats All For Today