Title: Probabilistic Reasoning System 2: Bayesian network
1Probabilistic Reasoning System 2Bayesian network
- CS 570
- Team12
- ??? ??, ???, ???, ???
2Todays topics
- 1. Inference in Multiply Connected Network
- Clustering algorithms
- Cutset conditioning methods
- 2. Approximate inference (Monte Carlo Algorithms)
- Direct sampling methods
- Rejection sampling in Bayesian networks
- Likelihood weighting
- Markov Chain Monte Carlo algorithm
- 3. Implementation tools and applications of
Bayesian network
3Singly connected Network/ Polytree
P(E) ____ .002
P(B) ____ .001
Burglary
Earthquake
- Â A singly connected network is a network in
which there is at most one undirected path
between any nodes in it - Size is defined as the number of CPT entries.
B E P(A) _______________ t t .95 t f .94 f t .29 f
f .001
Alarm
MaryCalls
JohnCalls
A P(M) _________ t .70 f .01
A P(J) _______ t .90 f .05
4Multiply connected Network
- Â Network in which two nodes are connected by
more than one path - That means
- Two or more possible causes for some variable
- One variable can influence another variable
through more than one causal mechanism
C P(R) ________ t .80 f .20
Cloudy
P(C).5
Rain
Sprinkler
S R P(W) ________________ t t .99 t f .90 f t .90
f f .00
C P(S) ________ t .10 f .50
WetGrass
5Clustering Algorithms (1)
- Combining individual nodes of the network to form
cluster nodes (meganode) in such a way that the
resulting network is a polytree (Singly connected
network) - The meganode has only one parent.
- Time can be reduced to O(n)
6Clustering Algorithms (2)
P(C).5
Cloudy
P(C).5
Cloudy
C P(R) ________ t .80 f .20
C P(S) ________ t .10 f .50
SprRain
C P(SRx) t t t f f t f f _____________________
_______________ t .08 .02 .72 .18 f .10 .40 .10 40
Rain
Sprinkler
WetGrass
WetGrass
S R P(W) ________________ t t .99 t f .90 f t .90
f f .00
S R P(W) ________________ t t .99 t f .90 f t .90
f f .00
7Cutset conditioning
- a cutset is a set of variables that can be
instantiated to yield polytree - each polytree has one or more variables
instantiated to a definite value - the number of resulting polytrees is exponential
in the size of the cutset, so the smallest cutset
is the best - evaluating the most likely polytrees first is
called bounded cutset conditioning
8Cutset conditioning example
9- 1. Inference in Multiply Connected Network
- Clustering algorithms
- Cutset conditioning methods
- 2. Approximate inference (Monte Carlo Algorithms)
- Direct sampling methods
- Rejection sampling in Bayesian networks
- Likelihood weighting
- Markov Chain Monte Carlo algorithm
- 3. Implementation tools and applications of
Bayesian network
10Monte Carlo Algorithms
- Provide approximate answers whose accuracy
depends on the number of samples generated - Two families
- Direct sampling
- Markov Chain sampling
11Direct sampling methods (1)
- No evidence associated with the network
- The algorithm samples each variable in turn
- For nodes without parents, sample from their
distribution for nodes with parents, sample from
the conditional distribution. - Example
- Sample from P(Cloudy) 0.5,0.5 gt true
- Sample from P(Sprinklercloudy True) 0.1,0.9
gt false - Sample from P(RainCloudy True) 0.8,0.2 gt
true - Sample from P(WetGrassSprinkler false,
RainTrue)0.9,0.1 gt true
12P(b).05
FamilyOut
P(f).2
BowelProblems
LightsOn
DogOut
D P(hD) t .6 f .25
F B P(dF,B) t t .994 t f .88 f t .96 f f .2
HearDogBark
F P(lF) t .99 f .1
Query What is the probability of family at home,
dog has no bowel problems and isnt out, the
light is off, and dogs barking can be
heard? We generate 100,000 samples from the
network We obtain NS(f,b,l,d,h)
13740 13740/100000 0.1374 P(f,b,l,d,h)
0.80.950.90.80.25 0.1368
13Direct sampling methods(2)Prior-Sample algorithm
- Function PRIOR-SAMPLE(bn) returns an event
sampled from the prior specified by bn - inputs bn, a Bayesian network specifying joint
distribution P(X1,,Xn) - x lt- an event with n elements
- for i 1 to n do
- xi lt- a random sample from P(Xi parents(Xi))
- return x
14Direct sampling methods(3)
- Looking at the sampling process, we have
-
- Which remind
- This estimated probability becomes exact in the
large-sample limit. We call it  consistentÂ
15Rejection sampling (1)
- Method to approximate conditional probabilities
P(Xe) given evidence e - P(Xe) Ns(X,e) / Ns(e)
- 1. First, generates samples from the prior
distribution specified by the network. - 2. Then, rejects all those that do not match the
evidence - 3. Finally, the estimate P(Xxe) is obtained by
counting how often Xx occurs in the remaining
samples.
16Rejection sampling (2) algorithm
- Function Rejection-Sampling(X,e,b,N) returns
an estimate of P(Xe) - Inputs X, the query variable
- e, evidence specified as an event
- bn, a Bayesian network
- N, the total number of samples to be
generated - Local variables N, a vector of counts over X,
initially zero - For j1 to N do
- x lt- PRIOR-SAMPLE(bn)
- if x is consistent with e then
- Nx lt- Nx 1 where x is the value of X in x
- return NORMALIZE(NX)
17Rejection sampling (3)Example
- Let us assume we want to estimate
P(rainSprinkler true) - 100 samples
- 73 samples gt Sprinkler false
- 27 samples gt Sprinkler true
- 8 samples gt Rain true
- 19 samples gt Rain false
- P(RainSprinkler true) NORMALIZE(8,19)
0.296,0.704. - The true answer is 0.3,0.7
- As more samples are collected, the estimate will
converge to the true answer
18Rejection sampling (4) Problems
- Problems
- Rejection sampling rejects too many samples.
- The fraction of samples consistent with the
evidence e drops exponentially as the number of
evidence variable drops. - Procedure unusable for complex problems
19Likelihood weighting (1)
- Avoids the inefficiency of rejection sampling by
generating only events that are consistent with
the evidence e. - That is, It fixes the values for the evidence
variable E and samples only the remaining
variables X and Y. - Each event generated is consistent with the
evidence - Also, it weights each sample by product of
conditional probabilities of evidence variables,
given its parents
20Likelihood weighting (2)Example for one sample
- Example
- Query P(RainSprinklertrue, WetGrass true)
- The weight is set to 1.0
- 1. Sample from P(Cloudy) 0.5,0.5 gt true
- 2. Sprinkler is an evidence variable with value
true - w lt- w P(Sprinklertrue Cloudy true) 0.1
- 3. Sample from P(RainCloudytrue)0.8,0.2 gt
true - 4. WetGrass is an evidence variable with value
true - w lt- w P(WetGrasstrue Sprinklertrue, Rain
true) 0.1
21Likelihood-Weighting(3) algorithm
- function LIKELIHOOD-WEIGHTING(X,e,bn,N) returns
an estimate of P(Xe) - inputs X, the query variable
- e, evidence specified as an event
- bn, a Bayesian Network
- N, the total number of samples to be generated
- local variables W, a vector of weighted counts
over X, initially zero - for j 1 to N do
- x, w lt- Weighted-Sample(bn)
- Wx lt- Wx w where x is the value of X in
x - return NORMALIZE(WX)
- __________________________________________________
_______________________________ - function WEIGHTED-SAMPLE(e,bn) returns an event
and a weight - x lt- an event with n elements w lt- 1
- for i 1 to n do
- if Xi has a value xi in e
- then w lt- w P(Xixi parents(Xi))
- else xi lt- a random sample from
P(Xiparents(Xi))
22Likelihood weighting (4)Example for one query
P(B).05
FamilyOut
P(F).2
BowelProblems
LightsOn
DogOut
D P(HD) t .6 f .25
F B P(HD) t t .994 t f .88 f t .96 f f .2
HearDogBark
F P(LF) t .99 f .1
23Likelihood weighting (4)Example for one query
- For N100000, obtain
- Correct
24Markov Chain Monte Carlo algorithm(1)
- Some vocabulary
- A node is conditionally independent of all other
nodes in the network, given its parents,
children, and childrens parents that is, given
its Markov Blanket. For example, Burglary is
independent of JohnCalls and MaryCalls, given
alarm and Earthquake. - A sequence of discrete random variables X0, X1,
is called a Markov chain with state space S iff - Thus, Xn is conditionally independent of all
other variables given Xn-1 -
25Markov Chain Monte Carlo algorithm(2)
- function MCMC-Ask(X,e,bn,N) returns an estimate
of P(Xe) - local variables NX, a vector of counts over
X, initially zero - Z, the nonevidence variables in bn.
- x, the current state of the network,
initially copied from e - initialize x with random values for the
variables in Z - for j 1 to N do
- Nx lt- Nx 1 where x is the value of X in x
- for each Zi in Z do
- sample the value of Zi in x from P(Zimb(Zi))
given the values of MB(Zi) in x - return NORMALIZE(NX)
26Markov Chain Monte Carlo algorithm(3)
- Lets think of the network as being in a
particular current state specifying a value for
every variable - MCMC generates each event by making a random
change to the preceding event - The next state is generated by randomly sampling
a value for one of the nonevidence variables Xi,
conditioned on the current values of the
variables in the MarkovBlanket of Xi
27Markov Chain Monte Carlo Example1
- Query P(RainSprinkler true, WetGrass true)
- Initial state is true, true, false, true
- The following steps are executed repeatedly
- Cloudy is sampled, given the current values of
its MarkovBlanket variables - So, we sample from P(CloudySprinkler true,
Rainfalse) - Suppose the result is Cloudy false.
- Then current state is false, true, false, true
- Rain is sampled given the current values of its
MarkovBlanket variables - So, we sample from P(RainCloudyfalse,Sprinkler
true, Rainfalse) - Suppose the result is Rain true.
- Then current state is false, true, true, true
- After all the iterations, lets say the process
visited 20 states where rain is true and 60
states where rain is false then the answer of the
query is NORMALIZE(20,60)0.25,0.75
28Markov Chain Monte Carlo Example2
- Query P(Fl,d)
- Start with arbitrary non-evidence settings
(f,b,h) - Pick F, sample from P(Fl,d,b), obtain f
- Pick B, sample from P(B f,d), obtain b
- Pick H, sample from P(Hd), obtain d
- Iterate last three steps 50000 times, keep last
10000 states - Obtain P(fl,d) 0.9016(correct 0.90206)
29- 1. Inference in Multiply Connected Network
- Clustering algorithms
- Cutset conditioning methods
- 2. Approximate inference (Monte Carlo Algorithms)
- Direct sampling methods
- Rejection sampling in Bayesian networks
- Likelihood weighting
- Markov Chain Monte Carlo algorithm
- 3. Implementation tools and applications of
Bayesian network
30Implementation Tool Microsoft Belief Networks
- MSBNx is a component-based Windows application
for creating, assessing, and evaluating Bayesian
Networks, created at Microsoft Research - MSBNx http//www.research.microsoft.com/adapt/MSB
Nx/ - Why MSBNx?
- Its free and available on the net.
- Its easy to learn how to use it.
- We can specify full and causally independent
probability distributions - If we have sufficient data and use machine
learning tools to create Bayesian Networks, we
can use MSBNx to edit and evaluate the results
31Implementation Tool Microsoft Belief Networks
- How to use it?
- First, we make the nodes on the window
- Then, we specify all the probabilities
- Finally, we do what we want to do with the
network
32Implementation Tool Microsoft Belief Networks
(example) making the network
33Implementation Tool Microsoft Belief Networks
(example) assessing probabilities(1)
34Implementation Tool Microsoft Belief Networks
(example) assessing probabilities(2)
CPT
35Implementation Tool Microsoft Belief Networks
(example) Obtaining results without evidence(1)
To obtain results
36Implementation Tool Microsoft Belief Networks
(example) Obtaining results without evidence(2)
37Implementation Tool Microsoft Belief Networks
(example) Obtaining results with evidence(1)
38Implementation Tool Microsoft Belief Networks
(example) Obtaining results with evidence(2)
Evidence
Evidence
39Implementation Tool Microsoft Belief Networks
(example) Obtaining results with evidence(3)
40