Probabilistic Reasoning System 2: Bayesian network - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Probabilistic Reasoning System 2: Bayesian network

Description:

2. Approximate inference (Monte Carlo Algorithms) Direct sampling methods ... bowel problems and isn't out, the light is off, and dog's barking can be heard? ... – PowerPoint PPT presentation

Number of Views:215

Avg rating:3.0/5.0

Slides: 41

Provided by: Ami5153

Category:

more less

Transcript and Presenter's Notes

Title: Probabilistic Reasoning System 2: Bayesian network

1
Probabilistic Reasoning System 2Bayesian network

CS 570
Team12
??? ??, ???, ???, ???

2
Todays topics

1. Inference in Multiply Connected Network
Clustering algorithms
Cutset conditioning methods
2. Approximate inference (Monte Carlo Algorithms)
Direct sampling methods
Rejection sampling in Bayesian networks
Likelihood weighting
Markov Chain Monte Carlo algorithm
3. Implementation tools and applications of
Bayesian network

3
Singly connected Network/ Polytree
P(E) ____ .002
P(B) ____ .001
Burglary
Earthquake

A singly connected network is a network in
which there is at most one undirected path
between any nodes in it
Size is defined as the number of CPT entries.

B E P(A) _______________ t t .95 t f .94 f t .29 f
f .001
Alarm
MaryCalls
JohnCalls
A P(M) _________ t .70 f .01
A P(J) _______ t .90 f .05
4
Multiply connected Network

Network in which two nodes are connected by
more than one path
That means
Two or more possible causes for some variable
One variable can influence another variable
through more than one causal mechanism

C P(R) ________ t .80 f .20
Cloudy
P(C).5
Rain
Sprinkler
S R P(W) ________________ t t .99 t f .90 f t .90
f f .00
C P(S) ________ t .10 f .50
WetGrass
5
Clustering Algorithms (1)

Combining individual nodes of the network to form
cluster nodes (meganode) in such a way that the
resulting network is a polytree (Singly connected
network)
The meganode has only one parent.
Time can be reduced to O(n)

6
Clustering Algorithms (2)
P(C).5
Cloudy
P(C).5
Cloudy
C P(R) ________ t .80 f .20
C P(S) ________ t .10 f .50
SprRain
C P(SRx) t t t f f t f f _____________________
_______________ t .08 .02 .72 .18 f .10 .40 .10 40
Rain
Sprinkler
WetGrass
WetGrass
S R P(W) ________________ t t .99 t f .90 f t .90
f f .00
S R P(W) ________________ t t .99 t f .90 f t .90
f f .00
7
Cutset conditioning

a cutset is a set of variables that can be
instantiated to yield polytree
each polytree has one or more variables
instantiated to a definite value
the number of resulting polytrees is exponential
in the size of the cutset, so the smallest cutset
is the best
evaluating the most likely polytrees first is
called bounded cutset conditioning

8
Cutset conditioning example
9

1. Inference in Multiply Connected Network
Clustering algorithms
Cutset conditioning methods
2. Approximate inference (Monte Carlo Algorithms)
Direct sampling methods
Rejection sampling in Bayesian networks
Likelihood weighting
Markov Chain Monte Carlo algorithm
3. Implementation tools and applications of
Bayesian network

10
Monte Carlo Algorithms

Provide approximate answers whose accuracy
depends on the number of samples generated
Two families
Direct sampling
Markov Chain sampling

11
Direct sampling methods (1)

No evidence associated with the network
The algorithm samples each variable in turn
For nodes without parents, sample from their
distribution for nodes with parents, sample from
the conditional distribution.
Example
Sample from P(Cloudy) 0.5,0.5 gt true
Sample from P(Sprinklercloudy True) 0.1,0.9
gt false
Sample from P(RainCloudy True) 0.8,0.2 gt
true
Sample from P(WetGrassSprinkler false,
RainTrue)0.9,0.1 gt true

12
P(b).05
FamilyOut
P(f).2
BowelProblems
LightsOn
DogOut
D P(hD) t .6 f .25
F B P(dF,B) t t .994 t f .88 f t .96 f f .2
HearDogBark
F P(lF) t .99 f .1
Query What is the probability of family at home,
dog has no bowel problems and isnt out, the
light is off, and dogs barking can be
heard? We generate 100,000 samples from the
network We obtain NS(f,b,l,d,h)
13740 13740/100000 0.1374 P(f,b,l,d,h)
0.80.950.90.80.25 0.1368
13
Direct sampling methods(2)Prior-Sample algorithm

Function PRIOR-SAMPLE(bn) returns an event
sampled from the prior specified by bn
inputs bn, a Bayesian network specifying joint
distribution P(X1,,Xn)
x lt- an event with n elements
for i 1 to n do
xi lt- a random sample from P(Xi parents(Xi))
return x

14
Direct sampling methods(3)

Looking at the sampling process, we have
Which remind
This estimated probability becomes exact in the
large-sample limit. We call it consistent

15
Rejection sampling (1)

Method to approximate conditional probabilities
P(Xe) given evidence e
P(Xe) Ns(X,e) / Ns(e)
1. First, generates samples from the prior
distribution specified by the network.
2. Then, rejects all those that do not match the
evidence
3. Finally, the estimate P(Xxe) is obtained by
counting how often Xx occurs in the remaining
samples.

16
Rejection sampling (2) algorithm

Function Rejection-Sampling(X,e,b,N) returns
an estimate of P(Xe)
Inputs X, the query variable
e, evidence specified as an event
bn, a Bayesian network
N, the total number of samples to be
generated
Local variables N, a vector of counts over X,
initially zero
For j1 to N do
x lt- PRIOR-SAMPLE(bn)
if x is consistent with e then
Nx lt- Nx 1 where x is the value of X in x
return NORMALIZE(NX)

17
Rejection sampling (3)Example

Let us assume we want to estimate
P(rainSprinkler true)
100 samples
73 samples gt Sprinkler false
27 samples gt Sprinkler true
8 samples gt Rain true
19 samples gt Rain false

P(RainSprinkler true) NORMALIZE(8,19)
0.296,0.704.
The true answer is 0.3,0.7
As more samples are collected, the estimate will
converge to the true answer

18
Rejection sampling (4) Problems

Problems
Rejection sampling rejects too many samples.
The fraction of samples consistent with the
evidence e drops exponentially as the number of
evidence variable drops.
Procedure unusable for complex problems

19
Likelihood weighting (1)

Avoids the inefficiency of rejection sampling by
generating only events that are consistent with
the evidence e.
That is, It fixes the values for the evidence
variable E and samples only the remaining
variables X and Y.
Each event generated is consistent with the
evidence
Also, it weights each sample by product of
conditional probabilities of evidence variables,
given its parents

20
Likelihood weighting (2)Example for one sample

Example
Query P(RainSprinklertrue, WetGrass true)
The weight is set to 1.0
1. Sample from P(Cloudy) 0.5,0.5 gt true
2. Sprinkler is an evidence variable with value
true
w lt- w P(Sprinklertrue Cloudy true) 0.1
3. Sample from P(RainCloudytrue)0.8,0.2 gt
true
4. WetGrass is an evidence variable with value
true
w lt- w P(WetGrasstrue Sprinklertrue, Rain
true) 0.1

21
Likelihood-Weighting(3) algorithm

function LIKELIHOOD-WEIGHTING(X,e,bn,N) returns
an estimate of P(Xe)
inputs X, the query variable
e, evidence specified as an event
bn, a Bayesian Network
N, the total number of samples to be generated
local variables W, a vector of weighted counts
over X, initially zero
for j 1 to N do
x, w lt- Weighted-Sample(bn)
Wx lt- Wx w where x is the value of X in
x
return NORMALIZE(WX)
__________________________________________________
_______________________________
function WEIGHTED-SAMPLE(e,bn) returns an event
and a weight
x lt- an event with n elements w lt- 1
for i 1 to n do
if Xi has a value xi in e
then w lt- w P(Xixi parents(Xi))
else xi lt- a random sample from
P(Xiparents(Xi))

22
Likelihood weighting (4)Example for one query
P(B).05
FamilyOut
P(F).2
BowelProblems
LightsOn
DogOut
D P(HD) t .6 f .25
F B P(HD) t t .994 t f .88 f t .96 f f .2
HearDogBark
F P(LF) t .99 f .1
23
Likelihood weighting (4)Example for one query

For N100000, obtain
Correct

24
Markov Chain Monte Carlo algorithm(1)

Some vocabulary
A node is conditionally independent of all other
nodes in the network, given its parents,
children, and childrens parents that is, given
its Markov Blanket. For example, Burglary is
independent of JohnCalls and MaryCalls, given
alarm and Earthquake.
A sequence of discrete random variables X0, X1,
is called a Markov chain with state space S iff
Thus, Xn is conditionally independent of all
other variables given Xn-1

25
Markov Chain Monte Carlo algorithm(2)

function MCMC-Ask(X,e,bn,N) returns an estimate
of P(Xe)
local variables NX, a vector of counts over
X, initially zero
Z, the nonevidence variables in bn.
x, the current state of the network,
initially copied from e
initialize x with random values for the
variables in Z
for j 1 to N do
Nx lt- Nx 1 where x is the value of X in x
for each Zi in Z do
sample the value of Zi in x from P(Zimb(Zi))
given the values of MB(Zi) in x
return NORMALIZE(NX)

26
Markov Chain Monte Carlo algorithm(3)

Lets think of the network as being in a
particular current state specifying a value for
every variable
MCMC generates each event by making a random
change to the preceding event
The next state is generated by randomly sampling
a value for one of the nonevidence variables Xi,
conditioned on the current values of the
variables in the MarkovBlanket of Xi

27
Markov Chain Monte Carlo Example1

Query P(RainSprinkler true, WetGrass true)
Initial state is true, true, false, true
The following steps are executed repeatedly
Cloudy is sampled, given the current values of
its MarkovBlanket variables
So, we sample from P(CloudySprinkler true,
Rainfalse)
Suppose the result is Cloudy false.
Then current state is false, true, false, true
Rain is sampled given the current values of its
MarkovBlanket variables
So, we sample from P(RainCloudyfalse,Sprinkler
true, Rainfalse)
Suppose the result is Rain true.
Then current state is false, true, true, true
After all the iterations, lets say the process
visited 20 states where rain is true and 60
states where rain is false then the answer of the
query is NORMALIZE(20,60)0.25,0.75

28
Markov Chain Monte Carlo Example2

Query P(Fl,d)
Start with arbitrary non-evidence settings
(f,b,h)
Pick F, sample from P(Fl,d,b), obtain f
Pick B, sample from P(B f,d), obtain b
Pick H, sample from P(Hd), obtain d
Iterate last three steps 50000 times, keep last
10000 states
Obtain P(fl,d) 0.9016(correct 0.90206)

1. Inference in Multiply Connected Network
Clustering algorithms
Cutset conditioning methods
2. Approximate inference (Monte Carlo Algorithms)
Direct sampling methods
Rejection sampling in Bayesian networks
Likelihood weighting
Markov Chain Monte Carlo algorithm
3. Implementation tools and applications of
Bayesian network

30
Implementation Tool Microsoft Belief Networks

MSBNx is a component-based Windows application
for creating, assessing, and evaluating Bayesian
Networks, created at Microsoft Research
MSBNx http//www.research.microsoft.com/adapt/MSB
Nx/
Why MSBNx?
Its free and available on the net.
Its easy to learn how to use it.
We can specify full and causally independent
probability distributions
If we have sufficient data and use machine
learning tools to create Bayesian Networks, we
can use MSBNx to edit and evaluate the results

31
Implementation Tool Microsoft Belief Networks

How to use it?
First, we make the nodes on the window
Then, we specify all the probabilities
Finally, we do what we want to do with the
network

32
Implementation Tool Microsoft Belief Networks
(example) making the network
33
Implementation Tool Microsoft Belief Networks
(example) assessing probabilities(1)
34
Implementation Tool Microsoft Belief Networks
(example) assessing probabilities(2)
CPT
35
Implementation Tool Microsoft Belief Networks
(example) Obtaining results without evidence(1)
To obtain results
36
Implementation Tool Microsoft Belief Networks
(example) Obtaining results without evidence(2)
37
Implementation Tool Microsoft Belief Networks
(example) Obtaining results with evidence(1)
38
Implementation Tool Microsoft Belief Networks
(example) Obtaining results with evidence(2)
Evidence
Evidence
39
Implementation Tool Microsoft Belief Networks
(example) Obtaining results with evidence(3)
40