Intelligent Systems 2II40 C7 - PowerPoint PPT Presentation

1 / 50

About This Presentation

Title:

Intelligent Systems 2II40 C7

Description:

Neighbors John and Mary promised to call if the alarm goes off; ... Given data on application form (other unshaded nodes) Efficient conditional distributions ... – PowerPoint PPT presentation

Number of Views:56

Avg rating:3.0/5.0

Slides: 51

Provided by: wwwisW

Category:

more less

Transcript and Presenter's Notes

Title: Intelligent Systems 2II40 C7

1
Intelligent Systems (2II40)C7

Alexandra I. Cristea

October 2003
2
VI. Uncertainty

VI.1. Decision theory basics
Uncertainty
Probability
Syntax
Semantics
Inference Rules
VI.2. Probabilistic reasoning
Conditional independence
Bayesian networks syntax and semantics
Exact inference
Approximate inference

3
VI.2.B. Belief networks (Bayesian networks)
4
Return to belief network example

Neighbors John and Mary promised to call if the
alarm goes off sometimes it starts because of
earthquakes. Is there a burglar?
Variables Burglary, Earthquake, Alarm,
JohnCalls, MaryCalls (n variables)

5
Belief network example cont.
6
Constructing belief networks

Choose an ordering of variables X1,,Xn
For i1 to n
add Xi to the network
select parents from X1,,Xi-1 such
that
P(XiParents(Xi))
P(XiX1,,Xi-1)

7
Constructing belief networks example
8
Constructing belief networks example
9
Constructing belief networks example
10
Constructing belief networks example
11
Constructing belief networks example
12
Example car diagnosis

Initial evidence engine wont start
Testable variables (thin ovals), diagnosis
variables (thick ovals), hidden variables
(shaded) ensure sparse structure, reduce
parameters

13
Example car insurance

Predict claim (medical, liability, property)
Given data on application form (other unshaded
nodes)

14
Efficient conditional distributions

CPT grows exponentially w. no. of parents
CPT becomes infinite w. continuous variables
Other, more compact methods are needed

15
Compact conditional distributions - cont.

Noisy-OR distributions model multiple
noninteracting causes
Parents U1 Uk include all causes (can add leak
node)
Independent failure probability qi for each cause
alone ? P(XU1Uj,?Uj1, ?Uk)1 - ?ji1qi

Number of parameters linear in number of parents
16
Hybrid (discrete continuous) networks

Discrete (Subsidy? and Buys?)
Continuous (Harvest and Cost)

How to deal with this?

17
Probability density functions

Instead of probability distributions
For continuous variables
Ex. let X denote tomorrows maximum temperature
in the summer in Eindhoven
Belief that X is distributed uniformly between 18
and 26 degree Celsius
P(Xx) U18,26(x)
P(X20,5) U18,26(20,5)0,125/C

18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
Hybrid (discrete continuous) networks

Discrete (Subsidy? and Buys?)
Continuous (Harvest and Cost)

How to deal with this?

22
Hybrid (discrete continuous) networks

Discrete (Subsidy? and Buys?)
Continuous (Harvest and Cost)

Option 1 discretization
possibly large errors, large CPTs
Option 2 finitely parameterized canonical
families
Continuous variable, discrete continuous
parents (e.g., Cost)
Discrete variable, continuous parents (e.g.,
Buys?)

23
a) Continuous child variables

Need one conditional density function for child
variable given continuous parents, for each
possible assignment to discrete parents
Most common is the linear Gaussian model, e.g.
Mean Cost varies linearly w. Harvest, variance is
fixed
Linear variation is unreasonable over the full
range, but works OK if the likely range of
Harvest is narrow

24
Continuous child variables ex.

All-continuous network w. LG distribution ?
full joint is a multivariate Gaussian
Discrete continuous LG network is a conditional
Gaussian network, i.e., a multivariate Gaussian
over all continuous variables for each
combination of discrete variable values

25
b) Discrete child, continuous parent

P(buysCostc) ??((-c ??) / ?)
with ? - threshold for buying
Probit distribution
? - the integral on the standard normal
distribution
Logit distribution
Uses the sigmoid function

26
VI.2. Probabilistic reasoning

Conditional independence
Bayesian networks syntax and semantics
Exact inference
Exact inference by enumeration
Exact inference by variable elimination
Approximate inference
Approximate inference by stochastic simulation
Approximate inference by Markov chain Monte Carlo

27
Exact inference w. enumeration
ndn n2n
dn 2n
28
Enumeration algorithm

Exhaustive depth-first enumeration O(n) space,
O(dn) time

29
Inference by variable elimination

Enumeration is inefficient repeated computation
e.g., computes P(Jtruea)P(Mtruea) for each
value of e
Variable elimination summation from right to
left, storing intermediate results (factors) to
avoid recomputation

30
Variable elimination basic operations

Pointwise product of factors f1 and f2
f1(x1,,xj,y1,,yk)?? f2(y1,,yk,z1,,zl)
f(x1,,xj,y1,,yk,z1,,zl)
e.g., f1(a,b)?? f2(b,c) f(a,b,c)
Summing out a variable from a product of factors
move any constant factors outside the summation
?xf1??fk f1??fi ?xfi1??fk f1? ?fi fX
assuming f1,fi do not depend on X

31
Example pointwise product
32
Example pointwise product
33
Variable elimination algorithm
34
Complexity of exact inference

Polytrees (singly connected network) network in
which there is at most one undirected path
between any two nodes
Time, space complexity of exact inference on
polytrees linear in size of network
Multiply connected networks ?polytrees
Variable elimination can have exponential time
and space complexity
inference in Bayesian networks is NP-hard
includes inference in propositional logics as
special case

VI.2. D. Approximate inference

36
Inference by stochastic simulation

Basic idea
Draw N samples from a sampling distribution S
Compute approximate posterior probability P
Show it converges to the true probability P

37
VI.2. D. Approximate inference

Sampling from an empty network
Rejection sampling reject samples disagreeing w.
evidence
Likelihood weighting use evidence to weight
samples
MCMC sample from a stochastic process whose
stationary distribution is the true posterior

38
i. Sampling from an empty network

function PRIOR-SAMPLE(bn) returns an
event sampled from the prior specified by bn
x ? an event w. n elements
for i1 to n do
xi ? a random sample from
P(Xiparents(Xi))
return x
P(Cloudy) lt0.5,0.5gt

39
i. Sampling from an empty network cont.

Probability that PRIOR-SAMPLE generates a
particular event
SPS(x1, xn) ? n i1P(XiParents(Xi))P(x1,xn)
NPS (Yy) no. of samples generated for which Yy
for any set of variables Y.
Then, P(Yy) NPS(Yy)/N and
lim N??? P(Yy) ?h SPS(Yy,Hh)
?h P(Yy,Hh)
P(Yy)
? estimates derived from PRIOR-SAMPLE are
consistent

40
ii. Rejection sampling example

Estimate P(RainSprinklertrue) ? using
100 samples
27 samples have Sprinklertrue out of these,
8 have Raintrue and
19 have Rainfalse.
P(RainSprinklertrue)
NORMALIZE(lt8,19gt) lt0.296,0.704gt
Similar to a basic real-world empirical
estimation procedure

41
ii. Rejection sampling

P(Xe) is estimated from samples agreeing with
evidence e

PROBLEM a lot of collected samples are thrown
away!!
42
iii. Likelihood weighting

Idea
fix evidence variables E,
sample only nonevidence var., X, Y
weight ?? sample by likelihood it accords to
evidence E

43
iii. Likelihood weighting example

Estimate P(RainSprinklertrue,WetGrasstrue)

44
iii. Likelihood weighting example

Sample generation process
w ? 1.0
Sample P(Cloudy)lt0.5,0.5gt say true
Sprinkler has value true, so
w ? w ? P(Sprinklertrue Cloudytrue) 0.1
Sample P(RainCloudytrue)lt0.8,0.2gt say true
WetGrass has value true, so
w ? w ?P(WetGrasstrueSprinklertrue,Raintrue)
0.099

45
iii. Likelihood weighting function
46
iii. Likelihood weighting analysis

Sampling probability for WEIGHTED-SAMPLE is
SWS(y,e) ? l i1P(yiParents(Yi))
Note pays attention to evidence in ancestors
only ? somewhere in between prior and posterior
distribution
Weight for a given sample y,e, is
w(y,e) ? n i1P(eiParents(Ei))
Weighted sampling probability is
SWS(y,e) w(y,e) ? l i1P(yiParents(Yi)) ? m
i1P(eiParents(Ei)) P(y,e) by
standard global semantics of network
Hence, likelihood weighting is consistent
But performance still degrades w. many evidence
variables

47
iv. MCMC inference

State of network current assignment to all
variables
Generate next state by sampling one variable
given Markov blanket
Sample each variable in turn, keeping evidence
fixed
Approaches stationary distribution long-run
fraction of time spent in each state is exactly
proportional to its posterior probability

48
Markov blanket - reminder