Title: Intelligent Systems 2II40 C7
1Intelligent Systems (2II40)C7
October 2003
2VI. Uncertainty
- VI.1. Decision theory basics
- Uncertainty
- Probability
- Syntax
- Semantics
- Inference Rules
- VI.2. Probabilistic reasoning
- Conditional independence
- Bayesian networks syntax and semantics
- Exact inference
- Approximate inference
3VI.2.B. Belief networks (Bayesian networks)
4Return to belief network example
- Neighbors John and Mary promised to call if the
alarm goes off sometimes it starts because of
earthquakes. Is there a burglar? - Variables Burglary, Earthquake, Alarm,
JohnCalls, MaryCalls (n variables)
5Belief network example cont.
6Constructing belief networks
- Choose an ordering of variables X1,,Xn
- For i1 to n
- add Xi to the network
- select parents from X1,,Xi-1 such
that - P(XiParents(Xi))
P(XiX1,,Xi-1)
7Constructing belief networks example
8Constructing belief networks example
9Constructing belief networks example
10Constructing belief networks example
11Constructing belief networks example
12Example car diagnosis
- Initial evidence engine wont start
- Testable variables (thin ovals), diagnosis
variables (thick ovals), hidden variables
(shaded) ensure sparse structure, reduce
parameters
13Example car insurance
- Predict claim (medical, liability, property)
- Given data on application form (other unshaded
nodes)
14Efficient conditional distributions
- CPT grows exponentially w. no. of parents
- CPT becomes infinite w. continuous variables
- Other, more compact methods are needed
15Compact conditional distributions - cont.
- Noisy-OR distributions model multiple
noninteracting causes - Parents U1 Uk include all causes (can add leak
node) - Independent failure probability qi for each cause
alone ? P(XU1Uj,?Uj1, ?Uk)1 - ?ji1qi -
Number of parameters linear in number of parents
16Hybrid (discrete continuous) networks
- Discrete (Subsidy? and Buys?)
- Continuous (Harvest and Cost)
17Probability density functions
- Instead of probability distributions
- For continuous variables
- Ex. let X denote tomorrows maximum temperature
in the summer in Eindhoven - Belief that X is distributed uniformly between 18
and 26 degree Celsius - P(Xx) U18,26(x)
- P(X20,5) U18,26(20,5)0,125/C
18(No Transcript)
19(No Transcript)
20(No Transcript)
21Hybrid (discrete continuous) networks
- Discrete (Subsidy? and Buys?)
- Continuous (Harvest and Cost)
22Hybrid (discrete continuous) networks
- Discrete (Subsidy? and Buys?)
- Continuous (Harvest and Cost)
- Option 1 discretization
- possibly large errors, large CPTs
- Option 2 finitely parameterized canonical
families - Continuous variable, discrete continuous
parents (e.g., Cost) - Discrete variable, continuous parents (e.g.,
Buys?)
23a) Continuous child variables
- Need one conditional density function for child
variable given continuous parents, for each
possible assignment to discrete parents - Most common is the linear Gaussian model, e.g.
- Mean Cost varies linearly w. Harvest, variance is
fixed - Linear variation is unreasonable over the full
range, but works OK if the likely range of
Harvest is narrow
24Continuous child variables ex.
- All-continuous network w. LG distribution ?
full joint is a multivariate Gaussian - Discrete continuous LG network is a conditional
Gaussian network, i.e., a multivariate Gaussian
over all continuous variables for each
combination of discrete variable values
25b) Discrete child, continuous parent
- P(buysCostc) ??((-c ??) / ?)
- with ? - threshold for buying
- Probit distribution
- ? - the integral on the standard normal
distribution - Logit distribution
- Uses the sigmoid function
26VI.2. Probabilistic reasoning
- Conditional independence
- Bayesian networks syntax and semantics
- Exact inference
- Exact inference by enumeration
- Exact inference by variable elimination
- Approximate inference
- Approximate inference by stochastic simulation
- Approximate inference by Markov chain Monte Carlo
27Exact inference w. enumeration
ndn n2n
dn 2n
28Enumeration algorithm
- Exhaustive depth-first enumeration O(n) space,
O(dn) time
29Inference by variable elimination
- Enumeration is inefficient repeated computation
- e.g., computes P(Jtruea)P(Mtruea) for each
value of e - Variable elimination summation from right to
left, storing intermediate results (factors) to
avoid recomputation
30Variable elimination basic operations
- Pointwise product of factors f1 and f2
- f1(x1,,xj,y1,,yk)?? f2(y1,,yk,z1,,zl)
- f(x1,,xj,y1,,yk,z1,,zl)
- e.g., f1(a,b)?? f2(b,c) f(a,b,c)
- Summing out a variable from a product of factors
- move any constant factors outside the summation
- ?xf1??fk f1??fi ?xfi1??fk f1? ?fi fX
- assuming f1,fi do not depend on X
31Example pointwise product
32Example pointwise product
33Variable elimination algorithm
34Complexity of exact inference
- Polytrees (singly connected network) network in
which there is at most one undirected path
between any two nodes - Time, space complexity of exact inference on
polytrees linear in size of network - Multiply connected networks ?polytrees
- Variable elimination can have exponential time
and space complexity - inference in Bayesian networks is NP-hard
- includes inference in propositional logics as
special case
35- VI.2. D. Approximate inference
36Inference by stochastic simulation
- Basic idea
- Draw N samples from a sampling distribution S
- Compute approximate posterior probability P
- Show it converges to the true probability P
37VI.2. D. Approximate inference
- Sampling from an empty network
- Rejection sampling reject samples disagreeing w.
evidence - Likelihood weighting use evidence to weight
samples - MCMC sample from a stochastic process whose
stationary distribution is the true posterior
38i. Sampling from an empty network
- function PRIOR-SAMPLE(bn) returns an
event sampled from the prior specified by bn - x ? an event w. n elements
- for i1 to n do
- xi ? a random sample from
P(Xiparents(Xi)) - return x
- P(Cloudy) lt0.5,0.5gt
39i. Sampling from an empty network cont.
- Probability that PRIOR-SAMPLE generates a
particular event - SPS(x1, xn) ? n i1P(XiParents(Xi))P(x1,xn)
- NPS (Yy) no. of samples generated for which Yy
for any set of variables Y. - Then, P(Yy) NPS(Yy)/N and
- lim N??? P(Yy) ?h SPS(Yy,Hh)
- ?h P(Yy,Hh)
- P(Yy)
- ? estimates derived from PRIOR-SAMPLE are
consistent
40ii. Rejection sampling example
- Estimate P(RainSprinklertrue) ? using
- 100 samples
- 27 samples have Sprinklertrue out of these,
- 8 have Raintrue and
- 19 have Rainfalse.
- P(RainSprinklertrue)
- NORMALIZE(lt8,19gt) lt0.296,0.704gt
- Similar to a basic real-world empirical
estimation procedure
41ii. Rejection sampling
- P(Xe) is estimated from samples agreeing with
evidence e
PROBLEM a lot of collected samples are thrown
away!!
42iii. Likelihood weighting
- Idea
- fix evidence variables E,
- sample only nonevidence var., X, Y
- weight ?? sample by likelihood it accords to
evidence E
43iii. Likelihood weighting example
- Estimate P(RainSprinklertrue,WetGrasstrue)
44iii. Likelihood weighting example
- Sample generation process
- w ? 1.0
- Sample P(Cloudy)lt0.5,0.5gt say true
- Sprinkler has value true, so
- w ? w ? P(Sprinklertrue Cloudytrue) 0.1
- Sample P(RainCloudytrue)lt0.8,0.2gt say true
- WetGrass has value true, so
- w ? w ?P(WetGrasstrueSprinklertrue,Raintrue)
0.099
45iii. Likelihood weighting function
46iii. Likelihood weighting analysis
- Sampling probability for WEIGHTED-SAMPLE is
- SWS(y,e) ? l i1P(yiParents(Yi))
- Note pays attention to evidence in ancestors
only ? somewhere in between prior and posterior
distribution - Weight for a given sample y,e, is
- w(y,e) ? n i1P(eiParents(Ei))
- Weighted sampling probability is
- SWS(y,e) w(y,e) ? l i1P(yiParents(Yi)) ? m
i1P(eiParents(Ei)) P(y,e) by
standard global semantics of network - Hence, likelihood weighting is consistent
- But performance still degrades w. many evidence
variables
47iv. MCMC inference
- State of network current assignment to all
variables - Generate next state by sampling one variable
given Markov blanket - Sample each variable in turn, keeping evidence
fixed - Approaches stationary distribution long-run
fraction of time spent in each state is exactly
proportional to its posterior probability
48Markov blanket - reminder
- Each node is conditionally independent of all
others given its Markov blanket parents - children childrens parents
49MCMC algorithm
50Homework 7
- Continue till step 8 with your project.