Title: CIS 730 Introduction to Artificial Intelligence Lecture 10 of 32
1KDD Group Research SeminarFall, 2001
Presentation 2b of 11
Adaptive Importance Sampling on Bayesian
Networks (AIS-BN)
Friday, 05 October 2001 Julie A.
Stilson http//www.cis.ksu.edu/jas3466 Reference
Cheng, J. and Druzdzel, M (2000). AIS-BN An
Adaptive Importance Sampling Algorithm for
Evidential Reasoning in Large Bayesian Networks.
Journal of Artificial Intelligence Research, 13,
155-188.
2Outline
- Basic Algorithm
- Definitions
- Updating importance function
- Example using Sprinkler-Rain
- Why Adaptive Importance Sampling?
- Heuristic initialization
- Sampling with unlikely evidence
- Different Importance Sampling Algorithms
- Forward Sampling (FS)
- Logic Sampling (LS)
- Self-Importance Sampling (SIS)
- Differences between SIS, AIS-BN
- Gathering results
- How RMSE values are collected
- Sample results for FS, AIS-BN
3Definitions
- Importance Conditional Probability Tables
(ICPTs) - Probability tables that represent the learned
importance function - Initially, equal to the CPTs
- Updated after each updating interval (see below)
- Learning Rate
- The rate at which the true importance function
is being learned - Learning rate a (b / a) (k / kmax)
- A initial learning rate, b learning rate in
last step, k number of updates that have
been made, kmax total number of updates that
will be made - Frequency Table
- Stores the frequency with which each
instantiation of each query node occurs - Used to update importance function
- Updating Interval
- AIS-BN updates the importance function after
this many samples - If 1000 total samples are to be taken, and the
updating interval is 100, then 10 total updates
will be made
4Basic Algorithm
k number of updates so far , m desired
number of samples , l updating interval for
(int i 1, i lt m, i) if (i mod l 0)
k Update importance function Prk(X\E)
based on total samples generate a sample
according to Prk(X\E), add to total
samples totalweight Pr(s,e) /
Prk(s) totalweight 0 T null for (int i
1 i lt m, i) generate a sample according to
Prkmax(X\E), add to total samples totalweight
Pr(s,e) / Prkmax(s) compute RMSE value of s
using totalweight
5Updating Importance Function
- Theorem Xi in X, Xi not in Anc(E) gt Pr(Xi
Pa(Xi), E) Pr(Xi Pa(Xi)) - Proved using d-connectivity
- Only ancestors of evidence nodes need to have
their importance function learned - The ICPT tables of all other nodes do not change
throughout sampling - Algorithm for Updating Importance Function
- Sample l points independently according to the
current importance function, Prk(X\E) - For every query node Xi that is an ancestor to
evidence, estimate Pr(xi pa(Xi), e) based on
the samples - Update Prk(X\E) according to the following
formula - Pr(k1)(xi pa(Xi), e) Prk(xi pa(Xi), e)
LRate (Pr(xi pa(Xi), e) Prk(xi pa(Xi),
e)
6Example Using Sprinkler-Rain
- Imagine Ground is evidence instantiated to Wet
- More probable that Sprinkler is on and that it is
raining - ICPT tables update the probabilities of the
ancestors to evidence nodes to reflect this
7Why Adaptive Importance Sampling?
- Heuristic Initialization Parents to Evidence
Nodes - Changes the probabilities of the parents to
evidence to a uniform distribution when the
probability of that evidence is sufficiently
small - Parents of evidence nodes are most affected by
the instantiation of evidence - Uniform distribution helps importance function be
learned faster - Heuristic Initialization Extremely Small
Probabilities - Extremely low probabilities would usually not be
sampled much - Slow to learn true importance function
- AIS-BN raises extremely low probabilities to a
set threshold and lowers extremely high
probabilities accordingly - Sampling with Unlikely Evidence
- Importance function very different from CPTs with
unlikely evidence - Difficult to accurately sample without changing
probability distributions - AIS-BN performs better than other sampling
algorithms with unlikely evidence
8Different Importance Sampling Algorithms
- Forward Sampling / Likelihood Weighting (FS)
- Similar to AIS-BN, but importance function is not
learned - Performs well under most circumstances
- Doesnt do well when evidence is unlikely
- Logic Sampling (LS)
- Network is sampled randomly without regard to
evidence - Samples that dont match evidence are then
discarded - Simplest importance sampling algorithm
- Also performs poorly with unlikely evidence
- Inefficient when many nodes are evidence
- Self-Importance Sampling (SIS)
- Also updates an importance function
- Does not obtain samples from learned importance
function - Updates to importance function do not use
sampling information - For large numbers of samples, performs worse than
FS
9Gathering Results
- Relative Root Mean Square Error
- P(?i) is exact probability of sample
- P(?i) is estimated probability of
- sample from frequency table
- M arity, T number of samples
- RMSE Collection
- Relative RMSE computed
- for each sample
- Each RMSE value is stored in
- an output file printings.txt
- Graphing Results
- Open output file in Excel
- Graph results using Chart
- Example Chart
- ALARM network, 10000 samples
- Compares FS, AIS-BN