Being Bayesian about Network Structure - PowerPoint PPT Presentation

About This Presentation

Title:

Being Bayesian about Network Structure

Description:

mixing rate (also required burn-in) unknown. islands of high posterior, connected by low bridges ... Effects of Non-Mixing. Two MCMC runs over same 500 instances ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 18

Provided by: NirFri

Category:

more less

Transcript and Presenter's Notes

Title: Being Bayesian about Network Structure

1
Being Bayesian about Network Structure

Nir Friedman Daphne Koller
Hebrew Univ. Stanford Univ.

2
Structure Discovery

Current practice model selection
Pick a single model (of high score)
Use that model to represent domain structure
Enough data ? right model overwhelmingly likely
But what about the rest of the time?
Many high-scoring models
Answer based on one model often useless
Bayesian model averaging is Bayesian ideal

Feature of G, e.g., X?Y
3
Model Averaging

Unfortunately, it is intractable
of possible structures is superexponential
Thats why no one really does it
Our contribution
Closed form solution for fixed ordering over
nodes
MCMC over orderings for general case
Faster convergence, robust results.

Exceptions Madigan Raftery, Madigan York
see below
4
Fixed Ordering

Suppose that
We know the ordering of variables
say, X1 gt X2 gt X3 gt X4 gt gt Xn
parents for Xi must be in X1,,Xi-1
Limit number of parents per nodes to k
Intuition
Order decouples choice of parents
The choice of parents for X7 do not restrict the
choice of parents for X12
We can exploit this to simplify the form of P(D)

2knlog n networks
5
Ordering Computing P(D)

Set of possible parent sets for Xi
consistent with ?
has size at most k

Independence of families
Small number of potential families per node
Efficient closed-form summation over exponential
number of structures
6
MCMC over Models

Cannot enumerate structures, so sample structures
MCMC Sampling
Define Markov chain over BN models
Run chain to get samples from posterior P(G D)
Possible pitfalls
huge number of models
mixing rate (also required burn-in) unknown
islands of high posterior, connected by low
bridges

7
ICU Alarm BN No Mixing

However, with 500 instances
the runs clearly do not mix.

Score of cuurent sample
MCMC Iteration
8
Effects of Non-Mixing

Two MCMC runs over same 500 instances
Probability estimates for Markov features
based on 50 nets sampled from MCMC process
Probability estimates highly variable, nonrobust

Initialization
true BN vs random
true BN vs true BN
9
Our Approach Sample Orderings

We can write
Comment Structure prior P(G) changes
uniform prior over structures ? uniform prior
over orderings and on structures consistent with
a given ordering
Sample orderings and approximate

10
MCMC Over Orderings

Use Metropolis-Hasting algorithm
Specify a proposal distribution q(? ?)
flip (i1 ij ik in) ? (i1 ik ij in)
cut (i1 ij ij1 in) ? (ij1 in i1 ij)
Each iteration
Sample ? from q(? ?)
go ? ?? with probability
Since priors are uniform
Efficient computation!!!

11
Why Ordering Helps

Smaller space
Significant reduction in size of sample space
Better structured space
We can get from one ordering to another in
(relatively) small number of steps
Smoother posterior landscape
Score of an ordering is sum over many networks
No ordering is horrendous ? no islands of
high posterior separated by a deep blue sea

12
Mixing with MCMC-Orderings

4 runs on ICU-Alarm with 500 instances
fewer iterations than MCMC-Nets
approximately same amount of computation
Process is clearly mixing!

Score of cuurent sample
MCMC Iteration
13
Mixing of MCMC runs

Two MCMC runs over same 500 instances
Probability estimates for Markov features
based on 50 nets sampled from MCMC process
Probability estimates very robust

1000 instances
100 instances
14
Computing Feature Posterior P(f?,D)

Edges
Markov Blanket
IfY?Z or both Y and Z are parents of some X
Posterior of these features are independent
Other features (e.g., existence of causal path)
Sample networks from ordering
Estimate features from networks

15
Feature Reconstruction (ICU-Alarm) Markov Features
Reconstruct true features of generating network
False Negatives
False Positives
16
Feature Reconstruction (ICU-Alarm) Path Features
200
150
100
50
0
200
150
100
50
0
200
150
100
50
0
17
Conclusion

Full Bayesian model averaging is tractable for
known ordering.
MCMC over orderings allows robust approximation
to full Bayesian averaging over Bayes nets
rapid and reliable mixing
robust reliable estimates for probability of
structural features
Crucial for structure discovery in domains with
limited data
Biological discovery