Being Bayesian about Network Structure - PowerPoint PPT Presentation

About This Presentation
Title:

Being Bayesian about Network Structure

Description:

mixing rate (also required burn-in) unknown. islands of high posterior, connected by low bridges ... Effects of Non-Mixing. Two MCMC runs over same 500 instances ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 18
Provided by: NirFri
Category:

less

Transcript and Presenter's Notes

Title: Being Bayesian about Network Structure


1
Being Bayesian about Network Structure
  • Nir Friedman Daphne Koller
  • Hebrew Univ. Stanford Univ.

2
Structure Discovery
  • Current practice model selection
  • Pick a single model (of high score)
  • Use that model to represent domain structure
  • Enough data ? right model overwhelmingly likely
  • But what about the rest of the time?
  • Many high-scoring models
  • Answer based on one model often useless
  • Bayesian model averaging is Bayesian ideal

Feature of G, e.g., X?Y
3
Model Averaging
  • Unfortunately, it is intractable
  • of possible structures is superexponential
  • Thats why no one really does it
  • Our contribution
  • Closed form solution for fixed ordering over
    nodes
  • MCMC over orderings for general case
  • Faster convergence, robust results.

Exceptions Madigan Raftery, Madigan York
see below
4
Fixed Ordering
  • Suppose that
  • We know the ordering of variables
  • say, X1 gt X2 gt X3 gt X4 gt gt Xn
  • parents for Xi must be in X1,,Xi-1
  • Limit number of parents per nodes to k
  • Intuition
  • Order decouples choice of parents
  • The choice of parents for X7 do not restrict the
    choice of parents for X12
  • We can exploit this to simplify the form of P(D)

2knlog n networks
5
Ordering Computing P(D)
  • Set of possible parent sets for Xi
  • consistent with ?
  • has size at most k

Independence of families
Small number of potential families per node
Efficient closed-form summation over exponential
number of structures
6
MCMC over Models
  • Cannot enumerate structures, so sample structures
  • MCMC Sampling
  • Define Markov chain over BN models
  • Run chain to get samples from posterior P(G D)
  • Possible pitfalls
  • huge number of models
  • mixing rate (also required burn-in) unknown
  • islands of high posterior, connected by low
    bridges

7
ICU Alarm BN No Mixing
  • However, with 500 instances
  • the runs clearly do not mix.

Score of cuurent sample
MCMC Iteration
8
Effects of Non-Mixing
  • Two MCMC runs over same 500 instances
  • Probability estimates for Markov features
  • based on 50 nets sampled from MCMC process
  • Probability estimates highly variable, nonrobust

Initialization
true BN vs random
true BN vs true BN
9
Our Approach Sample Orderings
  • We can write
  • Comment Structure prior P(G) changes
  • uniform prior over structures ? uniform prior
    over orderings and on structures consistent with
    a given ordering
  • Sample orderings and approximate

10
MCMC Over Orderings
  • Use Metropolis-Hasting algorithm
  • Specify a proposal distribution q(? ?)
  • flip (i1 ij ik in) ? (i1 ik ij in)
  • cut (i1 ij ij1 in) ? (ij1 in i1 ij)
  • Each iteration
  • Sample ? from q(? ?)
  • go ? ?? with probability
  • Since priors are uniform
  • Efficient computation!!!

11
Why Ordering Helps
  • Smaller space
  • Significant reduction in size of sample space
  • Better structured space
  • We can get from one ordering to another in
    (relatively) small number of steps
  • Smoother posterior landscape
  • Score of an ordering is sum over many networks
  • No ordering is horrendous ? no islands of
    high posterior separated by a deep blue sea

12
Mixing with MCMC-Orderings
  • 4 runs on ICU-Alarm with 500 instances
  • fewer iterations than MCMC-Nets
  • approximately same amount of computation
  • Process is clearly mixing!

Score of cuurent sample
MCMC Iteration
13
Mixing of MCMC runs
  • Two MCMC runs over same 500 instances
  • Probability estimates for Markov features
  • based on 50 nets sampled from MCMC process
  • Probability estimates very robust

1000 instances
100 instances
14
Computing Feature Posterior P(f?,D)
  • Edges
  • Markov Blanket
  • IfY?Z or both Y and Z are parents of some X
  • Posterior of these features are independent
  • Other features (e.g., existence of causal path)
  • Sample networks from ordering
  • Estimate features from networks

15
Feature Reconstruction (ICU-Alarm) Markov Features
Reconstruct true features of generating network
False Negatives
False Positives
16
Feature Reconstruction (ICU-Alarm) Path Features
200
150
100
50
0
200
150
100
50
0
200
150
100
50
0
17
Conclusion
  • Full Bayesian model averaging is tractable for
    known ordering.
  • MCMC over orderings allows robust approximation
    to full Bayesian averaging over Bayes nets
  • rapid and reliable mixing
  • robust reliable estimates for probability of
    structural features
  • Crucial for structure discovery in domains with
    limited data
  • Biological discovery
Write a Comment
User Comments (0)
About PowerShow.com