Fast Exact Bayes Net Structure Learning - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Fast Exact Bayes Net Structure Learning

Description:

alpha. We assumed alpha was easy to compute. for each i, there are 2N values ai ... Compute Alphas. For each node, compute af,i(:) and a1,i(:) (1x2N vectors) ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 23
Provided by: Kev866
Category:

less

Transcript and Presenter's Notes

Title: Fast Exact Bayes Net Structure Learning


1
Fast Exact Bayes NetStructure Learning
relatively-speakingly-
  • Daniel Eaton
  • Tuesday Oct 31, 2006

2
What structure learning is
  • (SL)

Data
A
B
optimize
Space of models
) .1
p(
A
B
integrate
p(
) .2
A
B
) .7
p(
A
B
eg. dirichlet-multinomial DAGs
3
What SL is for this talk
  • Have complete exchangeable data
  • M cases, N variables
  • Continuous or discrete
  • Model space DAGs with CPD for each node (ex.
    multinomial or linear gaussian)
  • plus a prior on parameters p(?G), and DAGs p(G)

4
What SL is for this talk
  • To determine posterior over DAGs given data
  • For each DAG G
  • compute marginal likelihood p(dataG) by
    integrating out parameters ?
  • easy for conjugate prior models
  • ie. dirichlet-multinomial
  • p(Gdata) a p(dataG)p(G)
  • Unfortunately ...

5
Why SL is hard
  • Number of DAGs is superexponential in of nodes
  • 7 node DAGs...
  • Even with a tight graph representation (1 b/edge)
    takes 6.3 GB to store
  • Posterior prob takes 8.2 GB

6
Possible resolutions
  • RW MCMC on DAG space
  • "Posterior landscape" too big and bumpy bad
    mixing, doesn't work for Ngt10
  • Never represent DAGs Condition on node orderings
    (Buntine 1991)
  • Compute marginal probability of "graph features"
  • Friedman Koller 2003 -gt MCMC
  • Koivisto Sood 2004 -gt Exact!

7
Graph features
  • Just an indicator function on DAGs (f(G)) which
    is 1 iff a particular structure (ie. an edge)
    exists in the graph, 0 otherwise
  • Assume N3, want to know marginal probability
    that edge from A -gt B exists
  • Of course, naively, this sum is difficult for ngt5
    ...

A
B
A
B
A
B
fab(
) 1
fab(
) 0
fab(
) 1
C
C
C
8
Conditioning on orders (order-trick)
  • Order on variables, just a permutation of index
    set
  • ie. 3 node graph, (1,2,3), (2,1,3), (3,2,1), ...
  • Only N! of these!
  • For a fixed order
  • Intuition Can consider each node independently
    of the others (acyclicity ensured by ordering)

9
Order-trick feature probability
  • Ultimately we want to determine
  • For now, assume order known so we go for
  • know how to do denominator
  • numerator depends on feature f
  • for f a single edge (A-gtB), it's easy

10
How to use order-trick in practice
  • Except in very special (perhaps bio) cases we
    don't know the node order a priori
  • Must sum over N! orders
  • Ouch, still super-exponential!
  • 1. Sample orders with MCMC (Koller/Friedman)
  • Sampler mixes much better than in DAG-space
  • Order-space smaller and less bumpy
  • 2. Do it exactly with dynamic programming!
  • Koivisto Sood (2004)

11
Koivisto
  • Recognize that although there are N! orders to
    sum over, there is much redundant computation
  • Will allow us to compute the marginal probability
    of a particular feature in O(N2N N2N-1C(M)) time
  • All edge marg probs in O(N32N) time (naive)
  • Or, O(N2N) time with a recent (2006) extension
    that I won't cover today

12
Koivisto
  • Consider
  • just need a way to evaluate
  • since
  • can be computed by setting f 1

13
  • To simplify derivation, assume uniform
  • Please accept that (proof later, if requested)
  • so that
  • Introduce
  • so that

key each term is modular
14
  • Brute force

()
O(NN!)
(1)
(2)
(3)
ordered sets
(1,2)
(1,3)
(2,1)
(2,3)
(3,1)
(3,2)
(1,2,3)
(1,3,2)
(2,1,3)
(2,3,1)
(3,1,2)
(3,2,1)


a1()a2(1)a3(1,2)
a2()a1(2)a3(1,2)


E
DP recurrence ...
a3(1,2) x ( a1()a2(1) a2()a1(2) )
15
  • DP


O(N2N)
2
3
1
unordered sets
1,3
2,3
1,2
1,2,3
16
alpha
naive O(N3N)
  • We assumed alpha was easy to compute
  • for each i, there are 2N values ai()
  • "Möbius transform"
  • Exists clever algorithm to compute each ai in
    O(2N) time (so O(N2N) overall)
  • Intuition (2 node)

precompute
3 node
111
11
10
11
10
11
101
010
110
10
100
00
01
00
01
010
00
01
001
000
17
Koivisto summary
  • For each nodes, for each possible parent set,
    compute ML(i,Gi), fi, p(Gi) (each Nx2N arrays)
  • Compute Betas
  • ßf ML . f . p(G)
  • ß1 ML . p(G) (trivial feature f1, used for
    normalization)
  • Compute Alphas
  • For each node, compute af,i() and a1,i() (1x2N
    vectors)
  • Compute gf(V) using af and g1(V) using a1

O(N2N-1C(m))
O(N2N)
O(N2N)
O(N2N)
O(N2N N2N-1C(m))
18
Timing
30s
30m
day
19
Order-trick limitations
  • Graph structure prior must be modular
  • Breaks markov equivalence
  • Cannot have arbitrary priors (ie. uniform
    impossible)
  • Cannot query arbitrary features (f must be
    modular)
  • ie. Directed path between nodes A B
  • Resolution use MCMC with proposal based on
    sampling from marginal edge probabilities
  • An independence sampler
  • Works well for N5

20
Improvement by MCMC
"cancer" network
21
Improvement by MCMC
22
Follow-up
  • To read Exact Bayesian structure learning from
    uncertain interventions
  • Kevin Daniel
  • Submitted to AIStats06
  • To run
  • Code in CVS Aline/koivisto
  • Probably best to wait till Dec to use

Happy Halloween
Write a Comment
User Comments (0)
About PowerShow.com