Fast Exact Bayes Net Structure Learning - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

Fast Exact Bayes Net Structure Learning

Description:

alpha. We assumed alpha was easy to compute. for each i, there are 2N values ai ... Compute Alphas. For each node, compute af,i(:) and a1,i(:) (1x2N vectors) ... – PowerPoint PPT presentation

Number of Views:42

Avg rating:3.0/5.0

Slides: 23

Provided by: Kev866

Category:

more less

Transcript and Presenter's Notes

Title: Fast Exact Bayes Net Structure Learning

1
Fast Exact Bayes NetStructure Learning
relatively-speakingly-

Daniel Eaton
Tuesday Oct 31, 2006

2
What structure learning is

(SL)

Data
A
B
optimize
Space of models
) .1
p(
A
B
integrate
p(
) .2
A
B
) .7
p(
A
B
eg. dirichlet-multinomial DAGs
3
What SL is for this talk

Have complete exchangeable data
M cases, N variables
Continuous or discrete
Model space DAGs with CPD for each node (ex.
multinomial or linear gaussian)
plus a prior on parameters p(?G), and DAGs p(G)

4
What SL is for this talk

To determine posterior over DAGs given data
For each DAG G
compute marginal likelihood p(dataG) by
integrating out parameters ?
easy for conjugate prior models
ie. dirichlet-multinomial
p(Gdata) a p(dataG)p(G)
Unfortunately ...

5
Why SL is hard

Number of DAGs is superexponential in of nodes
7 node DAGs...
Even with a tight graph representation (1 b/edge)
takes 6.3 GB to store
Posterior prob takes 8.2 GB

6
Possible resolutions

RW MCMC on DAG space
"Posterior landscape" too big and bumpy bad
mixing, doesn't work for Ngt10
Never represent DAGs Condition on node orderings
(Buntine 1991)
Compute marginal probability of "graph features"
Friedman Koller 2003 -gt MCMC
Koivisto Sood 2004 -gt Exact!

7
Graph features

Just an indicator function on DAGs (f(G)) which
is 1 iff a particular structure (ie. an edge)
exists in the graph, 0 otherwise
Assume N3, want to know marginal probability
that edge from A -gt B exists
Of course, naively, this sum is difficult for ngt5
...

A
B
A
B
A
B
fab(
) 1
fab(
) 0
fab(
) 1
C
C
C
8
Conditioning on orders (order-trick)

Order on variables, just a permutation of index
set
ie. 3 node graph, (1,2,3), (2,1,3), (3,2,1), ...
Only N! of these!
For a fixed order
Intuition Can consider each node independently
of the others (acyclicity ensured by ordering)

9
Order-trick feature probability

Ultimately we want to determine
For now, assume order known so we go for
know how to do denominator
numerator depends on feature f
for f a single edge (A-gtB), it's easy

10
How to use order-trick in practice

Except in very special (perhaps bio) cases we
don't know the node order a priori
Must sum over N! orders
Ouch, still super-exponential!
1. Sample orders with MCMC (Koller/Friedman)
Sampler mixes much better than in DAG-space
Order-space smaller and less bumpy
2. Do it exactly with dynamic programming!
Koivisto Sood (2004)

11
Koivisto

Recognize that although there are N! orders to
sum over, there is much redundant computation
Will allow us to compute the marginal probability
of a particular feature in O(N2N N2N-1C(M)) time
All edge marg probs in O(N32N) time (naive)
Or, O(N2N) time with a recent (2006) extension
that I won't cover today

12
Koivisto

Consider
just need a way to evaluate
since
can be computed by setting f 1

To simplify derivation, assume uniform
Please accept that (proof later, if requested)
so that
Introduce
so that

key each term is modular
14

Brute force

()
O(NN!)
(1)
(2)
(3)
ordered sets
(1,2)
(1,3)
(2,1)
(2,3)
(3,1)
(3,2)
(1,2,3)
(1,3,2)
(2,1,3)
(2,3,1)
(3,1,2)
(3,2,1)

a1()a2(1)a3(1,2)
a2()a1(2)a3(1,2)

E
DP recurrence ...
a3(1,2) x ( a1()a2(1) a2()a1(2) )
15

O(N2N)
2
3
1
unordered sets
1,3
2,3
1,2
1,2,3
16
alpha
naive O(N3N)

We assumed alpha was easy to compute
for each i, there are 2N values ai()
"Möbius transform"
Exists clever algorithm to compute each ai in
O(2N) time (so O(N2N) overall)
Intuition (2 node)

precompute
3 node
111
11
10
11
10
11
101
010
110
10
100
00
01
00
01
010
00
01
001
000
17
Koivisto summary

For each nodes, for each possible parent set,
compute ML(i,Gi), fi, p(Gi) (each Nx2N arrays)
Compute Betas
ßf ML . f . p(G)
ß1 ML . p(G) (trivial feature f1, used for
normalization)
Compute Alphas
For each node, compute af,i() and a1,i() (1x2N
vectors)
Compute gf(V) using af and g1(V) using a1

O(N2N-1C(m))
O(N2N)
O(N2N)
O(N2N)
O(N2N N2N-1C(m))
18
Timing
30s
30m
day
19
Order-trick limitations

Graph structure prior must be modular
Breaks markov equivalence
Cannot have arbitrary priors (ie. uniform
impossible)
Cannot query arbitrary features (f must be
modular)
ie. Directed path between nodes A B
Resolution use MCMC with proposal based on
sampling from marginal edge probabilities
An independence sampler
Works well for N5