Junction trees - PowerPoint PPT Presentation

About This Presentation

Title:

Junction trees

Description:

Compact representation tractable inference. Exact inference #P-complete in general. Often still need exponential time even for compact models. Example: ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 2

Provided by: bill64

Category:

more less

Transcript and Presenter's Notes

Title: Junction trees

1
Efficient Principled Learning of Junction Trees
Anton Chechetka and Carlos Guestrin
Carnegie Mellon University

Motivation
Constructing a junction tree Using Alg.1 for
every S?V, obtain a list L of pairs (S,Q) s.t
I(Q,V\SQS)ltV(??) Example
Theoretical guarantees Intuition if the
intra-clique dependencies are strong enough,
guaranteed to find a well- approximating JT in
polynomial time.
Experimental results

Finding almost independent subsets
Question if S is a separator of an ?-JT, which
variables are on the same side of S?
More than one correct answer possible
We will settle for finding one
Drop the complexity to polynomial from exponential

Junction trees
Trees where each node is a set of variables
Running intersection property every clique
between Ci and Cj contains Ci ? Cj
Ci and Cj are neighbors ? SijCi ? Cj is called a
separator
Example
Notation Vi?j is a set of all variables on the
same side of edge i-j as clique Cj
V3?4GF, V3?1A, V4?3AD
Encoded independencies (Vi?j ? Vj?i
Sij)

Constraint-based learning
Naively
for every candidate sep. S of size k
for every X?V\S
if I(X, V\SX S) lt ?
add (S,X) to the list of useful components L
find a JT consistent with L

Probabilistic graphical models are everywhere
Medical diagnosis, datacenter performance
monitoring, sensor nets,

Model quality (log-likelihood on test set)

Compare this work with
ordering-based search (OBS) TeyssierKollerUAI05
Chow-Liu alg. ChowLiuIEEE68
Karger-Srebro alg.KargerSrebroSODA01
local search
this work local search combination (using our
algorithm to initialize local search)

Complexity
B
B
B
C
BE
Theorem Suppose a maximal ?-JT tree of treewidth
k exists for P(V) s.t. for every clique C and
separator S of tree it holds that
minX?(C\S)I(X,C\SXS) gt (k3)(??) then our
algorithm will find a kV(??)-JT for P(V) with
probability at least (1-?) using
AB
,
,
,
,
S
A
EF
CD
D

Main advantages
Compact representation of probability
distributions
Exploit structure to speed up inference

BC
Q
C
E
E
SA B, C,D OR B,C, D
EF
CD
,
,
Separators
ABCD
1
ABEF
F
V3? 4
EG
AB
B
E
5

Problem From L, reconstruct a junction tree.
This is non-trivial. Complications
L may encode more independencies than a single JT
encodes
Several different JTs may be consistent with
independencies in L

B
BC
BE
Cliques
samples and
E
3
4
C

But also problems
Compact representation? tractable inference.
Exact inference P-complete in general
Often still need exponential time even for
compact models
Example
Often do not even have structure, only data
Best structure is NP-complete to find
Most structure learning algorithms return complex
models, where inference is intractable
Very few structure learning algorithms have
global quality guarantees
We address both of these issues! We provide
efficient structure learning algorithm
guaranteed to learn tractable models
with global guarantees on the results quality

CD
Intuition Consider set of variables QBCD.
Suppose an ?-JT (e.g. above) with separator SA
exists s.t. some of the variables in Q (B) are
on the left of S and the remaining ones (CD) on
the right.then a partitioning of Q into X and Y
exists s.t. I(X,YS)lt?
EF
2
Key theoretical result Efficient upper bound for
I(?,??)
6
time
Key insight Arnborgal,SIAM-JADM1987,
NarasimhanBilmes UAI05 In a junction tree,
components (S,Q) have recursive decomposition
Data BeinlichalECAIM1988 37 variables,
treewidth 4, learned treewidth 3
Intuition Suppose a distribution P(V) can be
well approximated by a junction tree with clique
size k. Then for every set S?V of size k, A,B?V
of arbitrary size, to check that I(A,B S) is
small, it is enough to check for all subsets X?A,
Y?B of size at most k that I(X,YS) is small.
possible partitionings
Corollary Maximal JTs of fixed treewidth s.t.
for every clique C and separator S it holds
that minX?(C\S)I(X,C\SXS) gt? for fixed ?gt0 are
efficiently PAC learnable
B
C
4 neighbors per variable (a constant!), but
inference still hard
a clique in the junction tree
D
C
smaller components from L

Tractability guarantees
Inference exponential in clique size k
Small cliques ? tractable inference ?

if no such splits exist, all variables of Q must
be on the same side of S
B
EF
A
I(A,B S)??
Only need to compute I(X,YS) for small X and Y!
Related work
A
B

Alg. 1 (given candidate sep. S), threshold ?
each variable of V\S starts out as a separate
partition
for every Q?V\S of size at most k2
if minX?Q I(X,Q\S S) gt ?
merge all partitions that have variables in Q

Data DesphandealVLDB04 54 variables, treewidth
2
Look for such recursive decompositions in L!

JTs as approximations
Often exact conditional independence is too
strict a requirement
generalization conditional mutual
informationI(A , B C) ? H(A B) H(A BC)
H() is conditional entropy
I(A , B C) 0 always
I(A , B C) 0 ? (A ? B C)
intuitively if C is already known, how much new
information about A is contained in B?

Y
I(X,YS)
X
Fixed size regardless of Q

DP algorithm (input list L of pairs (S,Q))
sort L in the order of increasing Q
mark (S,Q)?L with Q1 as positive
for (S,Q)?L, Q2, in the sorted order
if ?x?Q, (S1,Q1), , (Sm,Qm) ?L s.t.
Si ?Sx, (Si,Qi) is positive
Qi?Qj?
?i1mQiQ\x
then mark (S,Q) positive
decomposition(S,Q)(S1,Q1),...,(Sm,Qm)
if ?S s.t. all (S,Qi)?L are positive
return corresponding junction tree

S
Computation time is reduced from exponential in
V to polynomial!
Example ?0.25
Set S does not have to relate to the separators
of the true JT in any way!
NP-complete to decide ? We use greedy heuristic
Data KrauseGuestrinUAI05 32 variables,
treewidth 3
merge end result
Pairwise I(.,.S)
Test edge, merge variables
I() too low, do not merge
Approximation quality guarantee
Theorem 1 Suppose an ?-JT of treewidth k exists
for P(V). Suppose the sets S?V of size k, A?V\S
of arbitrary size are s.t. for every X?V\S of
size k1 it holds that I(X?A, X?(V\SA)S S) lt
? then I(A, V\SA S) lt V(??)

This work contributions
The first polynomial time algorithm with PAC
guarantees for learning low-treewidth graphical
models with
guaranteed tractable inference!
Key theoretical insight polynomial-time upper
bound on conditional mutual information for
arbitrarily large sets of variables
Empirical viability demonstration

Theorem Narasimhan and Bilmes, UAI05 If for
every separator Sij in the junction tree it holds
that the conditional mutual information I(Vi?j,
Vj?i Sij ) lt ? (call it ?-junction
tree) then KL(PPtree) lt V?
1 BachJordanNIPS-02 2 ChoialUAI-05 3
ChowLiuIEEE-1968 4 MeilaJordanJMLR-01 5
TeyssierKollerUAI-05 6 SinghMooreCMU-CALD-05
7 KargerSrebroSODA-01 8 AbbeelalJMLR-06
9 NarasimhanBilmesUAI-04

Theorem (results quality)If after invoking
Alg.1(S,??) a set U is a connected component,
then
For every Z s.t. I(Z, V\ZS S)lt?it holds that
U?Z
I(U, V\US S)ltnk?

Greedy heuristic for decomposition search
initialize decomposition to empty
iteratively add pairs (Si,Qi) that do not
conflict with those already in the decomposition
if all variables of Q are covered, success
May fail even if a decomposition exists
But we prove that for certain distributions
guaranteed to work

never mistakenly put variables together

Future work
Extend to non-maximal junction trees
Heuristics to speed up performance
Using information about edges likelihood (e.g.
from L1 regularized logistic regression) to cut
down on computation.

Incorrect splits not too bad
Goal find an ?junction tree with fixed clique
size k in polynomial (in V) time
Complexity O(nk1). Polynomial in n,instead of
O(exp(n)) for straightforward computation
Complexity O(nk3). Polynomial in n.

Write a Comment

User Comments (0)