Junction trees - PowerPoint PPT Presentation

About This Presentation
Title:

Junction trees

Description:

Compact representation tractable inference. Exact inference #P-complete in general. Often still need exponential time even for compact models. Example: ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 2
Provided by: bill64
Category:

less

Transcript and Presenter's Notes

Title: Junction trees


1
Efficient Principled Learning of Junction Trees
Anton Chechetka and Carlos Guestrin
Carnegie Mellon University

Motivation
Constructing a junction tree Using Alg.1 for
every S?V, obtain a list L of pairs (S,Q) s.t
I(Q,V\SQS)ltV(??) Example
Theoretical guarantees Intuition if the
intra-clique dependencies are strong enough,
guaranteed to find a well- approximating JT in
polynomial time.
Experimental results
  • Finding almost independent subsets
  • Question if S is a separator of an ?-JT, which
    variables are on the same side of S?
  • More than one correct answer possible
  • We will settle for finding one
  • Drop the complexity to polynomial from exponential
  • Junction trees
  • Trees where each node is a set of variables
  • Running intersection property every clique
    between Ci and Cj contains Ci ? Cj
  • Ci and Cj are neighbors ? SijCi ? Cj is called a
    separator
  • Example
  • Notation Vi?j is a set of all variables on the
    same side of edge i-j as clique Cj
  • V3?4GF, V3?1A, V4?3AD
  • Encoded independencies (Vi?j ? Vj?i
    Sij)
  • Constraint-based learning
  • Naively
  • for every candidate sep. S of size k
  • for every X?V\S
  • if I(X, V\SX S) lt ?
  • add (S,X) to the list of useful components L
  • find a JT consistent with L
  • Probabilistic graphical models are everywhere
  • Medical diagnosis, datacenter performance
    monitoring, sensor nets,

Model quality (log-likelihood on test set)
  • Compare this work with
  • ordering-based search (OBS) TeyssierKollerUAI05
  • Chow-Liu alg. ChowLiuIEEE68
  • Karger-Srebro alg.KargerSrebroSODA01
  • local search
  • this work local search combination (using our
    algorithm to initialize local search)

Complexity
B
B
B
C
BE
Theorem Suppose a maximal ?-JT tree of treewidth
k exists for P(V) s.t. for every clique C and
separator S of tree it holds that
minX?(C\S)I(X,C\SXS) gt (k3)(??) then our
algorithm will find a kV(??)-JT for P(V) with
probability at least (1-?) using
AB
,
,
,
,
S
A
EF
CD
D
  • Main advantages
  • Compact representation of probability
    distributions
  • Exploit structure to speed up inference

BC
Q
C
E
E
SA B, C,D OR B,C, D
EF
CD
,
,
Separators
ABCD
1
ABEF
F
V3? 4
EG
AB
B
E
5
  • Problem From L, reconstruct a junction tree.
  • This is non-trivial. Complications
  • L may encode more independencies than a single JT
    encodes
  • Several different JTs may be consistent with
    independencies in L

B
BC
BE
Cliques
samples and
E
3
4
C
  • But also problems
  • Compact representation? tractable inference.
  • Exact inference P-complete in general
  • Often still need exponential time even for
    compact models
  • Example
  • Often do not even have structure, only data
  • Best structure is NP-complete to find
  • Most structure learning algorithms return complex
    models, where inference is intractable
  • Very few structure learning algorithms have
    global quality guarantees
  • We address both of these issues! We provide
  • efficient structure learning algorithm
  • guaranteed to learn tractable models
  • with global guarantees on the results quality

CD
Intuition Consider set of variables QBCD.
Suppose an ?-JT (e.g. above) with separator SA
exists s.t. some of the variables in Q (B) are
on the left of S and the remaining ones (CD) on
the right.then a partitioning of Q into X and Y
exists s.t. I(X,YS)lt?
EF
2
Key theoretical result Efficient upper bound for
I(?,??)
6
time
Key insight Arnborgal,SIAM-JADM1987,
NarasimhanBilmes UAI05 In a junction tree,
components (S,Q) have recursive decomposition
Data BeinlichalECAIM1988 37 variables,
treewidth 4, learned treewidth 3
Intuition Suppose a distribution P(V) can be
well approximated by a junction tree with clique
size k. Then for every set S?V of size k, A,B?V
of arbitrary size, to check that I(A,B S) is
small, it is enough to check for all subsets X?A,
Y?B of size at most k that I(X,YS) is small.
possible partitionings
Corollary Maximal JTs of fixed treewidth s.t.
for every clique C and separator S it holds
that minX?(C\S)I(X,C\SXS) gt? for fixed ?gt0 are
efficiently PAC learnable
B
C
4 neighbors per variable (a constant!), but
inference still hard
a clique in the junction tree
D
C
smaller components from L
  • Tractability guarantees
  • Inference exponential in clique size k
  • Small cliques ? tractable inference ?

if no such splits exist, all variables of Q must
be on the same side of S
B
EF
A
I(A,B S)??
Only need to compute I(X,YS) for small X and Y!
Related work
A
B
  • Alg. 1 (given candidate sep. S), threshold ?
  • each variable of V\S starts out as a separate
    partition
  • for every Q?V\S of size at most k2
  • if minX?Q I(X,Q\S S) gt ?
  • merge all partitions that have variables in Q

Data DesphandealVLDB04 54 variables, treewidth
2
Look for such recursive decompositions in L!
  • JTs as approximations
  • Often exact conditional independence is too
  • strict a requirement
  • generalization conditional mutual
    informationI(A , B C) ? H(A B) H(A BC)
  • H() is conditional entropy
  • I(A , B C) 0 always
  • I(A , B C) 0 ? (A ? B C)
  • intuitively if C is already known, how much new
    information about A is contained in B?

Y
I(X,YS)
X
Fixed size regardless of Q
  • DP algorithm (input list L of pairs (S,Q))
  • sort L in the order of increasing Q
  • mark (S,Q)?L with Q1 as positive
  • for (S,Q)?L, Q2, in the sorted order
  • if ?x?Q, (S1,Q1), , (Sm,Qm) ?L s.t.
  • Si ?Sx, (Si,Qi) is positive
  • Qi?Qj?
  • ?i1mQiQ\x
  • then mark (S,Q) positive
  • decomposition(S,Q)(S1,Q1),...,(Sm,Qm)
  • if ?S s.t. all (S,Qi)?L are positive
  • return corresponding junction tree

S
Computation time is reduced from exponential in
V to polynomial!
Example ?0.25
Set S does not have to relate to the separators
of the true JT in any way!
NP-complete to decide ? We use greedy heuristic
Data KrauseGuestrinUAI05 32 variables,
treewidth 3
merge end result
Pairwise I(.,.S)
Test edge, merge variables
I() too low, do not merge
Approximation quality guarantee
Theorem 1 Suppose an ?-JT of treewidth k exists
for P(V). Suppose the sets S?V of size k, A?V\S
of arbitrary size are s.t. for every X?V\S of
size k1 it holds that I(X?A, X?(V\SA)S S) lt
? then I(A, V\SA S) lt V(??)
  • This work contributions
  • The first polynomial time algorithm with PAC
    guarantees for learning low-treewidth graphical
    models with
  • guaranteed tractable inference!
  • Key theoretical insight polynomial-time upper
    bound on conditional mutual information for
    arbitrarily large sets of variables
  • Empirical viability demonstration

Theorem Narasimhan and Bilmes, UAI05 If for
every separator Sij in the junction tree it holds
that the conditional mutual information I(Vi?j,
Vj?i Sij ) lt ? (call it ?-junction
tree) then KL(PPtree) lt V?
1 BachJordanNIPS-02 2 ChoialUAI-05 3
ChowLiuIEEE-1968 4 MeilaJordanJMLR-01 5
TeyssierKollerUAI-05 6 SinghMooreCMU-CALD-05
7 KargerSrebroSODA-01 8 AbbeelalJMLR-06
9 NarasimhanBilmesUAI-04
  • Theorem (results quality)If after invoking
    Alg.1(S,??) a set U is a connected component,
    then
  • For every Z s.t. I(Z, V\ZS S)lt?it holds that
    U?Z
  • I(U, V\US S)ltnk?
  • Greedy heuristic for decomposition search
  • initialize decomposition to empty
  • iteratively add pairs (Si,Qi) that do not
    conflict with those already in the decomposition
  • if all variables of Q are covered, success
  • May fail even if a decomposition exists
  • But we prove that for certain distributions
    guaranteed to work

never mistakenly put variables together
  • Future work
  • Extend to non-maximal junction trees
  • Heuristics to speed up performance
  • Using information about edges likelihood (e.g.
    from L1 regularized logistic regression) to cut
    down on computation.

Incorrect splits not too bad
Goal find an ?junction tree with fixed clique
size k in polynomial (in V) time
Complexity O(nk1). Polynomial in n,instead of
O(exp(n)) for straightforward computation
Complexity O(nk3). Polynomial in n.
Write a Comment
User Comments (0)
About PowerShow.com