Inference in Gaussian and Hybrid Bayesian Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Inference in Gaussian and Hybrid Bayesian Networks

Description:

Inference in Gaussian and Hybrid Bayesian Networks ICS 275B Gaussian Distribution Multivariate Gaussian Definition: Let X1, ,Xn. Be a set of random variables. – PowerPoint PPT presentation

Number of Views:296
Avg rating:3.0/5.0
Slides: 67
Provided by: vib3
Learn more at: https://ics.uci.edu
Category:

less

Transcript and Presenter's Notes

Title: Inference in Gaussian and Hybrid Bayesian Networks


1
Inference in Gaussian and Hybrid Bayesian Networks
  • ICS 275B

2
Gaussian Distribution
3
(No Transcript)
4
(No Transcript)
5
Multivariate Gaussian
  • Definition
  • Let X1,,Xn. Be a set of random variables. A
    multivariate Gaussian distribution over X1,,Xn
    is a parameterized by an n-dimensional mean
    vector ? and an n x n positive definitive
    covariance matrix ?. It defines a joint density
    via

6
Multivariate Gaussian
7
Linear Gaussian Distribution
  • Definition
  • Let Y be a continuous node with continuous
    parents X1,,Xk. We say that Y has a linear
    Gaussian model if it can be described using
    parameters ?0, ,?k and ?2 such that
  • P(y x1,,xk)N (µy ?1x1 ,?kxk ? )
  • N(µy,?1,,?k , ? )

8
(No Transcript)
9
(No Transcript)
10
Linear Gaussian Network
  • Definition
  • Linear Gaussian Bayesian network is a Bayesian
    network all of whose variables are continuous and
    where all of the CPTs are linear Gaussians.
  • Linear Gaussian BN ? Multivariate Gaussian
  • gtLinear Gaussian BN has a compact representation

11
Inference in Continuous Networks
A
B
12
Marginalization
13
Problems When we Multiply two arbitrary
Gaussians!
Inverse of K and M is always well
defined. However, this inverse is not!
14
Theoretical explanation Why this is the case ?
  • Inverse of a matrix of size n x n exists when the
    matrix is of rank n.
  • If all sigmas and ws are assumed to be 1.
  • (K-1M-1) has rank 2 and so is not invertible.

15
Density vs conditional
  • However,
  • Theorem If the product of the gaussians
    represents a multi-variate gaussian density, then
    the inverse always exists.
  • For example, For P(AB)P(B)P(A,B) N(c,C) then
    inverse of C always exists. P(A,B) is a
    multi-variate gaussian (density).
  • But P(AB)P(BX)P(A,BX) N(c,C) then inverse
    of C may not exist. P(A,BX) is a conditional
    gaussian.

16
Inference A general algorithm Computing marginal
of a given variable, say Z.
Step 1 Convert all conditional gaussians to
canonical form
17
Inference A general algorithm Computing marginal
of a given variable, say Z.
  • Step 2
  • Extend all gs,hs and ks to the same domain by
    adding 0s.

18
Inference A general algorithm Computing marginal
of a given variable, say Z.
  • Step 3 Add all gs, all hs and all ks.
  • Step 4 Let the variables involved in the
    computation be P(X1,X2,,Xk,Z) N(µ,?)

19
Inference A general algorithm Computing marginal
of a given variable, say Z.
Step 5 Extract the marginal
20
Inference Computing marginal of a given variable
  • For a continuous Gaussian Bayesian Network,
    inference is polynomial O(N3).
  • Complexity of matrix inversion
  • So algorithms like belief propagation are not
    generally used when all variables are Gaussian.
  • Can we do better than N3?
  • Use Bucket elimination.

21
Bucket elimination Algorithm elim-bel (Dechter
1996)
Marginalization operator
22
Multiplication Operator
  • Convert all functions to canonical form if
    necessary.
  • Extend all functions to the same variables
  • (g1,h1,k1)(g2,h2,k2) (g1g2,h1h2,k1k2)

23
Again our problem!
h(a,d,c,e) does not represent a density and so
cannot be computed in our usual form N(µ,s)
Marginalization operator
24
Solution Marginalize in canonical form
  • Although intermediate functions computed in
    bucket elimination are conditional, we can
    marginalize in canonical form, so we can
    eliminate the problem of non-existence of inverse
    completely.

25
Algorithm
  • In each bucket, convert all functions in
    canonical form if necessary, multiply them and
    marginalize out the variable in the bucket as
    shown in the previous slide.
  • Theorem P(A) is a density and is correct.
  • Complexity Time and space O((w1)3) where w is
    the width of the ordering used.

26
Continuous Node, Discrete Parents
  • Definition
  • Let X be a continuous node, and let
    UU1,U2,,Un be its discrete parents and
    YY1,Y2,,Yk be its continuous parents. We say
    that X has a conditional linear Gaussian (CLG)
    CPT if, for every value u?D(U), we have a a set
    of (k1) coefficients au,0, au,1, , au,k1 and a
    variance ?u2 such that

27
CLG Network
  • Definition
  • A Bayesian network is called a CLG network if
    every discrete node has only discrete parents,
    and every continuous node has a CLG CPT.

28
Inference in CLGs
  • Can we use the same algorithm?
  • Yes, but the algorithm is unbounded if we are not
    careful.
  • Reason
  • Marginalizing out discrete variables from any
    arbitrary function in CLGs is not bounded.
  • If we marginalize out y and k from f(x,y,i,k) ,
    the result is a mixture of 4 gaussians instead of
    2.
  • X and y are continuous variables
  • I and k are discrete binary variables.

29
Solution Approximate the mixture of Gaussians by
a single gaussian
30
Multiplication and Marginalization
Strong marginal when marginalizing continuous
variables
Multiplication
  • Convert all functions to canonical form if
    necessary.
  • Extend all functions to the same variables
  • (g1,h1,k1)(g2,h2,k2) (g1g2,h1h2,k1k2)

Weak marginal when marginalizing discrete
variables
31
Problem while using this marginalization in
bucket elimination
  • Requires computing ? and µ which is not possible
    due to non-existence of inverse.
  • Solution Use an ordering such that you never
    have to marginalize out discrete variables from a
    function that has both discrete and continuous
    gaussian variables.
  • Special case Compute marginal at a discrete node
  • Homework Derive a bucket elimination algorithm
    for computing marginal of a continuous variable.

32
Special Case A marginal on a discrete variable
in a CLG is to be computed.
B,C and D are continuous variables and A and E is
discrete
Marginalization operator
33
Complexity of the special case
  • Discrete-width (wd) Maximum number of discrete
    variables in a clique
  • Continuous-width (wc) Maximum number of
    continuous variables in a clique
  • Time O(exp(wd)wc3)
  • Space O(exp(wd)wc3)

34
Algorithm for the general caseComputing Belief
at a continuous node of a CLG
  • Convert all functions to canonical form.
  • Create a special tree-decomposition
  • Assign functions to appropriate cliques (Same as
    assigning functions to buckets)
  • Select a Strong Root
  • Perform message passing

35
Creating a Special-tree decomposition
  • Moralize the Bayesian Network.
  • Select an ordering such that all continuous
    variables are ordered before discrete variables
    (Increases induced width).

36
Elimination order
w
x
  • Strong elimination order
  • First eliminate continuous variables
  • Eliminate discrete variable when no available
    continuous variables

y
W and X are discrete variables and Y and Z are
continuous.
z
Moralized graph has this edge
37
Elimination order (1)
dim 2
dim 2
w
x
y
dim 2
z
1
38
Elimination order (2)
dim 2
dim 2
w
x
y
2
z
1
39
Elimination order (3)
3
dim 2
w
x
y
2
z
1
40
Elimination order (4)
3
4
3
4
w
x
w
x
3
y
w
y
2
2
3
Cliques 2
w
z
y
1
2
separator
y
2
Cliques 1
z
1
41
Bucket tree or Junction tree (1)
w
x
w
y
Cliques 2 root
w
y
separator
y
Cliques 1
z
42
Algorithm for the general caseComputing Belief
at a continuous node of a CLG
  • Convert all functions to canonical form.
  • Create a special tree-decomposition
  • Assign functions to appropriate cliques (Same as
    assigning functions to buckets)
  • Select a Strong Root
  • Perform message passing

43
Assigning Functions to cliques
  • Select a function and place it in an arbitrary
    clique that mentions all variables in the
    function.

44
Algorithm for the general caseComputing Belief
at a continuous node of a CLG
  • Convert all functions to canonical form.
  • Create a special tree-decomposition
  • Assign functions to appropriate cliques (Same as
    assigning functions to buckets)
  • Select a Strong Root
  • Perform message passing

45
Strong Root
  • We define a strong root as any node R in the
    bucket-tree which satisfies the following
    property for any pair (V,W) which are neighbors
    on the tree with W closer to R than V, we have

46
Example Strong root
Strong Root
47
Algorithm for the general caseComputing Belief
at a continuous node of a CLG
  • Create a special tree-decomposition
  • Assign functions to appropriate cliques (Same as
    assigning functions to buckets)
  • Select a Strong Root
  • Perform message passing

48
Message passing at a typical node
x2
  • Node a contains functions assigned to it
    according to the tree-decomposition scheme
    denoted by pj(a)

49
Message Passing
Two pass algorithm Bucket-tree propagation
Figure from P. Green
50
Lets look at the messagesCollect Evidence
Strong Root
?C
?Mout
?D
?L
?Min?D
51
Distribute Evidence
Strong Root
?E?W,B
?W
?F
?E?W,B
?E?B
52
Lauritzens theorem
  • When you perform message passing such that
    collect evidence contains only strong marginals
    and distribute evidence may contain weak
    marginals, the junction-tree algorithm in exact
    in the sense that
  • The first (mean) and second moments (variance)
    computed are true moments

53
Complexity
  • Polynomial in of continuous variables in a
    clique (n3)
  • Exponential in of discrete variables in a clique
  • Possible options for approximation
  • Ignore the strong root assumption and use
    approximation like MBTE, IJGP, Sampling
  • Respect the strong root assumption and use
    approximation like MBTE, IJGP, Sampling
  • Inaccuracies only due to discrete variables if
    done in one pass of MBTE.

54
Initialization (1)
dim 2
dim 2
x0 0.4
x1 0.6
w0 0.5
w1 0.5
w
x
X0 X1

y
dim 2
z
dim 2
W0 W1

55
Initialization (2)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
x0 glog(0.4),h,K
x1 glog(0.6),h,K
w0 glog(0.5),h,K
w1 glog(0.5),h,K
X0 X1
g -4.1245 h -0.02 0.12 K 0.1 0 0 0.1 g -3.0310 h 0.5 -0.5 K 0.5 0.50.5 0.5
W0 W1
g -4.0629 h 0.0889 -0.0111 -0.0556 0.0556 K g -2.7854 h 0.0867 -0.0633 -0.1000 -0.1667 K
56
Initialization (3)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
wx00 wx10
g -5.1308 h -0.02 0.12 K 0.1 0 0 0.1 g -5.1308 h -0.02 0.12 K 0.1 0 0 0.1
wx01 wx11
g -3.5418 h 0.5 -0.5 K 0.5 0.50.5 0.5 g -3.5418 h 0.5 -0.5 K 0.5 0.50.5 0.5
W0 W1
g -4.7560 h K g -3.4786 h K
57
Message Passing
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
Collect evidence
Distribute evidence
58
Collect evidence (1)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
y2y3
y2
?(y1,y2)?(y2)
y1y2
59
Collect evidence (2)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
W0 W1
g -4.7560 h K g -3.4786 h K
marginalization
W0 W1
g -0.6931 h 0.1388 0 1.0e-16 K 0.2776 -0.06940.0347 01.0e-16 g -0.6931 h 0 0 K 0 0 0 0
60
Collect evidence (3)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
empty
W0 W1
g -0.6931 h 0.1388 0 1.0e-16 K 0.2776 -0.06940.0347 01.0e-16 g -0.6931 h 0 0 K 0 0 0 0
multiplication
wx00 wx10
g -5.1308 h -0.02 0.12 K 0.1 0 0 0.1 g -5.1308 h -0.02 0.12 K 0.1 0 0 0.1
wx01 wx11
g -3.5418 h 0.5 -0.5 K 0.5 0.50.5 0.5 g -3.5418 h 0.5 -0.5 K 0.5 0.50.5 0.5
wx00 wx10
g -5.8329 h -0.02 0.12 K 0.1 0 0 0.1 g -5.8329 h -0.02 0.12 K 0.1 0 0 0.1
wx01 wx11
g -4.2350 h 0.5 -0.5 K 0.5 0.50.5 0.5 g -4.2350 h 0.5 -0.5 K 0.5 0.50.5 0.5
61
Distribute evidence (1)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
W0 W1
g -4.7560 h K g -3.4786 h K
division
W0 W1
g -0.6931 h 0.1388 0 1.0e-16 K 0.2776 -0.06940.0347 01.0e-16 g -0.6931 h 0 0 K 0 0 0 0
62
Distribute evidence (2)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
W0 W1
g -4.0629 h K g -2.7854 h K
63
Distribute evidence (3)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
wx00 wx10
g -5.8329 h -0.02 0.12 K 0.1 0 0 0.1 g -5.8329 h -0.02 0.12 K 0.1 0 0 0.1
wx01 wx11
g -4.2350 h 0.5 -0.5 K 0.5 0.50.5 0.5 g -4.2350 h 0.5 -0.5 K 0.5 0.50.5 0.5
Marginalize over x
w0 w1
logp -0.6931 mu 0.52 -0.12 Sigma logp -0.6931 mu 0.52 -0.12 Sigma
64
Distribute evidence (4)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
W0 W1
g -4.0629 h K g -2.7854 h K
multiplication
w0 w1
logp -0.6931 mu 0.52 -0.12 Sigma logp -0.6931 mu 0.52 -0.12 Sigma
w0 w1
g -4.3316 h 0.0927 -0.0096 K g -0.6931 h 0.0927 -0.0096 K
Canonical form
65
Distribute evidence (5)
Cliques 1
Cliques 2 (root)
wyz
wxy
wy
W0 W1
g -8.3935 h K g -7.1170 h K
66
After Message Passing
Cliques 1
Cliques 2 (root)
p(wyz)
p(wxy)
p(wy)
Local marginal distributions
Write a Comment
User Comments (0)
About PowerShow.com