Bayesian networks Variable Elimination - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Bayesian networks Variable Elimination

Description:

This is the a posteriori belief in X, given evidence e ... A posteriori belief. This query is useful in many cases: ... Queries: A posteriori joint ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 68
Provided by: gzirkela
Category:

less

Transcript and Presenter's Notes

Title: Bayesian networks Variable Elimination


1
BayesiannetworksVariable Elimination
Based on Nir Friedmans course (Hebrew
University)
2
  • In previous lessons we introduced compact
    representations of probability distributions
  • Bayesian Networks
  • A network describes a unique probability
    distribution P
  • How do we answer queries about P?
  • The process of computing answers to these queries
    is called probabilistic inference

3
Queries Likelihood
  • There are many types of queries we might ask.
  • Most of these involve evidence
  • An evidence e is an assignment of values to a set
    E of variables in the domain
  • Without loss of generality E Xk1, , Xn
  • Simplest query compute probability of
    evidence
  • This is often referred to as computing the
    likelihood of the evidence

4
Queries A posteriori belief
  • Often we are interested in the conditional
    probability of a variable given the evidence
  • This is the a posteriori belief in X, given
    evidence e
  • A related task is computing the term P(X, e)
  • i.e., the likelihood of e and X x for values
    of X
  • we can recover the a posteriori belief by

5
A posteriori belief
  • This query is useful in many cases
  • Prediction what is the probability of an outcome
    given the starting condition
  • Target is a descendent of the evidence
  • Diagnosis what is the probability of
    disease/fault given symptoms
  • Target is an ancestor of the evidence
  • As we shall see, the direction between variables
    does not restrict the directions of the queries
  • Probabilistic inference can combine evidence form
    all parts of the network

6
Queries A posteriori joint
  • In this query, we are interested in the
    conditional probability of several variables,
    given the evidence P(X, Y, e )
  • Note that the size of the answer to query is
    exponential in the number of variables in the
    joint

7
Queries MAP
  • In this query we want to find the maximum a
    posteriori assignment for some variable of
    interest (say X1,,Xl )
  • That is, x1,,xl maximize the probability P(x1,
    ,xl e)
  • Note that this is equivalent to
    maximizing P(x1,,xl, e)

8
Queries MAP
  • We can use MAP for
  • Classification
  • find most likely label, given the evidence
  • Explanation
  • What is the most likely scenario, given the
    evidence

9
Queries MAP
  • Cautionary note
  • The MAP depends on the set of variables
  • Example
  • MAP of X is 1,
  • MAP of (X, Y) is (0,0)

10
Complexity of Inference
  • Theorem
  • Computing P(X x) in a Bayesian network is
    NP-hard
  • Not surprising, since we can simulate Boolean
    gates.

11
Proof
  • We reduce 3-SAT to Bayesian network computation
  • Assume we are given a 3-SAT problem
  • q1,,qn be propositions,
  • ?1 ,... ,?k be clauses, such that ?i li1? li2 ?
    li3 where each lij is a literal over q1,,qn
  • ? ?1?... ??k
  • We will construct a network s.t. P(Xt) gt 0 iff
    ? is satisfiable

12
...
Q1
Q3
Q2
Q4
Qn
...
?1
?2
?3
?k-1
?k
...
A1
A2
X
Ak/2-1
  • P(Qi true) 0.5,
  • P(?I true Qi , Qj , Ql ) 1 iff Qi , Qj , Ql
    satisfy the clause ?I
  • A1, A2, , are simple binary and gates

13
  • It is easy to check
  • Polynomial number of variables
  • Each CPDs can be described by a small table (8
    parameters at most)
  • P(X true) gt 0 if and only if there exists a
    satisfying assignment to Q1,,Qn
  • Conclusion polynomial reduction of 3-SAT

14
  • Note this construction also shows that computing
    P(X t) is harder than NP
  • 2nP(X t) is the number of satisfying
    assignments to ?
  • Thus, it is P-hard (in fact it is P-complete)

15
Hardness - Notes
  • We used deterministic relations in our
    construction
  • The same construction works if we use (1-?, ?)
    instead of (1,0) in each gate for any ? lt 0.5
  • Hardness does not mean we cannot solve inference
  • It implies that we cannot find a general
    procedure that works efficiently for all networks
  • For particular families of networks, we can have
    provably efficient procedure

16
Inference in Simple Chains
X1
X2
  • How do we compute P(X2)?

17
Inference in Simple Chains (cont.)
X1
X2
X3
  • How do we compute P(X3)?
  • we already know how to compute P(X2)...

18
Inference in Simple Chains (cont.)
...
  • How do we compute P(Xn)?
  • Compute P(X1), P(X2), P(X3),
  • We compute each term by using the previous one
  • Complexity
  • Each step costs O(Val(Xi)Val(Xi1))
    operations
  • Compare to naïve evaluation, that requires
    summing over joint values of n-1 variables

19
Inference in Simple Chains (cont.)
X1
X2
  • Suppose that we observe thevalue of X2 x2
  • How do we compute P(X1x2)?
  • Recall that it suffices to compute P(X1,x2)

20
Inference in Simple Chains (cont.)
X1
X2
X3
  • Suppose that we observe the value of X3 x3
  • How do we compute P(X1,x3)?
  • How do we compute P(x3x1)?

21
Inference in Simple Chains (cont.)
...
X1
X2
X3
Xn
  • Suppose that we observe the value of Xn xn
  • How do we compute P(X1,xn)?

22
Inference in Simple Chains (cont.)
X1
X2
X3
Xn
  • We compute P(xnxn-1), P(xnxn-2), iteratively

23
Inference in Simple Chains (cont.)
...
...
X1
X2
Xk
Xn
  • Suppose that we observe the value of Xn xn
  • We want to find P(Xkxn )
  • How do we compute P(Xk,xn )?
  • We compute P(Xk ) by forward iterations
  • We compute P(xn Xk ) by backward iterations

24
Elimination in Chains
  • We now try to understand the simple chain example
    using first-order principles
  • Using definition of probability, we have

25
Elimination in Chains
  • By chain decomposition, we get

A
B
C
E
D
26
Elimination in Chains
  • Rearranging terms ...

27
Elimination in Chains
X
A
B
C
E
D
  • Now we can perform innermost summation
  • This summation, is exactly the first step in the
    forward iteration we describe before

28
Elimination in Chains
  • Rearranging and then summing again, we get

X
X
A
B
C
E
D
29
Elimination in Chains with Evidence
  • Similarly, we understand the backward pass
  • We write the query in explicit form

30
Elimination in Chains with Evidence
  • Eliminating d, we get

X
A
B
C
E
D
31
Elimination in Chains with Evidence
  • Eliminating c, we get

X
X
A
B
C
E
D
32
Elimination in Chains with Evidence
  • Finally, we eliminate b

X
X
X
A
B
C
E
D
33
Variable Elimination
  • General idea
  • Write query in the form
  • Iteratively
  • Move all irrelevant terms outside of innermost
    sum
  • Perform innermost sum, getting a new term
  • Insert the new term into the product

34
A More Complex Example
  • Asia network

35
S
V
  • We want to compute P(d)
  • Need to eliminate v,s,x,t,l,a,b
  • Initial factors

L
T
A
B
X
D
36
S
V
  • We want to compute P(d)
  • Need to eliminate v,s,x,t,l,a,b
  • Initial factors

L
T
A
B
X
D
Eliminate v
Note fv(t) P(t) In general, result of
elimination is not necessarily a probability term
37
  • We want to compute P(d)
  • Need to eliminate s,x,t,l,a,b
  • Initial factors

Eliminate s
Summing on s results in a factor with two
arguments fs(b,l) In general, result of
elimination may be a function of several variables
38
  • We want to compute P(d)
  • Need to eliminate x,t,l,a,b
  • Initial factors

Eliminate x
Note fx(a) 1 for all values of a !!
39
  • We want to compute P(d)
  • Need to eliminate t,l,a,b
  • Initial factors

Eliminate t
40
  • We want to compute P(d)
  • Need to eliminate l,a,b
  • Initial factors

Eliminate l
41
  • We want to compute P(d)
  • Need to eliminate b
  • Initial factors

Eliminate a,b
Compute
a
42
Variable Elimination
  • We now understand variable elimination as a
    sequence of rewriting operations
  • Actual computation is done in elimination step
  • Computation depends on order of elimination
  • We will return to this issue in detail

43
Dealing with evidence
  • How do we deal with evidence?
  • Suppose get evidence V t, S f, D t
  • We want to compute P(L, V t, S f, D t)

44
Dealing with Evidence
  • We start by writing the factors
  • Since we know that V t, we dont need to
    eliminate V
  • Instead, we can replace the factors P(V) and
    P(TV) with
  • These select the appropriate parts of the
    original factors given the evidence
  • Note that fp(V) is a constant, and thus does not
    appear in elimination of other variables

45
Dealing with Evidence
  • Given evidence V t, S f, D t
  • Compute P(L, V t, S f, D t )
  • Initial factors, after setting evidence

46
Dealing with Evidence
  • Given evidence V t, S f, D t
  • Compute P(L, V t, S f, D t )
  • Initial factors, after setting evidence
  • Eliminating x, we get

47
Dealing with Evidence
  • Given evidence V t, S f, D t
  • Compute P(L, V t, S f, D t )
  • Initial factors, after setting evidence
  • Eliminating x, we get
  • Eliminating t, we get

48
Dealing with Evidence
  • Given evidence V t, S f, D t
  • Compute P(L, V t, S f, D t )
  • Initial factors, after setting evidence
  • Eliminating x, we get
  • Eliminating t, we get
  • Eliminating a, we get

49
Dealing with Evidence
  • Given evidence V t, S f, D t
  • Compute P(L, V t, S f, D t )
  • Initial factors, after setting evidence
  • Eliminating x, we get
  • Eliminating t, we get
  • Eliminating a, we get
  • Eliminating b, we get

50
Complexity of variable elimination
  • Suppose in one elimination step we compute
  • This requires

  • multiplications
  • For each value for x, y1, , yk, we do m
    multiplications
  • additions
  • For each value of y1, , yk , we do Val(X)
    additions
  • Complexity is exponential in number of variables
    in the intermediate factor.

51
Understanding Variable Elimination
  • We want to select good elimination orderings
    that reduce complexity
  • We start by attempting to understand variable
    elimination via the graph we are working with
  • This will reduce the problem of finding good
    ordering to a graph-theoretic operation that is
    well-understood

52
Undirected graph representation
  • At each stage of the procedure, we have an
    algebraic term that we need to evaluate
  • In general this term is of the form
  • where Zi are sets of variables
  • We now plot a graph where there is undirected
    edge X--Y if X,Y are arguments of some factor
  • that is, if X,Y are in some Zi

53
Undirected Graph Representation
  • Consider the Asia example
  • The initial factors are
  • thus, the undirected graph is
  • In the first step this graph is just the
    moralized graph

V
S
V
S
L
T
L
T
A
B
A
B
X
D
X
D
54
Undirected Graph Representation
  • Now we eliminate t, getting
  • The corresponding change in the graph is

V
S
V
S
L
T
L
T
A
B
A
B
X
D
X
D
55
Example
  • Want to computeP(L, V t, S f, D t)
  • Moralizing

V
S
L
T
A
B
X
D
56
Example
  • Want to computeP(L, V t, S f, D t)
  • Moralizing
  • Setting evidence

V
S
L
T
A
B
X
D
57
Example
  • Want to computeP(L, V t, S f, D t)
  • Moralizing
  • Setting evidence
  • Eliminating x
  • New factor fx(A)

V
S
L
T
A
B
X
D
58
Example
  • Want to computeP(L, V t, S f, D t)
  • Moralizing
  • Setting evidence
  • Eliminating x
  • Eliminating a
  • New factor fa(b,t,l)

V
S
L
T
A
B
X
D
59
Example
  • Want to computeP(L, V t, S f, D t)
  • Moralizing
  • Setting evidence
  • Eliminating x
  • Eliminating a
  • Eliminating b
  • New factor fb(t,l)

V
S
L
T
A
B
X
D
60
Example
  • Want to computeP(L, V t, S f, D t)
  • Moralizing
  • Setting evidence
  • Eliminating x
  • Eliminating a
  • Eliminating b
  • Eliminating t
  • New factor ft(l)

V
S
L
T
A
B
X
D
61
Elimination in Undirected Graphs
  • Generalizing, we see that we can eliminate a
    variable x by
  • 1. For all Y,Z, s.t., Y--X, Z--X
  • add an edge Y--Z
  • 2. Remove X and all adjacent edges to it
  • This procedure creates a clique that contains all
    the neighbors of X
  • After step 1 we have a clique that corresponds to
    the intermediate factor (before marginalization)
  • The cost of the step is exponential in the size
    of this clique

62
Undirected Graphs
  • The process of eliminating nodes from an
    undirected graph gives us a clue to the
    complexity of inference
  • To see this, we will examine the graph that
    contains all of the edges we added during the
    elimination. The resulting graph is always
    chordal.

63
Example
  • Want to compute P(L)
  • Moralizing

V
S
L
T
A
B
X
D
64
Example
  • Want to compute P(L)
  • Moralizing
  • Eliminating v
  • Multiply to get fv(v,t)
  • Result fv(t)

V
S
L
T
A
B
X
D
65
Example
  • Want to compute P(L)
  • Moralizing
  • Eliminating v
  • Eliminating x
  • Multiply to get fx(a,x)
  • Result fx(a)

V
S
L
T
A
B
X
D
66
Example
  • Want to compute P(L)
  • Moralizing
  • Eliminating v
  • Eliminating x
  • Eliminating s
  • Multiply to get fs(l,b,s)
  • Result fs(l,b)

V
S
L
T
A
B
X
D
67
Example
  • Want to compute P(D)
  • Moralizing
  • Eliminating v
  • Eliminating x
  • Eliminating s
  • Eliminating t
  • Multiply to get ft(a,l,t)
  • Result ft(a,l)

V
S
L
T
A
B
X
D
68
Example
  • Want to compute P(D)
  • Moralizing
  • Eliminating v
  • Eliminating x
  • Eliminating s
  • Eliminating t
  • Eliminating l
  • Multiply to get fl(a,b,l)
  • Result fl(a,b)

V
S
L
T
A
B
X
D
69
Example
  • Want to compute P(D)
  • Moralizing
  • Eliminating v
  • Eliminating x
  • Eliminating s
  • Eliminating t
  • Eliminating l
  • Eliminating a, b
  • Multiply to get fa(a,b,d)
  • Result f(d)

V
S
L
T
A
B
X
D
70
Expanded Graphs
V
S
L
T
  • The resulting graph is the inducedgraph (for
    this particular ordering)
  • Main property
  • Every maximal clique in the induced
    graphcorresponds to a intermediate factor in the
    computation
  • Every factor stored during the process is a
    subset of some maximal clique in the graph
  • These facts are true for any variable elimination
    ordering on any network

A
B
X
D
71
Induced Width (Treewidth)
  • The size of the largest clique in the induced
    graph is thus an indicator for the complexity of
    variable elimination
  • This quantity (minus one) is called the induced
    width (or treewidth) of a graph according to the
    specified ordering
  • Finding a good ordering for a graph is equivalent
    to finding the minimal induced width of the graph

72
Consequence Elimination on Trees
  • Suppose we have a tree
  • A network where each variable has at most one
    parent
  • All the factors involve at most two variables
  • Thus, the moralized graph is also a tree

A
A
C
C
B
B
D
E
D
E
F
G
F
G
73
Elimination on Trees
  • We can maintain the tree structure by eliminating
    extreme variables in the tree

A
C
B
A
E
D
C
B
A
F
G
D
E
C
B
F
G
D
E
F
G
74
Elimination on Trees
  • Formally, for any tree, there is an elimination
    ordering with treewidth 1
  • Theorem
  • Inference on trees is linear in number of
    variables

75
PolyTrees
  • A polytree is a network where there is at most
    one path from one variable to another
  • Theorem
  • Inference in a polytree is linear in the
    representation size of the network
  • This assumes tabular CPT representation
  • Can you see how the argument would work?

A
H
C
B
D
E
F
G
76
General Networks
  • What do we do when the network is not a polytree?
  • If network has a cycle, the treewidth for any
    ordering is greater than 1

77
Example
  • Eliminating A, B, C, D, E,.
  • Resulting graph is chordal with
  • treewidth 2

A
A
A
A
B
C
B
C
B
C
B
C
D
E
D
E
D
E
D
E
F
G
F
G
F
G
F
G
H
H
H
H
78
Example
  • Eliminating H,G, E, C, F, D, E, A

A
A
A
A
B
C
B
C
B
C
B
C
D
E
D
E
D
E
D
E
F
G
F
G
F
G
F
G
H
H
H
H
79
General Networks
  • From graph theory
  • Theorem
  • Finding an ordering that minimizes the treewidth
    is NP-Hard
  • However,
  • There are reasonable heuristics for finding
    relatively good ordering
  • There are provable approximations to the best
    treewidth
  • If the graph has a small treewidth, there are
    algorithms that find it in polynomial time
Write a Comment
User Comments (0)
About PowerShow.com