Title: Inference I Introduction, Hardness, and Variable Elimination
1PGM 2002/03 Tirgul 4Exact Inference
2Inference in Simple Chains
X1
X2
3Inference in Simple Chains (cont.)
X1
X2
X3
- How do we compute P(X3)?
- we already know how to compute P(X2)...
4Inference in Simple Chains (cont.)
...
- How do we compute P(Xn)?
- Compute P(X1), P(X2), P(X3),
- We compute each term by using the previous one
- Complexity
- Each step costs O(Val(Xi)Val(Xi1))
operations - Compare to naïve evaluation, that requires
summing over joint values of n-1 variables
5Inference in Simple Chains (cont.)
X1
X2
- Suppose that we observe the value of X2 x2
- How do we compute P(X1x2)?
- Recall that we it suffices to compute P(X1,x2)
6Inference in Simple Chains (cont.)
X1
X2
X3
- Suppose that we observe the value of X3 x3
- How do we compute P(X1,x3)?
- How do we compute P(x3x1)?
7Inference in Simple Chains (cont.)
...
X1
X2
X3
Xn
- Suppose that we observe the value of Xn xn
- How do we compute P(X1,xn)?
- We compute P(xnxn-1), P(xnxn-2), iteratively
8Inference in Simple Chains (cont.)
...
...
X1
X2
Xk
Xn
- Suppose that we observe the value of Xn xn
- We want to find P(Xkxn )
- How do we compute P(Xk,xn )?
- We compute P(Xk ) by forward iterations
- We compute P(xn Xk ) by backward iterations
9Elimination in Chains
- We now try to understand the simple chain example
using first-order principles - Using definition of probability, we have
10Elimination in Chains
- By chain decomposition, we get
11Elimination in Chains
12Elimination in Chains
- Now we can perform innermost summation
- This summation, is exactly the first step in the
forward iteration we describe before
X
13Elimination in Chains
- Rearranging and then summing again, we get
X
X
14Elimination in Chains with Evidence
- Similarly, we understand the backward pass
- We write the query in explicit form
15Elimination in Chains with Evidence
X
16Elimination in Chains with Evidence
X
X
17Elimination in Chains with Evidence
X
X
X
18Variable Elimination
- General idea
- Write query in the form
- Iteratively
- Move all irrelevant terms outside of innermost
sum - Perform innermost sum, getting a new term
- Insert the new term into the product
19A More Complex Example
20- We want to compute P(d)
- Need to eliminate v,s,x,t,l,a,b
- Initial factors
21- We want to compute P(d)
- Need to eliminate v,s,x,t,l,a,b
- Initial factors
Eliminate v
Note fv(t) P(t) In general, result of
elimination is not necessarily a probability term
22- We want to compute P(d)
- Need to eliminate s,x,t,l,a,b
- Initial factors
Eliminate s
Summing on s results in a factor with two
arguments fs(b,l) In general, result of
elimination may be a function of several variables
23- We want to compute P(d)
- Need to eliminate x,t,l,a,b
- Initial factors
Eliminate x
Note fx(a) 1 for all values of a !!
24- We want to compute P(d)
- Need to eliminate t,l,a,b
- Initial factors
Eliminate t
25- We want to compute P(d)
- Need to eliminate l,a,b
- Initial factors
Eliminate l
26- We want to compute P(d)
- Need to eliminate b
- Initial factors
Eliminate a,b
27Variable Elimination
- We now understand variable elimination as a
sequence of rewriting operations - Actual computation is done in elimination step
- Exactly the same computation procedure applies to
Markov networks - Computation depends on order of elimination
- We will return to this issue in detail
28Dealing with evidence
- How do we deal with evidence?
- Suppose get evidence V t, S f, D t
- We want to compute P(L, V t, S f, D t)
29Dealing with Evidence
- We start by writing the factors
- Since we know that V t, we dont need to
eliminate V - Instead, we can replace the factors P(V) and
P(TV) with - These select the appropriate parts of the
original factors given the evidence - Note that fp(V) is a constant, and thus does not
appear in elimination of other variables
30Dealing with Evidence
- Given evidence V t, S f, D t
- Compute P(L, V t, S f, D t )
- Initial factors, after setting evidence
31Dealing with Evidence
- Given evidence V t, S f, D t
- Compute P(L, V t, S f, D t )
- Initial factors, after setting evidence
- Eliminating x, we get
32Dealing with Evidence
- Given evidence V t, S f, D t
- Compute P(L, V t, S f, D t )
- Initial factors, after setting evidence
- Eliminating x, we get
- Eliminating t, we get
33Dealing with Evidence
- Given evidence V t, S f, D t
- Compute P(L, V t, S f, D t )
- Initial factors, after setting evidence
- Eliminating x, we get
- Eliminating t, we get
- Eliminating a, we get
34Dealing with Evidence
- Given evidence V t, S f, D t
- Compute P(L, V t, S f, D t )
- Initial factors, after setting evidence
- Eliminating x, we get
- Eliminating t, we get
- Eliminating a, we get
- Eliminating b, we get
35Complexity of variable elimination
- Suppose in one elimination step we compute
- This requires
-
multiplications - For each value for x, y1, , yk, we do m
multiplications - additions
- For each value of y1, , yk , we do Val(X)
additions - Complexity is exponential in number of variables
in the intermediate factor!
36Numeric example Green Network
Rain
Hiker
Gazlan
Car
Waste
Pollution
37Example
- Suppose we wish to calculate P(c0p1,w0), using
the algorithm shown in class. - Answer
- First, Well calculate P(C,p1,w0).
- We have the initial factors
- P(R)fR (R) r1 0.2
- r0 0.8
38Initial Factors
- P(GR) fG(G,R) r0 g0 0.1
- r0 g1 0.9
- r1 g0 0.3
- r1 g1 0.7
- P(HR) fH(H,R) h0 r0 0.2
- h0 r1 0.7
- h1 r0 0.8
- h1 r1 0.3
39Initial Factors (cont.)
- P(w0G,H) fW(w0,G,H) w0 h0 g0 0.9
- w0 h0 g1 0.7
- w0 h1 g0 0.8
- w0 h1 g1 0.15
- P(CH) fC(C,H) h0 c0 0.95
- h0 c1 0.05
- h1 c0 0.4
- h1 c1 0.6
40Initial Factors (cont.)
- P(p1C) FP(p1,C) p1 c0 0.2
-
p1 c1 0.9 - We will now choose an elimination order
- P(F,p1,w0)
41Calculation
- The steps are gwghr fWXfG w0 g0 h0 r0
0.9X0.1 0.09 -
w0 g0 h0 r1 0.9X0.3 0.27 - w0 g0
h1 r0 0.8X0.1 0.08 -
w0 g0 h1 r1 0.8X0.3 0.24 -
w0 g1 h0 r0 0.7X0.9 0.63 -
w0 g1 h0 r1 0.7X0.7 0.49 -
w0 g1 h1 r0 0.15X0.9 0.135 -
w0 g1 h1 r1 0.15X0.7 0.105
42Calculation (cont.)
- mhwr w0 h0 r0 0.090.63
0.72 - w0 h0 r1 0.27 0.49 0.76
- w0 h1 r0 0.080.135 0.215
- w0 h1 r1 0.240.105 0.345
- gwhr fH x fR x mhwr w0 h0 r0 0.8x0.2x0.72
0.1152 - w0 h0 r1
0.2x0.7x0.76 0.1064 - w0 h1 r0
0.8x0.8x0.215 0.1376 - w0 h1 r1
0.2x0.3x0.345 0.0207
43Calculation (cont.)
- mhw w0 h0 0.11520.1064
0.2216 - w0 h1 0.13760.0207
0.1583 - gwhc fC x mhw w0 h0 c0 0.2216x0.95
0.21052 - w0 h0 c1 0.2216x0.05
0.01108 - w0 h1 c0 0.1583x0.4
0.06332 - w0 h1 c1 0.1583x0.6
0.09498 - mwc w0 c0 0.210520.06332
0.27384 - w0 c1 0.011080.09498
0.10606
44Calculation (cont.)
- gwcp fP x mwc w0 c0 p1 0.2x0.27384
0.054768 - w0 c1 p1 0.8x0.10606
0.095454 - Now, we have
45The Computational Cost
- Computing gwghr fW X fG requires 8
multiplications. - Computing mhwr requires 4
additions. - Computing gwhr fH x fR x mhwr requires 8
multiplications. - Computing mhw requires 2 additions.
- Computing gwhc fC x mhw requires 4
multiplications. - Computing mwc requires 2 additions.
- Computing gwcp fP x mwc requires 2
multiplications. - For a total of 8842 22 multiplications
- and 422 8 additions.
46Elimination order does matter
- We can choose another elimination order, say
R,G,H - For a total of 16442 26 multiplications,
- and 422 8 additions.