Title: Multiple Parents
1Lecture 4
2Multiple Parents
- Up until now our networks have been trees, but in
general they need not be. In particular, we need
to cope with the possibility of multiple parents. - Multiple parents can be thought of as
representing different possible causes of an
outcome. - For example, the eyes in a picture could be
caused by other image features
3Other causes for the E node
- A high probability for a state of the E (eyes)
node could be caused by - Pictures showing other animals (owls dogs etc)?
- Pictures that have features sharing the same
geometric model (bicycles)?
4How to tell an owl from a cat
5Owls or Cats!
- Two causes for the E variable are represented by
multiple parents
6Conditional Probabilities
- Unfortunately, with multiple parents our
conditional probabilities become more complex.
For the eye node we have to consider - P(E C W)?
- This must now be associated with the node, not
the arc, since it is distributed over the
possible combinations of states of the parents
7Link Matrix for multiple parents
If we write the states of W as w1 and w2, with w1
meaning owl present, (and similarly C), the link
matrix becomes
8? and ? messages with multiple parents
- If we wish to calculate the probability of eyes
given all the evidence, we first calculate the
evidence for C, taking into account its ?
evidence (prior probability) and ? evidence from F
9? and ? messages with multiple parents
- Similarly we calculate the evidence over the
states of W. At present we will assume only prior
evidence.
10Finding the joint distribution over the parents
- Next we calculate a joint distribution over C
W. - P'(C W) P'(C) P'(W)
- so for individual states we have that
- P'(ciwj) P'(ci) P'(wj)?
11Independence of C and W
- We have treated C and W as independent events and
this may look peculiar, since given that there is
a cat in the picture (causing the eyes) we know
it cannot be an owl. - However, if we think of choosing images at random
from a data base there is clearly no dependency
between cats and owls. We just expect
statistically that P(c1w1) 0
12Evidence from where
- In calculating the ? message to E the evidence
that we use is accumulated from everywhere else
in the network. We do not use the ? evidence from
E. - If we did so we would include the ? evidence more
than once. To do this would bias any inference on
E in favour (in this case) of the nodes S and D.
13It would also set up a loop in the calculation
14Calculating the ? evidence
- Finally we calculate the ? evidence by taking the
product of the link matrix with the posterior
joint distribution.
15Distinction between P' and ?E
- To be quite clear about where the evidence comes
from we will in future write - P'(C) the probability of P given all the evidence
- ?E(C) the evidence for C used to compute the ?
evidence for E (the ? message from C to E) which
we can write - ?E(C) P(C)/?E(C)?
16Being precise about the ? evidence
- Using is notation we have that
17Posterior Probability of E
- As before, we find the posterior probability of E
given all the evidence by multiplying the ? and ?
evidence together and normalising. - P'(ei) ???(ei) ?(ei)?
18Problem Break
- Given prior probabilities P(C) (0.5,0.5), and
P(W) (0.25, 0.75) and the ? evidence for C from
F is ?F(C) (0.33,0.5)? - and the link matrix from E is
- Calculate the ? evidence sent to E from its
parents.
19Solution (tricky)?
- Evidence from W from everywhere but E is
- ?(W) (0.25,0.75)?
- Evidence for C from everywhere but E is
- ?(C) (0.50.33, 0.50.5) (1/6, 1/4)?
- using our previous notation we write
- ??(C) (1/6, 1/4)?
- Joint evidence
- ?(CW) ??(C)??(W) (1/24, 1/8,1/16, 3/16)?
20Solution (the easy bit)?
21Calculating a Distribution over W or C
- What if we want to send ? evidence to W or C?
- Before we introduced the W node this was done as
follows - ??(c1) P(e1c1) ?(e1) P(e2c1) ?(e2)
P(e3c1) ?(e3)? - or more generally
- ??(c1) ?jP(ejc1) ?(ej)?
- Note the subscript ?E distinguishing the ?
message sent from E to C from the total ?
evidence at E
22Reducing the matrix
- To send a ? message from E to C we need to reduce
the joint probability matrix to a single
conditional probability matrix. - We can think of this as follows
- P(e1c1) P(e1c1w1) ??(w1) P(e1c1w2)
??(w2)? - In other words we use a weighted average of the
joint probabilities.
23Reducing the Matrix
- Two points are important to note here
- 1. If we wish to estimate P(EC) from P(ECW) we
use only the evidence for W that does not come
from from E. Clearly, if we used the ? evidence
from E it would appear twice in the computation
of P'(C). - 2. If we want a proper probability distribution
we need to normalise the ?? evidence.
24Practical Calculation
- In practice we don't bother going to the full
length of estimating P(EC) we calculate the ?
message as follows - ??(c1)
- ?j P(ejc1w1) ??(w1) P(ejc1w2)
??(w2)?(ej)? - or more neatly
- ??(c1) ?i ?E(wi) ?j P(ejc1wi) ?(ej)?