Title: Evidence and Message Passing
1Lecture 3
- Evidence and Message Passing
2Evidence
- It is useful to introduce the notion of evidence.
- P(ESD) ? P(E) P(SE) P(DE)?
- ?(E) P(E) P(SE) P(DE)
- is the evidence for E and has a value for each
state of E
3Evidence
- Evidence can further be divided into two parts
- Prior evidence, from the parents of a node
- ?(E) P(E)?
- Likelihood evidence from the children
- ?(E) P(SE) P(DE)
4Conventions for writing evidence
- We write it as a vector ?(E) ?(e1), ?(e2),
?(e3) - or as scalar values for the individual states
- ????????????(e1), ?(e2), ?(e3)
- and at any node we combine the evidence by
multiplication - ?(ej) ?(ej) ?(ej)?
5Why use evidence
- Evidence is simply un-normalised probability.
- Once we have amassed all the evidence for a
variable, we can convert it into a posterior
probability. - Using evidence gives us a mathematical
simplification in developing equations for
complex networks.
6Calculating ? Evidence
- Given that S and D have been instantiated, say
Ss4 and Dd2 We can look up the ? evidence for E
in the link matrix - P(ESD) ? P(E) P(SE) P(DE)?
- ?(e1) P(s4e1) P(d2e1)?
- ?(e2) P(s4e2) P(d2e2)?
- ?(e3) P(s4e3) P(d2e3)?
7Calculating Evidence
- Given evidence for E we can now calculate the
evidence for C by using a weighted average of the
probabilities - P(CEF) ? P(C) P(EC) P(FC)?
- ?(c1) ?(e1) P(e1c1)
- ?(e2) P(e2c1)
- ????????????????(e3) P(e3c1) P(f2c1)?
8The Conditioning Equation
- Generalising we calculate the evidence using the
conditioning equation
9Conditioning at the leaf nodes
- For node E, with the leaf nodes instantiated the
conditioning equation becomes
10Instantiation and Evidence
- In the simple case, for leaf nodes we have a
known state for that node. Recall that we defined
the eye separation measure as having seven
states -
11Instantiation and Evidence 2
- So if we take a measurement, which for example is
0.61 we instantiate the corresponding state (S4).
This is equivalent to setting the evidence as
follows.
12Virtual evidence
- Sometimes, when we make a measurement it is
possible to express uncertainty about it by
distributing the evidence values. For example,
instead of setting ?(s4) 1 we could use
13Virtual Evidence requires conditioning
- If we use virtual evidence then we must use the
conditioning equation
14Problem Break
- Given the following virtual evidence, write down
an expression for the ? evidence for state e1 of
E (which has two children S and D)
15Solution
- Putting in the virtual evidence gives
- ?(e1) (P(s1e1)0.2P(s2e1))(0.5P(d3e1)P(d4e
1))? - (I can't be bothered to multiply this out)?
16No evidence
- Sometimes, we may not have data for a node.
Propagation can still be carried out, but for all
states the evidence for the node is the same, ie
17No evidence and the conditioning equation
18Upward Propogation
- In the last lecture we discussed a very simple
network - namely the Bayesian Classifier - In Bayesian classifiers the top of the tree is
usually a hypothesis and the evidence all
propagated upwards. - Tree structured networks can be used in other ways
19Consider again the cat example
Previously we found the evidence of there being a
cat in the picture. Suppose now we want to ask
whether there is a pair of eyes.
20Case 1
- Suppose we know there is a cat in the picture
Node C is in state C1 (cattrue) and thus its
other children (just F in this case) cannot
affect its value.
21Looking at the link matrix
- The link matrix reduces to a vector, since the
state of C is known. This is, in effect a prior
probability of E - In vector notation P(C) (1,0)?
- In effect P(E) P(EC) P(C) is one column of the
link matrix
22Simplified Network
- Since we know the state of C, P(EC) effectively
gives us a prior probability of E P(E)?
23Case 2
- More interestingly we might not know for certain
that there was a cat in the picture. - Clearly the geometric evidence from below still
stands, but instead of having a prior probability
of a pair of eyes we need to determine the
evidence from the cat node that there is a pair
of eyes. - This is the most general case for inference in
trees
24A ? message from C to E
- Suppose for a given picture we calculate
- the ? evidence for C from F as
- ?(C) ?(c1), ?(c2) 0.3,0.2,
- and the prior probability of C
- P(C) 0.6, 0.4
- the evidence for C (excluding that from E) is
- ?(C) 0.18, 0.08
- then the ? evidence at E is calculated using
25Normalisation of evidence in a tree
- Although not necessary, the evidence can always
be normalised to a posterior probability
distribution indicated P(C). We could also
calculate ?(e) as follows - P(cj) ? P(cj)??F(cj)?
- Where ? is a normalising constant making ? P'(cj)
1 - The ? message sent to E is simply
- ?(ei) ? P(eicj) P(cj)?
26Magnitude and evidence
- If we don't normalise the evidence at node C then
we will send a different ? message to E than if
we do normalise. - In the example it is larger if we normalise.
- However, the magnitude of the evidence is not
relevant, it is the relative magnitudes of the
evidence for the states of a node that carries
the information
27General form of the ? evidence
28General form of the ? evidence
- However ? evidence can also be computed with a
matrix multiplication - ????????(E) P(EC) ?E(C)?
- where
- ?(E) is a vector expressing the ? evidence for E
- ?E(C) is a vector expressing all the evidence for
C except that from E. (In lecture 2 we used
P-E(C))?
29Generality of Propagation in trees
- Probability propagation is completely flexible.
- We can instantiate any subset of the nodes and
calculate the probability distribution over the
states of the other nodes.
30Priors and Likelihood in Networks
- Note that now we can associate the notion of
prior and likelihood with the evidence being
propagated - ?(Bi) The likelihood evidence for Bi
- ?(Bi) The prior evidence for Bi
31The network structure is prior knowledge
- This notion of prior and likelihood is slightly
different from our previous usage and reflects
the fact that the network represents our prior
knowledge for inference. - The ? message to a root node is the same as the
prior probability of that root node.
32Incorporating more nodes
- One of the best features of Bayesian Networks is
that we can incorporate new nodes as the data
becomes available. - Recall that we had information from the computer
vision process as to how likely the extracted
circles were. - This could simply be treated as another node
33Adding a node doesnt change a network