Evidence and Message Passing - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Evidence and Message Passing

Description:

Evidence can further be divided into two parts: Prior evidence, from the ... Once we have amassed all the evidence for a variable, we can convert it into a ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 34
Provided by: Dun683
Category:

less

Transcript and Presenter's Notes

Title: Evidence and Message Passing


1
Lecture 3
  • Evidence and Message Passing

2
Evidence
  • It is useful to introduce the notion of evidence.
  • P(ESD) ? P(E) P(SE) P(DE)?
  • ?(E) P(E) P(SE) P(DE)
  • is the evidence for E and has a value for each
    state of E

3
Evidence
  • Evidence can further be divided into two parts
  • Prior evidence, from the parents of a node
  • ?(E) P(E)?
  • Likelihood evidence from the children
  • ?(E) P(SE) P(DE)

4
Conventions for writing evidence
  • We write it as a vector ?(E) ?(e1), ?(e2),
    ?(e3)
  • or as scalar values for the individual states
  • ????????????(e1), ?(e2), ?(e3)
  • and at any node we combine the evidence by
    multiplication
  • ?(ej) ?(ej) ?(ej)?

5
Why use evidence
  • Evidence is simply un-normalised probability.
  • Once we have amassed all the evidence for a
    variable, we can convert it into a posterior
    probability.
  • Using evidence gives us a mathematical
    simplification in developing equations for
    complex networks.

6
Calculating ? Evidence
  • Given that S and D have been instantiated, say
    Ss4 and Dd2 We can look up the ? evidence for E
    in the link matrix
  • P(ESD) ? P(E) P(SE) P(DE)?
  • ?(e1) P(s4e1) P(d2e1)?
  • ?(e2) P(s4e2) P(d2e2)?
  • ?(e3) P(s4e3) P(d2e3)?

7
Calculating Evidence
  • Given evidence for E we can now calculate the
    evidence for C by using a weighted average of the
    probabilities
  • P(CEF) ? P(C) P(EC) P(FC)?
  • ?(c1) ?(e1) P(e1c1)
  • ?(e2) P(e2c1)
  • ????????????????(e3) P(e3c1) P(f2c1)?

8
The Conditioning Equation
  • Generalising we calculate the evidence using the
    conditioning equation

9
Conditioning at the leaf nodes
  • For node E, with the leaf nodes instantiated the
    conditioning equation becomes

10
Instantiation and Evidence
  • In the simple case, for leaf nodes we have a
    known state for that node. Recall that we defined
    the eye separation measure as having seven
    states

11
Instantiation and Evidence 2
  • So if we take a measurement, which for example is
    0.61 we instantiate the corresponding state (S4).
    This is equivalent to setting the evidence as
    follows.

12
Virtual evidence
  • Sometimes, when we make a measurement it is
    possible to express uncertainty about it by
    distributing the evidence values. For example,
    instead of setting ?(s4) 1 we could use

13
Virtual Evidence requires conditioning
  • If we use virtual evidence then we must use the
    conditioning equation

14
Problem Break
  • Given the following virtual evidence, write down
    an expression for the ? evidence for state e1 of
    E (which has two children S and D)

15
Solution
  • Putting in the virtual evidence gives
  • ?(e1) (P(s1e1)0.2P(s2e1))(0.5P(d3e1)P(d4e
    1))?
  • (I can't be bothered to multiply this out)?

16
No evidence
  • Sometimes, we may not have data for a node.
    Propagation can still be carried out, but for all
    states the evidence for the node is the same, ie

17
No evidence and the conditioning equation
18
Upward Propogation
  • In the last lecture we discussed a very simple
    network - namely the Bayesian Classifier
  • In Bayesian classifiers the top of the tree is
    usually a hypothesis and the evidence all
    propagated upwards.
  • Tree structured networks can be used in other ways

19
Consider again the cat example
Previously we found the evidence of there being a
cat in the picture. Suppose now we want to ask
whether there is a pair of eyes.
20
Case 1
  • Suppose we know there is a cat in the picture
    Node C is in state C1 (cattrue) and thus its
    other children (just F in this case) cannot
    affect its value.

21
Looking at the link matrix
  • The link matrix reduces to a vector, since the
    state of C is known. This is, in effect a prior
    probability of E
  • In vector notation P(C) (1,0)?
  • In effect P(E) P(EC) P(C) is one column of the
    link matrix

22
Simplified Network
  • Since we know the state of C, P(EC) effectively
    gives us a prior probability of E P(E)?

23
Case 2
  • More interestingly we might not know for certain
    that there was a cat in the picture.
  • Clearly the geometric evidence from below still
    stands, but instead of having a prior probability
    of a pair of eyes we need to determine the
    evidence from the cat node that there is a pair
    of eyes.
  • This is the most general case for inference in
    trees

24
A ? message from C to E
  • Suppose for a given picture we calculate
  • the ? evidence for C from F as
  • ?(C) ?(c1), ?(c2) 0.3,0.2,
  • and the prior probability of C
  • P(C) 0.6, 0.4
  • the evidence for C (excluding that from E) is
  • ?(C) 0.18, 0.08
  • then the ? evidence at E is calculated using

25
Normalisation of evidence in a tree
  • Although not necessary, the evidence can always
    be normalised to a posterior probability
    distribution indicated P(C). We could also
    calculate ?(e) as follows
  • P(cj) ? P(cj)??F(cj)?
  • Where ? is a normalising constant making ? P'(cj)
    1
  • The ? message sent to E is simply
  • ?(ei) ? P(eicj) P(cj)?

26
Magnitude and evidence
  • If we don't normalise the evidence at node C then
    we will send a different ? message to E than if
    we do normalise.
  • In the example it is larger if we normalise.
  • However, the magnitude of the evidence is not
    relevant, it is the relative magnitudes of the
    evidence for the states of a node that carries
    the information

27
General form of the ? evidence
28
General form of the ? evidence
  • However ? evidence can also be computed with a
    matrix multiplication
  • ????????(E) P(EC) ?E(C)?
  • where
  • ?(E) is a vector expressing the ? evidence for E
  • ?E(C) is a vector expressing all the evidence for
    C except that from E. (In lecture 2 we used
    P-E(C))?

29
Generality of Propagation in trees
  • Probability propagation is completely flexible.
  • We can instantiate any subset of the nodes and
    calculate the probability distribution over the
    states of the other nodes.

30
Priors and Likelihood in Networks
  • Note that now we can associate the notion of
    prior and likelihood with the evidence being
    propagated
  • ?(Bi) The likelihood evidence for Bi
  • ?(Bi) The prior evidence for Bi

31
The network structure is prior knowledge
  • This notion of prior and likelihood is slightly
    different from our previous usage and reflects
    the fact that the network represents our prior
    knowledge for inference.
  • The ? message to a root node is the same as the
    prior probability of that root node.

32
Incorporating more nodes
  • One of the best features of Bayesian Networks is
    that we can incorporate new nodes as the data
    becomes available.
  • Recall that we had information from the computer
    vision process as to how likely the extracted
    circles were.
  • This could simply be treated as another node

33
Adding a node doesnt change a network
Write a Comment
User Comments (0)
About PowerShow.com