Bayesian%20networks - PowerPoint PPT Presentation

About This Presentation
Title:

Bayesian%20networks

Description:

Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls ... A burglar can set the alarm off. An earthquake can set the alarm off ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 21
Provided by: alext8
Category:

less

Transcript and Presenter's Notes

Title: Bayesian%20networks


1
Bayesian networks
2
Motivation
  • We saw that the full joint probability can be
    used to answer any question about the domain, but
    can become intractable as the number of variables
    grow.
  • Furthermore specifying probabilities of atomic
    events is rather unnatural and can be very
    difficult unless a large amount of data is
    available from which to gather statistics.
  • Human performance, by contrast, exhibits a
    different complexity ordering probabilistic
    judgments on conditional statements involving a
    small number of propositions are issued swiftly
    and reliably, while judging the likelihood of a
    conjuction of many propositions is done with a
    great degree of difficulty and hesitancy.
  • This suggests that the elementary building blocks
    which make up human knowledge arent the entries
    of joint probability table but, rather, the
    low-order conditional probabilities defined over
    small clusters of propositions.

3
Bayesian networks
  • A simple, graphical notation for conditional
    independence assertions and hence for compact
    specification of full joint distributions.
  • Syntax
  • a set of nodes, one per variable
  • a directed, acyclic graph (a link means
    "directly influences")
  • a conditional distribution for each node given
    its parents
  • P (Xi Parents (Xi))
  • The conditional distribution is represented as a
    conditional probability table (CPT) giving the
    distribution over Xi for each combination of
    parent values.

4
Example
  • Topology of network encodes conditional
    independence assertions
  • Weather is independent of the other variables
  • Toothache and Catch are conditionally independent
    given Cavity, which is indicated by the absence
    of a link between them.

5
Another Example
  • I'm at work, neighbor John calls to say my alarm
    is ringing, but neighbor Mary doesn't call.
    Sometimes it's set off by minor earthquakes. Is
    there a burglar?
  • Variables Burglary, Earthquake, Alarm,
    JohnCalls, MaryCalls
  • Network topology reflects "causal" knowledge
  • A burglar can set the alarm off
  • An earthquake can set the alarm off
  • The alarm can cause Mary to call
  • The alarm can cause John to call

6
Example contd
The topology shows that burglary and earthquakes
directly affect the probability of alarm, but
whether Mary or John call depends only on the
alarm. Thus our assumptions are that they dont
perceive any burglaries directly, and they dont
confer before calling.
7
Compactness of Conditional Probability Tables
(CPTs)
  • A CPT for Boolean Xi with k Boolean parents has
    2k rows for the combinations of parent values
  • Each row requires one number p for Xi true(the
    number for Xi false is just 1-p)
  • If each variable has no more than k parents, the
    complete network requires O(n 2k) numbers
  • I.e., grows linearly with n, vs. O(2n) for the
    full joint distribution
  • For burglary net, 1 1 4 2 2 10 numbers
    (vs. 25-1 31)

8
Semantics
  • The full joint distribution is defined as the
    product of the local conditional distributions
  • P(x1, ,xn)
  • ?i 1 P(xi parents(xi))
  • e.g.,
  • P(j ? m ? a ? ?b ? ?e)
  • P(j a) P(m a) P(a ?b, ?e) P(?b) P(?e)

9
Constructing Bayesian networks
  • 1. Choose an ordering of variables X1, ,Xn
  • 2. For i 1 to n
  • add Xi to the network
  • select parents from X1, ,Xi-1 such that
  • P(Xi Parents(Xi)) P(Xi X1, ... Xi-1)
  • This choice of parents guarantees
  • P(X1, ,Xn) ?i 1 P(Xi X1, , Xi-1)
    (chain rule)
  • ?i 1P(Xi Parents(Xi))(by construction)

10
Example
  • The ordering of variables is very important.
  • E.g. suppose we choose the ordering M, J, A, B, E
  • Adding MaryCalls No parents
  • P(JM) P(J)?
  • Is P(John calling) independent of P(Mary
    calling)?
  • Clearly not, since, on any given day, if Mary
    called, then the probability that John called is
    much better than the background probability that
    he called.
  • So, we add a link from MaryCalls to JohnCalls.

11
Example
  • Suppose we choose the ordering M, J, A, B, E
  • Adding the A (Alarm) node Is
  • P(A J, M) P(A J)?
  • P(A J, M) P(A)?
  • No.
  • Clearly, if both call, its more likely that the
    alarm has gone off that if just one or neither
    call, so we need both MaryCalls and JohnCalls as
    parents.

12
Example
  • Suppose we choose the ordering M, J, A, B, E
  • Adding B (Burglary) node Is
  • P(B A, J, M) P(B A)?
  • P(B A, J, M) P(B)?
  • Yes for the first. No for the second.
  • If we know the alarm state, then the call from
    John or Mary might give us information about the
    phone ringing or Marys music, but not about
    burglary.
  • So, we need just Alarm as parent.

13
Example
  • Suppose we choose the ordering M, J, A, B,
    E
  • Adding E (Earthquake) node Is
  • P(E B, A ,J, M) P(E A)?
  • P(E B, A, J, M) P(E A, B)?
  • No for the first. Yes for the second.
  • If the alarm is on, it is more likely that there
    has been an earthquake.
  • But if we know there has been a burglary, then
    that explains the alarm, and the probability of
    an earthquake would be only slightly above
    normal.
  • Hence we need both Alarm and Burglary as parents.

14
Example contd
  • So, the network is less compact if we go
    non-causal 1 2 4 2 4 13 numbers needed
    instead of 10 if we go in causal direction.
  • Deciding conditional independence is hard in
    noncausal directions
  • Causal models and conditional independence seem
    hardwired for humans!

15
So
  • Bayesian networks provide a natural
    representation for (causally induced) conditional
    independence
  • Topology CPTs compact representation of
    joint distribution
  • Generally easy for domain experts to construct

16
Noisy-OR
  • Even if the maximum number of parents k is small,
    filling the CPT for a node is tedius.
  • Uncertain relationships can often be
    characterized by so-called noisy OR relation,
    which is generalization of logical OR.
  • In propositional logic, we might say that Fever
    is true if Cold, Flu, or Malaria is true.
  • The noisy-OR allows for uncertainty about the
    ability of each parent cause to cause the child
    to be true.
  • The causal relationship may be inhibited, and so
    a patient could have cold, but not exhibit fever.
  • So, suppose we managed to find out the
    probabilities of inhibitions
  • P(?fever cold, ?flu, ?malaria) 0.6
  • P(?fever ?cold, flu, ?malaria) 0.2
  • P(?fever ?cold, ?flu, malaria) 0.1
  • Now, we can easily construct the truth table

17
Observe that by using the noisy-OR we needed to
specify 3 entries only instead of 8. In general,
for k parents, we need to specify k entries
instead of 2k.
18
Exact Inference in Bayesian Networks
  • The basic task for any probabilistic inference
    system is to compute the posterior probability
    for a set of query variables, given some observed
    event that is, some assignment of values to a
    set of evidence variables.
  • Notation
  • X denotes query variable
  • E denotes the set of evidence variables E1,,Em,
    and e is a particular event, i.e. an assignment
    to the variables in E.
  • Y will denote the set of the remaining variables
    (hidden variables).
  • A typical query asks for the posterior
    probability P(Xxe), i.e. P(xe1,,em).
  • E.g. We could ask Whats the probability of a
    burglary if both Mary and John calls, P(burglary
    johhcalls, marycalls)?

19
Inference by enumeration
  • Slightly intelligent way to sum out variables
    from the joint without actually constructing its
    explicit representation

20
Numerically
Complete it for exercise
  • P(b j,m) ? P(b) ?e P(e)?aP(ab,e)P(ja)P(ma)
    ? 0.00059
  • P(?b j,m) ? P(?b) ?eP(e)?aP(a?b,e)P(ja)P(ma
    ) ? 0.0015
  • P(B j,m) ? lt0.00059, 0.0015gt lt0.284, 0.716gt.
Write a Comment
User Comments (0)
About PowerShow.com