Bayesian Networks - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Bayesian Networks

Description:

Bayesian Networks Introduction A problem domain is modeled by a list of variables X1, , Xn Knowledge about the problem domain is represented by a joint probability ... – PowerPoint PPT presentation

Number of Views:221
Avg rating:3.0/5.0
Slides: 41
Provided by: webcourse
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Networks


1
Bayesian Networks
2
Introduction
  • A problem domain is modeled by a list of
    variables X1, , Xn
  • Knowledge about the problem domain is represented
    by a joint probability P(X1, , Xn)

3
Introduction
  • Example Alarm
  • The story In LA burglary and earthquake are not
    uncommon. They both can cause alarm. In case of
    alarm, two neighbors John and Mary may call
  • Problem Estimate the probability of a burglary
    based who has or has not called
  • Variables Burglary (B), Earthquake (E), Alarm
    (A), JohnCalls (J), MaryCalls (M)
  • Knowledge required to solve the problem
    P(B, E, A, J, M)

4
(No Transcript)
5
Introduction
  • What is the probability of burglary given that
    Mary called, P(B y M y)?
  • Compute marginal probabilityP(B , M) ?E, A, J
    P(B, E, A, J, M)
  • Use the definition of conditional probability
  • Answer

6
Introduction
  • Difficulty Complexity in model construction and
    inference
  • In Alarm example
  • 31 numbers needed
  • Computing P(B y M y) takes 29 additions
  • In general
  • P(X1, Xn) needs at least 2n 1numbers to
    specify the joint probability
  • Exponential storage and inference

7
Conditional Independence
  • Overcome the problem of exponential size by
    exploiting conditional independence
  • The chain rule of probabilities

8
Conditional Independence
  • Conditional independence in the problem
    domainDomain usually allows to identify a
    subset pa(Xi) µ X1, , Xi 1 such that given
    pa(Xi), Xi is independent of all variables in
    X1, , Xi - 1 \ paXi, i.e. P(Xi X1,
    , Xi 1) P(Xi pa(Xi))Then

9
Conditional Independence
  • As a result, the joint probability P(X1, , Xn)
    can be represented as the conditional
    probabilities P(Xi pa(Xi))
  • Example continuedP(B, E, A, J, M)
    P(B)P(EB)P(AB,E)P(JA,B,E)P(MB,E,A,J)
    P(B)P(E)P(AB,E)P(JA)P(MA)
  • pa(B) , pa(E) , pa(A) B, E, paJ
    A, paM A
  • Conditional probability table specifies P(B),
    P(E), P(A B, E), P(M A), P(J A)

10
Conditional Independence
  • As a result
  • Model size reduced
  • Model construction easier
  • Inference easier

11
Graphical Representation
  • To graphically represent the conditional
    independence relationships, construct a directed
    graph by drawing an arc from Xj to Xi iff
  • Xj pa(Xi)
  • pa(B) , pa(E) , pa(A) B, E, paJ
    A, paM A

12
Graphical Representation
  • We also attach the conditional probability table
    P(Xi pa(Xi)) to node Xi
  • The result Bayesian network

P(B)
P(E)
P(A B, E)
P(J A)
P(M A)
13
Formal Definition
  • A Bayesian network is
  • An acyclic directed graph (DAG), where
  • Each node represents a random variable
  • And is associated with the conditional
    probability of the node given its parents

14
Intuition
  • A BN can be understood as a DAG where arcs
    represent direct probability dependence
  • Absence of arc indicates probability
    independence a variable is conditionally
    independent of all its nondescendants given its
    parents
  • From the graph B ? E, J ? B A, J ? E A

15
Construction
  • Procedure for constructing BN
  • Choose a set of variables describing the
    application domain
  • Choose an ordering of variables
  • Start with empty network and add variables to the
    network one by one according to the ordering

16
Construction
  • To add i-th variable Xi
  • Determine pa(Xi) of variables already in the
    network (X1, , Xi 1) such thatP(Xi X1, ,
    Xi 1) P(Xi pa(Xi))(domain knowledge is
    needed there)
  • Draw an arc from each variable in pa(Xi) to Xi

17
Example
  • Order B, E, A, J, M
  • pa(B)pa(E), pa(A)B,E, pa(J)A, paMA
  • Order M, J, A, B, E
  • paM, paJM, paAM,J, paBA,
    paEA,B
  • Order M, J, E, B, A
  • Fully connected graph

18
Construction
  • Which variable order?
  • Naturalness of probability assessmentM, J, E, B,
    A is bad because of P(B J, M, E) is not
    natural
  • Minimize number of arcsM, J, E, B, A is bad (too
    many arcs), the first is good
  • Use casual relationship cause come before their
    effects M, J, E, B, A is bad because M and J are
    effects of A but come before A

VS
19
Casual Bayesian Networks
  • A causal Bayesian network, or simply causal
    networks, is a Bayesian network whose arcs are
    interpreted as indicating cause-effect
    relationships
  • Build a causal network
  • Choose a set of variables that describes the
    domain
  • Draw an arc to a variable from each of its direct
    causes (Domain knowledge required)

20
Example
Smoking
Visit Africa
Lung Cancer
Bronchitis
Tuberculosis
Tuberculosis orLung Cancer
X-Ray
Dyspnea
21
Casual BN
  • Causality is not a well understood concept.
  • No widely accepted denition.
  • No consensus on whether it is a property of the
    world or a concept in our minds
  • Sometimes causal relations are obvious
  • Alarm causes people to leave building.
  • Lung Cancer causes mass on chest X-ray.
  • At other times, they are not that clear.
  • Doctors believe smoking causes lung cancer but
    the tobacco industry has a different story

Surgeon General (1964)
S
C
Tobacco Industry

C
S
22
Inference
  • Posterior queries to BN
  • We have observed the values of some variables
  • What are the posterior probability distributions
    of other variables?
  • Example Both John and Mary reported alarm
  • What is the probability of burglary P(BJy,My)?

23
Inference
  • General form of query P(Q E e) ?
  • Q is a list of query variables
  • E is a list of evidence variables
  • e denotes observed variables

24
Inference Types
  • Diagnostic inference P(B M y)
  • Predictive/Casual Inference P(M B y)
  • Intercasual inference (between causes of a common
    effect) P(B A y, E y)
  • Mixed inference (combining two or more above) P(A
    J y, E y) (diagnostic and casual)
  • All the types are handled in the same way

25
Naïve Inference
  • Naïve algorithm for solving P(QE e) in BN
  • Get probability distribution P(X) over all
    variables X by multiplying conditional
    probabilities
  • BN structure is not used, for many variables the
    algorithm is not practical
  • Generally exact inference is NP-hard

26
Basic Example
  • Conditional Probabilities P(A),P(BA),P(CB),P(D
    C)
  • Query P(D) ?
  • P(D) ?A, B, C P(A, B, C, D) ?A, B, C
    P(A)P(BA)P(CB)P(DC) (1) ?CP(DC)
    ?BP(CB) ?AP(A)P(BA) (2)
  • Complexity
  • Use (1) 23 22 2
  • Use (2) 2 2 2

27
Inference
  • Though generally exact inference is NP-hard, in
    some cases the problem is tractable, e.g. if BN
    has a (poly)-tree structure efficient algorithm
    exists(a poly tree is a directed acyclic graph
    in which no two nodes have more than one path
    between them)
  • Another practical approach Stochastic Simulation

28
A general sampling algorithm
  • For i 1 to n
  • Find parents of Xi (Xp(i, 1), , Xp(i, n) )
  • Recall the values that those parents where
    randomly given
  • Look up the table for P(Xi Xp(i, 1) xp(i, 1),
    , Xp(i, n) xp(i, n) )
  • Randomly set xi according to this probability

29
Stochastic Simulation
  • We want to know P(Q q E e)
  • Do a lot of random samplings and count
  • Nc Num. samples in which E e
  • Ns Num. samples in which Q q and E e
  • N number of random samples
  • If N is big enough
  • Nc / N is a good estimate of P(E e)
  • Ns / N is a good estimate of P(Q q, E e)
  • Ns / Nc is then a good estimate of P(Q q E
    e)

30
Parameter Learning
X2
X1
  • Example
  • given a BN structure
  • A dataset
  • Estimate conditional probabilities P(Xi pa(Xi))

X4
X1 X2 X3 X4 X5
0 0 1 1 0
1 0 0 1 0
0 ? 0 0 ?

X3
X5
? means missing values
31
Parameter Learning
  • We consider cases with full data
  • Use maximum likelihood (ML) algorithm and
    bayesian estimation
  • Mode of learning
  • Sequential learning
  • Batch learning
  • Bayesian estimation is suitable both for
    sequential and batch learning
  • ML is suitable only for batch learning

32
ML in BN with Complete Data
  • n variables X1, , Xn
  • Number of states of Xi ri ?Xi
  • Number of configurations of parents of Xi qi
    ?pa(Xi)
  • Parameters to be estimated ?ijkP(Xi j
    pa(Xi) k), i 1, , n j 1, , ri k 1,
    , qi

33
ML in BN with Complete Data
  • Example consider a BN. Assume all variables are
    binary taking values 1, 2.
  • ?ijkP(Xi j pa(Xi) k)

Number of parents configuration
34
ML in BN with Complete Data
  • A complete case Dl is a vector of values, one
    for each variable (all data is known).Example
    Dl (X1 1, X2 2, X3 2)
  • Given A set of complete cases D D1, , Dm
  • Find the ML estimate of the parameters ?

35
ML in BN with Complete Data
  • Loglikelihoodl(? D) log L(? D) log P(D
    ?) log ?l P(Dl ?) ?l log P(Dl
    ?)
  • The term log P(Dl ?)
  • D4 (1, 2, 2)
  • log P(D4 ?) log P(X1 1, X2 2, X3 2 ?)
  • log P(X11 ?) P(X22 ?) P(X32 X11,
    X22, ?) log ?111 log ?221 log ?322
  • Recall ??111,?121,?211,?221,?311,?312,?313,?31
    4,?321,?322,?323, ?324

36
ML in BN with Complete Data
  • Define the characteristic function of Dl
  • When l 4, D4 1, 2, 2?(1,1,1D4)
    ?(2,2,1D4) ?(3,2,2D4)1,?(i, j, k D4) 0
    for all other i, j, k
  • So, log P(D4 ?) ?ijk ?(i, j, k D4) log ?ijk
  • In general, log P(Dl ?) ?ijk ?(i, j, k Dl)
    log ?ijk

37
ML in BN with Complete Data
  • Define mijk ?l ?(i, j, k Dl)the number of
    data cases when Xi j and pa(Xi) k
  • Then l(? D) ?l log P(Dl ?) ?l ?i,
    j, k ?(i, j, k Dl) log ?ijk ?i, j, k
    ?l ?(i, j, k Dl) log ?ijk ?i, j, k
    mijk log ?ijk ?i,k ?j mijk log ?ijk

38
ML in BN with Complete Data
  • We want to findargmax l(? D) argmax ?i,k ?j
    mijk log ?ijk ?
    ?ijk
  • Assume that ?ijk P(Xi j pa(Xi) k) is not
    related to ?ijk provided that i ? i OR k ? k
  • Consequently we can maximize separately each term
    in the summation ?i, k argmax
    ?j mijk log ?ijk
    ?ijk

39
ML in BN with Complete Data
  • As a result we have
  • In words, the ML estimate for ?ijk P( Xi j
    pa(Xi) k) isnumber of cases where Xij and
    pa(Xi) k number of cases where pa(Xi)
    k

40
More to do with BN
  • Learning parameters with some values missing
  • Learning the structure of BN from training data
  • Many more

41
References
  • Pearl, Judea, Probabilistic Reasoning in
    Intelligent Systems Networks of Plausible
    Inference, Morgan Kaufmann, San Mateo, CA, 1988.
  • Heckerman, David, "A Tutorial on Learning with
    Bayesian Networks," Technical Report
    MSR-TR-95-06, Microsoft Research, 1995.
  • www.ai.mit.edu/murphyk/Software
  • http//www.cs.ubc.ca/murphyk/Bayes/bnintro.html
  • R. G. Cowell, A. P. Dawid, S. L. Lauritzen and D.
    J. Spiegelhalter. "Probabilistic Networks and
    Expert Systems". Springer-Verlag. 1999.
  • http//www.ets.org/research/conferences/almond2004
    .htmlsoftware
Write a Comment
User Comments (0)
About PowerShow.com