Belief Networks - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Belief Networks

Description:

What kind of application problems can be solved by Belief Networks? ... Belief Network offers a simple and compact way of representing the joint ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 48
Provided by: qian9
Category:
Tags: belief | networks

less

Transcript and Presenter's Notes

Title: Belief Networks


1
Belief Networks
  • Qian Liu
  • CSE 391 2005 spring
  • University of Pennsylvania

2
Outline
  • Motivation
  • BNDAGCPTs
  • Inference
  • Learning
  • Application of BNs

3
Outline
  • Motivation
  • BNDAGCPTs
  • Inference
  • Learning
  • Application of BNs

4
From application
  • What kind of application problems can be solved
    by Belief Networks?
  • Classifier classify email, webpage
  • Medical diagnosis
  • Trouble shooting system in MS Windows
  • Bouncy paperclip guy in MS Word
  • Speech recognition
  • Gene finding
  • aka Bayesian Network, Causal Network, Directed
    Graphical Model

5
From representation
  • Problem How to describe joint distribution of N
    random variables P(X1,X2,,XN)?
  • If we are to write the joint distribution in a
    table, how many entries are there in all in the
    table?

6
From representation
  • Problem How to describe joint distribution of N
    random variables P(X1,X2,,XN)?
  • If we are to write the joint distribution in a
    table, how many entries are there in all in the
    table?
  • entries 2 N

7
From representation
  • Problem How to describe joint distribution of N
    random variables P(X1,X2,,XN)?
  • If we are to write the joint distribution in a
    table, how many entries are there in all in the
    table?
  • entries 2 N
  • Too many!

8
From representation
  • Problem How to describe joint distribution of N
    random variables P(X1,X2,,XN)?
  • If we are to write the joint distribution in a
    table, how many entries are there in all in the
    table?
  • entries 2 N
  • Too many! Is there a more clever way of
    representing the joint distribution?

9
From representation
  • Problem How to describe joint distribution of N
    random variables P(X1,X2,,XN)?
  • If we are to write the joint distribution in a
    table, how many entries are there in all in the
    table?
  • entries 2 N
  • Too many! Is there a more clever way of
    representing the joint distribution?
  • YES! ---- Belief Network

10
Outline
  • Motivation
  • BNDAGCPTs
  • Inference
  • Learning
  • Application of BNs

11
What is BN BNDAGCPTs
  • Belief Network consists of a Directed Acyclic
    Graph, and Conditional Probability Tables.
  • DAG
  • Nodes random variables
  • Directed edges represent causal relations
  • CPTs
  • Each random variable has a CPT
  • CPT specifies
  • BN specifies a joint distribution on the
    variables

12
Alarm example
  • Random variables Burglary, Earthquake, Alarm,
    JohnCalls, MaryCalls
  • Causal relationship among the variables
  • A burglary can set the alarm off
  • An earthquake can set the alarm off
  • The alarm can cause Mary to call
  • The alarm can cause John to call
  • Causal relations reflect domain knowledge.

13
Alarm example (cont.)
14
Joint distribution
  • BN specifies a joint distribution on the
    variables
  • Alarm example
  • Shorthand notation

15
Another example
  • CPTs
  • Joint probability
  • Convention to write joint probability
  • write each variable in a cause ? effects order
  • Belief Networks are generative models, which can
    generate data in a cause ? effects order.

16
Compactness
  • Belief Network offers a simple and compact way of
    representing the joint distribution of many
    random variables.
  • The number of entries in a full joint
    distribution table
  • But, for BN, suppose a variable has at most k
    parents, then the total number of entries in all
    the CPTs
  • In real practice, k ltlt N. We save a lot of space!

17
Outline
  • Motivation
  • BNDAGCPTs
  • Inference
  • Learning
  • Application of BNs

18
Inference
  • The task of inference is to compute the posterior
    probability of a set of query variables given a
    set of evidence variables, denoted as
    , given the Belief Network is known.
  • Since the joint distribution is known,
    can be computed naively and inefficiently by

product rule marginalization
19
Conditional independence
  • A,B are (unconditionally) independent
  • P(A,B) P(A)P(B) (1) I.
  • P(AB) P(A) (2) I.
  • P(BA) P(B) (3) I.
  • (1),(2),(3) are equivalent
  • A,B are conditionally independent, given evidence
    E
  • P(A,BE) P(AE)P(BE) (1) C.I.
  • P(AB,E) P(AE) (2) C.I.
  • P(BA,E) P(BE) (3) C.I.
  • (1),(2),(3) are equivalent

20
Conditional independence
  • A,B are (unconditionally) independent
  • A,B are conditionally independent, given evidence
    E
  • P(A,B) P(A)P(B) (1) I.
  • P(A,BE) P(AE)P(BE) (1) C.I.
  • P(AB) P(A) (2) I.
  • P(AB,E) P(AE) (2) C.I.
  • P(BA) P(B) (3) I.
  • P(BA,E) P(BE) (3) C.I.
  • (1),(2),(3) are equivalent
  • (1),(2),(3) are equivalent

21
Conditional independence
  • Chain rule
  • P(B,E,A,J,M)
  • P(B)
  • P(EB)
  • P(AB,E)
  • P(JB,E,A)
  • P(MB,E,A,J)
  • BN
  • P(B,E,A,J,M)
  • P(B)
  • P(E) ---- B is I. of E.
  • P(AB,E)
  • P(JA) ----J is C.I. of B,E, given A
  • P(MA) ----M is C.I. of B,E,J, given A
  • Belief Networks explore conditional independence
    among
  • variables so as to represent the joint
    distribution compactly.

22
Conditional independence
  • Belief Network encodes conditional independence
    in the graph structure.
  • (1) A variable is C.I. of its non-descendants,
    given all its parents.
  • (2) A variable is C.I. of all the other
    variables, given all its parents, children and
    childrens parents ---- that is , given its
    Markov blanket.

23
Conditional independence
  • Alarm example
  • B is I. with E.
  • or, B is C.I. with E, given nothing. (1)
  • J is C.I. with B,E,M, given A. (1)
  • M is C.I. with B,E,J, given A. (1)
  • Another example
  • U is C.I. with X, given Y,V,Z (2)

24
Examples of inference
  • Alarm example ---- We know
  • CPTs P(B), P(E), P(AB,E), P(JA), P(MA)
  • Conditional independence
  • P(A) ?

Marginalization
Chain rule
B, E are Ind.
25
Examples of inference
  • Alarm example ---- We know
  • CPTs P(B), P(E), P(AB,E), P(JA), P(MA)
  • Conditional independence
  • P(J,M) ?

marginalization
chain rule
J is C.I. of M, given A
26
Outline
  • Motivation
  • BNDAGCPTs
  • Inference
  • Learning
  • Applications of BNs

27
Learning
  • The task of learning is to learn the Belief
    Network which can best describe the data we
    observe.
  • Lets assume the DAG is known, then the learning
    problem is simplified to learning the best CPTs
    from data, according to some goodness
    criterion.
  • Note There are many kinds of learning learn
    different things, different goodness criteria
    We are only going to discuss the easiest kind of
    learning.

28
Training data
  • All binary variables
  • Observe T examples
  • Assume examples are identically, independently
    distributed (i.i.d.) from distribution
  • Task how to learn the best CPTs from the T
    examples?

29
Think of learning as
  • One person used a DAG and a set of CPTs to
    generate the training data. You are given the DAG
    and the training data, and you are asked to guess
    what CPTs are most likely to be used by this guy
    to generate the training data.

30
Given CPTs
  • Probability of the t-th example
  • --- e.g. for alarm example
  • If the t-th example is
  • Then

31
Given CPTs
  • Probability of the t-th example
  • Probability of all the data (i.i.d.)

32
Given CPTs
  • Probability of the t-th example
  • Probability of all the data (i.i.d.)
  • Log-likelihood of data

33
Given CPTs
  • Probability of the t-th example
  • Probability of all the data (i.i.d.)
  • Log-likelihood of data
  • Log-likelihood of data is a function of CPTs.
  • Which CPTs are the best?

34
Maximum-likelihood learning
  • Log-likelihood of data is a function of CPTs
  • So, the goodness criterion of CPTs is the
    log-likelihood of the data .
  • The best CPTs are the CPTs which can maximize the
    log-likelihood

35
Maximum-likelihood learning
  • Mathematical formulation
  • subject to constraints probabilities sum up to
    1
  • Constrained optimization with equality
    constraints
  • Lagrange multiplier (which youve probably seen
    in your Calculus class)
  • You can solve it yourself. Its not hard at all.
  • Very common technique in machine learning

36
ML solution
  • Nicely, we can have closed-form solution for the
    constrained optimization problem.

37
ML solution
  • Nicely, we can have closed-form solution for the
    constrained optimization problem.

38
ML solution
  • Nicely, we can have closed-form solution for the
    constrained optimization problem.

And, the solution is very intuitive!
39
ML learning example
  • Three binary variables X,Y,Z
  • T 1000 examples in training data

?
40
ML learning example
  • Three binary variables X,Y,Z
  • T 1000 examples in training data

?
41
Outline
  • Motivation
  • BNDAGCPTs
  • Inference
  • Learning
  • Application of BNs

42
Naïve Bayes Classifier
  • Represent an object with attributes
  • Class label
  • Joint probability
  • Learn CPTs from training data
  • ,
  • Classify a new object

43
Naïve Bayes Classifier
  • Inference

Bayes rule
marginalization
C.I.
44
Medical Diagnosis
  • The QMR-DT model (Shwe et al. 1991)
  • Learning
  • Prior probability of each disease
  • Conditional probability of each finding given its
    parents
  • Inference
  • Given the findings of some patient, which is/are
    the most probable disease(s) causing these
    findings?

45
Hidden Markov Model
  • Sequence / time-series model
  • Speech recognition
  • Observations utterance/waveform
  • States words
  • Gene finding
  • Observations genomic sequence
  • States gene/no-gene, different components of gene

Q states Y observations
46
Applying BN to real-world problem
  • Involves the following steps
  • Domain experts (or computer scientists if the
    problem is not very hard) specify causal
    relations among the random variables, then we can
    draw the DAG
  • Collect training data from real-world
  • Learn the Maximum-likelihood CPTs from the
    training data
  • Infer the queries we are interested in

47
Summary
  • BNDAGCPTs
  • Compact representation of Joint probability
  • Inference
  • Conditional independence
  • Probability rules
  • Learning
  • Maximum-likelihood solution
Write a Comment
User Comments (0)
About PowerShow.com