Learning Bayesian Belief Networks - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Learning Bayesian Belief Networks

Description:

Definition of Bayesian Belief Networks (2) A Global structure G = (V,E) is a DAG such that: ... Definition of Bayesian Belief Networks (3) ... – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 38
Provided by: dais4
Learn more at: http://www4.ncsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Learning Bayesian Belief Networks


1
Learning Bayesian Belief Networks
  • Department of Computer Science
  • North Carolina State University
  • Tatdow Pansombut
  • May 1, 2006

2
Outline
  • Notation
  • Introduction
  • Inference Models Classical vs. Bayesian
  • Definition of Bayesian Belief Networks
  • Representation of Bayes Nets
  • Building A Model
  • Types of Learning Bayes Nets
  • Issues in Learning Bayes Nets
  • Batch Learning
  • Limitations
  • Conclusions
  • Questions and Comments

3
Notation
  • BBN (BBNs) Bayesian Belief Network (Bayesian
    Belief Networks)
  • H Hypothesis Space, h ? H
  • G (V,E) is a directed acyclic graph (DAG), V(G)
    is the vertex set, and E(G) is the edge set.
  • Parent set of u ?(u) v?V(G) there exists an
    edge from v to u in G

4
Introduction
  • Problem Learning under uncertainty
  • Goal Build a system that achieves the best
    expected outcome
  • Model Probabilistic Model
  • Tool Bayes Nets

5
Classical vs. Bayesian Inference Model
  • Bayesian Inference Model Allow the use of prior
    knowledge.
  • Let P(h?) be a degree of belief in h given
    current state of information ?.
  • New evidence is presented.
  • Update using Bayess Theorem

6
Classical Probability Calculus (1)
  • A random variable u ? U
  • Domain of u d(u)
  • Sample space SU
  • Probability of variable u degree of belief in u,
    denoted by P(u)
  • Conditionally Independence a and b are
    conditionally independent if P(a c) P(a b,
    c).
  • A Joint probability distribution of a set U of
    random variables

7
Classical Probability Calculus (2)
  • A world consists of
  • Domain
  • Sample Space
  • A Joint probability distribution over U

8
Definition of Bayesian Belief Networks (1)
  • Let U u1,,un.
  • A Bayesian Belief Network B over U B (Bs, ?)
  • Bs has global structure G (V,E).
  • ? is a set of parameters that encodes the local
    conditional probability.

9
Definition of Bayesian Belief Networks (2)
  • A Global structure G (V,E) is a DAG such
    that
  • V(G) U
  • ui ? U is labeled by an event or random
    variables.
  • ui ? U is labeled by conditional probability
    P(ui?(ui)).

10
Definition of Bayesian Belief Networks (3)
  • Let ?i be a set of all unique instantiations of
    uis parents.
  • Example
  • ?3 (u10, u20),
  • (u10, u21),
  • (u11, u20),
  • (u11, u21),
  • ?1 ?2 ?

11
Definition of Bayesian Belief Networks (4)
  • A set of parameters ?
  • ? ? ui ? U, j ? ?i, k ? d(ui)
    ?(i,j,k)P(uik ?i j, ?, G)

12
Representation of Bayes Nets
  • Let U u1,, un.
  • A Bayes Net B over U represents a joint
    probability distribution

13
Building A Model (1)
  • Assume that the true relationships among
    variables is governed by MT.
  • MT (VT, PT)
  • Given a training set D
  • Goal Build a model ML to approximate MT in the
    form of a BBN B with global structure G.

14
Building A Model (2)
  • Input P (the joint probability distribution over
    VT of the training set D)
  • Output A DAG G
  • Procedure
  • Assign an order d to all variables in VT.
  • For each u ? VT, identify predecessors ?(u) that
    render u independent of all other predecessors.
  • Assign a direct edge from each element of ?(u) to
    u.

15
Types of Learning Bayes Nets
  • Learners input
  • A Batch Learning
  • input training set
  • output a BBN
  • Adaptive Learning
  • input a BBN and new example(s)
  • output a new BBN
  • Property of a network
  • Qualitative Property structure of the network
  • Quantitative Property conditional probabilities

16
Issues In Learning Bayes Nets
  • Hidden variables
  • Feature Selection
  • Causal relationships
  • 1. Expert knowledge
  • 2. Temporal Precedence

17
Batch Learning (1)
  • Input a set of discrete variables U and a
    database (training set) D
  • Output A BBN

18
Batch Learning (2)
  • Important Assumption MT is a BBN with unknown
    structure and parameters
  • Let be the hypothesis that D is generated by
    a network structure Bs.
  • Learning a BBN search for the hypothesis
    corresponding to the best network structure
    given Hypothesis Space H.

19
Batch Learning (3)
  • Model Selection Given the search space of
    models, output the best model.
  • Neighborhood
  • Scoring criterion
  • Search Procedure/Strategy

20
Bayesian Scoring Metric (1)
  • Goal Output the most accurate network structure
    given a database.
  • In theory Find with the highest
    .
  • In practice Find with the highest
    .

21
Bayesian Scoring Metric (2)
  • Notation
  • qi is the number of instances in ?i.
  • Nijk is the number of cases in D where ui k and
  • ?i j.
  • is the number of elements in d(ui).
  • ?(.) is the Gamma function.

22
Bayesian Scoring Metric (3)
23
Bayesian Scoring Metric (4)
  • Algorithm K2
  • Input a set of variables U, an order d, an upper
    bound , and a database D.
  • Output For ui ? U, ?(ui)
  • Bayesian Scoring Metric estimation function

24
Bayesian Scoring Metric (5)
  • Assumptions for K2
  • Prior knowledge is uninformative.
  • Database is a Monte Carlo sampling of a
    belief-network with only variables in U.
  • All cases in database are independent.
  • Cases are complete.
  • If p and q are components of different
    conditional probability distributions, then the
    value assigned of p is independent of the value
    assigned to q.
  • Initially indifferent about value assignment of
    conditional probability.
  • Specify order d.
  • Prior probability of all network structures are
    equal.

25
MDL Metric (1)
  • Goal Output the network structure with minimum
    description length (MDL).
  • MDL is the sum of the encoding length of the
    model (network) and the length of the data given
    the model.
  • Trade-off between accuracy and usefulness

26
MDL Metric (2)
  • Problems with the most accurate but complex
    model
  • 1. Hard for human understanding
  • 2. Can at best represent the distribution of
    training set.
  • 3. Computationally difficult to learn and use.

27
MDL Metric (3)
  • Let be the number of uis parents.
  • Let be the number of bits needed to store a
    numerical value.
  • Encoding length of the network

28
MDL Metric (4)
  • Encoding length of the data using the network
  • Kullback-Leibler cross entropy
  • P and Q are the distributions over the same space
    SU.
  • SU is exponential in number of variables.

29
Scoring Criterion Property
  • Asypmtotically Consistent
  • Locally Consistent
  • Score Equivalence
  • Decomposable

30
Learning Methods focusing on Search Procedures
  • A search procedure works in combination with a
    family of scoring metrics.
  • Greedy Search Equivalence (GES)
  • K-greedy Search Equivalence (KES)

31
Greedy Equivalence Search
  • Procedure
  • Divide the search space into equivalence classes
    of DAGs.
  • Start with an equivalence class of DAGs with no
    edge.
  • Perform a forward equivalence search.
  • Perform a backward equivalence search.
  • With asymptotically consistent scoring criterion
    an parameter-optimal DAG model
  • With locally consistent scoring criterion an
    inclusion-optimal DAG model
  • Problem local maxima.

32
K-greedy Equivalence Search
  • Procedure
  • Let G be a DAG with no edge.
  • Let Sh be a set of DAGs resulting from adding or
    removing one edge from G whose score is higher
    than Gs.
  • Choose a random Sr ? Sh of size (k Sr).
  • Let G be the DAG in Sr with highest score.
  • With locally consistent scoring criterion an
    inclusion-optimal DAG model

33
Limitations (1)
  • Given a Hypothesis Space H and a probability
    distribution P over H
  • Learning by update P using Bayess Theorem

34
Limitations (2)
  • Implicit condition
  • Explicit condition

35
Limitations (3)
36
Conclusions
  • A BBN is a powerful model for reasoning under
    uncertainty.
  • Need to make assumptions to measure the
    goodness of the network.
  • Without some assumptions, finding a network
    structure with the highest score is NP-hard 1.
  • 1 Chickering, D. M, Heckerman D., and Meek C.
    (2004). Large-Sample Learning of Bayesian
    Networks is NP-Hard, In Journal of Machine
    Learning Research 5, 1287-1330.

37
Questions and Comments
Write a Comment
User Comments (0)
About PowerShow.com