Learning Bayesian Belief Networks - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

Learning Bayesian Belief Networks

Description:

Definition of Bayesian Belief Networks (2) A Global structure G = (V,E) is a DAG such that: ... Definition of Bayesian Belief Networks (3) ... – PowerPoint PPT presentation

Number of Views:149

Avg rating:3.0/5.0

Slides: 38

Provided by: dais4

Learn more at: http://www4.ncsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Learning Bayesian Belief Networks

1
Learning Bayesian Belief Networks

Department of Computer Science
North Carolina State University
Tatdow Pansombut
May 1, 2006

2
Outline

Notation
Introduction
Inference Models Classical vs. Bayesian
Definition of Bayesian Belief Networks
Representation of Bayes Nets
Building A Model
Types of Learning Bayes Nets
Issues in Learning Bayes Nets
Batch Learning
Limitations
Conclusions
Questions and Comments

3
Notation

BBN (BBNs) Bayesian Belief Network (Bayesian
Belief Networks)
H Hypothesis Space, h ? H
G (V,E) is a directed acyclic graph (DAG), V(G)
is the vertex set, and E(G) is the edge set.
Parent set of u ?(u) v?V(G) there exists an
edge from v to u in G

4
Introduction

Problem Learning under uncertainty
Goal Build a system that achieves the best
expected outcome
Model Probabilistic Model
Tool Bayes Nets

5
Classical vs. Bayesian Inference Model

Bayesian Inference Model Allow the use of prior
knowledge.
Let P(h?) be a degree of belief in h given
current state of information ?.
New evidence is presented.
Update using Bayess Theorem

6
Classical Probability Calculus (1)

A random variable u ? U
Domain of u d(u)
Sample space SU
Probability of variable u degree of belief in u,
denoted by P(u)
Conditionally Independence a and b are
conditionally independent if P(a c) P(a b,
c).
A Joint probability distribution of a set U of
random variables

7
Classical Probability Calculus (2)

A world consists of
Domain
Sample Space
A Joint probability distribution over U

8
Definition of Bayesian Belief Networks (1)

Let U u1,,un.
A Bayesian Belief Network B over U B (Bs, ?)
Bs has global structure G (V,E).
? is a set of parameters that encodes the local
conditional probability.

9
Definition of Bayesian Belief Networks (2)

A Global structure G (V,E) is a DAG such
that
V(G) U
ui ? U is labeled by an event or random
variables.
ui ? U is labeled by conditional probability
P(ui?(ui)).

10
Definition of Bayesian Belief Networks (3)

Let ?i be a set of all unique instantiations of
uis parents.
Example
?3 (u10, u20),
(u10, u21),
(u11, u20),
(u11, u21),
?1 ?2 ?

11
Definition of Bayesian Belief Networks (4)

A set of parameters ?
? ? ui ? U, j ? ?i, k ? d(ui)
?(i,j,k)P(uik ?i j, ?, G)

12
Representation of Bayes Nets

Let U u1,, un.
A Bayes Net B over U represents a joint
probability distribution

13
Building A Model (1)

Assume that the true relationships among
variables is governed by MT.
MT (VT, PT)
Given a training set D
Goal Build a model ML to approximate MT in the
form of a BBN B with global structure G.

14
Building A Model (2)

Input P (the joint probability distribution over
VT of the training set D)
Output A DAG G
Procedure
Assign an order d to all variables in VT.
For each u ? VT, identify predecessors ?(u) that
render u independent of all other predecessors.
Assign a direct edge from each element of ?(u) to
u.

15
Types of Learning Bayes Nets

Learners input
A Batch Learning
input training set
output a BBN
Adaptive Learning
input a BBN and new example(s)
output a new BBN

Property of a network
Qualitative Property structure of the network
Quantitative Property conditional probabilities

16
Issues In Learning Bayes Nets

Hidden variables
Feature Selection
Causal relationships
1. Expert knowledge
2. Temporal Precedence

17
Batch Learning (1)

Input a set of discrete variables U and a
database (training set) D
Output A BBN

18
Batch Learning (2)

Important Assumption MT is a BBN with unknown
structure and parameters
Let be the hypothesis that D is generated by
a network structure Bs.
Learning a BBN search for the hypothesis
corresponding to the best network structure
given Hypothesis Space H.

19
Batch Learning (3)

Model Selection Given the search space of
models, output the best model.
Neighborhood
Scoring criterion
Search Procedure/Strategy

20
Bayesian Scoring Metric (1)

Goal Output the most accurate network structure
given a database.
In theory Find with the highest
.
In practice Find with the highest
.

21
Bayesian Scoring Metric (2)

Notation
qi is the number of instances in ?i.
Nijk is the number of cases in D where ui k and
?i j.
is the number of elements in d(ui).
?(.) is the Gamma function.

22
Bayesian Scoring Metric (3)
23
Bayesian Scoring Metric (4)

Algorithm K2
Input a set of variables U, an order d, an upper
bound , and a database D.
Output For ui ? U, ?(ui)
Bayesian Scoring Metric estimation function

24
Bayesian Scoring Metric (5)

Assumptions for K2
Prior knowledge is uninformative.
Database is a Monte Carlo sampling of a
belief-network with only variables in U.
All cases in database are independent.
Cases are complete.
If p and q are components of different
conditional probability distributions, then the
value assigned of p is independent of the value
assigned to q.
Initially indifferent about value assignment of
conditional probability.
Specify order d.
Prior probability of all network structures are
equal.

25
MDL Metric (1)

Goal Output the network structure with minimum
description length (MDL).
MDL is the sum of the encoding length of the
model (network) and the length of the data given
the model.
Trade-off between accuracy and usefulness

26
MDL Metric (2)

Problems with the most accurate but complex
model
1. Hard for human understanding
2. Can at best represent the distribution of
training set.
3. Computationally difficult to learn and use.

27
MDL Metric (3)

Let be the number of uis parents.
Let be the number of bits needed to store a
numerical value.
Encoding length of the network

28
MDL Metric (4)

Encoding length of the data using the network
Kullback-Leibler cross entropy
P and Q are the distributions over the same space
SU.
SU is exponential in number of variables.

29
Scoring Criterion Property

Asypmtotically Consistent
Locally Consistent
Score Equivalence
Decomposable

30
Learning Methods focusing on Search Procedures

A search procedure works in combination with a
family of scoring metrics.
Greedy Search Equivalence (GES)
K-greedy Search Equivalence (KES)

31
Greedy Equivalence Search

Procedure
Divide the search space into equivalence classes
of DAGs.
Start with an equivalence class of DAGs with no
edge.
Perform a forward equivalence search.
Perform a backward equivalence search.

With asymptotically consistent scoring criterion
an parameter-optimal DAG model
With locally consistent scoring criterion an
inclusion-optimal DAG model
Problem local maxima.

32
K-greedy Equivalence Search

Procedure
Let G be a DAG with no edge.
Let Sh be a set of DAGs resulting from adding or
removing one edge from G whose score is higher
than Gs.
Choose a random Sr ? Sh of size (k Sr).
Let G be the DAG in Sr with highest score.

With locally consistent scoring criterion an
inclusion-optimal DAG model

33
Limitations (1)

Given a Hypothesis Space H and a probability
distribution P over H
Learning by update P using Bayess Theorem

34
Limitations (2)

Implicit condition
Explicit condition

35
Limitations (3)
36
Conclusions

A BBN is a powerful model for reasoning under
uncertainty.
Need to make assumptions to measure the
goodness of the network.
Without some assumptions, finding a network
structure with the highest score is NP-hard 1.
1 Chickering, D. M, Heckerman D., and Meek C.
(2004). Large-Sample Learning of Bayesian
Networks is NP-Hard, In Journal of Machine
Learning Research 5, 1287-1330.

37
Questions and Comments

Write a Comment

User Comments (0)