Title: Learning Bayesian Belief Networks
1Learning Bayesian Belief Networks
- Department of Computer Science
- North Carolina State University
- Tatdow Pansombut
- May 1, 2006
2Outline
- Notation
- Introduction
- Inference Models Classical vs. Bayesian
- Definition of Bayesian Belief Networks
- Representation of Bayes Nets
- Building A Model
- Types of Learning Bayes Nets
- Issues in Learning Bayes Nets
- Batch Learning
- Limitations
- Conclusions
- Questions and Comments
3Notation
- BBN (BBNs) Bayesian Belief Network (Bayesian
Belief Networks) - H Hypothesis Space, h ? H
- G (V,E) is a directed acyclic graph (DAG), V(G)
is the vertex set, and E(G) is the edge set. - Parent set of u ?(u) v?V(G) there exists an
edge from v to u in G
4Introduction
- Problem Learning under uncertainty
- Goal Build a system that achieves the best
expected outcome - Model Probabilistic Model
- Tool Bayes Nets
5Classical vs. Bayesian Inference Model
- Bayesian Inference Model Allow the use of prior
knowledge. - Let P(h?) be a degree of belief in h given
current state of information ?. - New evidence is presented.
- Update using Bayess Theorem
6Classical Probability Calculus (1)
- A random variable u ? U
- Domain of u d(u)
- Sample space SU
- Probability of variable u degree of belief in u,
denoted by P(u) - Conditionally Independence a and b are
conditionally independent if P(a c) P(a b,
c). - A Joint probability distribution of a set U of
random variables
7Classical Probability Calculus (2)
- A world consists of
- Domain
- Sample Space
- A Joint probability distribution over U
-
8Definition of Bayesian Belief Networks (1)
- Let U u1,,un.
- A Bayesian Belief Network B over U B (Bs, ?)
- Bs has global structure G (V,E).
- ? is a set of parameters that encodes the local
conditional probability.
9Definition of Bayesian Belief Networks (2)
- A Global structure G (V,E) is a DAG such
that - V(G) U
- ui ? U is labeled by an event or random
variables. - ui ? U is labeled by conditional probability
P(ui?(ui)).
10Definition of Bayesian Belief Networks (3)
- Let ?i be a set of all unique instantiations of
uis parents. - Example
- ?3 (u10, u20),
- (u10, u21),
- (u11, u20),
- (u11, u21),
- ?1 ?2 ?
11Definition of Bayesian Belief Networks (4)
- A set of parameters ?
- ? ? ui ? U, j ? ?i, k ? d(ui)
?(i,j,k)P(uik ?i j, ?, G)
12Representation of Bayes Nets
- Let U u1,, un.
- A Bayes Net B over U represents a joint
probability distribution
13Building A Model (1)
- Assume that the true relationships among
variables is governed by MT. - MT (VT, PT)
- Given a training set D
- Goal Build a model ML to approximate MT in the
form of a BBN B with global structure G.
14Building A Model (2)
- Input P (the joint probability distribution over
VT of the training set D) - Output A DAG G
- Procedure
- Assign an order d to all variables in VT.
- For each u ? VT, identify predecessors ?(u) that
render u independent of all other predecessors. - Assign a direct edge from each element of ?(u) to
u.
15Types of Learning Bayes Nets
- Learners input
- A Batch Learning
- input training set
- output a BBN
- Adaptive Learning
- input a BBN and new example(s)
- output a new BBN
- Property of a network
- Qualitative Property structure of the network
- Quantitative Property conditional probabilities
16Issues In Learning Bayes Nets
- Hidden variables
- Feature Selection
- Causal relationships
- 1. Expert knowledge
- 2. Temporal Precedence
17Batch Learning (1)
- Input a set of discrete variables U and a
database (training set) D - Output A BBN
18Batch Learning (2)
- Important Assumption MT is a BBN with unknown
structure and parameters - Let be the hypothesis that D is generated by
a network structure Bs. - Learning a BBN search for the hypothesis
corresponding to the best network structure
given Hypothesis Space H.
19Batch Learning (3)
- Model Selection Given the search space of
models, output the best model. - Neighborhood
- Scoring criterion
- Search Procedure/Strategy
20Bayesian Scoring Metric (1)
- Goal Output the most accurate network structure
given a database. - In theory Find with the highest
. - In practice Find with the highest
. -
21Bayesian Scoring Metric (2)
- Notation
- qi is the number of instances in ?i.
- Nijk is the number of cases in D where ui k and
- ?i j.
- is the number of elements in d(ui).
- ?(.) is the Gamma function.
22Bayesian Scoring Metric (3)
23Bayesian Scoring Metric (4)
- Algorithm K2
- Input a set of variables U, an order d, an upper
bound , and a database D. - Output For ui ? U, ?(ui)
- Bayesian Scoring Metric estimation function
24Bayesian Scoring Metric (5)
- Assumptions for K2
- Prior knowledge is uninformative.
- Database is a Monte Carlo sampling of a
belief-network with only variables in U. - All cases in database are independent.
- Cases are complete.
- If p and q are components of different
conditional probability distributions, then the
value assigned of p is independent of the value
assigned to q. - Initially indifferent about value assignment of
conditional probability. - Specify order d.
- Prior probability of all network structures are
equal.
25MDL Metric (1)
- Goal Output the network structure with minimum
description length (MDL). - MDL is the sum of the encoding length of the
model (network) and the length of the data given
the model. - Trade-off between accuracy and usefulness
26MDL Metric (2)
- Problems with the most accurate but complex
model - 1. Hard for human understanding
- 2. Can at best represent the distribution of
training set. - 3. Computationally difficult to learn and use.
27MDL Metric (3)
- Let be the number of uis parents.
- Let be the number of bits needed to store a
numerical value. - Encoding length of the network
28MDL Metric (4)
- Encoding length of the data using the network
- Kullback-Leibler cross entropy
- P and Q are the distributions over the same space
SU. - SU is exponential in number of variables.
29Scoring Criterion Property
- Asypmtotically Consistent
- Locally Consistent
- Score Equivalence
- Decomposable
30Learning Methods focusing on Search Procedures
- A search procedure works in combination with a
family of scoring metrics. - Greedy Search Equivalence (GES)
- K-greedy Search Equivalence (KES)
31Greedy Equivalence Search
- Procedure
- Divide the search space into equivalence classes
of DAGs. - Start with an equivalence class of DAGs with no
edge. - Perform a forward equivalence search.
- Perform a backward equivalence search.
- With asymptotically consistent scoring criterion
an parameter-optimal DAG model - With locally consistent scoring criterion an
inclusion-optimal DAG model - Problem local maxima.
32K-greedy Equivalence Search
- Procedure
- Let G be a DAG with no edge.
- Let Sh be a set of DAGs resulting from adding or
removing one edge from G whose score is higher
than Gs. - Choose a random Sr ? Sh of size (k Sr).
- Let G be the DAG in Sr with highest score.
- With locally consistent scoring criterion an
inclusion-optimal DAG model
33Limitations (1)
- Given a Hypothesis Space H and a probability
distribution P over H - Learning by update P using Bayess Theorem
34Limitations (2)
- Implicit condition
- Explicit condition
35Limitations (3)
36Conclusions
- A BBN is a powerful model for reasoning under
uncertainty. - Need to make assumptions to measure the
goodness of the network. - Without some assumptions, finding a network
structure with the highest score is NP-hard 1. - 1 Chickering, D. M, Heckerman D., and Meek C.
(2004). Large-Sample Learning of Bayesian
Networks is NP-Hard, In Journal of Machine
Learning Research 5, 1287-1330.
37Questions and Comments