Learning Bayesian Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Learning Bayesian Networks

Description:

Title: Learning Bayesian Networks: Search Methods and Experimental Results Author: Max Chickering Last modified by: Administrator Created Date – PowerPoint PPT presentation

Number of Views:202
Avg rating:3.0/5.0
Slides: 49
Provided by: MaxC48
Category:

less

Transcript and Presenter's Notes

Title: Learning Bayesian Networks


1
Learning Bayesian Networks
(From David Heckermans tutorial)
2
Learning Bayes Nets From Data
Bayes net(s)
data
X1
X2
Bayes-net learner
X3
X4
X5
X6
X7
prior/expert information
X8
X9
3
Overview
  • Introduction to Bayesian statisticsLearning a
    probability
  • Learning probabilities in a Bayes net
  • Learning Bayes-net structure

4
Learning Probabilities Classical Approach
Simple case Flipping a thumbtack
True probability q is unknown
Given iid data, estimate q using an estimator
with good properties low bias, low variance,
consistent (e.g., ML estimate)
5
Learning Probabilities Bayesian Approach
True probability q is unknown Bayesian
probability density for q
6
Bayesian Approach use Bayes' rule to compute a
new density for q given data
prior
likelihood
posterior
7
The Likelihood
binomial distribution
8
Example Application of Bayes rule to the
observation of a single "heads"
p(qheads)
p(q)
p(headsq) q
q
q
q
0
1
0
1
0
1
prior
likelihood
posterior
9
A Bayes net for learning probabilities
10
Sufficient statistics
(h,t) are sufficient statistics
11
The probability of heads on the next toss
12
Prior Distributions for q
  • Direct assessment
  • Parametric distributions
  • Conjugate distributions (for convenience)
  • Mixtures of conjugate distributions

13
Conjugate Family of Distributions
Beta distribution
Properties
14
Intuition
  • The hyperparameters ah and at can be thought of
    as imaginary counts from our prior experience,
    starting from "pure ignorance"
  • Equivalent sample size ah at
  • The larger the equivalent sample size, the more
    confident we are about the true probability

15
Beta Distributions
Beta(3, 2 )
Beta(1, 1 )
Beta(19, 39 )
Beta(0.5, 0.5 )
16
Assessment of a Beta Distribution
Method 1 Equivalent sample - assess ah and
at - assess ahat and ah/(ahat) Method 2
Imagined future samples
17
Generalization to m discrete outcomes("multinomia
l distribution")
Dirichlet distribution
Properties
18
More generalizations(see, e.g., Bernardo
Smith, 1994)
  • Likelihoods from the exponential family
  • Binomial
  • Multinomial
  • Poisson
  • Gamma
  • Normal

19
Overview
  • Intro to Bayesian statisticsLearning a
    probability
  • Learning probabilities in a Bayes net
  • Learning Bayes-net structure

20
From thumbtacks to Bayes nets
Thumbtack problem can be viewed as learning the
probability for a very simple BN
X
heads/tails
21
The next simplest Bayes net
22
The next simplest Bayes net
?
QY
case 1
Y1
case 2
Y2
YN
case N
23
The next simplest Bayes net
"parameter independence"
QY
case 1
Y1
case 2
Y2
YN
case N
24
The next simplest Bayes net
"parameter independence"
QY
case 1
Y1
ß
case 2
Y2
two separate thumbtack-like learning problems
YN
case N
25
A bit more difficult...
  • Three probabilities to learn
  • qXheads
  • qYheadsXheads
  • qYheadsXtails

26
A bit more difficult...
QX
QYXheads
QYXtails
heads
X1
Y1
case 1
tails
X2
Y2
case 2
27
A bit more difficult...
QX
QYXheads
QYXtails
X1
Y1
case 1
X2
Y2
case 2
28
A bit more difficult...
?
?
QX
QYXheads
QYXtails
?
X1
Y1
case 1
X2
Y2
case 2
29
A bit more difficult...
QX
QYXheads
QYXtails
X1
Y1
case 1
X2
Y2
case 2
3 separate thumbtack-like problems
30
In general
  • Learning probabilities in a BN is straightforward
    if
  • Local distributions from the exponential family
    (binomial, poisson, gamma, ...)
  • Parameter independence
  • Conjugate priors
  • Complete data

31
Incomplete data makes parameters dependent
QX
QYXheads
QYXtails
X1
Y1
case 1
X2
Y2
case 2
32
Overview
  • Intro to Bayesian statisticsLearning a
    probability
  • Learning probabilities in a Bayes net
  • Learning Bayes-net structure

33
Learning Bayes-net structure
Given data, which model is correct?
X
Y
model 1
X
Y
model 2
34
Bayesian approach
Given data, which model is correct? more likely?
X
Y
model 1
Data d
X
Y
model 2
35
Bayesian approach Model Averaging
Given data, which model is correct? more likely?
X
Y
model 1
Data d
X
Y
model 2
average predictions
36
Bayesian approach Model Selection
Given data, which model is correct? more likely?
X
Y
model 1
Data d
X
Y
model 2
Keep the best model - Explanation -
Understanding - Tractability
37
To score a model, use Bayes rule
Given data d
model score
"marginal likelihood"
likelihood
38
Thumbtack example
X
heads/tails
conjugate prior
39
More complicated graphs
3 separate thumbtack-like learning problems
X
YXheads
YXtails
40
Model score for a discrete BN
41
Computation of Marginal Likelihood
  • Efficient closed form if
  • Local distributions from the exponential family
    (binomial, poisson, gamma, ...)
  • Parameter independence
  • Conjugate priors
  • No missing data (including no hidden variables)

42
Practical considerations
  • The number of possible BN structures for n
    variables is super exponential in n
  • How do we find the best graph(s)?
  • How do we assign structure and parameter priors
    to all possible graph?

43
Model search
  • Finding the BN structure with the highest score
    among those structures with at most k parents is
    NP hard for kgt1 (Chickering, 1995)
  • Heuristic methods
  • Greedy
  • Greedy with restarts
  • MCMC methods

44
Structure priors
  • 1. All possible structures equally likely
  • 2. Partial ordering, required / prohibited arcs
  • 3. p(m) a similarity(m, prior BN)

45
Parameter priors
  • All uniform Beta(1,1)
  • Use a prior BN

46
Parameter priors
  • Recall the intuition behind the Beta prior for
    the thumbtack
  • The hyperparameters ah and at can be thought of
    as imaginary counts from our prior experience,
    starting from "pure ignorance"
  • Equivalent sample size ah at
  • The larger the equivalent sample size, the more
    confident we are about the long-run fraction

47
Parameter priors
imaginary count for any variable configuration
equivalent sample size

parameter modularity
parameter priors for any BN structure for X1Xn
48
Combine user knowledge and data
prior networkequivalent sample size
improved network(s)
data
Write a Comment
User Comments (0)
About PowerShow.com