Variational%20Bayes%20101 - PowerPoint PPT Presentation

About This Presentation
Title:

Variational%20Bayes%20101

Description:

Hansen & Rasmussen, Neural Comp (1994) Tipping 'Relevance vector machine' (1999) ... Hansen & Rasmussen, Neural Comp (1994) Approximations needed for posteriors ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 49
Provided by: tor121
Category:

less

Transcript and Presenter's Notes

Title: Variational%20Bayes%20101


1
Variational Bayes 101
2
The Bayes scene
  • Exact averaging in discrete/small models (Bayes
    networks)
  • Approximate averaging
  • - Monte Carlo methods
  • - Ensemble/mean field
  • - Variational Bayes methods

Variational-Bayes .org MLpedia Wikipedia
  • ISP Bayes
  • ICA mean field, Kalman, dynamical systems
  • NeuroImaging Optimal signal detector
  • Approximate inference
  • Machine learning methods

3
Bayes methodology
Minimal error rate obtained when detector is
based on posterior probability (Bayes decision
theory)
Likelihood may contain unknown parameters
4
Bayes methodology
Conventional approach is to use most probable
parameters
However averaged model is generalization
optimal (Hansen, 1999), i.e.
5
The hidden agenda of learning
  • Typically learning proceeds by generalization
    from limited set of samplesbut
  • We would like to identify the model that
    generated the data
  • .Choose the least complex model compatible with
    data

That I figured out in 1386
6
Generalization!
  • Generalizability is defined as the expected
    performance on a random new sample ... the mean
    performance of a model on a fresh data set is
    an unbiased estimate of generalization
  • Typical loss functions
  • lt-log p(x)gt , lt prediction errors gt
  • lt g(x)-g(x) 2 gt,
  • ltlog p(x,g)/p(x)p(g)gt, etc
  • Results can be presented as bias-variance
    trade-off curves or learning curves

7
Generalization optimal predictive distribution
  • The game of guessing a pdf
  • Assume Random teacher drawn from P(?), random
    data set, D, drawn from P(x?)
  • The prediction / generalization error is

Predictive distribution of model A
Test sample distribution
8
Generalization optimal predictive distribution
  • We define the generalization functional
    (Hansen, NIPS 1999)
  • Minimized by the Bayesian averaging predictive
    distribution

9
Bias-variance trade-off and averaging
  • Now averaging is good, can we average too much?
  • Define the family of tempered posterior
    distributions
  • Case univariate normal dist. w. unknown mean
    parameter
  • High temperature widened posterior average
  • Low temperature Narrow average

10
Bayes model selection, example
  • Let three models A,B,C be given
  • A) x is normal N(0,1)
  • B) x is normal N(0,s2), s2 is uniform U(0,8)
  • C) x is normal N(µ,s2), µ, s2 are uniform U(0,8)

11
Model A
The likelihood of N samples is given by
12
Model B
The likelihood of N samples is given by
13
Model C
The likelihood of N samples is given by
14
Model A maximum likelihood
The likelihood of N samples is given by
15
Model B
The likelihood of N samples is given by
16
Model C
The likelihood of N samples is given by
17
  • Bayesian model selection
  • C(green) is the correct model,
  • what if only A(red)B(blue) are known?

18
  • Bayesian model selection
  • A (red) is the correct model

19
Bayesian inference
  • Bayesian averaging
  • Caveats
  • Bayes can rarely be implemented exactly
  • Not optimal if the model family is incorrect
  • Bayes can not detect bias
  • However, still asymptotically optimal if
    observation model is
  • correct prior is weak (Hansen, 1999).

20
Hierarchical Bayes models
  • Multi-level models in Bayesian averaging
  • C.P. Robert The Bayesian Choice - A
    Decision-Theoretic Motivation.
  • Springer Texts in Statistics, Springer Verlag,
    New
  • York (1994).
  • G. Golub, M. Heath and G. Wahba, Generalized
    crossvalidation
  • as a method for choosing a good ridge parameter,
  • Technometrics 21 pp. 215223, (1979).
  • K. Friston A theory of Cortical Responses. Phil.
    Trans. R. Soc. B 360815-836 (2005)

21
Hierarchical Bayes models
Posterior
learning hyper- parameters by adjusting prior
expectations -empirical Bayes -MacKay, (1992)
Prior
Evidence
Hansen et al. (Eusipco, 2006) Cf. Boltzmann
learning (Hinton et al. 1983)
Target at Maximal evidence
22
Hyperparameter dynamics
Gaussian prior w adaptive hyperparameter
?2A is a signal-to-noise measure ?ML is
maximum lik. opt.
Discontinuity Parameter is pruned at Low
signal-to-noise Hansen Rasmussen, Neural Comp
(1994) Tipping Relevance vector machine (1999)
23
Hyperparameter dynamics
  • Hyperparameters dynamically updated implies
    pruning
  • Pruning decisions based on SNR
  • Mechanism for cognitive selection, attention?

24
Hansen Rasmussen, Neural Comp (1994)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
Approximations needed for posteriors
  • Approximations using asymptotic expansions
    (Laplace etc) -JL
  • Approximation of posteriors using tractable
    (factorized) pdfs by KL-fitting
  • Approximation of products using EP -AH Wednesday
  • Approximation by MCMC OWI Thursday

36
Illustration of approximation by a gaussian pdf
P. Højen-Sørensen Thesis (2001)
37
(No Transcript)
38
Variational Bayes
  • Notation are observables and hidden
    variables
  • we analyse the log likelihood of a mixture model


39
Variational Bayes

40
Variational Bayes
41
Conjugate exponential families

42
Mini exercise
  • What are the natural parameters for a Gaussian?
  • What are the natural parameters for a MoG?

43
(No Transcript)
44
  • Observation model and Bayes factor

45
  • Normal inverse gamma prior the conjugate
    prior for the GLM observation model

46
  • Normal inverse gamma prior the conjugate
    prior for the GLM observation model

47
  • Bayes factor is the ratio between normalization
    const. of NIGs

48
(No Transcript)
49
(No Transcript)
50
(No Transcript)
51
Exercises
  • Matthew Beals Mixture of Factor Analyzers code
  • Code available (variational-bayes.org)
  • Code a VB version of the BGML for signal
    detection
  • Code available for exact posterior
Write a Comment
User Comments (0)
About PowerShow.com