Bayesian regularization of learning - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Bayesian regularization of learning

Description:

Infinite predictions Finite data? How to optimize future predictions? ... Pruned weights. Equisensitive weights. Laplace regularization. ?-step. Weights estimation ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 56
Provided by: Shum3
Category:

less

Transcript and Presenter's Notes

Title: Bayesian regularization of learning


1
Bayesian regularization of learning
  • Sergey Shumsky
  • NeurOK Software LLC

2
Scientific methods
Models
  • Induction
  • F.Bacon
  • Machine
  • Deduction
  • R.Descartes
  • Math. modeling

Data
3
Outline
  • Learning as ill-posed problem
  • General problem data generalization
  • General remedy model regularization
  • Bayesian regularization. Theory
  • Hypothesis comparison
  • Model comparison
  • Free Energy EM algorithm
  • Bayesian regularization. Practice
  • Hypothesis testing
  • Function approximation
  • Data clustering

4
Outline
  • Learning as ill-posed problem
  • General problem data generalization
  • General remedy model regularization
  • Bayesian regularization. Theory
  • Hypothesis comparison
  • Model comparison
  • Free Energy EM algorithm
  • Bayesian regularization. Practice
  • Hypothesis testing
  • Function approximation
  • Data clustering

5
Problem statement
  • Learning is inverse, ill-posed problem
  • Model ? Data
  • Learning paradoxes
  • Infinite predictions ? Finite data?
  • How to optimize future predictions?
  • How to select regular from casual in data?
  • Regularization of learning
  • Optimal model complexity

6
Well-posed problem
  • Solution is unique
  • Solution is stable
  • Hadamard (1900-s)
  • Tikhonoff (1960-s)

7
Learning from examples
  • Problem
  • Find hypothesis h, generating observed data D in
    model H
  • Well defined if not sensitive to
  • noise in data (Hadamard)
  • learning procedure (Tikhonoff)

8
Learning is ill-posed problem
  • Example Function approximation
  • Sensitive tonoise in data
  • Sensitive tolearning procedure

9
Learning is ill-posed problem
  • Solution is non-unique

10
Outline
  • Learning as ill-posed problem
  • General problem data generalization
  • General remedy model regularization
  • Bayesian regularization. Theory
  • Hypothesis comparison
  • Model comparison
  • Free Energy EM algorithm
  • Bayesian regularization. Practice
  • Hypothesis testing
  • Function approximation
  • Data clustering

11
Problem regularization
  • Main idea restrict solutions sacrifice
    precision to stability

How to choose?
12
Statistical Learning practice
  • Data ?? Learning set Validation
    set
  • Cross-validation
  • Systematic approach to ensembles ?? Bayes

13
Outline
  • Learning as ill-posed problem
  • General problem data generalization
  • General remedy model regularization
  • Bayesian regularization. Theory
  • Hypothesis comparison
  • Model comparison
  • Free Energy EM algorithm
  • Bayesian regularization. Practice
  • Hypothesis testing
  • Function approximation
  • Data clustering

14
Statistical Learning theory
  • Learning as inverse Probability
  • Probability theory. H h ? D
  • Learning theory. H h ? D

Bernoulli (1713)
H
Bayes ( 1750)
15
Bayesian learning
Prior
Posterior
Evidence
16
Coin tossing game
H
17
Monte Carlo simulations
18
Bayesian regularization
  • Most Probable hypothesis

? Learning error
Regularization
Example Function approximation
19
Minimal Description Length
Rissanen (1978)
  • Most Probable hypothesis

Code length for
Data
hypothesis
Example Optimal prefix code
20
Data Complexity
  • Complexity K(D H) min L(h,DH)

Kolmogoroff (1965)
Code length L(h,D) coded data L(Dh)
decoding program L(h)
Decoding
Data D
21
Complex Unpredictable
Solomonoff (1978)
  • Prediction error L(h,D)/L(D)
  • Random data is uncompressible
  • Compression predictability

Example block coding
Program h length L(h,D)
Decoding
Data D
22
Universal Prior
L(h,D)
H
  • All 2L programs with length L are
    equiprobable
  • Data complexity

D
Solomonoff (1960)
Bayes (1750)
23
Statistical ensemble
  • Shorter description length
  • Proof
  • Corollary Ensemble predictions are superior to
    most probable prediction

24
Ensemble prediction
25
Outline
  • Learning as ill-posed problem
  • General problem data generalization
  • General remedy model regularization
  • Bayesian regularization. Theory
  • Hypothesis comparison
  • Model comparison
  • Free Energy EM algorithm
  • Bayesian regularization. Practice
  • Hypothesis testing
  • Function approximation
  • Data clustering

26
Model comparison
Posterior
Evidence
27
Statistics Bayes vs. Fisher
  • Fisher max Likelihood
  • Bayes max Evidence

28
Historical outlook
  • 20 60s of ?? century
  • Parametric statistics
  • Asymptotic N ? ?
  • 60 - 80s of ?? century
  • Non-Parametric statistics
  • Regularization of ill-posed problems
  • Non-asymptotic learning
  • Algorithmic complexity
  • Statistical physics of disordered systems

Fisher (1912)
Chentsoff (1962)
Tikhonoff (1963)
Vapnik (1968)
Kolmogoroff (1965)
Gardner (1988)
29
Outline
  • Learning as ill-posed problem
  • General problem data generalization
  • General remedy model regularization
  • Bayesian regularization. Theory
  • Hypothesis comparison
  • Model comparison
  • Free Energy EM algorithm
  • Bayesian regularization. Practice
  • Hypothesis testing
  • Function approximation
  • Data clustering

30
Statistical physics
  • Probability of hypothesis - microstate
  • Optimal model - macrostate

31
Free energy
  • F - log Z
  • Log of Sum ?
  • F E TS
  • Sum of logs ?
  • P PL ?

32
EM algorithm. Main idea
  • Introduce independent P
  • Iterations
  • E-step
  • ?-step

33
EM algorithm
  • ?-step
  • Estimate Posterior for given Model
  • ?-step
  • Update Model for given Posterior

34
Outline
  • Learning as ill-posed problem
  • General problem data generalization
  • General remedy model regularization
  • Bayesian regularization. Theory
  • Hypothesis comparison
  • Model comparison
  • Free Energy EM algorithm
  • Bayesian regularization. Practice
  • Hypothesis testing
  • Function approximation
  • Data clustering

35
Bayesian regularization Examples
  • Hypothesis testing
  • Function approximation
  • Data clustering

36
Outline
  • Learning as ill-posed problem
  • General problem data generalization
  • General remedy model regularization
  • Bayesian regularization. Theory
  • Hypothesis comparison
  • Model comparison
  • Free Energy EM algorithm
  • Bayesian regularization. Practice
  • Hypothesis testing
  • Function approximation
  • Data clustering

37
Hypothesis testing
  • Problem
  • Noisy observations y ?
  • Is theoretical value h0 true?
  • Model H

Gaussian noise
Gaussian prior
38
Optimal model Phase transition
  • Confidence
  • ? finite
  • ? infinite

39
Threshold effect
  • Student coefficient
  • Hypothesis h0 is true
  • Corrections to h0

40
Outline
  • Learning as ill-posed problem
  • General problem data generalization
  • General remedy model regularization
  • Bayesian regularization. Theory
  • Hypothesis comparison
  • Model comparison
  • Free Energy EM algorithm
  • Bayesian regularization. Practice
  • Hypothesis testing
  • Function approximation
  • Data clustering

41
Function approximation
  • Problem
  • Noisy data y ?(x ?)
  • Find approximation h(x)
  • Model

Noise
Prior
42
Optimal model
  • Free energy minimization

43
Saddle point approximation
  • Function of best hypothesis

44
?? learning
  • ?-step. Optimal hypothesis
  • ?-step. Optimal regularization

45
Laplace Prior
  • Pruned weights
  • Equisensitive weights

46
Laplace regularization
  • ?-step. Weights estimation
  • ?-step. Regularization

47
Outline
  • Learning as ill-posed problem
  • General problem data generalization
  • General remedy model regularization
  • Bayesian regularization. Theory
  • Hypothesis comparison
  • Model comparison
  • Free Energy EM algorithm
  • Bayesian regularization. Practice
  • Hypothesis testing
  • Function approximation
  • Data clustering

48
Clustering
  • Problem
  • Noisy data x ?
  • Find prototypes (mixture density approximation)
  • How many clusters?
  • ??????

Noise
49
Optimal model
  • Free energy minimization
  • Iterations
  • E-step
  • ?-step

50
?? algorithm
  • ?-step
  • ?-step

51
How many clusters?
  • Number of clusters M(?)
  • Optimal number of clusters

52
Simulations Uniform data
  • Optimal model

M
53
Simulations Gaussian data
  • Optimal model

-9.5
-10
-10.5
-11
-11.5
-12
-12.5
0
10
20
30
40
50
M
54
Simulations Gaussian mixture
  • Optimal model

M
55
Outline
  • Learning as ill-posed problem
  • General problem data generalization
  • General remedy model regularization
  • Bayesian regularization. Theory
  • Hypothesis comparison
  • Model comparison
  • Free Energy EM algorithm
  • Bayesian regularization. Practice
  • Hypothesis testing
  • Function approximation
  • Data clustering

56
Summary
  • Learning
  • Ill-posed problem
  • Remedy regularization
  • Bayesian learning
  • Built-in regularization (model assumptions)
  • Optimal model minimal Description Length
    minimal Free Energy
  • Practical issues
  • Learning algorithms with built-in optimal
    regularization - from first principles (opposite
    to cross validation)
Write a Comment
User Comments (0)
About PowerShow.com