Find hypothesis h, generating observed data D in model H
Well defined if not sensitive to
noise in data (Hadamard)
learning procedure (Tikhonoff)
8 Learning is ill-posed problem
Example Function approximation
Sensitive tonoise in data
Sensitive tolearning procedure
9 Learning is ill-posed problem
Solution is non-unique
10 Outline
Learning as ill-posed problem
General problem data generalization
General remedy model regularization
Bayesian regularization. Theory
Hypothesis comparison
Model comparison
Free Energy EM algorithm
Bayesian regularization. Practice
Hypothesis testing
Function approximation
Data clustering
11 Problem regularization
Main idea restrict solutions sacrifice precision to stability
How to choose? 12 Statistical Learning practice
Data ?? Learning set Validation set
Cross-validation
Systematic approach to ensembles ?? Bayes
13 Outline
Learning as ill-posed problem
General problem data generalization
General remedy model regularization
Bayesian regularization. Theory
Hypothesis comparison
Model comparison
Free Energy EM algorithm
Bayesian regularization. Practice
Hypothesis testing
Function approximation
Data clustering
14 Statistical Learning theory
Learning as inverse Probability
Probability theory. H h ? D
Learning theory. H h ? D
Bernoulli (1713) H Bayes ( 1750) 15 Bayesian learning Prior Posterior Evidence 16 Coin tossing game H 17 Monte Carlo simulations 18 Bayesian regularization
Most Probable hypothesis
? Learning error Regularization Example Function approximation 19 Minimal Description Length Rissanen (1978)
Most Probable hypothesis
Code length for Data hypothesis Example Optimal prefix code 20 Data Complexity
Complexity K(D H) min L(h,DH)
Kolmogoroff (1965) Code length L(h,D) coded data L(Dh) decoding program L(h) Decoding Data D 21 Complex Unpredictable Solomonoff (1978)
Prediction error L(h,D)/L(D)
Random data is uncompressible
Compression predictability
Example block coding Program h length L(h,D) Decoding Data D 22 Universal Prior L(h,D) H
All 2L programs with length L are equiprobable
Data complexity
D Solomonoff (1960) Bayes (1750) 23 Statistical ensemble
Shorter description length
Proof
Corollary Ensemble predictions are superior to most probable prediction
24 Ensemble prediction 25 Outline
Learning as ill-posed problem
General problem data generalization
General remedy model regularization
Bayesian regularization. Theory
Hypothesis comparison
Model comparison
Free Energy EM algorithm
Bayesian regularization. Practice
Hypothesis testing
Function approximation
Data clustering
26 Model comparison Posterior Evidence 27 Statistics Bayes vs. Fisher
PowerShow.com is a leading presentation sharing website. It has millions of presentations already uploaded and available with 1,000s more being uploaded by its users every day. Whatever your area of interest, here you’ll be able to find and view presentations you’ll love and possibly download. And, best of all, it is completely free and easy to use.
You might even have a presentation you’d like to share with others. If so, just upload it to PowerShow.com. We’ll convert it to an HTML5 slideshow that includes all the media types you’ve already added: audio, video, music, pictures, animations and transition effects. Then you can share it with your target audience as well as PowerShow.com’s millions of monthly visitors. And, again, it’s all free.
About the Developers
PowerShow.com is brought to you by CrystalGraphics, the award-winning developer and market-leading publisher of rich-media enhancement products for presentations. Our product offerings include millions of PowerPoint templates, diagrams, animated 3D characters and more.