A(n) (extremely) brief/crude introduction to minimum description length principle - PowerPoint PPT Presentation

About This Presentation

Title:

A(n) (extremely) brief/crude introduction to minimum description length principle

Description:

Source: http://star.itc.it/caprile/teaching/algebra-superiore-2001/ 13 ... Bad news. We have not found clear guidelines to design codes for H. 27. Outline ... – PowerPoint PPT presentation

Number of Views:53

Avg rating:3.0/5.0

Slides: 36

Provided by: jian60

Learn more at: http://www.gersteinlab.org

Category:

more less

Transcript and Presenter's Notes

Title: A(n) (extremely) brief/crude introduction to minimum description length principle

1
A(n) (extremely) brief/crude introduction to
minimum description length principle

jdu
2006-04

2
Outline

Conceptual/non-technical introduction
Probabilities and Codelengths
Crude MDL
Refined MDL
Other topics

3
Outline

Conceptual/non-technical introduction
Probabilities and Codelengths
Crude MDL
Refined MDL
Other topics

4
Introduction

Example data compression
Description methods

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
5
Introduction

Example regression
Model selection and overfitting
Complexity of the model vs. Goodness of fit

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
6
Introduction

Models vs. Hypotheses

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
7
Introduction

Crude 2-part version of MDL

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
8
Outline

Conceptual/non-technical introduction
Probabilities and Codelengths
Crude MDL
Refined MDL
Other topics

9
Probabilities and Codelengths

Let X be a finite or countable set
A code C(x) for X
1-to-1 mapping from X to Ungt00,1n
LC(x) number of bits needed to encode x using C
P probability distribution defined on X
P(x) the probability of x
A sequence of (usually iid) observations x1, x2,
, xn xn

10
Probabilities and Codelengths

Prefix codes as examples of uniquely decodable
codes
no code word is a prefix of any other

a 0
b 111
c 1011
d 1010
r 110
! 100
Source http//www.cs.princeton.edu/courses/archiv
e/spring04/cos126/
11
Probabilities and Codelengths

Expected codelength of a code C
Lower bound
Optimal code
if it has minimum expected codelength over all
uniquely decodable codes
How to design one given P?
Huffman coding

12
Probabilities and Codelengths

Huffman coding

Source http//star.itc.it/caprile/teaching/algebr
a-superiore-2001/
13
Probabilities and Codelengths

How to design code for 1, 2, , M?
Assuming a uniform distribution 1/M for each
number
logM bits

14
Probabilities and Codelengths

How to design code for all the positive integers?
For each k
Describe it with 0s
Followed by a 1
Then encode k using the uniform code for
In total, 2logk 1 bits
Can be refined

15
Probabilities and Codelengths

Let P be a probability distribution over X, then
there exists a code C for X such that
Let C be a uniquely decodable code over X, then
there exists a probability distribution P such
that

16
Probabilities and Codelengths

Codelength revisited

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
17
Outline

Conceptual/non-technical introduction
Probabilities and Codelengths
Crude MDL
Refined MDL
Other topics

18
Crude MDL

Preliminary k-th order Markov chain on X0,1
A sequence X1, X2, , XN
Special case 0-th order Bernoulli model (biased
coin)
Maximum Likelihood estimator

19
Crude MDL

Preliminary k-th order Markov chain on X0,1
Special case first order Markov chain B(1)
MLE

20
Crude MDL

Preliminary k-th order Markov chain on X0,1
2k parameters
theta1000000 n1000000/n000000
theta1000001
theta1111110
theta1111111
Log likelihood function
MLE

21
Crude MDL

Question Given data Dxn, find the Markov chain
that best explains D.
We do not want to restrict ourselves to chains of
fixed order
How to avoid overfitting?
Obviously, an (n-1)-th order Markov model would
always fit the data the best

22
Crude MDL

two-part MDL revisited

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
23
Crude MDL

Description length of data given hypothesis

24
Crude MDL

Description length of hypothesis
The code should not change with the sample size
n.
Different codes will lead to preferences of
different hypotheses
How to design a code that
Leads to good inferences with small, practically
relevant sample sizes?

25
Crude MDL

An intuitive and reasonable code for k-th
order Markov chain
First describe k using 2logk1 bits
Then describe the d2k parameters
Assume n is given in advance
For each theta in the MLE theta1000000, ,
theta1111111, the best precision we can
achieve by counting is 1/(n1)
Describe each theta with log(n1) bits
L(H)2logk1dlog(n1)
L(H)L(DH) 2logk1dlog(n1) logP(Dk,
theta)
For a given k, only the MLE theta need to be
considered

26
Crude MDL

Good news
We have found a principled manner to encode data
D using H
Bad news
We have not found clear guidelines to design
codes for H

27
Outline

Conceptual/non-technical introduction
Probabilities and Codelengths
Crude MDL
Refined MDL
Other issues

28
Refined MDL

Universal codes and universal distributions
maximum likelihood code depends on the data
How to describe the data in an unambiguous
manner?
Design a code such that for every possible
observation, its codelength corresponds to its
ML? - impossible

29
Refined MDL

Worst-case regret
Optimal universal model

30
Refined MDL

Normalized maximum likelihood (NML)
Minimizing -logNML

31
Refined MDL

Complexity of a model
The more sequences that can be fit well by an
element of M, the larger Ms complexity
Would it lead to a right balance between
complexity and fit?
Hopefully

32
Refined MDL

General refined MDL

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
33
Outline

Conceptual/non-technical introduction
Probabilities and Codelengths
Crude MDL
Refined MDL
Other topics

34
Other topics

Mixture code
Resolvability

35
References

Barron, A. Rissanen, J. Yu, B. (1998), 'The
minimum description length principle in coding
and modeling', Information Theory, IEEE
Transactions on 44(6), 2743--2760.
Grnwald, P.D. Myung, I.J. Pitt, M.A. (2005),
Advances in Minimum Description Length Theory
and Applications (Neural Information Processing),
The MIT Press.
Hall, P. Hannan, E.J. (1988), 'On stochastic
complexity and nonparametric density estimation',
Biometrika 75(4), 705-714.