A(n) (extremely) brief/crude introduction to minimum description length principle - PowerPoint PPT Presentation

About This Presentation
Title:

A(n) (extremely) brief/crude introduction to minimum description length principle

Description:

Source: http://star.itc.it/caprile/teaching/algebra-superiore-2001/ 13 ... Bad news. We have not found clear guidelines to design codes for H. 27. Outline ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 36
Provided by: jian60
Category:

less

Transcript and Presenter's Notes

Title: A(n) (extremely) brief/crude introduction to minimum description length principle


1
A(n) (extremely) brief/crude introduction to
minimum description length principle
  • jdu
  • 2006-04

2
Outline
  • Conceptual/non-technical introduction
  • Probabilities and Codelengths
  • Crude MDL
  • Refined MDL
  • Other topics

3
Outline
  • Conceptual/non-technical introduction
  • Probabilities and Codelengths
  • Crude MDL
  • Refined MDL
  • Other topics

4
Introduction
  • Example data compression
  • Description methods

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
5
Introduction
  • Example regression
  • Model selection and overfitting
  • Complexity of the model vs. Goodness of fit

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
6
Introduction
  • Models vs. Hypotheses

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
7
Introduction
  • Crude 2-part version of MDL

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
8
Outline
  • Conceptual/non-technical introduction
  • Probabilities and Codelengths
  • Crude MDL
  • Refined MDL
  • Other topics

9
Probabilities and Codelengths
  • Let X be a finite or countable set
  • A code C(x) for X
  • 1-to-1 mapping from X to Ungt00,1n
  • LC(x) number of bits needed to encode x using C
  • P probability distribution defined on X
  • P(x) the probability of x
  • A sequence of (usually iid) observations x1, x2,
    , xn xn

10
Probabilities and Codelengths
  • Prefix codes as examples of uniquely decodable
    codes
  • no code word is a prefix of any other

a 0
b 111
c 1011
d 1010
r 110
! 100
Source http//www.cs.princeton.edu/courses/archiv
e/spring04/cos126/
11
Probabilities and Codelengths
  • Expected codelength of a code C
  • Lower bound
  • Optimal code
  • if it has minimum expected codelength over all
    uniquely decodable codes
  • How to design one given P?
  • Huffman coding

12
Probabilities and Codelengths
  • Huffman coding

Source http//star.itc.it/caprile/teaching/algebr
a-superiore-2001/
13
Probabilities and Codelengths
  • How to design code for 1, 2, , M?
  • Assuming a uniform distribution 1/M for each
    number
  • logM bits

14
Probabilities and Codelengths
  • How to design code for all the positive integers?
  • For each k
  • Describe it with 0s
  • Followed by a 1
  • Then encode k using the uniform code for
  • In total, 2logk 1 bits
  • Can be refined

15
Probabilities and Codelengths
  • Let P be a probability distribution over X, then
    there exists a code C for X such that
  • Let C be a uniquely decodable code over X, then
    there exists a probability distribution P such
    that

16
Probabilities and Codelengths
  • Codelength revisited

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
17
Outline
  • Conceptual/non-technical introduction
  • Probabilities and Codelengths
  • Crude MDL
  • Refined MDL
  • Other topics

18
Crude MDL
  • Preliminary k-th order Markov chain on X0,1
  • A sequence X1, X2, , XN
  • Special case 0-th order Bernoulli model (biased
    coin)
  • Maximum Likelihood estimator

19
Crude MDL
  • Preliminary k-th order Markov chain on X0,1
  • Special case first order Markov chain B(1)
  • MLE

20
Crude MDL
  • Preliminary k-th order Markov chain on X0,1
  • 2k parameters
  • theta1000000 n1000000/n000000
  • theta1000001
  • theta1111110
  • theta1111111
  • Log likelihood function
  • MLE

21
Crude MDL
  • Question Given data Dxn, find the Markov chain
    that best explains D.
  • We do not want to restrict ourselves to chains of
    fixed order
  • How to avoid overfitting?
  • Obviously, an (n-1)-th order Markov model would
    always fit the data the best

22
Crude MDL
  • two-part MDL revisited

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
23
Crude MDL
  • Description length of data given hypothesis

24
Crude MDL
  • Description length of hypothesis
  • The code should not change with the sample size
    n.
  • Different codes will lead to preferences of
    different hypotheses
  • How to design a code that
  • Leads to good inferences with small, practically
    relevant sample sizes?

25
Crude MDL
  • An intuitive and reasonable code for k-th
    order Markov chain
  • First describe k using 2logk1 bits
  • Then describe the d2k parameters
  • Assume n is given in advance
  • For each theta in the MLE theta1000000, ,
    theta1111111, the best precision we can
    achieve by counting is 1/(n1)
  • Describe each theta with log(n1) bits
  • L(H)2logk1dlog(n1)
  • L(H)L(DH) 2logk1dlog(n1) logP(Dk,
    theta)
  • For a given k, only the MLE theta need to be
    considered

26
Crude MDL
  • Good news
  • We have found a principled manner to encode data
    D using H
  • Bad news
  • We have not found clear guidelines to design
    codes for H

27
Outline
  • Conceptual/non-technical introduction
  • Probabilities and Codelengths
  • Crude MDL
  • Refined MDL
  • Other issues

28
Refined MDL
  • Universal codes and universal distributions
  • maximum likelihood code depends on the data
  • How to describe the data in an unambiguous
    manner?
  • Design a code such that for every possible
    observation, its codelength corresponds to its
    ML? - impossible

29
Refined MDL
  • Worst-case regret
  • Optimal universal model

30
Refined MDL
  • Normalized maximum likelihood (NML)
  • Minimizing -logNML

31
Refined MDL
  • Complexity of a model
  • The more sequences that can be fit well by an
    element of M, the larger Ms complexity
  • Would it lead to a right balance between
    complexity and fit?
  • Hopefully

32
Refined MDL
  • General refined MDL

Source Grnwald et al. (2005) Advances in Minimum
Description Length Theory and Applications.
33
Outline
  • Conceptual/non-technical introduction
  • Probabilities and Codelengths
  • Crude MDL
  • Refined MDL
  • Other topics

34
Other topics
  • Mixture code
  • Resolvability

35
References
  • Barron, A. Rissanen, J. Yu, B. (1998), 'The
    minimum description length principle in coding
    and modeling', Information Theory, IEEE
    Transactions on 44(6), 2743--2760.
  • Grnwald, P.D. Myung, I.J. Pitt, M.A. (2005),
    Advances in Minimum Description Length Theory
    and Applications (Neural Information Processing),
    The MIT Press.
  • Hall, P. Hannan, E.J. (1988), 'On stochastic
    complexity and nonparametric density estimation',
    Biometrika 75(4), 705-714.
Write a Comment
User Comments (0)
About PowerShow.com