Entropy - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Entropy

Description:

The rth type has frequency f. r types have ... Zipf-Mandelbrot law. Zipf's law: f = P r-1 ... Mandelbrot proposed additional parameters: f = P (r ?)-B ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 17
Provided by: VasileiosH9
Category:

less

Transcript and Presenter's Notes

Title: Entropy


1
Entropy
  • Vasileios Hatzivassiloglou
  • University of Texas at Dallas

2
Relationship between Zipfs and Paretos laws
  • Two equivalent statements
  • The rth type has frequency f
  • r types have frequency f or more
  • Starting from Zipfs law, we obtain
  • nP(X gt f) types have frequency f or more
  • nP(X gt f) f is constant
  • (nc1f-k) f is constant, which holds for k1
  • Zipfs law is a special case of Paretos law

3
Zipf-Mandelbrot law
  • Zipfs law f Pr-1
  • Overestimates for high frequencies,
    underestimates for middle frequencies
  • Mandelbrot proposed additional parameters
  • f P(r ?)-B
  • Extra parameters allow for a closer fit
  • This is known as the parabolic fractal
    distribution

4
Entropy
  • The entropy (or information) of a probability
    distribution P is
  • with b gt 1 (usually b2)
  • is known as the surprisal of outcome
    s, so entropy is the expected value of the
    surprisal

5
Entropy examples
  • A source that generates the same symbol over and
    over
  • we define 0log0 0 since
  • A uniform distribution with n outcomes

6
Interpretation of entropy
  • The information contained in the distribution P
    (the more unpredictable the outcomes, the higher
    the entropy)
  • The message length if the message was generated
    according to P and coded optimally
  • Relationship with thermodynamics

7
Entropy for multiple variables
  • So far we have dealt with a single random
    variable
  • The joint entropy of a pair of two RVs is

8
Conditional Entropy
  • How much information do we gain (out of H(X,Y))
    if we already have knowledge of one variable?
  • The conditional entropy is

9
Chain rule for entropy
  • H(X,Y) H(X) H(YX)
  • H(X1,X2,...,Xn) H(X1) H(X2X1)
    H(XnXn-1,Xn-2,...,X1)
  • Follows from the chain rule for conditional
    probabilities the product becomes a sum because
    of the logarithm

10
Entropy of a process
  • The entropy rate of a stochastic process for
    messages of length n is
  • The entropy rate of the process Xn is

11
The Shannon-McMillan-Breiman Theorem
  • Is it ever possible to calculate Hrate?
  • If the process Xn is both stationary and
    ergodic, the theorem states that for any one
    sample X1,X2,...,Xn

12
Estimating P(X1,X2,...,Xn)
  • We use the chain rule of conditional
    probabilities to decompose this to a product of
    conditional probabilities P(Xi Xi-1,
    ..., X1)
  • Then, we can directly estimate the conditional
    probability using human subjects (Shannons
    original approach)
  • Or we can approximate with Markov chains

13
Markov chains
  • A subtype of random walks where the entire memory
    of the system is contained in the current state
  • Described by a transition matrix P, where pij is
    the probability of going from state i to state j
  • Very useful for describing stochastic discrete
    systems

14
Markov chain states
  • Can correspond directly to the individual
    decisions/states for the problem at hand
  • Can combine multiple successive problem states in
    one Markov Chain state
  • This allows for the MC to remember more than one
    (but a finite number of) problem states
  • Number of parameters increases exponentially with
    the order (number of combined states) of the chain

15
The Markov approximation
  • For a (stationary) Markov chain of order k,
  • Also known as n-gram models (with nk1)
  • The 0-th order Markov chain is
  • The prior probability of each state P(X)

16
Reading
  • Sections 2.2.1-2.2.2 on entropy and conditional
    and joint entropy
Write a Comment
User Comments (0)
About PowerShow.com