Title: Information and entropy
1Information and entropy
- Entropy H formal measure of disorder in
information - Related to information content
- Discrete information source
2Decomposition
rearrange
example 4 sided die vs 2 coin tosses
Pr(s1) p1 p5 ? p7 etc.
3Relating probability and entropy
- Coin toss
- 50/50 outcome
- Complete uncertainty
- Minimum information
- Maximum entropy
- Coin with two heads
- 100/0 outcome
- Complete certainty
- Maximum information
- Minimum entropy
4Relating probability and entropy
- 6 sided fair die
- average number of questions 2.666
- (2 ? 2 ? 1/6) (3 ? 4 ? 1/6)
- NB this is NOT the entropy
- 6 sided unfair die
- pr(6) 0.5, pr(5)0.1, , pr(1) 0.1
- average number of questions 2.2
- (1 ? 0.5) (3 ? 3 ? 0.1) (4 ? 2 ? 0.1)
- less uncertainty, more information
- lower entropy
5Relating probability and entropy
- entropy is some function of the probability
distribution over outcomes. - H(A) H(p1, pm)
- Entropy axioms
- Entropy is a continuous function of its
probabilities - Maximum entropy increases with the number of
outcomes - Total entropy is unchanged if a random process is
rearranged as a combination of two or more
processes
6Unit of entropy bit
- S the alphabet a,b,z
- Pick one letter at random, ? ? S
- Remainder, S' S ?
- How many yes/no questions do we need to ask to
determine which letter is ?? - Maximum 25
- Minimum ?
rearrange
a-b
a
a-c
a-f
b
c
a-m
yes/no question binary choice bit
p1
A
Max 5 questions, Min 4 questions
n-z
p2
7Maximum entropy
- Maximum entropy occurs when all outcomes are
equally likely
8Maximum entropy
- Adopt bit as our unit of entropy
- From previous examples
- How many yes/no questions are needed to cover all
outcomes? - (assumed to be equally probable)
Note on converting logs in different bases
9Information and entropy
- Shannon and Weaver (1948)
- Information gained by a single event
- Entropy in that event
- Entropy of source (ensemble of symbols) average
entropy - 1 symbols
- 2 symbols
- n symbols
NB if pi 0, then pi log2 (pi) 0
10Properties of entropy
- For random process A with set of outcomes S
- where S number of outcomes
- when do we get equality ?
- pi 1 for some i
- pi pj for all i, j
- joint entropy
- where pij is (joint) probability of outcomes i, j
- for independent A, B H(A, B) H(A) H(B)
11Decomposability
- case 1 - random variable with distribution
- Pr(0)0.5 , Pr(1)0.25 , Pr(2)0.25
- entropy 1.5
- case 2 - toss coin once or twice
- heads ? answer is 0, tails ? toss again, heads ?
answer is 1, tails? answer is 2 - entropy entropy of first toss half entropy of
second (half because it only happens 50 of the
time - H H(0.5, 0.5) 0.5 H(0.5, 0.5)
- 1.5
12Source decomposition
example 4 sided die vs 2 coin tosses
In general
- In the case above
- H(A) H(p1, , p4) H(p5p7, p5p8, p6p9,
p6p10) H(B) p5H(C) p6H(D) H(B)
p5H(p7,p8) p6H(p9,p10)
13Conditional Entropy
- conditional entropy of A given Bbk is the
entropy of the probability distribution
Pr(ABbk) - the conditional entropy of A given B is the
average of this quantity over all bk
the average uncertainty about A when B is known
14Dice example
two dice, with different coloured faces numbered
1-6 C colour, N number, P parity
15Conditional Entropy Example
where
16Example - viewed from A
H(A , B ) H(A ) P(0 sent).H(B 0 sent)
P(1 sent). H(B 1 sent) H(A ) H(B A
)
17Example - viewed from B
H(B, A ) H(B ) H(A B )
18Mutual information
H(A,B)
H(B , A ) H(A , B)
H(A)
H(B)
H(B) H(AB) H(A) H(BA)
H(AB)
H(BA)
I(AB)
H(A) H(AB) H(B) H(BA)
Rearrange
I(A B)
I(B A)
I(A B) information about A contained in B
19A Brief History of Entropy
- 1865 Clausius
- thermodynamic entropy
- ?S ?Q/T
- change in entropy of a thermodynamic system,
during a reversible process in which an amount of
heat ?Q is applied at constant absolute
temperature T - 1877 Boltzmann
- S k ln N
- S , the entropy of a system is related to the
number of possible microscopic states (N)
consistent with macroscopic observations - e.g. ideal gas or 10 coins in a box - (10 heads
vs 5 heads, 5 tails) - 1940s Turing
- weight of evidence - see Alan Turing the
Enigma - 1948 Shannon
- information entropy
related
20A problem
- 32768 computer users
- each is given a different random 5 character ID
where each character appears with same
probability as in English text - e.g. 2048 begin with a, of which 128 start aa
and 2 start az - 32 begin with z
- how much information is conveyed by an ID
- how much information is conveyed by knowing the
first character of an ID is - (i) a
- (ii) z
- what is the average information content of the
remaining 4 characters
21A Problem
12 identical balls except that one is heavier or
lighter than the rest
balance
- find the odd ball, and whether it is heavier or
lighter, minimising use of the balance - how much information will you gain in finding the
answer - how much information do you gain by comparing
- 6 balls to the other 6 (ii) 4 balls to
another 4 - what is the best strategy to find the odd ball ?
22Monty Hall Paradox
prize is in
not swapping gets the prize 6/18 times swapping
gets the prize 6/9 times