Information Theory - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Information Theory

Description:

What if we were flipping two coins now - we only care if we got both heads, both ... In the two-coin-flip example, we had a 1.5 bit uncertainty about the outcome. ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 26

Provided by: charle106

Category:

more less

Transcript and Presenter's Notes

Title: Information Theory

1
Information Theory

Information theory is a branch of mathematics
founded by Claude Shannon in the 1940s. It
formalizes the concepts of entropy (or
uncertainty) and information so that we can talk
about them from a rigorously mathematical
standpoint. It helps us answer questions such
as
Is it possible to communicate reliably from one
point to another if we only have a noisy
communication channel?
How can the information content of a seemingly
random variable be measured?
What is the maximum we can compress a file?

2
Flipping Coins
After we flip a coin, how many bits (on average)
do we need to store in order to record its
outcome? Obviously, only one. What if we were
flipping two coins now - we only care if we got
both heads, both tails, or a head and a tail.
What is the probability of each of these
outcomes? both heads - 25 both
tails - 25 mixed - 50
3
Outcomes for two coins
Normally, we would need to store 2 bits to store
the outcome of two coin flips, but if we dont
care about the ordering, shouldnt we need to
store less? Well, half the time we have a mixed
state. When that happens we can just record a
0. When we dont have a mixed state, well
record a 1 and then well have to look at the
next bit to know if they were both heads or both
tails mixed 0 both heads 10 both
tails 11
4
How many bits do we use?
We can calculate the average number of bits we
use by performing a weighted average Both
heads 0.25 ? 2 bits 0.5 bits Both tails 0.25
? 2 bits 0.5 bits Mixes 0.5 ? 1 bit 0.5
bits 1.5 bits Half the time we use 1
bit, half the time we use 2, on the average we
use 1.5 bits.
5
Entropy
The entropy of a system is our uncertainty about
the state of that system. It is the expected
number of bits required to fully describe the
state of the system. In the two-coin-flip
example, we had a 1.5 bit uncertainty about the
outcome. If there were a horse race with 8
horses in it, and we knew nothing about the
individual horses, we would have a 3 bit
uncertainty about the outcome of the race.
6
Information
Information is, quite simply, the amount our
uncertainty is reduced given new knowledge. In
the horse race, if I found out that four of the
horses had a zero chance of winning, this would
knock my uncertainty down to two bits. Hence,
this would be one bit of information. If I were
told that the two coins I flipped were the same,
how much information would this be?
7
Information must be about something
Information must be accessible and about the
situation we are measuring the entropy of for it
to be non-zero. We could be told that There are
seven trillion gallons of water on Europa, and
this would be information for some scenarios, but
not when were trying to predict the outcome of a
horse race. Likewise, a history textbook might
help us decrease our uncertainty about the
seventeenth king of England, but would do very
little if it werent in a language we knew.
8
Information is Physical
All information must have some kind of physical
instantiation, be it ink on paper, the state of
memory in a computer, or even the state of the
neurons in your brain. Information is a
correlation between two physical objects. For
example, the streets in a city might correlate to
the lines on a map of that city, so the map can
give us information about the city.
9
Information is Mutual
Since information is a correlation between two
physical things, it is also bi-directional. If A
gives us 3 bits of information about B, B would
also give us 3 bits of information about A. From
our previous example, if a map gives us
information about a city, that city itself would
give us the same amount of information about what
the map would look like.
10
Calculating Entropy
If all n possible outcomes of situation X are
equally probable, then our uncertainty about
which one will occur can be calculated
by H(X) log2(n) bits This is clear for
the 8 horses, log2(8) 3. What do we do if all
the probabilities are not equal?
11
More complex equation...
If not all of the probabilities are identical, we
can still calculate entropy using the
formula This incorporates a component for
each outcome. H(X) represents the expected
number of bits needed in order to record the
outcome given an ideal encoding. Unfortunately,
an ideal encoding is not always possible.
12
Huffman Encoding
Huffman encoding provides the optimal encoding to
minimize the expected number of bits in the
codeword used. It is one of the more complex
greedy algorithms. Sort all of the possible
outcomes by their probability of occurrence take
the two least likely ones, and add a new bit to
the beginning of their codewords, set to 0 and 1
to differentiate them. Then group them as a
single outcome with the sum of their individual
probabilities and repeat.
13
Example
In Dr. L. Battles Greek Philosophy class, he
always gives out grades in the following
distribution A 10 B 15 C 20 D 25 F
30 Given the best possible encoding, what is
the expected number of bits it would take to
record a single students grade?
14
Genetic Example
Assume that we are trying to encode a strand of
DNA that has the nucleotides represented with the
following frequencies PA1/2 PC1/4 PG1/8
and PT1/8. For example, part of a sequence
might be ACATGAAC Using Huffman encoding, we
set A1, C01, G001, and T000. We can use this
to encode the entire sequence as
10110000011101 Requiring only 14 bits, not the
16 that wed expect. Sure enough, the entropy
for this sequence is 1.75 14/16.
15
Poor Optimal Codes...
What if we were recording the outcome of a
one-on-one basketball game between Bill Clinton
and Michael Jordan? Being generous to Clinton,
we might expect the probabilities of winning to
look something like Jordan 99.999 Clinton
0.001 The entropy is .000184 bits (i.e., we
are rather certain of the outcome), yet we still
need to have one whole bit in our code! Thats
much worse than the entropy.
16
Combining Outcomes
What if Clinton and Jordan played multiple games
together? We might want to record all of the
outcomes! For two games, we would have the
probabilities Clinton/Clinton
0.00000001 Clinton/Jordan
0.00099999 Jordan/Clinton 0.00099999 Jordan/
Jordan 99.99800001 Now, we expect to use only
about 1/2 bit per outcome. Combining enough
outcomes together, we can get arbitrarily close
to the entropy!
17
Questions...

Can gaining information ever cause our
uncertainty about a situation to increase?
When we have an optimal code, what can we say
about our uncertainty about what bit will come
next?

18
Information Theory Equations
H(X) - ? px log(px) H(XYy) - ? pxy
log (pxy) H(XY) ? py H(XYy) - ? py
? pxy log (pxy) - ? ? px,y
log (pxy) I(XY) H(X) - H(XY) H(Y) -
(YX) I(YX)
x ? X
x ? X
y ? Y
x ? X
y ? Y
x ? X y ? Y
19
Transmission Channels
0
0
A FED BAQL
Channel
A RED BALL
Encode
Decode
1
1
We tend to be most interested in Binary Symmetric
Channels, where bits are transmitted individually
and there is a fixed probability p for a bit to
be flipped.
20
Properties of Channels
Each channel has a transmission rate that
determines how many symbols it can send per unit
time. Channels also have error rates, which
determine, for any particular symbol, the
probability that a different symbol will come out
of the channel. The error rate of the channel
determines its capacity - the bits of information
that are transmitted per symbol sent. The
transmission rate and the capacity of a channel
can be multiplied together to get its data rate -
the rate at which information can be sent across
the channel.
21
Dealing with Errors
Assuming we know that there are going to be some
errors, how can we be sure to get our information
across? If were really unlucky, we cant. But
we can make sure to be able to tolerate any
reasonable amount of error. Whats one way for
us to be able to be sure we can detect any single
error in our message? How can we make sure we
can correct any error in the message? Can we do
better than this?
22
How good is Error Correction?
We can do better. We can get as close to the
channel capacity as we want, though we may need
long messages. The channel capacity is defined
as the information that passes through the
channel. If we are correct in our definition of
information, it should give us a perfect measure
of how many bits we can send through the
channel. Intuitively channel capacity makes
sense. We start with maximal uncertainty about
the symbol that entered the channel. That
uncertainty is lowered when we see a symbol come
out.
23
Transmission Channels
0
0
A FED BAQL
Channel
A RED BALL
Encode
Decode
1
1
Lets assume we have a Binary Symmetric Channel
with an error rate of 0.01. How much information
can get through it?
24
Calculating Capacity
Inputs are 0 or 1 Initial uncertainty H(Input)
1 Uncertainty knowing output
H(InputOutput) - (0.99 log 0.99 0.01 log
0.01) 0.0144
0.0664 0.0808
bits I(Input Output) H(Input) - H(Input
Output) 1 bit -
0.0808 bits 0.9192 bits of capacity
25
How do we get channel capacity?
Lets assume that we use four bits at a time
instead of one. How can we correct any single
bit of error now? Yes! We just need to add on
three extra bits. A B C D E F G 0 0 0 0 0 0
0 0 1 0 0 0 1 1 0 1 0 1 1 0 1 1 1 0 0 1 0
0 E A?C?D F A?B?D G B?C?D Anything one
bit away codes for the same input.

Write a Comment

User Comments (0)