EEL 5930 sec. 5 / 4930 sec. 7, Spring

About This Presentation

Title:

EEL 5930 sec. 5 / 4930 sec. 7, Spring

Description:

http://www.eng.fsu.edu/~mpf EEL 5930 sec. 5 / 4930 sec. 7, Spring 05 Physical Limits of Computing Slides for a course taught by Michael P. Frank – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 55

Provided by: Mich1162

Learn more at: https://eng.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: EEL 5930 sec. 5 / 4930 sec. 7, Spring

1
EEL 5930 sec. 5 / 4930 sec. 7, Spring
05Physical Limits of Computing
http//www.eng.fsu.edu/mpf

Slides for a course taught byMichael P. Frankin
the Department of Electrical Computer
Engineering

2
Module 2 Review of Basic Theory of Information
Computation

Probability
Information Theory
Computation Theory

3
Outline of this Module

Topics covered in this module
Probability and statistics Some basic concepts
Some basic elements of information theory
Various usages of the word information
Measuring information
Entropy and physical information
Some basic elements of the theory of computation
Universality
Computational complexity
Models of computation

4
Review of Basic Probability and Statistics
Background

Events, Probabilities, Product Rule, Conditional
Mutual Probabilities, Expectation, Variance,
Standard Deviation

5
Probability

In statistics, an event E is any possible
situation (occurrence, state of affairs) that
might or might not be the actual situation.
The proposition P the event E occurred (or
will occur) could turn out to be either true or
false.
The probability of an event E is a real number p
in the range 0,1 which gives our degree of
belief in the proposition P, i.e., the
proposition that E will/did occur, where
The value p 0 means that P is false with
complete certainty, and
The value p 1 means that P is true with
complete certainty,
The value p ½ means that the truth value of P
is completely unknown
That is, as far as we know, it is equally likely
to be either true or value.
The probability p(E) is also the fraction of
times that we would expect the event E to occur
in a repeated experiment.
That is, on average, if the experiment could be
repeated infinitely often, and if each repetition
was independent of the others.
If the probability of E is p, then we would
expect E to occur once for every 1/p independent
repetitions of the experiment, on average.
Well call 1/p the improbability i of E.

6
Joint Probability

Let X and Y be events, and let XY denote the
event that events X and Y both occur together
(that is, jointly).
Then p(XY) is called the joint probability of X
and Y.
Product rule If X and Y are independent events,
then p(XY) p(X) p(Y).
This follows from basic combinatorics.
It can also be considered a definition of what it
means for X and Y to be independent.

7
Event Complements, Mutual Exclusivity,
Exhaustiveness

For any event E, its complement E is the event
that event E does not occur.
Complement rule p(E) p(E) 1.
Two events E and F are called mutually exclusive
if it is impossible for E and F to occur
together.
That is, p(EF) 0.
Note that E and E are always mutually exclusive.
A set S E1, E2, of events is exhaustive if
the event that some event in S occurs has
probability 1.
Note that S E, E is an exhaustive set of
events.
Theorem The sum of the probabilities of any
exhaustive set S of mutually exclusive events is
1.

8
Conditional Probability

Let XY be the event that X and Y occur jointly.
Then the conditional probability of X given Y is
defined by p(XY) p(XY) / p(Y).
It is the probability that if we are given that Y
occurs, that X would also occur.
Bayes rule p(XY) p(X) p(YX) / p(Y).

r(XY)
Space of possible outcomes
Event Y
Event XY
Event X
9
Mutual Probability Ratio

The mutual probability ratio of X and Y is
defined as r(XY) p(XY)/p(X)p(Y).
Note that r(XY) p(XY)/p(X) p(YX)/p(Y).
I.e., r is the factor by which the probability of
either X or Y gets boosted upon learning that the
other event occurs.
WARNING Some authors define the term mutual
probability to be the reciprocal of our quantity
r.
Dont get confused! I call that mutual
improbability ratio.
Note that for independent events, r 1.
Whereas for dependent, positively correlated
events, r gt 1.
And for dependent, anti-correlated events, r lt 1.

10
Expectation Values

Let S be an exhaustive set of mutually exclusive
events Ei.
This is sometimes known as a sample space.
Let f(Ei) be any function of the events in S.
This is sometimes called a random variable.
The expectation value or expected value or norm
of f, written Exf or ?f?, is just the mean or
average value of f(Ei), as weighted by the
probabilities of the events Ei.
WARNING The expected value may actually be
quite unexpected, or even impossible to occur!
Its not the ordinary English meaning of the word
expected.
Expected values combine linearly
ExafgaExf Exg.

11
Variance Standard Deviation

The variance of a random variable f is s2(f)
Ex(f - Exf)2
The expected value of the squared deviation of f
from the norm. (The squaring makes it positive.)
The standard deviation or root-mean-square (RMS)
difference of f from its mean is s(f)
s2(f)1/2.
This is usually comparable, in absolute
magnitude, to a typical value of f - Exf.

12
The Theory of InformationSome Basic Concepts

Basic Information Concepts
Quantifying Information
Information and Entropy

13
Etymology of Information

Earliest historical usage in English (from Oxford
English Dictionary)
The act of informing,
As in education, instruction, training.
Five books come down from Heaven for information
of mankind. (1387)
Or a particular item of training, i.e., a
particular instruction.
Melibee had heard the great skills and reasons
of Dame Prudence, and her wise informations and
techniques. (1386)
Derived by adding the action noun ending ation
(descended from Latins tio) to the
pre-existing verb to inform,
Meaning to give form (shape) to the mind
to discipline, instruct, teach
Men so wise should go and inform their kings.
(1330)
And inform comes from Latin informare, derived
from noun forma (form),
Informare means to give form to, or to form an
idea of.
Latin also even already contained the derived
word informatio,
meaning concept or idea.
Note The Greek words e?d?? (eídos) and µ??f?
(morphé),
Meaning form, or shape,
were famously used by Plato ( later Aristotle)
in a technical philosophical sense, to denote the
true identity or ideal essence of something.
Well see that our modern concept of physical
information is not too dissimilar!

14
Information Our Definition

Information is that which distinguishes one thing
(entity) from another.
It is all or part of an identification or
description of the thing.
A specification of some or all of its properties
or characteristics.
We can say that every thing carries or embodies a
complete description of itself.
Simply in virtue of its own being this is called
the entitys form or constitutive essence.
But, let us also take care to distinguish between
the following
A nugget of information (for lack of a better
phrase)
A specific instantiation (i.e., as found in a
specific entity) of some general form.
A cloud or stream of information
A physical state or set of states, dynamically
changing over time.
A form or pattern of information
An abstract pattern of information, as opposed to
a specific instantiation.
Many separate nuggets of information contained in
separate objects may have identical patterns, or
content.
We may say that those nuggets are copies of each
other.
An amount or quantity of information
A quantification of how large a given nugget,
cloud, or pattern of information is.
Measured in logarithmic units, applied to the
number of possible patterns.

15
Information-related concepts

It will also be convenient to discuss the
following
An embodiment of information
The physical system that contains some particular
nugget or cloud of information.
A symbol or message
A nugget of information or its embodiment
produced with the intent that it should convey
some specific meaning, or semantic content.
A message is typically a compound object
containing a number of symbols.
An interpretation of information
A particular semantic interpretation of a form
(pattern of information), tying it to potentially
useful facts of interest.
May or may not be the intended meaning!
A representation of information
An encoding of one pattern of information within
some other (frequently larger) pattern.
According to some particular language or code.
A subject of information
An entity that is identified or described by a
given pattern of information.
May be abstract or concrete, mathematical or
physical

16
Information Concept Map
Meaning (interpretationof information)
Describes, identifies
Interpretedto get
Representedby
Quantity ofinformation
Thing (subjector embodiment)
Form (pattern ofinformation)
Measures size of
May be a
Measures
Instantiatedby/in
Maybe a
Instantiates,has
Measures
Forms, composes
Contains, carries, embodies
Cloud (dynamicbody of information)
Physicalentity
Nugget (instanceof a form)
Has a changing
17
Quantifying Information

One way to quantify forms is to try to count how
many distinct ones there are.
The number of all conceivable forms is not
finite.
However
Consider a situation defined in such a way that a
given nugget (in the context of that situation)
can only take on some definite number N of
possible distinct forms.
One way to try to characterize the size of the
nugget is then to specify the value of N.
This describes the amount of variability of its
form.
However, N by itself does not seem to have the
right mathematical properties to be used to
describe the informational size of the nugget

18
Compound Nuggets

Consider a nugget of information C formed by
taking two separate and independent nuggets of
information A, B, and considering them together
as constituting a single compound nugget of
information.
Suppose now also that A has N possible forms, and
that B has M possible forms.
Clearly then, due to the product rule of
combinatorics, C has NM possible distinct
forms.
Each is obtained by assigning a form to A and a
form to B independently.
Would the size of the nugget C then be the
product of the sizes of A and B?
It would seem more natural to say sum,so that
the whole is the sum of the parts.

Nugget C Has NM forms
Nugget A
Nugget B
N possibleforms
M possibleforms
19
Information Logarithmic Units

We can convert the product to a sum by using
logarithmic units.
Let us then define the informational size I of
(or amount of information contained in) a nugget
of information that has N possible forms as being
the indefinite logarithm of N, that is, as I
log N.
With an unspecified base for the logarithm.
We can interpret indefinite-logarithm values as
being inherently dimensional (not dimensionless
pure-number) quantities.
Any numeric result is always (explicitly or
implicitly) paired with a unit log b which is
associated with the base b of the logarithm that
is used.
The unit log 2 is called the bit, the unit log
10 is the decade or bel, log 16 is sometimes
called a nybble, and log 256 is the byte.
Whereas, the unit log e (most widely used in
physics) is called the nat.
The nat is also expressed as Boltzmanns constant
kB (e.g. in Joules/K)
A.k.a. the ideal gas constant R (frequently
expressed in kcal/mol/K)

Log Unit
Log Unit
Number
log a (logb a) log b (logc a)
log c
20
The Size of a Form

Suppose that in some situation, a given nugget
has N possible forms.
Then the size of the nugget is I log N.
Can we also say that this is the size of each of
the nuggets possible forms?
In a way, but we have to be a little bit careful.
We distinguish between two concepts
The actual size I log N of each form.
That is, given how the situation is described.
The entropy or compressed size S of each form.
Which we are about to define.

21
The Entropy of a Form

How can we measure the compressed size of an
abstract form?
For this, we need a language that we can use to
represent forms using concrete nuggets of
linguistic information whose size we can measure.
We then say that the compressed size or entropy S
of a form is the size of the smallest nugget of
information representing it in our language.
(Its most compressed description.)
At first, this seems pretty ambiguous, but
In their algorithmic information theory,
Kolmogorov and Chaitin showed that this quantity
is even almost language-independent.
It is invariant to a language-dependent additive
constant.
That is, among computationally universal
(Turing-complete) languages.
Also, whenever we have a probability distribution
over forms, Shannon shows us how to choose an
encoding that minimizes the expected size of the
codeword nugget that is needed.
If a probability distribution is available, we
assume a language chosen to minimize the expected
size of the nugget representing the form.
We define the compressed size or entropy of the
form to be the size of its description in this
optimal language.

22
The Optimal Encoding

Suppose a specific form F has probability p.
Thus, improbability i 1/p.
Note that this is the same probability that F
would have if it were one of i equally-likely
forms.
We saw earlier that a nugget of information
having i possible forms is characterized as
containing a quantity of information I log i.
And the actual size of each form in that
situation is the same, I.
If all forms are equally likely, their average
compressed size cant be any less.
So, it seems reasonable to declare that the
compressed size S of a form F with probability p
is the same as its actual size in this situation,
that is, S(F) log i log 1/p -log p.
This suggests that in the optimal encoding
language, the description of the form F would be
represented in a nugget of that size.
In his Mathematical Theory of Communication
(1949) Claude Shannon showed that in fact this is
exactly correct,
So long as we permit ourselves to consider
encodings in which many similar systems (whose
forms are chosen from the same distribution) are
described together.
Modern block-coding schemes in fact closely
approach Shannons ideal encoding efficiency.

23
Optimal Encoding Example

Suppose a system has four forms A, B, C, D with
the following probabilities
p(A)½, p(B)¼, p(C)p(D)1/8.
Note that the probabilities sum to 1, as they
must.
Then the corresponding improbabilities are
i(A)2, i(B)4, i(C)i(D)8.
And the form sizes (log-improbabilities) are
S(A) log 2 1 bit, S(B) log 4 2 log 2
2 bits, S(C) S(D) log 8 3 log 2 3 bits.
Indeed, in this example, we can encode the forms
using bit-strings of exactly these lengths, as
follows
A0, B10, C110, D111.
Note that this code is self-delimiting
the codewords can be concatenated together
without ambiguity.

0
1
A
1
0
B
1
0
C
D
24
Entropy Content of a Nugget

Naturally, if we have a probability distribution
over the possible forms F of a nugget,
We can easily calculate the expected entropy ?S?
(expected compressed size) of the nuggets form.
This is possible since S itself is a random
variable,
a function of the event that the system has a
specific form F.
The expected entropy ?S? of the nuggets form is
then
We usually drop the expected, and just call
this the amount of entropy S contained in the
nugget.
It is really the expected compressed size of the
nugget.

Notethe -!
25
Visualizing Boltzmann-Gibbs-Shannon Statistical
Entropy
26
Known vs. Unknown Information

We can consider the informational size I log N
of a nugget that has N forms as telling us the
total amount of information that the nugget
contains.
Meanwhile, we can consider its entropy S ?log
i(f)? as telling us how much of the total
information that it contains is unknown to us.
In the perspective specified by the distribution
p().
Since S I, we can also define the amount of
known information (or extropy) in the nugget as X
I - S.
Note that our probability distribution p() over
the nuggets form could change (if we gain or
lose knowledge about it),
Thus, the nuggets entropy S and extropy X may
also change.
However, note that the total informational size
of a given nugget, I log N X S, always
still remains a constant.
Entropy and extropy can be viewed as two forms of
information, which can be converted to each
other, but whose total amount is conserved.

27
Information/Entropy Example

Consider a tetrahedral die which maylie on any
of its 4 faces labeled 1,2,3,4
We say that the answer to the question Which
side is up? is a nugget of information having 4
possible forms.
Thus, the total amount of information contained
in this nugget, and in the orientation of the
physical die itself, is log 4 2 bits.
Now, suppose the die is weighted so that p(1)½,
p(2)¼, and p(3)p(4)1/8 for its post-throw
state.
Then S(1)1b, S(2)2b, and S(3)S(4)3b.
The expected entropy is then S 1.75 bits.
This much information remains unknown before the
die is thrown.
The extropy (known information) is then X 0.25
bits.
Exactly one-fourth of a bits worth of knowledge
about the outcome is already expressed by this
specific probability distribution p().

28
NuggetVariable, FormValue, and Types of Events.

A nugget basically means a variable V.
Also associated with a set of possible values
v1,v2,.
Meanwhile, a form is basically a value v.
A primitive event is a proposition that assigns a
specific form v to a specific nugget, Vv.
I.e., a specific value to a specific variable.
A compound event is a conjunctive proposition
that assigns forms to multiple nuggets,
E.g., Vv, Uu, Ww.
A general event is a disjunctive set of primitive
and/or compound events.
Essentially equivalent to a Boolean combination
of assignment propositions.

29
Entropy of a Binary Variable
Below, little s of an individual form or
probability denotesthe contribution to the total
entropy of a form with that probability.
Maximum s(p) (1/e) nat (lg e)/e bits .531
bits _at_ p 1/e .368
30
Joint Distributions over Two Nuggets

Let X, Y be two nuggets, each with many forms
x1, x2, and y1, y2, .
Let xy represent the compound event Xx,Yy.
Note all xys are mutually exclusive and
exhaustive.
Suppose we have available a joint probability
distribution p(xy) over the nuggets X and Y.
This then implies the reduced or marginal
distributions p(x)?y p(xy) and p(y)?x p(xy).
We also thus have conditional probabilities
p(xy) and p(yx), according to the usual
definitions.
And we have mutual probability ratios r(xy).

31
Joint, Marginal, Conditional Entropyand Mutual
Information

The joint entropy S(XY) ?log i(xy)?.
The (prior, marginal or reduced) entropy S(X)
S(p(x)) ?log i(x)?. Likewise for S(Y).
The entropy of each nugget, taken by itself.
Entropy is subadditive S(XY) S(X) S(Y).
The conditional entropy S(XY) ExyS(p(xy))
The expected entropy after Y is observed.
Theorem S(XY) S(XY) - S(Y). Joint entropy
minus that of Y.
The mutual information I(XY) Exlog r(xy).
We will prove Theorem I(XY) S(X) - S(XY).
Thus the mutual information is the expected
reduction of entropy in either variable as a
result of observing the other.

32
Conditional Entropy Theorem
The conditional entropy of X given Y is the joint
entropy of XY minus the entropy of Y.
33
Mutual Information is Mutual Reduction in Entropy
And likewise, we also have I(XY) S(Y) -
S(YX), since the definition is symmetric.
I(XY) S(X) S(Y) - S(XY)
Also,
34
Visualization of Mutual Information

Let the total length of the bar below represent
the total amount of entropy in the system XY.

S(YX) conditional entropy of Y given X
S(X) entropy of X
S(XY) joint entropy of X and Y
S(XY) conditional entropy of X given Y
S(Y) entropy of Y
35
Example 1

Suppose the sample space of primitive events
consists of 5-bit strings Bb1b2b3b4b5.
Chosen at random with equal probability (1/32).
Let variable Xb1b2b3b4, and Yb3b4b5.
Then S(X) ___ bits, and S(Y) ___ b.
Meanwhile S(XY) ___ b.
Thus S(XY) ___ b, and S(YX) ___ b
And so I(XY) ___ b.

4
3
5
2
1
2
36
Example 2

Let the sample space A consist of the 8 letters
a,b,c,d,e,f,g,h. (All equally likely.)
Let X partition A into x1a,b,c,d and
x2e,f,g,h.
Y partitions A into y1a,b,e, y2c,f,
y3d,g,h.
Then we have
S(X) 1 bit.
S(Y) 2(3/8 log 8/3) (1/4 log 4) 1.561278
bits
S(YX) (1/2 log 2) 2(1/4 log 4) 1.5 bits.
I(XY) 1.561278b - 1.5b .061278 b.
S(XY) 1b 1.5b 2.5 b.
S(XY) 1b - .061278b .938722 b.

Y
a
b
c
d
X
e
f
g
h
(Meanwhile, the total information content of the
sample space log 8 3 bits)
37
Physical Information

Now, physical information is simply information
that is contained in the state of a physical
system or subsystem.
We may speak of a holder, pattern, amount,
subject, embodiment, meaning, cloud or
representation of physical information, as with
information in general.
Note that all information that we can manipulate
ultimately must be (or be represented by)
physical information!
So long as we are stuck in the physical universe!
In our quantum-mechanical universe, there are two
very different categories of physical
information
Quantum information is all the information that
is embodied in the quantum state of a physical
system.
Unfortunately, it cant all be measured or
copied!
Classical information is just a piece of
information that picks out a particular measured
state, once a basis for measurement is already
given.
Its the kind of information that were used to
thinking about.

38
Objective Entropy?

In all of this, we have defined entropy as a
somewhat subjective or relative quantity
Entropy of a subsystem depends on an observers
state of knowledge about that subsystem, such as
a probability distribution.
Wait a minute Doesnt physics have a more
objective, observer-independent definition of
entropy?
Only insofar as there are preferred states of
knowledge that are most readily achieved in the
lab.
E.g., knowing of a gas only its chemical
composition, temperature, pressure, volume, and
number of molecules.
Since such knowledge is practically difficult to
improve upon using present-day macroscale tools,
it serves as a uniform standard.
However, in nanoscale systems, a significant
fraction of the physical information that is
present in one subsystem is subject to being
known, or not, by another subsystem (depending on
design).
? How a nanosystem is designed how we deal with
information recorded at the nanoscale may vastly
affect how much of the nanosystems internal
physical information effectively is or is not
entropy (for practical purposes).

39
Entropy in Compound Systems

When modeling a compound system C having at least
two subsystems A and B, we can adopt either of
(at least) two different perspectives
The external perspective where we treat AB as a
single system, and we (as modelers) have some
probability distribution over its states.
This allows us to derive an entropy for the whole
system.
The internal perspective in which we imagine
putting ourselves in the shoes of one of the
subsystems (say A), and considering its state of
knowledge about B.
A may have more knowledge about B than we do.
Well see how to make the total expected entropy
come out the same in both perspectives!

40
Beyond Statistical Entropy
41
Entropy as Information

A bit of history
Most of the credit for originating this concept
really should go to Ludwig Boltzmann.
He (not Shannon) first characterized the entropy
of a system as the expected log-improbability of
its state, H -?(pi log pi).
He also discussed combinatorial reasons for its
increase in his famous H-theorem
Shannon brought Boltzmanns entropy to the
attention of communication engineers
And he taught us how to interpret Boltzmanns
entropy as unknown information, in a
communication-theory context.
von Neumann generalized Boltzmann entropy to
quantum mixed states
That is, the S -Tr ? ln ? expression that we
all know and love
Jaynes clarified how the von Neumann entropy of a
system can increase over time
Either when the Hamiltonian itself is unknown, or
when we trace out entangled subsystems
Zurek suggested adding algorithmically
incompressible information to the part of
physical information that we consider to be
entropy
I will discuss a variation on this theme.

42
Why go beyond the statistical definition of
entropy?

We may argue the statistical concept of entropy
is incomplete,
because it doesnt even begin to break down the
ontology-epistemology barrier
In the statistical view, a knower (such as
ourselves) must always be invoked to supply a
state of knowledge (probability distribution)
But we typically treat the knower as being
fundamentally separate from the physical system
itself.
However, in reality, we ourselves are part of the
physical system that is our universe
Thus, a complete understanding of entropy must
also address what knowledge means, physically

43
Small Physical Knowers

Of course, humans are extremely large complex
physical systems, and to physically characterize
our states of knowledge is a very long way off
However, we can hope to characterize the
knowledge of simpler systems.
Computer engineers find that in practice, it can
be very meaningful and useful to ascribe
epistemological states even to extremely simple
systems.
E.g., digital systems and their component
subsystems.
When analyzing complex digital systems,
we constantly say things like, At such-and-such
time, component A knows such-and-such information
about the state of component B
Means, essentially, that there is a specific
correlation between the states of A and B.
For nano-scale digital devices, we can strive to
exactly characterize their logical states in
mathematical physics terms
Thus we ought to be able to say exactly what it
means, physically, for one component to know some
information about another.

44
What wed like to say

We want to formalize arguments such as the
following
Component A doesnt know the state of component
B, so the physical information in B is entropy to
component A. Component A cant destroy the
entropy in B, due to the 2nd law of
thermodynamics, and therefore A cant reset B to
a standard state without expelling Bs entropy to
the environment.
We want all of these to be mathematically
well-defined and physically meaningful
statements, and we want the argument itself to be
formally provable!
One motivation A lot of head-in-the-sand
technologists are still in a state of denial
about Landauers principle!
Oblivious erasure of non-entropy information
turns it into entropy.
We need to be able to prove it to them with
simple, undeniable, clear and correct arguments!
To get reversible/quantum computing more traction
in industry.

45
Insufficiency of Statistical Entropy for Physical
Knowers

Unfortunately for this kind of program
If the ordinary statistical definition of entropy
is used,
together with a knower that is fully defined as
an actual physical system, then
The 2nd law of thermodynamics no longer holds!
Note the unknown information in a system can be
reduced
Simply let the knower system perform a
(coherent, reversible) measurement of the target
system, to gain knowledge about the state of the
target system!
The entropy of the target system (from knowers
perspective) is then reduced.
The 2nd law says there must be a corresponding
increase in entropy somewhere, but where?
This is the essence of Maxwells Demon paradox.

46
Entropy in knowledge?

Resolution suggested by Bennett
The demons knowledge of the result of his
measurement can itself be considered to
constitute one form of entropy!
It must be expelled into environment in order to
reset his state.
But, what if we imagine ourselves in the demons
shoes?
Clearly, the demons knowledge of the measurement
result itself constitutes known information,
from his own perspective!
I.e., the demons own subjective posterior
probability distribution that he would (or
should) assess over the possible values of his
knowledge of the result, after he has already
obtained this knowledge, will be entirely
concentrated on the actual outcome.
The statistical entropy of this distribution is
zero!
So, here we have a type of entropy that is
present in someones (the demons) own knowledge
itself, and is not unknown information!
Needed A way to make sense of this, and to
mathematically quantify this entropy of
knowledge.

47
Quantifying the Entropy ofKnowledge, Approach 1

The traditional position says In order to
properly define the entropy in the demons state
of knowledge, we must always pop up to the
meta-perspective from which we are describing the
whole physical situation.
We ourselves always implicitly possess some
probability distribution over the states of the
joint demon-target system.
We should just take the statistical entropy of
that distribution.
Problem This approach doesnt face up to the
fact that we are physical systems too!
It doesnt offer any self-consistent way that
physical systems themselves can ever play the
role of a knower!
I.e., describe other systems, assess subjective
probability distributions over their state,
modify those distributions via measurements, etc.
This contradicts our own personal physical
experience,
as well as what we expect that quantum computers
performing coherent measurements of other systems
ought to be able to do

48
Approach 2

The entropy inherent in some known information is
the smallest size to which this information can
be compressed.
But of course, this depends on the coding system.
Zurek suggests, use Kolmogorov complexity. (Size
of shortest generating program.)
But there are two problems with doing that
Its only well-defined up to an additive
constant.
That is, modulo a choice of universal programming
language.
Its uncomputable!
What else might we try?

49
Approach 3 (We Suggest)

We propose The entropy content of some known
piece of information is its compressed size
according to whatever encoding would have yielded
the smallest expected compressed size, a priori.
That is, taking the expectation value over all
the possible patterns of information before the
actual one was obtained.
This is nice, because the expected value of
posterior entropy then closely matches the
ordinary statistical entropy of the prior
distribution.
Even exactly, in special cases, or in the limit
of many repetitions
Due to a simple application of Shannons
channel-capacity theorem.
We can then show that the 2nd law gets obeyed on
average.
But, from whose a priori probability distribution
is this expectation value of compressed size to
be obtained?

Expected length of thecodeword ci
encodinginformation pattern i
50
Who picks the compressor?

Two possible answers to this
Use our probability distribution when we
originally describe and analyze the hypothetical
situation from outside.
Although this is a bit distasteful, since here we
are resorting to the meta-perspective again,
which we were trying to avoid
However, at least we do manage to sidestep the
paradox
Or, we can use the demons own a priori
assessment of the probabilities
That is, essentially, let him pick his own
compression system, however he wants!
The entropy of knowledge is then defined in a
relative way, as the smallest size that a given
entity with that knowledge would or could
compress that knowledge to,
given a specification of its capabilities,
together with any of its previous decisions
commitments as to the compression strategy it
would use.

51
A Simple Example

Suppose we have a seperable two-qubit system ab,
Where qubit a initially contains 1 bit of
entropy
I.e., described by density operator ?a ?0 ?1
0??0 1??1.
while qubit b is in a pure state (say 0?)
Its density operator (if we care) is ?b ?0
0??0.
Now, suppose we do a CNOT(a,b).
Can view this process as a measurement of qubit
a by qubit b.
Qubit b could be considered a subsystem of some
quantum knower
Assuming the observer knows that this process has
occurred,
We can say that he now knows the state of a!
Since the state of a is now correlated with a
part of bs own state.
I.e., from bs personal subjective point of
view,bit a is no longer an unknown bit
But it is still entropy, because theexpected
compressed size of anencoding of this data is
still 1 bit!
This becomes clearer in a larger example

?a ?0?1
?ab ?00 ?01 00??00 11??11
?b 0?
52
Slightly Larger Example

Suppose system A initially contains 8 random
qubits a0a7, with a uniform distribution over
their values
a thus contains 8 bits of entropy.
And system B initially contains a large number
b0, of empty qubits.
b contains 0 entropy initially
Now, say we do CNOT(ai, bi) for i0 to 3
B now knows the values of a0,,a3.
The information in A that is unknown by b is now
only the 4 other bits a4a7.
But, the AB system also contains an additional 4
bits of information about A (shared between A and
B) which (though known by B) is (we expect) still
incompressible by B
I.e., the encoding that offers the minimum
expected length (prior to learning a0a3) still
has an expected length of 4 bits!
A second CNOT(bi, ai) can allow B to reversibly
clear the entropyfrom system A.
Note this is a Maxwells Demon type of scenario.
Entropy isnt lost because the incompressible
information in B is still entropy!
From an outside observers perspective, the
amount of unknown information remains the same in
all these situations
But from an inside perspective, entropy can flow
(reversibly) from known to unknown and back

53
Entropy Conversion
4 bits of Aknown to B(correlation)
Target system A
4 bits un-known to B
8 bits unknown to B
CNOT(a0-3?b0-3)
a0 a1 a2 a3 a4 a5 a6 a7
x0 x1 x2 x3 x4 x5 x6 x7
a0 a1 a2 a3 a4 a5 a6 a7
x0 x1 x2 x3 x4 x5 x6 x7
A
A
b0 b1 b2 b3 b4 b5 b6 b7
x0 x1 x2 x3 0 0 0 0
b0 b1 b2 b3 b4 b5 b6 b7
0 0 0 0 0 0 0 0
B (reversibly)measures A
B
B
Demon system B
4 bits of knowledge 8 bits all
together compressibleto 4 bits

In all stages, there remain 8 total bits of
entropy.
All 8 are unknown to us in our
meta-perspective.
But some may be known to subsystem B!
Still call them entropy for B if we dont
expect B can compress them away

4 bits un-known to B
a0 a1 a2 a3 a4 a5 a6 a7
0 0 0 0 x4 x5 x6 x7
A
CNOT(b0-3?a0-3)B (reversibly)controls A
b0 b1 b2 b3 b4 b5 b6 b7
x0 x1 x2 x3 0 0 0 0
B
4 incompressiblebits in Bs internalstate of
knowledge
54
Are we done?

I.e., have we arrived at a satisfactory
generalization of the entropy concept?
Perhaps not quite, because
Weve been vague about how to define the
compression system that the knower would use.
Or in other words, the knowers prior
distribution.
We havent yet provided an operational definition
(that can be replicably verified by a third
party) of the meaning of
The entropy of a physical system A, as assessed
by another physical system (the knower) B.
However, there might be no way to do better

55
One Possible Conclusion

Perhaps the entropy of a particular piece of
known information can only be defined relative to
a given description system.
Where by description system I mean a bijection
between compressed decompressed
informational objects ci ? di
Most usefully, the map should be computable.
This is not really any worse than the situation
with standard statistical entropy, where it is
only defined relative to a given state of
knowledge, in the form of a probability
distribution over states of the system.
The existence of optimal compression systems for
given probability distributions strengthens the
connection.
In fact, we can also infer a probability
distribution from the description system, in
cases of optimal description systems
We could consider a description system, rather
than a probability distribution, to be the
fundamental starting point for any discussion of
entropy.
But, can we do better?

56
The Entropy Game

A game (or adversarial protocol) between two
players (A and B) that can be used to
operationally define the entropy content of a
given target physical system X.
X should have a well-defined state space,
with N states total information content Itot
log N.
Basic idea B must use A (reversibly) as a
storage medium for data provided by C.
The entropy of C is defined as its total
info. content, minus the expected logarithm of
the number of messages that A can reliably store
and retrieve from it.

Rules of the game
A and B start out unentangled with each other
(and with C).
A publishes his own exact initial classical
state A0 in a public record.
B can probe A to make sure he is telling the
truth.
Meanwhile, B prepares in secret any string WW0
of any number n of bits.
B passes his string W to A. A may observe its
length n.

A may then carry out any fixed quantum algorithm
Q1 operating on the closed joint system (A,X,W),
under the condition
The final state must leave (A,X,W) unentangled,
AA0, and W 0n.
B is allowed to probe A and W to verify that
AA0 and W0n.
Finally, A carries out another fixed quantum
algorithm Q2, returning again to his initial
state A0, and supposedly restoring W to its
initial state.
A returns W to B B is allowed to check W and A
again to verify that these conditions are
satisfied.

Iterate till convergence.
Definition The entropy of system X is C minus
the maximum over As strategies (starting states
A0, and algorithms Q1,Q2) of the expectation
value (over states of X) of the minimum over Bs
strategies (sequences of strings) of the average
length of those strings that are exactly
returned by A (in step 8) with zero probability
of error.
57
Intuitions behind the Game

A wants to show that X has a low entropy (high
available storage capacity or extropy).
He will choose an encoding of strings W in Xs
state that is as efficient as possible.
A chooses his strategy without knowledge of what
strings B will provide
The coding scheme must thus be very general.
Meanwhile, B wants to show that X has a high
entropy (low capacity).
B will

58
Explaining Entropy Increase

When the Hamiltonian of a closed system is
exactly known,
The statistical (von Neumann) entropy of the
systems density operator is exactly conserved.
I.e., there is no entropy increase.
In the traditional statistical view of entropy,
Entropy can only increase in one of the following
situations
(a) The Hamiltonian is not precisely known, or
(b) The system is not closed
Entropy can leak into the system from an unknown
outside environment
(c) We estimate entropy by tracing over entangled
subsystems
Take reduced density operators of individual
subsystems
And pretend the entropy is additive
However, in the

59
Extra Slides

Omitted from talk for lack of time

60
Information Content of a Physical System

The (total amount of) information content I(A) of
an abstract physical system A is the unknown
information content of the mathematical object D
used to define A.
If D is (or implies) only a set S of (assumed
equiprobable) states, then we have I(A)
U(S) log S.
If D implies a probability distribution PS over
a set S (of distinguishable states), then
I(A) U(PS) -Pi log Pi.
We would expect to gain I(A) information if we
measured A (using basis set S) to find its exact
actual state s?S.
? we say that amount I(A) of information is
contained in A.
Note that the information content depends on how
broad (how abstract) the systems description D
is!

61
Information Capacity Entropy

The information capacity of a system is also the
amount of information about the actual state of
the system that we do not know, given only the
systems definition.
It is the amount of physical information that we
can say is in the state of the system.
It is the amount of uncertainty we have about the
state of the system, if we know only the systems
definition.
It is also the quantity that is traditionally
known as the (maximum) entropy S of the system.
Entropy was originally defined as the ratio of
heat to temperature.
The importance of this quantity in thermodynamics
(the observed fact that it never decreases) was
first noticed by Rudolph Clausius in 1850.
Today we know that entropy is, physically, really
nothing other than (unknown, incompressible)
information!

62
Known vs. Unknown Information

We, as modelers, define what we mean by the
system in question using some abstract
description D.
This implies some information content I(A) for
the abstract system A described by D.
But, we will often wish to model a scenario in
which some entity E (perhaps ourselves) has more
knowledge about the system A than is implied by
its definition.
E.g., scenarios in which E has prepared A more
specifically, or has measured some of its
properties.
Such E will generally have a more specific
description of A and thus would quote a lower
resulting I(A) or entropy.
We can capture this by distinguishing the
information in A that is known by E from that
which is unknown.
Let us now see how to do this a little more
formally.

63
Subsystems (More Generally)

For a system A defined by a state set S,
any partition P of S into subsets can be
considered a subsystem B of A.
The subsets in the partition P can be considered
the states of the subsystem B.

Another subsytem of A
In this example,the product of thetwo
partitions formsa partition of Sinto singleton
sets.We say that this isa complete set
ofsubsystems of A.In this example, the two
subsystemsare also independent.
One subsystemof A
64
Pieces of Information

For an abstract system A defined by a state set
S, any subset T?S is a possible piece of
information about A.
Namely it is the information The actual state of
A is some member of this set T.
For an abstract system A defined by a probability
distribution PS, any probability distribution
P'S such that P0 ? P'0 and U(P')ltU(P) is
another possible piece of information about A.
That is, any distribution that is consistent with
and more informative than As very definition.

65
Known Physical Information

Within any universe (closed physical system) W
described by distribution P, we say entity E (a
subsystem of W) knows a piece P of the physical
information contained in system A (another
subsystem of W) iff P implies a correlation
between the state of E and the state of A, and
this correlation is meaningfully accessible to E.
Let us now see how to make this definition more
precise.

The Universe W
Entity(Knower)E
The PhysicalSystem A
Correlation
66
What is a correlation, anyway?

A concept from statistics
Two abstract systems A and B are correlated or
interdependent when the entropy of the combined
system S(AB) is less than that of S(A)S(B).
I.e., something is known about the combined state
of AB that cannot be represented as knowledge
about the state of either A or B by itself.
E.g. A,B each have 2 possible states 0,1
They each have 1 bit of entropy.
But, we might also know that AB, so the entropy
of AB is 1 bit, not 2. (States 00 and 11.)

67
Known Information, More Formally

For a system defined by probability distribution
P that includes two subsystems A,B with
respective state variables X,Y having mutual
information IP(XY),
The total information content of B is I(B)
U(PY).
The amount of information in B that is known by A
is KA(B) IP(XY).
The amount of information in B that is unknown by
A is UA(B) U(PY) - KA(B) S(Y) - I(XY)
S(YX).
The amount of entropy in B from As perspective
is SA(B) UA(B) S(YX).
These definitions are based on all the
correlations that are present between A and B
according to our global knowledge P.
However, a real entity A may not know,
understand, or be able to utilize all the
correlations that are actually present between
him and B.
Therefore, generally more of Bs physical
information will be effectively entropy, from As
perspective, than is implied by this definition.
We will explore some corrections to this
definition later.
Later, we will also see how to sensibly extend
this definition to the quantum context.

68
Maximum Entropy vs. Entropy
Total information content I Maximum entropy
Smax logarithm of states consistent with
systems definition
Unknown information UA Entropy SA(as seen by
observer A)
Known information KA I - UA Smax - SAas
seen by observer A
Unknown information UB Entropy SB(as seen by
observer B)
69
A Simple Example

A spin is a type of simple quantum system having
only 2 distinguishable states.
In the z basis, the basis states are called up
(?) and down (?).
In the example to the right, we have a compound
system composed of 3 spins.
? it has 8 distinguishable states.
Suppose we know that the 4 crossed-out states
have 0 amplitude (0 probability).
Due to prior preparation or measurement of the
system.
Then the system contains
One bit of known information
in spin 2
and two bits of entropy
in spins 1 3

70
Entropy, as seen from the Inside

One problem with our previous definition of
knowledge-dependent entropy based on mutual
information is that it is only well-defined for
an ensemble or probability distribution of
observer states, not for a single observer state.
However, as observers, we always find ourselves
in a particular state, not in an ensemble!
Can we obtain an alternative definition of
entropy that works for (and can be used by)
observers who are in individual states also?
While still obeying the 2nd law of
thermodynamics?
Zurek proposed that entropy S should be defined
to include not only unknown information U, but
also incompressible information N.
By definition, incompressible information (even
if it is known) cannot be reduced, therefore the
validity of the 2nd law can be maintained.
Zurek proposed using a quantity called Kolmogorov
complexity to measure the amount of
incompressible information.
Size of shortest program that computes the