Title: EEL 5930 sec. 5 / 4930 sec. 7, Spring
1EEL 5930 sec. 5 / 4930 sec. 7, Spring
05Physical Limits of Computing
http//www.eng.fsu.edu/mpf
- Slides for a course taught byMichael P. Frankin
the Department of Electrical Computer
Engineering
2Module 2 Review of Basic Theory of Information
Computation
- Probability
- Information Theory
- Computation Theory
3Outline of this Module
- Topics covered in this module
- Probability and statistics Some basic concepts
- Some basic elements of information theory
- Various usages of the word information
- Measuring information
- Entropy and physical information
- Some basic elements of the theory of computation
- Universality
- Computational complexity
- Models of computation
4Review of Basic Probability and Statistics
Background
- Events, Probabilities, Product Rule, Conditional
Mutual Probabilities, Expectation, Variance,
Standard Deviation
5Probability
- In statistics, an event E is any possible
situation (occurrence, state of affairs) that
might or might not be the actual situation. - The proposition P the event E occurred (or
will occur) could turn out to be either true or
false. - The probability of an event E is a real number p
in the range 0,1 which gives our degree of
belief in the proposition P, i.e., the
proposition that E will/did occur, where - The value p 0 means that P is false with
complete certainty, and - The value p 1 means that P is true with
complete certainty, - The value p ½ means that the truth value of P
is completely unknown - That is, as far as we know, it is equally likely
to be either true or value. - The probability p(E) is also the fraction of
times that we would expect the event E to occur
in a repeated experiment. - That is, on average, if the experiment could be
repeated infinitely often, and if each repetition
was independent of the others. - If the probability of E is p, then we would
expect E to occur once for every 1/p independent
repetitions of the experiment, on average. - Well call 1/p the improbability i of E.
6Joint Probability
- Let X and Y be events, and let XY denote the
event that events X and Y both occur together
(that is, jointly). - Then p(XY) is called the joint probability of X
and Y. - Product rule If X and Y are independent events,
then p(XY) p(X) p(Y). - This follows from basic combinatorics.
- It can also be considered a definition of what it
means for X and Y to be independent.
7Event Complements, Mutual Exclusivity,
Exhaustiveness
- For any event E, its complement E is the event
that event E does not occur. - Complement rule p(E) p(E) 1.
- Two events E and F are called mutually exclusive
if it is impossible for E and F to occur
together. - That is, p(EF) 0.
- Note that E and E are always mutually exclusive.
- A set S E1, E2, of events is exhaustive if
the event that some event in S occurs has
probability 1. - Note that S E, E is an exhaustive set of
events. - Theorem The sum of the probabilities of any
exhaustive set S of mutually exclusive events is
1.
8Conditional Probability
- Let XY be the event that X and Y occur jointly.
- Then the conditional probability of X given Y is
defined by p(XY) p(XY) / p(Y). - It is the probability that if we are given that Y
occurs, that X would also occur. - Bayes rule p(XY) p(X) p(YX) / p(Y).
r(XY)
Space of possible outcomes
Event Y
Event XY
Event X
9Mutual Probability Ratio
- The mutual probability ratio of X and Y is
defined as r(XY) p(XY)/p(X)p(Y). - Note that r(XY) p(XY)/p(X) p(YX)/p(Y).
- I.e., r is the factor by which the probability of
either X or Y gets boosted upon learning that the
other event occurs. - WARNING Some authors define the term mutual
probability to be the reciprocal of our quantity
r. - Dont get confused! I call that mutual
improbability ratio. - Note that for independent events, r 1.
- Whereas for dependent, positively correlated
events, r gt 1. - And for dependent, anti-correlated events, r lt 1.
10Expectation Values
- Let S be an exhaustive set of mutually exclusive
events Ei. - This is sometimes known as a sample space.
- Let f(Ei) be any function of the events in S.
- This is sometimes called a random variable.
- The expectation value or expected value or norm
of f, written Exf or ?f?, is just the mean or
average value of f(Ei), as weighted by the
probabilities of the events Ei. - WARNING The expected value may actually be
quite unexpected, or even impossible to occur! - Its not the ordinary English meaning of the word
expected. - Expected values combine linearly
ExafgaExf Exg.
11Variance Standard Deviation
- The variance of a random variable f is s2(f)
Ex(f - Exf)2 - The expected value of the squared deviation of f
from the norm. (The squaring makes it positive.) - The standard deviation or root-mean-square (RMS)
difference of f from its mean is s(f)
s2(f)1/2. - This is usually comparable, in absolute
magnitude, to a typical value of f - Exf.
12The Theory of InformationSome Basic Concepts
- Basic Information Concepts
- Quantifying Information
- Information and Entropy
13Etymology of Information
- Earliest historical usage in English (from Oxford
English Dictionary) - The act of informing,
- As in education, instruction, training.
- Five books come down from Heaven for information
of mankind. (1387) - Or a particular item of training, i.e., a
particular instruction. - Melibee had heard the great skills and reasons
of Dame Prudence, and her wise informations and
techniques. (1386) - Derived by adding the action noun ending ation
(descended from Latins tio) to the
pre-existing verb to inform, - Meaning to give form (shape) to the mind
- to discipline, instruct, teach
- Men so wise should go and inform their kings.
(1330) - And inform comes from Latin informare, derived
from noun forma (form), - Informare means to give form to, or to form an
idea of. - Latin also even already contained the derived
word informatio, - meaning concept or idea.
- Note The Greek words e?d?? (eÃdos) and µ??f?
(morphé), - Meaning form, or shape,
- were famously used by Plato ( later Aristotle)
in a technical philosophical sense, to denote the
true identity or ideal essence of something. - Well see that our modern concept of physical
information is not too dissimilar!
14Information Our Definition
- Information is that which distinguishes one thing
(entity) from another. - It is all or part of an identification or
description of the thing. - A specification of some or all of its properties
or characteristics. - We can say that every thing carries or embodies a
complete description of itself. - Simply in virtue of its own being this is called
the entitys form or constitutive essence. - But, let us also take care to distinguish between
the following - A nugget of information (for lack of a better
phrase) - A specific instantiation (i.e., as found in a
specific entity) of some general form. - A cloud or stream of information
- A physical state or set of states, dynamically
changing over time. - A form or pattern of information
- An abstract pattern of information, as opposed to
a specific instantiation. - Many separate nuggets of information contained in
separate objects may have identical patterns, or
content. - We may say that those nuggets are copies of each
other. - An amount or quantity of information
- A quantification of how large a given nugget,
cloud, or pattern of information is. - Measured in logarithmic units, applied to the
number of possible patterns.
15Information-related concepts
- It will also be convenient to discuss the
following - An embodiment of information
- The physical system that contains some particular
nugget or cloud of information. - A symbol or message
- A nugget of information or its embodiment
produced with the intent that it should convey
some specific meaning, or semantic content. - A message is typically a compound object
containing a number of symbols. - An interpretation of information
- A particular semantic interpretation of a form
(pattern of information), tying it to potentially
useful facts of interest. - May or may not be the intended meaning!
- A representation of information
- An encoding of one pattern of information within
some other (frequently larger) pattern. - According to some particular language or code.
- A subject of information
- An entity that is identified or described by a
given pattern of information. - May be abstract or concrete, mathematical or
physical
16Information Concept Map
Meaning (interpretationof information)
Describes, identifies
Interpretedto get
Representedby
Quantity ofinformation
Thing (subjector embodiment)
Form (pattern ofinformation)
Measures size of
May be a
Measures
Instantiatedby/in
Maybe a
Instantiates,has
Measures
Forms, composes
Contains, carries, embodies
Cloud (dynamicbody of information)
Physicalentity
Nugget (instanceof a form)
Has a changing
17Quantifying Information
- One way to quantify forms is to try to count how
many distinct ones there are. - The number of all conceivable forms is not
finite. - However
- Consider a situation defined in such a way that a
given nugget (in the context of that situation)
can only take on some definite number N of
possible distinct forms. - One way to try to characterize the size of the
nugget is then to specify the value of N. - This describes the amount of variability of its
form. - However, N by itself does not seem to have the
right mathematical properties to be used to
describe the informational size of the nugget
18Compound Nuggets
- Consider a nugget of information C formed by
taking two separate and independent nuggets of
information A, B, and considering them together
as constituting a single compound nugget of
information. - Suppose now also that A has N possible forms, and
that B has M possible forms. - Clearly then, due to the product rule of
combinatorics, C has NM possible distinct
forms. - Each is obtained by assigning a form to A and a
form to B independently. - Would the size of the nugget C then be the
product of the sizes of A and B? - It would seem more natural to say sum,so that
the whole is the sum of the parts.
Nugget C Has NM forms
Nugget A
Nugget B
N possibleforms
M possibleforms
19Information Logarithmic Units
- We can convert the product to a sum by using
logarithmic units. - Let us then define the informational size I of
(or amount of information contained in) a nugget
of information that has N possible forms as being
the indefinite logarithm of N, that is, as I
log N. - With an unspecified base for the logarithm.
- We can interpret indefinite-logarithm values as
being inherently dimensional (not dimensionless
pure-number) quantities. - Any numeric result is always (explicitly or
implicitly) paired with a unit log b which is
associated with the base b of the logarithm that
is used. -
- The unit log 2 is called the bit, the unit log
10 is the decade or bel, log 16 is sometimes
called a nybble, and log 256 is the byte. - Whereas, the unit log e (most widely used in
physics) is called the nat. - The nat is also expressed as Boltzmanns constant
kB (e.g. in Joules/K) - A.k.a. the ideal gas constant R (frequently
expressed in kcal/mol/K)
Log Unit
Log Unit
Number
log a (logb a) log b (logc a)
log c
20The Size of a Form
- Suppose that in some situation, a given nugget
has N possible forms. - Then the size of the nugget is I log N.
- Can we also say that this is the size of each of
the nuggets possible forms? - In a way, but we have to be a little bit careful.
- We distinguish between two concepts
- The actual size I log N of each form.
- That is, given how the situation is described.
- The entropy or compressed size S of each form.
- Which we are about to define.
21The Entropy of a Form
- How can we measure the compressed size of an
abstract form? - For this, we need a language that we can use to
represent forms using concrete nuggets of
linguistic information whose size we can measure. - We then say that the compressed size or entropy S
of a form is the size of the smallest nugget of
information representing it in our language.
(Its most compressed description.) - At first, this seems pretty ambiguous, but
- In their algorithmic information theory,
Kolmogorov and Chaitin showed that this quantity
is even almost language-independent. - It is invariant to a language-dependent additive
constant. - That is, among computationally universal
(Turing-complete) languages. - Also, whenever we have a probability distribution
over forms, Shannon shows us how to choose an
encoding that minimizes the expected size of the
codeword nugget that is needed. - If a probability distribution is available, we
assume a language chosen to minimize the expected
size of the nugget representing the form. - We define the compressed size or entropy of the
form to be the size of its description in this
optimal language.
22The Optimal Encoding
- Suppose a specific form F has probability p.
- Thus, improbability i 1/p.
- Note that this is the same probability that F
would have if it were one of i equally-likely
forms. - We saw earlier that a nugget of information
having i possible forms is characterized as
containing a quantity of information I log i. - And the actual size of each form in that
situation is the same, I. - If all forms are equally likely, their average
compressed size cant be any less. - So, it seems reasonable to declare that the
compressed size S of a form F with probability p
is the same as its actual size in this situation,
that is, S(F) log i log 1/p -log p. - This suggests that in the optimal encoding
language, the description of the form F would be
represented in a nugget of that size. - In his Mathematical Theory of Communication
(1949) Claude Shannon showed that in fact this is
exactly correct, - So long as we permit ourselves to consider
encodings in which many similar systems (whose
forms are chosen from the same distribution) are
described together. - Modern block-coding schemes in fact closely
approach Shannons ideal encoding efficiency.
23Optimal Encoding Example
- Suppose a system has four forms A, B, C, D with
the following probabilities - p(A)½, p(B)¼, p(C)p(D)1/8.
- Note that the probabilities sum to 1, as they
must. - Then the corresponding improbabilities are
- i(A)2, i(B)4, i(C)i(D)8.
- And the form sizes (log-improbabilities) are
- S(A) log 2 1 bit, S(B) log 4 2 log 2
2 bits, S(C) S(D) log 8 3 log 2 3 bits. - Indeed, in this example, we can encode the forms
using bit-strings of exactly these lengths, as
follows - A0, B10, C110, D111.
- Note that this code is self-delimiting
- the codewords can be concatenated together
without ambiguity.
0
1
A
1
0
B
1
0
C
D
24Entropy Content of a Nugget
- Naturally, if we have a probability distribution
over the possible forms F of a nugget, - We can easily calculate the expected entropy ?S?
(expected compressed size) of the nuggets form. - This is possible since S itself is a random
variable, - a function of the event that the system has a
specific form F. - The expected entropy ?S? of the nuggets form is
then - We usually drop the expected, and just call
this the amount of entropy S contained in the
nugget. - It is really the expected compressed size of the
nugget.
Notethe -!
25Visualizing Boltzmann-Gibbs-Shannon Statistical
Entropy
26Known vs. Unknown Information
- We can consider the informational size I log N
of a nugget that has N forms as telling us the
total amount of information that the nugget
contains. - Meanwhile, we can consider its entropy S ?log
i(f)? as telling us how much of the total
information that it contains is unknown to us. - In the perspective specified by the distribution
p(). - Since S I, we can also define the amount of
known information (or extropy) in the nugget as X
I - S. - Note that our probability distribution p() over
the nuggets form could change (if we gain or
lose knowledge about it), - Thus, the nuggets entropy S and extropy X may
also change. - However, note that the total informational size
of a given nugget, I log N X S, always
still remains a constant. - Entropy and extropy can be viewed as two forms of
information, which can be converted to each
other, but whose total amount is conserved.
27Information/Entropy Example
- Consider a tetrahedral die which maylie on any
of its 4 faces labeled 1,2,3,4 - We say that the answer to the question Which
side is up? is a nugget of information having 4
possible forms. - Thus, the total amount of information contained
in this nugget, and in the orientation of the
physical die itself, is log 4 2 bits. - Now, suppose the die is weighted so that p(1)½,
p(2)¼, and p(3)p(4)1/8 for its post-throw
state. - Then S(1)1b, S(2)2b, and S(3)S(4)3b.
- The expected entropy is then S 1.75 bits.
- This much information remains unknown before the
die is thrown. - The extropy (known information) is then X 0.25
bits. - Exactly one-fourth of a bits worth of knowledge
about the outcome is already expressed by this
specific probability distribution p().
28NuggetVariable, FormValue, and Types of Events.
- A nugget basically means a variable V.
- Also associated with a set of possible values
v1,v2,. - Meanwhile, a form is basically a value v.
- A primitive event is a proposition that assigns a
specific form v to a specific nugget, Vv. - I.e., a specific value to a specific variable.
- A compound event is a conjunctive proposition
that assigns forms to multiple nuggets, - E.g., Vv, Uu, Ww.
- A general event is a disjunctive set of primitive
and/or compound events. - Essentially equivalent to a Boolean combination
of assignment propositions.
29Entropy of a Binary Variable
Below, little s of an individual form or
probability denotesthe contribution to the total
entropy of a form with that probability.
Maximum s(p) (1/e) nat (lg e)/e bits .531
bits _at_ p 1/e .368
30Joint Distributions over Two Nuggets
- Let X, Y be two nuggets, each with many forms
x1, x2, and y1, y2, . - Let xy represent the compound event Xx,Yy.
- Note all xys are mutually exclusive and
exhaustive. - Suppose we have available a joint probability
distribution p(xy) over the nuggets X and Y. - This then implies the reduced or marginal
distributions p(x)?y p(xy) and p(y)?x p(xy). - We also thus have conditional probabilities
p(xy) and p(yx), according to the usual
definitions. - And we have mutual probability ratios r(xy).
31Joint, Marginal, Conditional Entropyand Mutual
Information
- The joint entropy S(XY) ?log i(xy)?.
- The (prior, marginal or reduced) entropy S(X)
S(p(x)) ?log i(x)?. Likewise for S(Y). - The entropy of each nugget, taken by itself.
- Entropy is subadditive S(XY) S(X) S(Y).
- The conditional entropy S(XY) ExyS(p(xy))
- The expected entropy after Y is observed.
- Theorem S(XY) S(XY) - S(Y). Joint entropy
minus that of Y. - The mutual information I(XY) Exlog r(xy).
- We will prove Theorem I(XY) S(X) - S(XY).
- Thus the mutual information is the expected
reduction of entropy in either variable as a
result of observing the other.
32Conditional Entropy Theorem
The conditional entropy of X given Y is the joint
entropy of XY minus the entropy of Y.
33Mutual Information is Mutual Reduction in Entropy
And likewise, we also have I(XY) S(Y) -
S(YX), since the definition is symmetric.
I(XY) S(X) S(Y) - S(XY)
Also,
34Visualization of Mutual Information
- Let the total length of the bar below represent
the total amount of entropy in the system XY.
S(YX) conditional entropy of Y given X
S(X) entropy of X
S(XY) joint entropy of X and Y
S(XY) conditional entropy of X given Y
S(Y) entropy of Y
35Example 1
- Suppose the sample space of primitive events
consists of 5-bit strings Bb1b2b3b4b5. - Chosen at random with equal probability (1/32).
- Let variable Xb1b2b3b4, and Yb3b4b5.
- Then S(X) ___ bits, and S(Y) ___ b.
- Meanwhile S(XY) ___ b.
- Thus S(XY) ___ b, and S(YX) ___ b
- And so I(XY) ___ b.
4
3
5
2
1
2
36Example 2
- Let the sample space A consist of the 8 letters
a,b,c,d,e,f,g,h. (All equally likely.) - Let X partition A into x1a,b,c,d and
x2e,f,g,h. - Y partitions A into y1a,b,e, y2c,f,
y3d,g,h. - Then we have
- S(X) 1 bit.
- S(Y) 2(3/8 log 8/3) (1/4 log 4) 1.561278
bits - S(YX) (1/2 log 2) 2(1/4 log 4) 1.5 bits.
- I(XY) 1.561278b - 1.5b .061278 b.
- S(XY) 1b 1.5b 2.5 b.
- S(XY) 1b - .061278b .938722 b.
Y
a
b
c
d
X
e
f
g
h
(Meanwhile, the total information content of the
sample space log 8 3 bits)
37Physical Information
- Now, physical information is simply information
that is contained in the state of a physical
system or subsystem. - We may speak of a holder, pattern, amount,
subject, embodiment, meaning, cloud or
representation of physical information, as with
information in general. - Note that all information that we can manipulate
ultimately must be (or be represented by)
physical information! - So long as we are stuck in the physical universe!
- In our quantum-mechanical universe, there are two
very different categories of physical
information - Quantum information is all the information that
is embodied in the quantum state of a physical
system. - Unfortunately, it cant all be measured or
copied! - Classical information is just a piece of
information that picks out a particular measured
state, once a basis for measurement is already
given. - Its the kind of information that were used to
thinking about.
38Objective Entropy?
- In all of this, we have defined entropy as a
somewhat subjective or relative quantity - Entropy of a subsystem depends on an observers
state of knowledge about that subsystem, such as
a probability distribution. - Wait a minute Doesnt physics have a more
objective, observer-independent definition of
entropy? - Only insofar as there are preferred states of
knowledge that are most readily achieved in the
lab. - E.g., knowing of a gas only its chemical
composition, temperature, pressure, volume, and
number of molecules. - Since such knowledge is practically difficult to
improve upon using present-day macroscale tools,
it serves as a uniform standard. - However, in nanoscale systems, a significant
fraction of the physical information that is
present in one subsystem is subject to being
known, or not, by another subsystem (depending on
design). - ? How a nanosystem is designed how we deal with
information recorded at the nanoscale may vastly
affect how much of the nanosystems internal
physical information effectively is or is not
entropy (for practical purposes).
39Entropy in Compound Systems
- When modeling a compound system C having at least
two subsystems A and B, we can adopt either of
(at least) two different perspectives - The external perspective where we treat AB as a
single system, and we (as modelers) have some
probability distribution over its states. - This allows us to derive an entropy for the whole
system. - The internal perspective in which we imagine
putting ourselves in the shoes of one of the
subsystems (say A), and considering its state of
knowledge about B. - A may have more knowledge about B than we do.
- Well see how to make the total expected entropy
come out the same in both perspectives!
40Beyond Statistical Entropy
41Entropy as Information
- A bit of history
- Most of the credit for originating this concept
really should go to Ludwig Boltzmann. - He (not Shannon) first characterized the entropy
of a system as the expected log-improbability of
its state, H -?(pi log pi). - He also discussed combinatorial reasons for its
increase in his famous H-theorem - Shannon brought Boltzmanns entropy to the
attention of communication engineers - And he taught us how to interpret Boltzmanns
entropy as unknown information, in a
communication-theory context. - von Neumann generalized Boltzmann entropy to
quantum mixed states - That is, the S -Tr ? ln ? expression that we
all know and love - Jaynes clarified how the von Neumann entropy of a
system can increase over time - Either when the Hamiltonian itself is unknown, or
when we trace out entangled subsystems - Zurek suggested adding algorithmically
incompressible information to the part of
physical information that we consider to be
entropy - I will discuss a variation on this theme.
42Why go beyond the statistical definition of
entropy?
- We may argue the statistical concept of entropy
is incomplete, - because it doesnt even begin to break down the
ontology-epistemology barrier - In the statistical view, a knower (such as
ourselves) must always be invoked to supply a
state of knowledge (probability distribution) - But we typically treat the knower as being
fundamentally separate from the physical system
itself. - However, in reality, we ourselves are part of the
physical system that is our universe - Thus, a complete understanding of entropy must
also address what knowledge means, physically
43Small Physical Knowers
- Of course, humans are extremely large complex
physical systems, and to physically characterize
our states of knowledge is a very long way off - However, we can hope to characterize the
knowledge of simpler systems. - Computer engineers find that in practice, it can
be very meaningful and useful to ascribe
epistemological states even to extremely simple
systems. - E.g., digital systems and their component
subsystems. - When analyzing complex digital systems,
- we constantly say things like, At such-and-such
time, component A knows such-and-such information
about the state of component B - Means, essentially, that there is a specific
correlation between the states of A and B. - For nano-scale digital devices, we can strive to
exactly characterize their logical states in
mathematical physics terms - Thus we ought to be able to say exactly what it
means, physically, for one component to know some
information about another.
44What wed like to say
- We want to formalize arguments such as the
following - Component A doesnt know the state of component
B, so the physical information in B is entropy to
component A. Component A cant destroy the
entropy in B, due to the 2nd law of
thermodynamics, and therefore A cant reset B to
a standard state without expelling Bs entropy to
the environment. - We want all of these to be mathematically
well-defined and physically meaningful
statements, and we want the argument itself to be
formally provable! - One motivation A lot of head-in-the-sand
technologists are still in a state of denial
about Landauers principle! - Oblivious erasure of non-entropy information
turns it into entropy. - We need to be able to prove it to them with
simple, undeniable, clear and correct arguments! - To get reversible/quantum computing more traction
in industry.
45Insufficiency of Statistical Entropy for Physical
Knowers
- Unfortunately for this kind of program
- If the ordinary statistical definition of entropy
is used, - together with a knower that is fully defined as
an actual physical system, then - The 2nd law of thermodynamics no longer holds!
- Note the unknown information in a system can be
reduced - Simply let the knower system perform a
(coherent, reversible) measurement of the target
system, to gain knowledge about the state of the
target system! - The entropy of the target system (from knowers
perspective) is then reduced. - The 2nd law says there must be a corresponding
increase in entropy somewhere, but where? - This is the essence of Maxwells Demon paradox.
46 Entropy in knowledge?
- Resolution suggested by Bennett
- The demons knowledge of the result of his
measurement can itself be considered to
constitute one form of entropy! - It must be expelled into environment in order to
reset his state. - But, what if we imagine ourselves in the demons
shoes? - Clearly, the demons knowledge of the measurement
result itself constitutes known information,
from his own perspective! - I.e., the demons own subjective posterior
probability distribution that he would (or
should) assess over the possible values of his
knowledge of the result, after he has already
obtained this knowledge, will be entirely
concentrated on the actual outcome. - The statistical entropy of this distribution is
zero! - So, here we have a type of entropy that is
present in someones (the demons) own knowledge
itself, and is not unknown information! - Needed A way to make sense of this, and to
mathematically quantify this entropy of
knowledge.
47Quantifying the Entropy ofKnowledge, Approach 1
- The traditional position says In order to
properly define the entropy in the demons state
of knowledge, we must always pop up to the
meta-perspective from which we are describing the
whole physical situation. - We ourselves always implicitly possess some
probability distribution over the states of the
joint demon-target system. - We should just take the statistical entropy of
that distribution. - Problem This approach doesnt face up to the
fact that we are physical systems too! - It doesnt offer any self-consistent way that
physical systems themselves can ever play the
role of a knower! - I.e., describe other systems, assess subjective
probability distributions over their state,
modify those distributions via measurements, etc. - This contradicts our own personal physical
experience, - as well as what we expect that quantum computers
performing coherent measurements of other systems
ought to be able to do
48Approach 2
- The entropy inherent in some known information is
the smallest size to which this information can
be compressed. - But of course, this depends on the coding system.
- Zurek suggests, use Kolmogorov complexity. (Size
of shortest generating program.) - But there are two problems with doing that
- Its only well-defined up to an additive
constant. - That is, modulo a choice of universal programming
language. - Its uncomputable!
- What else might we try?
49Approach 3 (We Suggest)
- We propose The entropy content of some known
piece of information is its compressed size
according to whatever encoding would have yielded
the smallest expected compressed size, a priori.
- That is, taking the expectation value over all
the possible patterns of information before the
actual one was obtained. - This is nice, because the expected value of
posterior entropy then closely matches the
ordinary statistical entropy of the prior
distribution. - Even exactly, in special cases, or in the limit
of many repetitions - Due to a simple application of Shannons
channel-capacity theorem. - We can then show that the 2nd law gets obeyed on
average. - But, from whose a priori probability distribution
is this expectation value of compressed size to
be obtained?
Expected length of thecodeword ci
encodinginformation pattern i
50Who picks the compressor?
- Two possible answers to this
- Use our probability distribution when we
originally describe and analyze the hypothetical
situation from outside. - Although this is a bit distasteful, since here we
are resorting to the meta-perspective again,
which we were trying to avoid - However, at least we do manage to sidestep the
paradox - Or, we can use the demons own a priori
assessment of the probabilities - That is, essentially, let him pick his own
compression system, however he wants! - The entropy of knowledge is then defined in a
relative way, as the smallest size that a given
entity with that knowledge would or could
compress that knowledge to, - given a specification of its capabilities,
together with any of its previous decisions
commitments as to the compression strategy it
would use.
51A Simple Example
- Suppose we have a seperable two-qubit system ab,
- Where qubit a initially contains 1 bit of
entropy - I.e., described by density operator ?a ?0 ?1
0??0 1??1. - while qubit b is in a pure state (say 0?)
- Its density operator (if we care) is ?b ?0
0??0. - Now, suppose we do a CNOT(a,b).
- Can view this process as a measurement of qubit
a by qubit b. - Qubit b could be considered a subsystem of some
quantum knower - Assuming the observer knows that this process has
occurred, - We can say that he now knows the state of a!
- Since the state of a is now correlated with a
part of bs own state. - I.e., from bs personal subjective point of
view,bit a is no longer an unknown bit - But it is still entropy, because theexpected
compressed size of anencoding of this data is
still 1 bit! - This becomes clearer in a larger example
?a ?0?1
?ab ?00 ?01 00??00 11??11
?b 0?
52Slightly Larger Example
- Suppose system A initially contains 8 random
qubits a0a7, with a uniform distribution over
their values - a thus contains 8 bits of entropy.
- And system B initially contains a large number
b0, of empty qubits. - b contains 0 entropy initially
- Now, say we do CNOT(ai, bi) for i0 to 3
- B now knows the values of a0,,a3.
- The information in A that is unknown by b is now
only the 4 other bits a4a7. - But, the AB system also contains an additional 4
bits of information about A (shared between A and
B) which (though known by B) is (we expect) still
incompressible by B - I.e., the encoding that offers the minimum
expected length (prior to learning a0a3) still
has an expected length of 4 bits! - A second CNOT(bi, ai) can allow B to reversibly
clear the entropyfrom system A. - Note this is a Maxwells Demon type of scenario.
- Entropy isnt lost because the incompressible
information in B is still entropy! - From an outside observers perspective, the
amount of unknown information remains the same in
all these situations - But from an inside perspective, entropy can flow
(reversibly) from known to unknown and back
53Entropy Conversion
4 bits of Aknown to B(correlation)
Target system A
4 bits un-known to B
8 bits unknown to B
CNOT(a0-3?b0-3)
a0 a1 a2 a3 a4 a5 a6 a7
x0 x1 x2 x3 x4 x5 x6 x7
a0 a1 a2 a3 a4 a5 a6 a7
x0 x1 x2 x3 x4 x5 x6 x7
A
A
b0 b1 b2 b3 b4 b5 b6 b7
x0 x1 x2 x3 0 0 0 0
b0 b1 b2 b3 b4 b5 b6 b7
0 0 0 0 0 0 0 0
B (reversibly)measures A
B
B
Demon system B
4 bits of knowledge 8 bits all
together compressibleto 4 bits
- In all stages, there remain 8 total bits of
entropy. - All 8 are unknown to us in our
meta-perspective. - But some may be known to subsystem B!
- Still call them entropy for B if we dont
expect B can compress them away
4 bits un-known to B
a0 a1 a2 a3 a4 a5 a6 a7
0 0 0 0 x4 x5 x6 x7
A
CNOT(b0-3?a0-3)B (reversibly)controls A
b0 b1 b2 b3 b4 b5 b6 b7
x0 x1 x2 x3 0 0 0 0
B
4 incompressiblebits in Bs internalstate of
knowledge
54Are we done?
- I.e., have we arrived at a satisfactory
generalization of the entropy concept? - Perhaps not quite, because
- Weve been vague about how to define the
compression system that the knower would use. - Or in other words, the knowers prior
distribution. - We havent yet provided an operational definition
(that can be replicably verified by a third
party) of the meaning of - The entropy of a physical system A, as assessed
by another physical system (the knower) B. - However, there might be no way to do better
55One Possible Conclusion
- Perhaps the entropy of a particular piece of
known information can only be defined relative to
a given description system. - Where by description system I mean a bijection
between compressed decompressed
informational objects ci ? di - Most usefully, the map should be computable.
- This is not really any worse than the situation
with standard statistical entropy, where it is
only defined relative to a given state of
knowledge, in the form of a probability
distribution over states of the system. - The existence of optimal compression systems for
given probability distributions strengthens the
connection. - In fact, we can also infer a probability
distribution from the description system, in
cases of optimal description systems - We could consider a description system, rather
than a probability distribution, to be the
fundamental starting point for any discussion of
entropy. - But, can we do better?
56The Entropy Game
- A game (or adversarial protocol) between two
players (A and B) that can be used to
operationally define the entropy content of a
given target physical system X. - X should have a well-defined state space,
with N states total information content Itot
log N. - Basic idea B must use A (reversibly) as a
storage medium for data provided by C. - The entropy of C is defined as its total
info. content, minus the expected logarithm of
the number of messages that A can reliably store
and retrieve from it.
- Rules of the game
- A and B start out unentangled with each other
(and with C). - A publishes his own exact initial classical
state A0 in a public record. - B can probe A to make sure he is telling the
truth. - Meanwhile, B prepares in secret any string WW0
of any number n of bits. - B passes his string W to A. A may observe its
length n.
- A may then carry out any fixed quantum algorithm
Q1 operating on the closed joint system (A,X,W),
under the condition - The final state must leave (A,X,W) unentangled,
AA0, and W 0n. - B is allowed to probe A and W to verify that
AA0 and W0n. - Finally, A carries out another fixed quantum
algorithm Q2, returning again to his initial
state A0, and supposedly restoring W to its
initial state. - A returns W to B B is allowed to check W and A
again to verify that these conditions are
satisfied.
Iterate till convergence.
Definition The entropy of system X is C minus
the maximum over As strategies (starting states
A0, and algorithms Q1,Q2) of the expectation
value (over states of X) of the minimum over Bs
strategies (sequences of strings) of the average
length of those strings that are exactly
returned by A (in step 8) with zero probability
of error.
57Intuitions behind the Game
- A wants to show that X has a low entropy (high
available storage capacity or extropy). - He will choose an encoding of strings W in Xs
state that is as efficient as possible. - A chooses his strategy without knowledge of what
strings B will provide - The coding scheme must thus be very general.
- Meanwhile, B wants to show that X has a high
entropy (low capacity). - B will
58Explaining Entropy Increase
- When the Hamiltonian of a closed system is
exactly known, - The statistical (von Neumann) entropy of the
systems density operator is exactly conserved. - I.e., there is no entropy increase.
- In the traditional statistical view of entropy,
- Entropy can only increase in one of the following
situations - (a) The Hamiltonian is not precisely known, or
- (b) The system is not closed
- Entropy can leak into the system from an unknown
outside environment - (c) We estimate entropy by tracing over entangled
subsystems - Take reduced density operators of individual
subsystems - And pretend the entropy is additive
- However, in the
59Extra Slides
- Omitted from talk for lack of time
60Information Content of a Physical System
- The (total amount of) information content I(A) of
an abstract physical system A is the unknown
information content of the mathematical object D
used to define A. - If D is (or implies) only a set S of (assumed
equiprobable) states, then we have I(A)
U(S) log S. - If D implies a probability distribution PS over
a set S (of distinguishable states), then
I(A) U(PS) -Pi log Pi. - We would expect to gain I(A) information if we
measured A (using basis set S) to find its exact
actual state s?S. - ? we say that amount I(A) of information is
contained in A. - Note that the information content depends on how
broad (how abstract) the systems description D
is!
61Information Capacity Entropy
- The information capacity of a system is also the
amount of information about the actual state of
the system that we do not know, given only the
systems definition. - It is the amount of physical information that we
can say is in the state of the system. - It is the amount of uncertainty we have about the
state of the system, if we know only the systems
definition. - It is also the quantity that is traditionally
known as the (maximum) entropy S of the system. - Entropy was originally defined as the ratio of
heat to temperature. - The importance of this quantity in thermodynamics
(the observed fact that it never decreases) was
first noticed by Rudolph Clausius in 1850. - Today we know that entropy is, physically, really
nothing other than (unknown, incompressible)
information!
62Known vs. Unknown Information
- We, as modelers, define what we mean by the
system in question using some abstract
description D. - This implies some information content I(A) for
the abstract system A described by D. - But, we will often wish to model a scenario in
which some entity E (perhaps ourselves) has more
knowledge about the system A than is implied by
its definition. - E.g., scenarios in which E has prepared A more
specifically, or has measured some of its
properties. - Such E will generally have a more specific
description of A and thus would quote a lower
resulting I(A) or entropy. - We can capture this by distinguishing the
information in A that is known by E from that
which is unknown. - Let us now see how to do this a little more
formally.
63Subsystems (More Generally)
- For a system A defined by a state set S,
- any partition P of S into subsets can be
considered a subsystem B of A. - The subsets in the partition P can be considered
the states of the subsystem B.
Another subsytem of A
In this example,the product of thetwo
partitions formsa partition of Sinto singleton
sets.We say that this isa complete set
ofsubsystems of A.In this example, the two
subsystemsare also independent.
One subsystemof A
64Pieces of Information
- For an abstract system A defined by a state set
S, any subset T?S is a possible piece of
information about A. - Namely it is the information The actual state of
A is some member of this set T. - For an abstract system A defined by a probability
distribution PS, any probability distribution
P'S such that P0 ? P'0 and U(P')ltU(P) is
another possible piece of information about A. - That is, any distribution that is consistent with
and more informative than As very definition.
65Known Physical Information
- Within any universe (closed physical system) W
described by distribution P, we say entity E (a
subsystem of W) knows a piece P of the physical
information contained in system A (another
subsystem of W) iff P implies a correlation
between the state of E and the state of A, and
this correlation is meaningfully accessible to E. - Let us now see how to make this definition more
precise.
The Universe W
Entity(Knower)E
The PhysicalSystem A
Correlation
66What is a correlation, anyway?
- A concept from statistics
- Two abstract systems A and B are correlated or
interdependent when the entropy of the combined
system S(AB) is less than that of S(A)S(B). - I.e., something is known about the combined state
of AB that cannot be represented as knowledge
about the state of either A or B by itself. - E.g. A,B each have 2 possible states 0,1
- They each have 1 bit of entropy.
- But, we might also know that AB, so the entropy
of AB is 1 bit, not 2. (States 00 and 11.)
67Known Information, More Formally
- For a system defined by probability distribution
P that includes two subsystems A,B with
respective state variables X,Y having mutual
information IP(XY), - The total information content of B is I(B)
U(PY). - The amount of information in B that is known by A
is KA(B) IP(XY). - The amount of information in B that is unknown by
A is UA(B) U(PY) - KA(B) S(Y) - I(XY)
S(YX). - The amount of entropy in B from As perspective
is SA(B) UA(B) S(YX). - These definitions are based on all the
correlations that are present between A and B
according to our global knowledge P. - However, a real entity A may not know,
understand, or be able to utilize all the
correlations that are actually present between
him and B. - Therefore, generally more of Bs physical
information will be effectively entropy, from As
perspective, than is implied by this definition. - We will explore some corrections to this
definition later. - Later, we will also see how to sensibly extend
this definition to the quantum context.
68Maximum Entropy vs. Entropy
Total information content I Maximum entropy
Smax logarithm of states consistent with
systems definition
Unknown information UA Entropy SA(as seen by
observer A)
Known information KA I - UA Smax - SAas
seen by observer A
Unknown information UB Entropy SB(as seen by
observer B)
69A Simple Example
- A spin is a type of simple quantum system having
only 2 distinguishable states. - In the z basis, the basis states are called up
(?) and down (?). - In the example to the right, we have a compound
system composed of 3 spins. - ? it has 8 distinguishable states.
- Suppose we know that the 4 crossed-out states
have 0 amplitude (0 probability). - Due to prior preparation or measurement of the
system. - Then the system contains
- One bit of known information
- in spin 2
- and two bits of entropy
- in spins 1 3
70Entropy, as seen from the Inside
- One problem with our previous definition of
knowledge-dependent entropy based on mutual
information is that it is only well-defined for
an ensemble or probability distribution of
observer states, not for a single observer state. - However, as observers, we always find ourselves
in a particular state, not in an ensemble! - Can we obtain an alternative definition of
entropy that works for (and can be used by)
observers who are in individual states also? - While still obeying the 2nd law of
thermodynamics? - Zurek proposed that entropy S should be defined
to include not only unknown information U, but
also incompressible information N. - By definition, incompressible information (even
if it is known) cannot be reduced, therefore the
validity of the 2nd law can be maintained. - Zurek proposed using a quantity called Kolmogorov
complexity to measure the amount of
incompressible information. - Size of shortest program that computes the