Title: Molecular Information Theory
1Molecular Information Theory
- Niru Chennagiri
- Probability and Statistics
- Fall 2004
- Dr. Michael Partensky
2Overview
- Why do we study Molecular Info. Theory?
- What are molecular machines?
- Power of Logarithm
- Components of a Communication System
- Discrete Noiseless System
- Channel Capacity
- Molecular Machine Capacity
3Motivation
- Needle in a haystack situation.
- How will you go about looking for the needle?
- How much energy you need to spend?
- How fast can you find the needle?
- Haystack DNA, Needle Binding site, You
Ribosome
4What is a Molecular Machine?
- One or more molecules or a molecular complex not
a macroscopic reaction. - Performs a specific function.
- Energized before the reaction.
- Dissipates energy during reaction.
- Gains information.
- An isothermal engine.
5Where is the candy?
- Is it in the left four boxes?
- Is it in the bottom four boxes?
- Is it in the front four boxes?
You need answer to three questions to find the
candy Box labels 000, 001, 010, 011, 100, 101,
110, 111 Need log8 3 bits of information
6More candies
- Box labels 00, 01, 10, 11, 00, 01, 10, 11
- Candy in both boxes labeled 01.
- Need only log8 - log2 2 bits of information.
In general, m boxes with n candies need log m -
log n bits of information
7Ribosomes
- 2600 binding sites from
- 4.7 million base pairs
- Need
- log(4.7 million) - log(2600)
- 10.8 bits of information.
8Communication System
9Information Source
- Represented by a stochastic process
- Mathematically a Markov chain
- We are interested in ergodic sources Every
sequence is statistically same as every other
sequence.
10How much information is produced?
- Measure of uncertainty H should be
- Continuous in the probability.
- Monotonic increasing function of the number of
events. - When a choice is broken down into two successive
choices, Total H weighted sum of individual H
11Enter Entropy
12Properties of Entropy
- H is zero iff all but one p are zero.
- H is never negative.
- H is maximum when all the events are equally
probable - If x and y are two events
- H(x,y) H(x) H(y)
- Conditional entropy
Hx(y) H(y)
13Why is entropy important?
- Entropy is a measure of uncertainty.
-
- Entropy relation from thermodynamics
- Also from thermodynamics
- For every bit of information gained, the machine
dissipates kBTln2 joules.
14Ribosome binding sites
15Information in sequence
16Information curve
Information gain for site l is
Plot of this across the sites gives Information
curve. For E.Coli, Total information is about 11
bits. same as what the ribosome needs.
17Sequence Logo
18Channel capacity
Source transmitting 0 and 1 at 1000
symbols/sec. 1 in 100 symbols have an error. What
is the rate of transmission? Need to apply a
correction correction uncertainty in x for a
given value of y Same as conditional entropy
81 bits/sec
19Channel capacity contd.
For a continuous source with white noise,
Signal to noise ratio
Bandwidth
Shannons theorem As long as the rate of
transmission is below C, the number of errors can
me made as small as needed.
20Molecular Machine Capacity
- Lock and key mechanism.
- Each pin on the ribosome is a simple harmonic
oscillator in thermal bath. - Velocity of the pins represented by points in 2-d
velocity space - More pins -gt more dimensions.
- Distribution of points is spherical.
21Machine capacity
For larger dimensions All points are in a thin
spherical shell Radius of the shell is the
velocity and hence square root of the
energy Before binding
After Binding
22Number of choices Number of after spheres
that can sit in
the before sphere Vol. of
Before sphere/Vol. Of after sphere Machine
capacity logarithm of number of choices
23References