GEK1530 - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

GEK1530

Description:

Ribose. Arabinose. Lyxose. All these pentoses have the same ... Consist of Amino Acids which have a carboxyl group and an amino group plus a side chain. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 46
Provided by: frederickh
Category:
Tags: gek1530

less

Transcript and Presenter's Notes

Title: GEK1530


1
Natures Monte Carlo BakeryThe Story of Life as
a Complex System
  • GEK1530

Frederick H. Willeboordse frederik_at_chaos.nus.edu.s
g
2
DNA Information
In life, information is transmitted from
generation to generation. A key role in this
process is played by DNA. In this lecture, well
have a look at what DNA and information are.
  • Lecture 4

3
The Bakery
Water
Yeast
Flour
Get some units - ergo building blocks
AddIngredients
mix n bake
Get something wonderful!
Process
Knead
Wait
Eat Live
Bake
4
Todays Lecture
The Story
DNA
Building blocks are everywhere and it is hence
not all too surprising that we find them in DNA
too. Once can argue that the sequence of the
building blocks embodies information but then we
need to ask
Nulceic Acids Polymerization
Information
Coding
What is information?
5
Nucleic Acids
Nucleotides
Just as amino acids are the building blocks of
proteins, nucleotides are the building blocks of
genetic (and some related materials) that are
called nucleic acids.
In other words, nucleic acids are (long) polymers
of nucleotides.
6
Nucleotides
Subunits
Nucleotides are built up of smaller units
  • five carbon sugar
  • phosphate group
  • amine base

Schematic representation of a nucleotide.
The nitrogen containing base and the phosphate
group are both covalently bonded to the five
carbon sugar.
7
Nucleotide Building Blocks
Five Carbon Sugar
CH2OH
H
O
5
O
OH
In aqueous solution
C
1
C
C
H
OH
C
1
H
H
4
H
H
2
OH
H
C
C
C
H
Aldehyde group
O
2
3
3
OH
OH
H
OH
C
C
4
1
OH
H
C
H
OH
C
Ribose
5
2
H
OH
H
C
3
H
OH
C
Hydroxyl group
4
CH2OH
5
8
Nucleotide Building Blocks
Five Carbon Sugar - Isomers
CH2OH
OH
All these pentoses have the same chemical formula
C5H10O5. However, the spatial arrangement is
different. Though chemically very similar,
proteins recognize this and hence there can be
important biological effects.
5
O
CH2OH
C
C
1
5
O
OH
OH
OH
4
H
H
C
C
C
C
H
H
1
2
3
4
H
H
H
H
C
C
2
3
CH2OH
OH
Lyxose
OH
5
O
OH
Ribose
C
C
OH
1
H
4
H
H
C
C
2
3
OH
H
Arabinose
9
Nucleotide Building Blocks
Five Carbon Sugar
Deoxyribose (in this form called
b-D-2-deoxyribose) is used in deoxyribonucleic
acid (DNA).
CH2OH
5
O
OH
C
C
1
H
H
4
H
H
CH2OH
C
C
5
O
OH
2
3
OH
OH
C
C
H
H
1
4
H
H
Ribose (in this form called b-D-ribose) is used
in ribonucleic acid (RNA).
C
C
3
2
H
OH
Deoxy because the oxygen is missing here.
10
Nucleotide Building Blocks
Phosphate Group
O
If this OH is attached, we have phosphoric acid.
OH
O-
P
O-
Monophosphate
O
O
O
O
O
O
P
O
P
O-
P
O
P
O-
P
O-
O-
O-
O-
O-
Triphosphate
Diphosphate
11
Nucleotide Building Blocks
The bases are either pyrimidines or purines.
Amine Base
single-ringed nitrogenous base
double-ringed nitrogenous base
NH2
C
NH2
C
N
HC
N
C
C
N
A
C
HC
HC
O
O
CH
NH
C
C
Cytosine
NH
N
NH
HC
Adenine
O
O
U
C
C
N
C
H3C
HC
C
NH
O
NH
C
T
NH
G
HC
Uracil (only in RNA)
C
C
HC
O
NH2
C
NH
NH
N
Thymine (only in DNA)
Guanine
12
Nucleotides
Structure
Combining the parts we thus obtain
NH2
H2O
C
O
N
HC
CH2OH
OH
O-
P
C
5
O
OH
O-
C
HC
C
C
O
H
H
1
NH
4
H
H
C
C
2
3
H
OH
H2O
13
Nucleotides
This is how a Nucleotide looks
NH2
C
N
HC
C
O
C
HC
OCH2
O-
P
O
5
N
O
O-
C
C
H
H
1
4
H
H
C
C
2
3
H
OH
2-Deoxycytidine 5-phosphate
14
NH2
Nucleic Acids
C
N
HC
C
Polynucleotide
O
C
HC
OCH2
O-
P
O
5
N
O
O-
C
C
NH2
H
H
1
4
H
H
C
C
C
2
3
H
OH
N
Nucleic Acids can be joined by condensation.
HC
O-
H
input
C
O
P
O-
C
HC
OCH2
H2O
O
5
N
O
C
C
1
H
H
4
H
H
C
C
3
2
H
OH
15
NH2
DNA
C
N
HC
C
O
C
This is called the 5 end.
HC
OCH2
O-
P
O
5
N
O
O-
C
C
NH2
H
H
1
4
H
H
C
C
C
2
3
H
N
HC
And thus we obtain DNA. well on strand
C
C
HC
OCH2
O
5
N
O
C
C
1
H
H
4
H
H
C
C
3
2
This is called the 3 end.
H
OH
16
Double Strands
Pairing
Backbone
HC
N
NH
Adenine
C
C
N
Guanine
HC
A
C
H
C
H
O
N
C
N
N
H
H
C
G
O
NH
N
C
NH
CH
C
N
H
HC
H3C
C
N
N
C
C
T
C
N
HC
H
O
C
NH
H
HC
O
NH
Cytosine
Backbone
Thymine
17
DNA
Double Strand
T
G
A
C
G
A
G
A
G
C
C
T
C
A
T
T
Complementary strands are formed.
18
DNA
19
Viral Self-assembly
Tobacco Mosaic Virus
This virus consists of a long RNA molecule
enclosed in a cylindrical protein coat. The
protein coat is built up from 2130 small 158
amino acid long subunits.
If you take its RNA and mix it with the protein
subunits, it can reassemble into a fully
infectious virus.
20
RNA
Like DNA, RNA is built up of four nucleotides,
however, the base thymine is replaced by the base
uracil.
Generally, RNA is single stranded (but in some
viruses, it is double stranded) but it can fold
into so-called hairpin-loops.
The major functions of RNA is modern cells are
Messenger RNA (mRNA) is transcribed directly from
a gene's DNA and is used to encode proteins, it
specifies the order of amino acids during protein
synthesis.
Transfer RNA (tRNA) brings nucleotides to rRNA
for protein synthesis.
Ribosomal RNA (rRNA) translates mRNA into
Proteins.
But RNA has many other roles as well
21
Protein Synthesis
22
Organic Compounds
Consist of Hydrogen, Carbon and Oxygen with H and
O present in the same proportions as in water
and with an equal or larger number of C than
O. E.g. C6H12O6 Glucose or Fructose (depending
of spatial arrangement of the molecule)
Carbohydrates
Fat or fat-like compounds. In their simplest
form they are hydrocarbons with a carboxyl group
at one end. Fats consist of glycerol and fatty
acids
Lipids
Consist of Amino Acids which have a carboxyl
group and an amino group plus a side chain.
Proteins
Consist of nucleotides which have a five-carbon
sugar, a phosphate group and a nitrogenous base.
Nucleic Acids
23
Information
The Story
Information
Probably the most important aspect of DNA is that
it contains information. But what is
information?And then, if every cell has the same
information stored, why can cells have different
functions. How do cells differentiate?
What is information actually? Is there a formal
notion for information?
24
Information
Claude Shannon
Claude Shannon is often called the father of
Information Theory. In 1948, he published a
ground-breaking paper where he introduced a
probabilistic approach for analyzing
information. This also led to the first
mathematically rigorous theory on entropy.
Claude Shannon
25
Information
Why do we need to talk about information.
  • Theres information in life!

If a cell needs to carry out a task, information
will be required as with regards to the task at
hand (e.g. increase the production of a protein).
If a cell wants to replicate or even reproduce,
somehow, information needs to be passed on.
But whats actually meant by that?
26
Information
Usually, when we talk about information we can
mean things like
  • How useful an article is
  • The length of an e-mail
  • Something that is worth knowing

But clearly, that cant be everything that is
meant by information.
27
Information
In information theory, information is measured in
bits (which can be zero or one).
The question is then, how many bits are actually
necessary to communicate a certain message (a
message is in this case a string of bits).
Practically, you can think of the question as
how many bits are necessary to send a given file
over a modem?
28
Information
If you create a message (string of bits) by
flipping a coin 1 million times.
1001010111010101001001010101111..
Then it will take 1 million bits (i.e. the string
itself) to communicate the sequence.
This is what one would initially think to be
natural. If you have e.g. a floppy full of
assignments, and you want to send that
information to someone by e-mail (remember a
floppy stores everything in ones and zeros), then
you can just get your modem program to get those
ones and zeros one by one and send them via the
modem line.
29
Information
But is this always clever? (It is in case of the
fully random string from the previous slide).
NO!
Think compression. Think ZIP!
No, I havent become an advertising consultant ?.
If we compress the data on the floppy we may need
much fewer bits than are on the disk.
30
Information
Let us look at an example.
Again, take a sequence of 1 million bits. But now
assume that only about 1000 of these are ones and
that the rest are zeros.
That is what we would expect to obtain when we
toss a very special coin such that the
probability of to have a one is only 1/1000.
How can we communicate this sequence more
efficiently than sending all the 1 million bits?
31
Information
We could just send the positions of the ones and
then tell the receiver that the rest is zero.
0
0
0
0
This means that we only would need to transmit
about 1000 positions.
1
0
0
0
How many bits are then necessary to describe one
position? If we have a sequence of 1 million bits
we need to have enough bits to count from 1 to 1
million. This can be done with 20 bits.
32
Binary counting
Let us recall how to count binary
If we have boxes, going from right to left, we
can say that
a) Each box has a value as a power of 2 b) If
theres a one in the box count it, if zero not.
Value 16
Value 8
Value 4
Value 2
Value 1

33
Information
Hence for each position we need to send 20 bits
and since there are about 1000 positions that
means we only will need to transmit about 20,000
bits!
Pretty good! From 1 million down to 20,000.
But we can do even better!
34
Information
Instead of sending the entire position (which
would allow us to send the positions in any
order) we can agree with the receiver that we
send the positions in order.
Then we only need to transmit the distance
between subsequent ones.
Why would this be better?
35
Information
Well, if there are about 1000 ones in the
sequence then on average the distance between
them will be about 1000 steps. Only extremely
rarely, the distance would be larger than 4000.
Hence we only need to be able to count from 1 to
4000 and that can be done with 12 bits.
Ergo we can send the string with only about
12,000 bits!
36
Optimal Coding
Optimal coding
The limit of how efficient a coding can be is
called optimal coding.
In the example before this turns out to be about
11440 bits and hence we were pretty close.
37
Optimal Coding
How do we get this value 11400?
In order to understand that well first have to
look at something called variable length code.
Say we have a dice with 8 sides and the following
probabilities of throwing a side
1 gt ½ 2 gt ¼ 3 gt 1/8 4 gt 1/16
5 gt 1/32 6 gt 1/64 7 gt 1/128 8 gt 1/128
Note, these 8 probabilities add up to 1.
38
Optimal Coding
How could one code this optimally?
1 gt 0 2 gt 10 3 gt 110 4 gt 1110 5 gt 11110 6 gt
111110 7 gt 1111110 8 gt 1111111
Is this really efficient? The codes for 7 and 8
are very long!
39
Optimal Coding
If one throws the dice many times, on average one
would get a 1 every second time since the
probability to get a 1 is 0.5.
This means that the probability that one needs to
transmit one single bit (the code for throwing a
1 is a single 0) is 0.5.
Indeed
Average number of transmitted bits
Probability of getting the number number of
bits required to encode this number
40
Optimal Coding
Ergo
Number of Bits
Side of Dice
Probability
1 gt 1/2 1 1/2 2 gt 1/4 2 1/2 3
gt 1/8 3 3/8 4 gt 1/16 4 1/4 5 gt
1/32 5 5/32 6 gt 1/64 6 6/64 7 gt
1/128 7 7/128 8 gt 1/128 7 7/128
1.984
If there are 1000 throws, on average one would
need 1984 bits to transmit the entire sequence.
41
Optimal Coding
Now if you inspect this closer you see that the
number of bits used to encode the side of the
dice is
Exactly the number of bits needed to write the
inverse of the probability in binary!
1 gt 1/2 1 1/2 2 gt 1/4 2 1/2 3
gt 1/8 3 3/8 4 gt 1/16 4 1/4 5 gt
1/32 5 5/32 6 gt 1/64 6 6/64 7 gt
1/128 7 7/128 8 gt 1/128 7 7/128
E.g. 26 64
Log 26 Log 64
6 Log 64
6 -Log 1/64
n -Log P
And hence we have
P Probability, n Number of Bits
Note All logarithms are base 2 in this context
42
Optimal Coding
So then we obtain for an optimal encoding
The number of bits necessary is
Sum Pi -Log Pi
i
Where Pi are the various possible probabilities.
In the previous example this means 1/2
-Log(1/2) 1/4 -Log(1/4) 1/8 -Log(1/8) .
43
Optimal Coding
Back to our previous example this means
P(one) 1/1000 gt takes Log(1/1000)
9.9658 bits to encode P(zero) 999/1000 gt
takes Log(999/1000) 0.0014 bits to encode
Ergo on average
1/10009.9658 999/10000.0014 0.0114 bits per
toss or if there are 1 million throws 11400
bits.
44
Information
Information Content
The information content of a message is defined
as the minimal number of bits needed to transmit
that message using optimal coding.
Wow!
45
Wrapping up
Key Points of the Day
Various levels of self-organizationcascade into
a very complex system.
Atom Nucleotide Nucleic Acid
DNA - Information
Give it some thought
Is life a computational problem?
References
http//www.bio.cmu.edu/Courses/BiochemMols/BCMolec
ules.html
Write a Comment
User Comments (0)
About PowerShow.com