THE RNA DETECTIVE GAME: - PowerPoint PPT Presentation

About This Presentation
Title:

THE RNA DETECTIVE GAME:

Description:

... can be thought of as a chain consisting of bases. Each base is one of four ... The number of 10-link RNA chains consisting of 3 A's, 2 C's, 2 U's, and 3 G's is ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 66
Provided by: dimacsR
Category:

less

Transcript and Presenter's Notes

Title: THE RNA DETECTIVE GAME:


1
THE RNA DETECTIVE GAME FINDING RNA CHAINS FROM
FRAGMENTS
RNA
Detective
Fred Roberts, Rutgers University
2
DNA and RNA
Deoxyribonucleic acid, DNA, is the basic building
block of inheritance. DNA can be thought of as a
chain consisting of bases. Each base is one of
four possible chemicals Thymine (T), Cytosine
(C), Adenine (A), Guanine (G)
3
DNA and RNA
Some DNA chains GGATCCTGG, TTCGCAAAAAGAATC Real
DNA chains are long Algae (P. salina) 6.6x105
bases long Slime mold (D.
discoideum) 5.4x107 bases long
4
DNA and RNA
Insect (D. melanogaster fruit fly) 1.4x108
bases long Bird (G. domesticus)
1.2x109 bases long
5
DNA and RNA
Human (H. sapiens) 3.3x109 bases
long The sequence of bases in DNA encodes
certain genetic information. In particular, it
determines long chains of amino acids known as
proteins.
6
DNA and RNA
How many possible DNA chains are there in
humans?
7
Aside Counting
Fundamental methods of combinatorics are
important in mathematical biology.
8
The Product Rule
How many sequences of 0s and 1s are there of
length 2? There are 2 ways to choose the first
digit and no matter how we choose the first
digit, there are two ways to choose the second
digit. Thus, there are 2x2 22 4 ways to
choose the sequence. 00, 01, 10, 11 How many
sequences are there of length 3? By similar
reasoning 2x2x2 23.
9
The Product Rule
Is this interesting?
10
The Product Rule
Boring!
11
The Product Rule
Really boring!
12
The Product Rule
Counting may be boring at times, but we will see
that it can be really powerful.
13
The Product Rule
Product Rule If something can happen in n1 ways
and no matter how the first thing happens, a
second thing can happen in n2 ways, then the two
things together can happen in n1 x n2 ways. More
generally, if something can happen in n1 ways and
no matter how the first thing happens, a second
thing can happen in n2 ways, and no matter how
the first two things happen a third thing can
happen in n3 ways, then all the things together
can happen in n1 x n2 x n3 ways.
14
DNA and RNA
How many possible DNA chains are there in
humans? How many DNA chains are there with two
bases? Answer (Product Rule) 4x4 42
16. There are 4 choices for the first base and,
for each such choice, 3 choices for the second
base. How many with 3 bases? How many with n
bases?
15
DNA and RNA
How many with 3 bases? 43 64 How many with n
bases? 4n How many human DNA chains are
possible? 4(3.3x109) This is greater than
10(1.98x109) (1 followed by 198 million
zeroes!)
16
DNA and RNA
RNA is a messenger molecule whose links are
defined from DNA. An RNA chain has at each link
one of four bases. The possible bases are the
same as those in DNA except that the base Uracil
(U) replaces the base Thymine (T).
17
The RNA Detective Game
Sample RNA chains GGCAUUGGA, UAUAUGCGGCUUC RNA
chains are very long. Can we discover what they
look like without actually observing
them? Trick Use enzymes.
18
The RNA Detective Game
Some enzymes break up an RNA chain into fragments
after each G link. Some enzymes break up the
chain after each C or U link. Consider the
chain CCGGUCCGAAAG Applying the G enzyme breaks
the chain into the following fragments G
fragments CCG, G, UCCG, AAAG We know that these
are the fragments, but we do not know the order
in which they appear. How many possible chains
have these four fragments?
19
The RNA Detective Game
Chain CCGGUCCGAAAG G fragments
CCG, G, UCCG, AAAG Product rule again 4 choices
for the first fragment, for each such choice 3
choices for second fragment, There are 4x3x2x1
4! 24 possible chains. One chain
corresponding to each permutation of these four
fragments. One such chain different from the
original UCCGGCCGAAAG
20
The RNA Detective Game
Chain CCGGUCCGAAAG Suppose we
instead apply the U,C enzyme. We get the
following fragments U,C fragments C, C, GGU,
C, C, GAAAG How many chains are there with these
fragments? Is 6! 720 the correct answer??? Two
of the permutations are the one that takes the
fragments in the order given and the one that
takes the second fragment first and the first
second and all others in this order. They give
rise to the same chain.
21
The RNA Detective Game
So 6! is wrong. What is the answer??
What if the fragments were C, C, C, C, C There
are 5! permutations of these fragments, but only
one RNA chain with these fragments CCCCC
22
Aside More Counting
23
Multinomial Coefficients
Putting n distinguishable balls into k
distinguishable boxes
The number of ways to put n1 balls into the
first box, n2 balls into the second box, , nk
balls into the kth box is denoted by
C(nn1,n2,,nk), where n n1 n2 nk.
24
Multinomial Coefficients
Theorem C(nn1,n2,,nk) n!/n1!n2!...nk!
Example How many RNA chains of length 6 have 3
Cs and 3 As? Think of 2 boxes, a C box and an
A box. How many ways are there to put 3 positions
(balls) into the C box and 3 into the A
box? Answer C(63,3) 6!/3!3! 20. Some of
these are CACACA, ACACAC, AAACCC.
25
Multinomial Coefficients
If a 6-link RNA chain is chosen at random, what
is the probability of obtaining one with 3 Cs
and 3 As? Answer There are 46 possible RNA
chains of length 6. The probability is
therefore C(63,3)/46 20/4096 ? .005.
26
Multinomial Coefficients
The number of 10-link RNA chains consisting of 3
As, 2 Cs, 2 Us, and 3 Gs is C(103,2,2,3)
25,200 What if we know they end in AAG? Then,
only the first 7 positions need to be filled, and
2 As and one G are already used up. Hence, the
answer is C(71,2,2,2) 630 Notice how knowing
the end of a chain can dramatically reduce the
number of possible chains.
27
Returning to the RNA Detective Game
28
The RNA Detective Game
Recall that we have the following U,C
fragments C, C, GGU, C, C, GAAAG The number of
RNA chains with these fragments is not 6!
720. Think of having 6 positions (there are 6
fragments) and assigning 4 positions to the C
box, 1 to the GGU box, and one to the GAAAG
box. Then the number of ways of doing this
is C(64,1,1) 6!/4!1!1! 30
29
The RNA Detective Game
U,C fragments C, C, GGU, C, C, GAAAG Actually,
this computation is still a bit off, though not
because the combinatorial argument is
wrong. Notice that the fragment GAAAG does not
end in U or C. Thus, we know it comes
last. There are 5 remaining U,C fragments. The
number of chains beginning with these 5 fragments
is given by C(54,1) 5 Beginning of the
chains CCCCGGU, CCCGGUC, CCGGUCC, CGGUCCC,
GGUCCCC
30
The RNA Detective Game
We get all chains with the given U,C fragments
by adding GAAAG to the end of each of
these CCCCGGUGAAAG CCCGGUCGAAAG CCGGUCCGAAAG CGG
UCCCGAAAG GGUCCCCGAAAG
31
The RNA Detective Game
Thus, there are 24 possible chains with the given
G fragments and 5 with the possible U,C
fragments. But We have not yet combined our
knowledge of both G and U,C fragments. G
fragments CCG, G, UCCG, AAAG U,C fragments C,
C, GGU, C, C, GAAAG Which of the 5 chains with
these U,C fragments has the right G fragments?
32
The RNA Detective Game
G fragments CCG, G, UCCG, AAAG U,C fragments
C, C, GGU, C, C, GAAAG Which of the 5 chains
with these U,C fragments has the right G
fragments? CCCCGGUGAAAG CCCGGUCGAAAG CCGGUCCGAAAG
CGGUCCCGAAAG GGUCCCCGAAAG CCCCGGUGAAAG does
not It has CCCCG as a G fragment. What about
the others?
33
The RNA Detective Game
Checking the remaining 4 possible RNA chains with
the given U,C fragments shows that only the third
one, CCGGUCCGAAAG has the given G
fragments. Hence, we have recovered the initial
chain. This is an example of recovery of an RNA
chain given a complete digest by enzymes. How
remarkable is it that we could recover the
initial RNA chain this way?
34
The RNA Detective Game
CCGGUCCGAAAG How many RNA chains are there with
the same bases as this chain? There are 12
bases 4 Cs, 4 Gs, 3 As, and 1 U. The number
of chains with these bases is given by
C(124,4,3,1) 138,600 Thus, knowing the number
of bases is not nearly as useful as knowing the
fragments.
35
The RNA Detective Game
Another example. G fragments UG, ACG, AC U,C
fragments U, GAC, GAC Step 1 Does any fragment
have to come last?
36
The RNA Detective Game
G fragments UG, ACG, AC U,C fragments U, GAC,
GAC Step 1 Does any fragment have to come
last? None of the U,C fragments has to come
last. However, the G fragment AC has to come
last. Thus, the other two G fragments come first
in some order and there are only two possible RNA
chains with these G fragments UGACGAC,
ACGUGAC
37
The RNA Detective Game
G fragments UG, ACG, AC U,C fragments U, GAC,
GAC There are only two possible RNA chains with
these G fragments UGACGAC, ACGUGAC The latter
has AC as a U,C fragment. So, the former is the
correct chain.
38
The RNA Detective Game
Is it always possible to completely recover the
original RNA chain given its G fragments and U,C
fragments?
RNA
39
The RNA Detective Game
Is it always possible to completely recover the
original RNA chain given its G fragments and U,C
fragments? No sometimes the solution is
ambiguous. Exercise Find two RNA chains with
the same G and U,C fragments.
40
Eulerian Paths
Surprisingly, eulerian paths in multidigraphs can
be used to help with the RNA detective game. When
a digraph is allowed to have more than one arc
from vertex x to vertex y, we call it a
multidigraph. A path in a multidigraph is called
eulerian if it uses every arc once and only once.
(Recall the Konigsberg Bridge Problem.) A closed
path (one that ends where it starts) is eulerian
if it is eulerian as a path.
41
Eulerian Paths
d
a
c
b
e
eulerian closed path a, b, c, d, b, e, a
42
Eulerian Paths
d
a
c
b
e
eulerian path a, b, c, d, b, e
43
Eulerian Paths
When does a multidigraph have an eulerian path or
closed path? Theorem (I.J. Good, 1946) A
connected multidigraph has an eulerian closed
path iff for every vertex, the indegree (number
of incoming arcs) equals the outdegree (number of
outgoing arcs). Theorem (I.J. Good, 1946) A
connected multidigraph has an eulerian path iff
for all vertices with the possible exception of
two, indegree equals outdegree, and for at most
two vertices, indegree and outdegree differ by
one.
44
Eulerian Paths
a
d
b
b
a
c
45
Eulerian Paths
Note that these theorems hold if there are loops
from a vertex to itself. A loop adds 1 to
indegree and 1 to outdegree. Thus, loops do not
affect the existence of eulerian paths or closed
paths.
46
Eulerian Paths and the RNA Detective Game
Assume that there are at least two G fragments
and at least two U,C fragments. Otherwise, we can
recover the original chain. Example G
fragments CCG, G, UCACG, AAAG, AA U,C fragments
C, C, GGU, C, AC, GAAAGAA
47
Eulerian Paths and the RNA Detective Game
G fragments CCG, G, UCACG, AAAG, AA U,C
fragments C, C, GGU, C, AC, GAAAGAA Step 1
Break down each fragment after each G, U, or
C. E.g. GAAAGAA becomes GxAAAGxAA GGU
becomes GxGxU UCACG becomes
UxCxACxG Each piece is called an extended
base. All extended bases in a fragment except
first and last are called interior extended bases.
48
Eulerian Paths and the RNA Detective Game
G fragments CCG, G, UCACG, AAAG, AA U,C
fragments C, C, GGU, C, AC, GAAAGAA Step 2 Use
the extended base breakup of fragments to find
the beginning and end of the RNA chain. Start by
making two lists All interior extended bases of
all fragments C, C, AC, G, AAAG Fragments with
one extended base G, AAAG, AA, C, C, C, AC
49
Eulerian Paths and the RNA Detective Game
All interior extended bases of all fragments C,
C, AC, G, AAAG Fragments with one extended
base G, AAAG, AA, C, C, C, AC Theorem Every
entry on the first list is on the second list.
There are always exactly two entries on the
second list not on the first. One of these is the
first extended base of the entire RNA chain and
the other is the last. Thus chain begins in AA
or C and ends in AA or C. How do you tell how it
ends?
50
Eulerian Paths and the RNA Detective Game
Thus chain begins in AA or C and ends in AA or
C. How do you tell how it ends? One of these
must be from an abnormal fragment a G fragment
that doesnt end in G or a U,C fragment that
doesnt end in U or C. G fragments CCG, G,
UCACG, AAAG, AA U,C fragments C, C, GGU, C, AC,
GAAAGAA AA is such an abnormal fragment. An
abnormal fragment marks the end of the
chain. So chain ends in AA and begins in C.
51
Eulerian Paths and the RNA Detective Game
Step 3 Build a multidigraph. First, identify
all normal fragments with more than one extended
base. From each such fragment, use the first and
last extended bases as vertices and draw an arc
from the first to the last. Label the arc with
the corresponding fragment. G fragments CCG, G,
UCACG, AAAG, AA U,C fragments C, C, GGU, C, AC,
GAAAGAA Fragment UCACG gives rise to vertices U
and G and we include an arc from U to G labeled
UCACG.
52
Eulerian Paths and the RNA Detective Game
53
Eulerian Paths and the RNA Detective Game
G fragments CCG, G, UCACG, AAAG, AA U,C
fragments C, C, GGU, C, AC, GAAAGAA Fragment
CCG means that we include an arc from C to G
labeled CCG. Fragment GGU means that we include
an arc from G to U labeled GGU.
54
Eulerian Paths and the RNA Detective Game
GGU
G
C
U
UCACG
CCG
55
Eulerian Paths and the RNA Detective Game
There might be several arcs from a given extended
base to another if there are several normal
fragments from the first to the second. That is
why we get a multidigraph. Step 4 We add one
additional arc. Identify the longest abnormal
fragment. Include an arc from the first (and
perhaps only) extended base in this fragment to
the first extended base in the chain. Label this
as XY where X is the longest abnormal fragment
in the chain and Y is first extended base in the
chain.
56
Eulerian Paths and the RNA Detective Game
G fragments CCG, G, UCACG, AAAG, AA U,C
fragments C, C, GGU, C, AC, GAAAGAA GAAAGAA is
the longest abnormal fragment. Put in an arc
from G (first extended base in this fragment) to
C, first extended base in the chain. Label the
arc as GAAAGAAC
57
Eulerian Paths and the RNA Detective Game
GAAAGAAC
GGU
G
C
U
UCACG
CCG
58
Eulerian Paths and the RNA Detective Game
Theorem This multidigraph has an eulerian closed
path. The RNA chains with the given G and U,C
fragments correspond to eulerian closed paths
that end with the special arc XY. In our
example, it is easy to check it has an eulerian
closed path. (Use I.J. Goods Theorem.)
59
Eulerian Paths and the RNA Detective Game
GAAAGAAC
GGU
G
C
U
UCACG
CCG
The only eulerian closed path that ends in
GAAAGAAC goes from C to G to U to G to C.
60
Eulerian Paths and the RNA Detective Game
GAAAGAAC
GGU
G
C
U
UCACG
CCG
Step 5 Use the corresponding labeling of arcs
to obtain the chain CCGGUCACGAAAGAA It is easy
to check this has the right G and U,C fragments.
61
The RNA Detective Game Concluding Comments
The fragmentation stratagem we have described
was used by R.W. Holley and his colleagues at
Cornell in 1965 to determine the first nucleic
acid sequence. The method is not used anymore
and was only used for a short time before other,
more efficient methods were adopted. However, it
has great historical significance and illustrates
an important role for mathematical methods in
biology.
62
The RNA Detective Game Concluding Comments
Nowadays, by use of radioactive marking and
high-speed computer analysis, it is possible to
sequence long RNA and DNA chains rather quickly.
63
The RNA Detective Game Concluding Comments
The mathematical power of the fragmentation
stratagem, nevertheless, is a good illustration
of the use of methods of discrete mathematics in
modern molecular biology.
64
The RNA Detective Game Concluding Comments
And of the power of counting!
65
The RNA Detective Game Enjoy it with Your
Students
Write a Comment
User Comments (0)
About PowerShow.com