Title: Evolutionary Change in Nucleotide Sequences
1Evolutionary Change in Nucleotide Sequences
2So far, we described the evolutionary process as
a series of gene substitutions in which new
alleles, each arising as a mutation in a single
individual, progressively increase their
frequency and ultimately become fixed in the
population.
3We may look at the process from a different point
of view. An allele that becomes fixed is
different in its sequence from the allele that it
replaces. That is, the substitution of a new
allele for an old one is the substitution of a
new sequence for a previous sequence.
4If we use a time scale in which one time unit is
larger than the time of fixation, then the DNA
sequence at any given locus will appear to change
with time.
actgggggtaaactatcggtatagatcataa actgggggttaactatcg
gtatagatcataa actgggggttaactatcggtatagatcataa actg
ggggttaactatcggtatagatcataa actgggggtgaactatcggtat
agatcataa actgggggtgaactatcggtacagatcataa
5To study the dynamics of nucleotide substitution,
we must make several assumptions regarding the
probability of substitution of a nucleotide by
another.
6Jukes Cantors one-parameter model
7Assumption
- Substitutions occur with equal probabilities
among the four nucleotide types.
8If the nucleotide residing at a certain site in a
DNA sequence is A at time 0, what is the
probability, PA(t), that this site will be
occupied by A at time t?
9Since we start with A, PA(0) 1. At time 1, the
probability of still having A at this site is
where 3? is the probability of A changing to T,
C, or G, and 1 3? is the probability that A has
remained unchanged.
10To derive the probability of having A at time 2,
we consider two possible scenarios
111. The nucleotide has remained unchanged from
time 0 to time 2.
122. The nucleotide has changed to T, C, or G at
time 1, but has subsequently reverted to A at
time 2.
13(No Transcript)
14The following equation applies to any t and any
t1
15We can rewrite the equation in terms of the
amount of change in PA(t) per unit time as
16We approximate the discrete-time process by a
continuous-time model, by regarding ?PA(t) as the
rate of change at time t.
17The solution is
18If we start with A, the probability that the site
has A at time 0 is 1. Thus, PA(0) 1, and
consequently,
19If we start with non A, the probability that the
site has A at time 0 is 0. Thus, PA(0) 0, and
consequently,
20In the Jukes and Cantor model, the probability of
each of the four nucleotides at equilibrium (t
?) is 1/4.
21So far, we treated PA(t) as a probability.
However, PA(t) can also be interpreted as the
frequency of A in a DNA sequence at time t. For
example, if we start with a sequence made of
adenines only, then PA(0) 1, and PA(t) is the
expected frequency of A in the sequence at time
t. The expected frequency of A in the sequence
at equilibrium will be 1/4, and so will the
expected frequencies of T, C, and G.
22After reaching equilibrium no further change in
the nucleotide frequencies is expected to occur.
However, the actual frequencies of the
nucleotides will remain unchanged only in DNA
sequences of infinite length. In practice,
fluctuations in nucleotide frequencies are likely
to occur.
23Too long even for Methuselah, who is said to have
lived 187 years (Genesis 525)
2
1
24(No Transcript)
25(No Transcript)
26(No Transcript)
27NUMBER OF NUCLEOTIDE SUBSTITUTIONS BETWEEN TWO
DNA SEQUENCES
28After two nucleotide sequences diverge from each
other, each of them will start accumulating
nucleotide substitutions. If two sequences of
length N differ from each other at n sites, then
the proportion of differences, n/N, is referred
to as the degree of divergence or Hamming
distance. Degrees of divergence are usually
expressed as percentages (n/N ? 100).
29(No Transcript)
30The observed number of differences is likely to
be smaller than the actual number of
substitutions due to multiple hits at the same
site.
3113 mutations3 differences
32(No Transcript)
33Number of substitutions between two noncoding
(NOT protein coding) sequences
34The one-parameter model
In this model, it is sufficient to consider only
I(t), which is the probability that the
nucleotide at a given site at time t is the same
in both sequences.
35where I(t) is the proportion of identical
nucleotides between two sequences that diverged t
time units ago.
36The probability that the two sequences are
different at a site at time t is p 1 I(t).
t is usually not known and, thus, we cannot
estimate ?. Instead, we compute K, which is the
number of substitutions per site since the time
of divergence between the two sequences.
37(No Transcript)
38L number of sites compared between the two
sequences.
39Jukes Cantors one-parameter model
40(No Transcript)
41Kimuras two-parameter model
42Assumptions
- The rate of transitional substitution at each
nucleotide site is ? per unit time. - The rate of each type of transversional
substitution is ? per unit time.
43 a / ß 5 - 10
44If the nucleotide residing at a certain site in a
DNA sequence is A at time 0, what is the
probability, PA(t), that this site will be
occupied by A at time t?
45After one time unit the probability of A changing
into G is ?, the probability of A changing into C
is ???and the probability of A changing into T is
?. Thus, the probability of A remaining unchanged
after one time unit is
46To derive the probability of having A at time 2,
we consider four possible scenarios
471. A remained unchanged at t 1 and t 2
482. A changed into G at t 1 and reverted by a
transition to A at t 2
493. A changed into C at t 1 and reverted by a
transversion to A at t 2
504. A changed into T at t 1 and reverted by a
transversion to A at t 2
51(No Transcript)
52By extension we obtain the following recurrence
equation for the general case
53After rewriting this equation as the amount of
change in PAA(t) per unit time, and after
approximating the discrete-time model by the
continuous-time model, we obtain the following
differential equation
54Similarly, we can obtain equations for PTA(t),
PCA(t), and PGA(t), and from this set of four
equations, we arrive at the following solution
55In the Jukes-Cantor model PAA(t) PGG(t)
PCC(t) PTT(t) Because of the symmetry of the
substitution scheme, this equality also holds for
Kimura's two-parameter model.
563 probabilities
X(t) The probability that a nucleotide at a
site at time t is identical to that at time 0
At equilibrium, the equation reduces to X(?)
1/4. Thus, as in the case of Jukes and Cantor's
model, the equilibrium frequencies of the four
nucleotides are 1/4.
573 probabilities
Y(t) The probability that the initial
nucleotide and the nucleotide at time t differ
from each other by a transition. Because of the
symmetry of the substitution scheme, Y(t)
PAG(t) PGA(t) PTC(t) PCT(t).
583 probabilities
Z(t) The probability that the nucleotide at
time t and the initial nucleotide differ by a
specific type of transversion is given by
59Each nucleotide is subject to two types of
transversion, but only one type of transition.
Therefore, the probability that the initial
nucleotide and the nucleotide at time t differ by
a transversion is twice the probability that
differ by a transition X(t) Y(t) 2Z(t) 1
60Number of substitutions between two noncoding
(NOT protein coding) sequences
61The differences between two sequences are
classified into transitions and transversions.
P proportion of transitional differencesQ
proportion of transversional differences
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66(No Transcript)
67Numerical example (2P-model)
68There are substitution schemes with more than two
parameters!