Evolutionary Change in Nucleotide Sequences - PowerPoint PPT Presentation

1 / 68

About This Presentation

Title:

Evolutionary Change in Nucleotide Sequences

Description:

For example, if we start with a sequence made of adenines only, then PA(0) = 1, ... The expected frequency of A in the sequence at equilibrium will be 1/4, and so ... – PowerPoint PPT presentation

Number of Views:57

Avg rating:3.0/5.0

Slides: 69

Provided by: gra90

Category:

more less

Transcript and Presenter's Notes

Title: Evolutionary Change in Nucleotide Sequences

1
Evolutionary Change in Nucleotide Sequences

Dan Graur

2
So far, we described the evolutionary process as
a series of gene substitutions in which new
alleles, each arising as a mutation in a single
individual, progressively increase their
frequency and ultimately become fixed in the
population.
3
We may look at the process from a different point
of view. An allele that becomes fixed is
different in its sequence from the allele that it
replaces. That is, the substitution of a new
allele for an old one is the substitution of a
new sequence for a previous sequence.
4
If we use a time scale in which one time unit is
larger than the time of fixation, then the DNA
sequence at any given locus will appear to change
with time.
actgggggtaaactatcggtatagatcataa actgggggttaactatcg
gtatagatcataa actgggggttaactatcggtatagatcataa actg
ggggttaactatcggtatagatcataa actgggggtgaactatcggtat
agatcataa actgggggtgaactatcggtacagatcataa
5
To study the dynamics of nucleotide substitution,
we must make several assumptions regarding the
probability of substitution of a nucleotide by
another.
6
Jukes Cantors one-parameter model
7
Assumption

Substitutions occur with equal probabilities
among the four nucleotide types.

8
If the nucleotide residing at a certain site in a
DNA sequence is A at time 0, what is the
probability, PA(t), that this site will be
occupied by A at time t?
9
Since we start with A, PA(0) 1. At time 1, the
probability of still having A at this site is
where 3? is the probability of A changing to T,
C, or G, and 1 3? is the probability that A has
remained unchanged.
10
To derive the probability of having A at time 2,
we consider two possible scenarios
11
1. The nucleotide has remained unchanged from
time 0 to time 2.
12
2. The nucleotide has changed to T, C, or G at
time 1, but has subsequently reverted to A at
time 2.
13
(No Transcript)
14
The following equation applies to any t and any
t1
15
We can rewrite the equation in terms of the
amount of change in PA(t) per unit time as
16
We approximate the discrete-time process by a
continuous-time model, by regarding ?PA(t) as the
rate of change at time t.
17
The solution is
18
If we start with A, the probability that the site
has A at time 0 is 1. Thus, PA(0) 1, and
consequently,
19
If we start with non A, the probability that the
site has A at time 0 is 0. Thus, PA(0) 0, and
consequently,
20
In the Jukes and Cantor model, the probability of
each of the four nucleotides at equilibrium (t
?) is 1/4.
21
So far, we treated PA(t) as a probability.
However, PA(t) can also be interpreted as the
frequency of A in a DNA sequence at time t. For
example, if we start with a sequence made of
adenines only, then PA(0) 1, and PA(t) is the
expected frequency of A in the sequence at time
t. The expected frequency of A in the sequence
at equilibrium will be 1/4, and so will the
expected frequencies of T, C, and G.
22
After reaching equilibrium no further change in
the nucleotide frequencies is expected to occur.
However, the actual frequencies of the
nucleotides will remain unchanged only in DNA
sequences of infinite length. In practice,
fluctuations in nucleotide frequencies are likely
to occur.
23
Too long even for Methuselah, who is said to have
lived 187 years (Genesis 525)
2
1
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
NUMBER OF NUCLEOTIDE SUBSTITUTIONS BETWEEN TWO
DNA SEQUENCES
28
After two nucleotide sequences diverge from each
other, each of them will start accumulating
nucleotide substitutions. If two sequences of
length N differ from each other at n sites, then
the proportion of differences, n/N, is referred
to as the degree of divergence or Hamming
distance. Degrees of divergence are usually
expressed as percentages (n/N ? 100).
29
(No Transcript)
30
The observed number of differences is likely to
be smaller than the actual number of
substitutions due to multiple hits at the same
site.
31
13 mutations3 differences
32
(No Transcript)
33
Number of substitutions between two noncoding
(NOT protein coding) sequences
34
The one-parameter model
In this model, it is sufficient to consider only
I(t), which is the probability that the
nucleotide at a given site at time t is the same
in both sequences.
35
where I(t) is the proportion of identical
nucleotides between two sequences that diverged t
time units ago.
36
The probability that the two sequences are
different at a site at time t is p 1 I(t).
t is usually not known and, thus, we cannot
estimate ?. Instead, we compute K, which is the
number of substitutions per site since the time
of divergence between the two sequences.
37
(No Transcript)
38
L number of sites compared between the two
sequences.
39
Jukes Cantors one-parameter model
40
(No Transcript)
41
Kimuras two-parameter model
42
Assumptions

The rate of transitional substitution at each
nucleotide site is ? per unit time.
The rate of each type of transversional
substitution is ? per unit time.

43
a / ß 5 - 10
44
If the nucleotide residing at a certain site in a
DNA sequence is A at time 0, what is the
probability, PA(t), that this site will be
occupied by A at time t?
45
After one time unit the probability of A changing
into G is ?, the probability of A changing into C
is ???and the probability of A changing into T is
?. Thus, the probability of A remaining unchanged
after one time unit is
46
To derive the probability of having A at time 2,
we consider four possible scenarios
47
1. A remained unchanged at t 1 and t 2
48
2. A changed into G at t 1 and reverted by a
transition to A at t 2
49
3. A changed into C at t 1 and reverted by a
transversion to A at t 2
50
4. A changed into T at t 1 and reverted by a
transversion to A at t 2
51
(No Transcript)
52
By extension we obtain the following recurrence
equation for the general case
53
After rewriting this equation as the amount of
change in PAA(t) per unit time, and after
approximating the discrete-time model by the
continuous-time model, we obtain the following
differential equation
54
Similarly, we can obtain equations for PTA(t),
PCA(t), and PGA(t), and from this set of four
equations, we arrive at the following solution
55
In the Jukes-Cantor model PAA(t) PGG(t)
PCC(t) PTT(t) Because of the symmetry of the
substitution scheme, this equality also holds for
Kimura's two-parameter model.
56
3 probabilities
X(t) The probability that a nucleotide at a
site at time t is identical to that at time 0
At equilibrium, the equation reduces to X(?)
1/4. Thus, as in the case of Jukes and Cantor's
model, the equilibrium frequencies of the four
nucleotides are 1/4.
57
3 probabilities
Y(t) The probability that the initial
nucleotide and the nucleotide at time t differ
from each other by a transition. Because of the
symmetry of the substitution scheme, Y(t)
PAG(t) PGA(t) PTC(t) PCT(t).
58
3 probabilities
Z(t) The probability that the nucleotide at
time t and the initial nucleotide differ by a
specific type of transversion is given by
59
Each nucleotide is subject to two types of
transversion, but only one type of transition.
Therefore, the probability that the initial
nucleotide and the nucleotide at time t differ by
a transversion is twice the probability that
differ by a transition X(t) Y(t) 2Z(t) 1
60
Number of substitutions between two noncoding
(NOT protein coding) sequences
61
The differences between two sequences are
classified into transitions and transversions.
P proportion of transitional differencesQ
proportion of transversional differences
62
(No Transcript)
63
(No Transcript)
64
(No Transcript)
65
(No Transcript)
66
(No Transcript)
67
Numerical example (2P-model)
68
There are substitution schemes with more than two
parameters!

Write a Comment

User Comments (0)