Title: Number of substitutions between two proteincoding genes
1Number of substitutions between two
protein-coding genes
Dan Graur
2Computing the number of substitutions between two
protein-coding sequences is more complicated,
because a distinction should be made between
synonymous and nonsynonymous substitutions.
3(No Transcript)
4(No Transcript)
5Aims1. Compute two numerators The numbers of
synonymous and nonsynonymous substitutions. 2.
Compute two denominators The numbers of
synonymous and nonsynonymous sites.
61. The classification of a site changes with
time For example, the third position of CGG
(Arg) is synonymous. However, if the first
position changes to T, then the third position of
the resulting codon, TGG (Trp), becomes
nonsynonymous.
Difficulties with denominator
72. Many sites are neither completely synonymous
nor completely nonsynonymous. For example, a
transition in the third position of GAT (Asp)
will be synonymous, while a transversion to GAG
or GAA will alter the amino acid.
Difficulties with denominator
8Difficulties with numerator1. The
classification of the change depends on the order
in which the substitutions had occurred.
9Difficulties with numerator1. When two
homologous codons differ from each other by two
substitutions or more the order of the
substitutions must be known in order to classify
substitutions into synonymous and
nonsynonymous. Example CCC in sequence 1 and CAA
in sequence 2. Pathway I CCC (Pro) ? CCA (Pro) ?
CAA (Gln) 1 synonymous and 1 nonsynonymous Pathwa
y II CCC (Pro) ? CAC (His) ? CAA (Gln) 2
nonsynonymous
10Difficulties with numerator2. Transitions
occur with different frequencies than
transversions. 3. The type of substitution
depends on the mutation. Transitions result more
frequently in synonymous substitutions than
transversions.
11Miyata Yasunaga (1980)andNei Gojobori
(1986)method
121. Classification of sites. Consider a particular
position in a codon. Let i be the number of
possible synonymous changes at this site. Then
this site is counted as i/3 synonymous and (3
i)/3 nonsynonymous.
13In TTT (Phe), the first two positions are
nonsynonymous, because no synonymous change can
occur in them, and the third position is 1/3
synonymous and 2/3 nonsynonymous because one of
the three possible changes is synonymous.
142. Count the number of synonymous and
nonsynonymous sites in each sequence and compute
the averages between the two sequences. The
average number of synonymous sites is NS and that
of nonsynonymous sites is NA.
153. Classify nucleotide differences into
synonymous and nonsynonymous differences.
16- For two codons that differ by only one
nucleotide, the difference is easily inferred.
For example, the difference between the two
codons GTC (Val) and GTT (Val) is synonymous,
while the difference between the two codons GTC
(Val) and GCC (Ala) is nonsynonymous.
17(No Transcript)
18- For two codons that differ by two or more
nucleotides, the estimation problem is more
complicated, because we need to determine the
order in which the substitutions occurred.
19Pathway (1) requires one synonymous and one
nonsynonymous change, whereas pathway (2)
requires two nonsynonymous changes.
20There are two approaches to deal with multiple
substitutions at a codon
21The unweighted method Average the numbers of the
different types of substitutions for all the
possible scenarios. For example, if we assume
that the two pathways are equally likely, then
the number of nonsynonymous differences is (1
2)/2 1.5, and the number of synonymous
differences is (1 0)/2 0.5.
22The weighted method. Employ a priori criteria to
assign the probability of each pathway. For
instance, if the weight of pathway 1 is 0.9, and
the weight for pathway 2 is 0.1, then the number
of nonsynonymous differences between the two
codons is (0.9 ? 1) (0.1 ? 2) 1.1, and the
number of synonymous differences is 0.9.
23(No Transcript)
244. The numbers of synonymous and nonsynonymous
differences between the two protein-coding
sequences are MS and MA, respectively.
25The number of synonymous differences per
synonymous site is pS MS/NS The number of
nonsynonymous differences per nonsynonymous site
is pA MA/NA
26If we take into account the effect of multiple
hits at the same site, we can make corrections by
using Jukes and Cantor's formula
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31Number of Amino-Acid Replacements between Two
Proteins
- The observed proportion of different amino acids
between the two sequences (p) is - p n /L
- n number of amino acid differences between the
two sequences - L length of the aligned sequences.
32(No Transcript)
33Number of Amino-Acid Replacements between Two
Proteins
The Poisson model is used to convert p into the
number of amino replacements between two
sequences (d ) d - ln(1 p) The
variance of d is estimated as V(d) p/L (1
p)