Title: Mutation, drift, and selection
1Mutation, drift, and selection
2(No Transcript)
3Kimura 2-parameter model, table form
transitions
transversions
4Discrete time model for change
5Continuous time analytical solution
Differential equation (plus three others, one for
each nt)
Integrating
6Neutral evolution toward equilibrium (single
nucleotide)
all one nt
none nt
7Kimura model
HKY model (Hasegawa-Kishino-Yano)
The two basic rates multiplied by the frequency
of the nucleotide to which each changes, which
permits any equilibrium nucleotide frequency.
8Sequence divergence
Start with two identical nucleotide sequences
(after gene duplication or speciation). Allow
the sequences to diverge over time according to
the nucleotide substitution model (no
selection). At early time points, every mutation
will cause the sequences to become more
different. At later time points, some mutations
will cause convergence (on average, divergence
will continue but more slowly).
9Neutral sequence divergence
(with equal equilibrium nucleotide frequencies)
10Deletion lengths among structurally aligned
proteins
11Sequence divergence within a population
Each individual in an interbreeding population
has different copies of each gene. Mutations
ensure that these copies will continually diverge
from each other. Mutation pressure. In a finite
population, some copies will be lost
stochastically at each generation. Without
selection, this results in stochastic changes in
the number of copies of each type of gene.
Genetic drift. Mutation, drift, and selection
interact to shape the genetic diversity.
12In large genomes most mutations are effectively
neutral
Motoo Kimura founder of neutralism. For
completely neutral changes, genetic drift will
determine population patterns. Other changes are
nearly neutral, meaning that the degree of
natural selection is so small that it is
dominated by drift.
13Negative (purifying) selection
Among nucleotide changes that are NOT neutral,
most cause reduced fitness. These changes will
tend to removed from a population over time by
selection. The long-term result of this process
is that some nucleotides (and amino acids if in
coding region) remain unchanged.
14Neutrality and negative selection in coding
sequence
Redundancy in the genetic code means that some
nucleotide changes do not affect the protein
sequence. These changes are often assumed to be
neutral (in fact, they are subject to weak
selection this can be ignored for many
purposes). Selection against some amino acid
changes is weak, for others selection is strong.
15Synonymous and non-synonymous changes
codon wheel
16A dubious inference about the genetic code.
17Two types of changes in codons
Synonymous silent change (amino acid stays the
same) Nonsynonymous replacement change (changes
amino acid)
Val GTC
Synonymous Change
Nonsynonymous Change
Val GTG
Ala GCC
18Example codons from five human KRAB-Znf genes
DNA
protein
19dN/dS analysis in a nutshell
- Synonymous sites are considered selectively
neutral. - Use synonymous sites as a ruler for
nonsynonymous substitutions.
dN nonsynonymous changes (normalized to of
sites) dS synonymous changes (normalized to
of sites) dN/dS 1 Neutral evolution
(anything goes) dN/dS lt 1 Negative
(purifying) selection dN/dS gt 1 Positive
selection
Note - dN/dS also called KA/KS analysis in
literature
20Multiple methods for calculating dN /dS
- Counting methods
- Nei and Gojobori
- Li et al.
- Maximum-likelihood methods (with model of codon
evolution) - Muse and Gaut
- Nielsen and Yang
217 codons from a real alignment
possible positive selection (four amino acids)
- Align codons from related set of genes.
- Determine dN/dS for each aligned codon.
negative selection
22Codon degeneracy and counting N and S sites
- For each nucleotide, count what changes are
possible - Non-degenerate (labeled 1)
- No synonymous
- Two-fold degenerate (labeled 2)
- 1/3 synonymous and 2/3 nonsynonymous
- Four-fold degenerate (labeled 4)
- 1 synonymous
Note the single three-fold degenerate treated as
two-fold, why Im not sure.
23Counting sites
Example Degeneracy 1 Asp Thr Ala Val Sequenc
e 1 GAC ACA GCG GTT
How many synonymous sites in this sequence?
24Counting sites
Example Degeneracy 1 112 114 114 114 Asp Thr
Ala Val Sequence 1 GAC ACA GCG GTT
3 four-fold degenerate sites and 1 two-fold
degenerate 3.33 All other sites are
non-synonymous.
25Counting sites
Example Degeneracy 1 112 114 114 114 Asp Thr
Ala Val Sequence 1 GAC ACA GCG GTT
How many nonsynonymous sites in this sequence?
26Counting sites
Example Degeneracy 1 112 114 114 114 Asp Thr
Ala Val Sequence 1 GAC ACA GCG GTT
8 non-degenerate sites, 1 two-fold degenerate
site 8.66 nonsynonymous sites. Note that N S
nt number.
27Counting sites
Example Degeneracy 1 112 114 114 114 Asp Thr
Ala Val Sequence 1 GAC ACA GCG GTT Sequence 2
GCC ACT TCG GTT Ala Thr Ser Val Degeneracy
2 114 114 114 114
Sequence 2 has 8 nonsynonymous sites and 4
synonymous sites. For pairwise comparison, we
average number from both sequences. Nonsynonymous
sites (8.66 8)/2 8.33 Synonymous sites
(3.33 4) 3.67
28Counting changes Degeneracy 1 112 114 114 114
Asp Thr Ala Val Sequence 1 GAC ACA GCG GTT Seque
nce 2 GCC ACT TCG GTT Ala Thr Ser Val Degenera
cy 2 114 114 114 114
There are 2 nonsynonymous changes dN 2/8.33
0.24 There is 1 silent change dS 1/3.67
0.27 dN/dS 0.24/0.27 0.88 lt 1 despite
having more nonsynonymous changes.
29Other factors affect calculation of dN/dS
- Transition/transversion ratio
- Transitions typically more frequent
- Pathway of substitution
- Codon bias
30Example of complications that arise (multiple
changes)
Nearly all counting methods assume all pathways
are equally likely.
31Codon Bias
- Prominent in some genomes, probably those with
large populations or strong selection (bacteria,
fungi, nematodes). - Also seen in mammals, although the probably due
more to mutation bias than to selection. - Unequal codon usage results in reduced number of
effective codon sites. - Ignoring codon bias leads to underestimate of dS.
32transition
(all others are transversions)
size codon frequency
transition
transition
33Maximum-likelihood estimates of dN and dS
34(No Transcript)
35Take-home message and lead in
For strongly conserved genes (strong negative
selection), dN/dS averaged over all codons is
typically 0.05 to 0.2. Note that this is
independent of the degree of divergence the
values of dN and dS increase but the ratio
remains constant. Recall from protein
alignments that the degree of conservation varies
greatly from site to site (mostly from variation
in the strength of negative selection). Negative
selection results in evolutionary
stasis. Genetic drift result in evolutionary
well, drift. ?