Title: Dan Graur
1Rates of Nucleotide Substitution
2r Rate of substitution per site per year K
Number of substitutions per site per year
3Mean Rate of Nucleotide Substitution in Mammalian
Nuclear Genomes
Less than 10-9 substitutions/site/year
Evolution is a very slow process at the molecular
level. Not much happens in evolution.
3
4Substitutions Rates in Protein-Coding Regions
The rate of synonymous substitution is much
larger than the nonsynonymous rate.
5(No Transcript)
6A lot
A little
7Synonymous substitutions are more frequent than
nonsynonymous ones.
8Mean nonsynonymous rate 0.75 ? 109
substitutions per site per year
Mean synonymous rate 3.65 ? 109 substitutions
per site per year
The synonymous substitution rate is 5 times
higher than the nonsynonymous substitution rate
Coefficient of variation of nonsynonymous rate
95
Coefficient of variation of synonymous rate 31
9The distribution of KA to KS ratios in gt13,000
orthologous protein-coding genes from human and
chimpanzee
1058 nucleotide differences 3 amino acid differences
In a comparison of human and yeast ubiquitin
genes, the inferred number of synonymous
substitutions per synonymous site is 6 (almost
certainly indicative of saturation). The inferred
number of nonsynonymous substitutions per
nonsynonymous site is 0.03. Thus, synonymous
substitutions have accumulated at least 200 times
faster than nonsynonymous substitutions.
11Ratio 1.5 4.4 1.1
12Substitution Rates of in Noncoding Regions
13(No Transcript)
14Divergence between cow and goat b- and g-globin
genes and between cow and goat b-globin
pseudogenes _____________________________________
_________ Region K _________________________
_____________________ 5 Flanking region 5.3
? 1.2 5 Untranslated region 4.0 ? 2.0 4-fold
degenerate sites 8.6 ? 2.5 Introns 8.1 ?
0.7 3 Untranslated region 8.8 ? 2.2 3
Flanking region 8.0 ? 1.5 Pseudogenes 9.1
? 0.9 ____________________________________________
__
15(No Transcript)
16Coding regions evolve slower than noncoding
regions.
17Evolutionary Rate Profiles
18Alignment preproinsulin
Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHL Bos
MALWTRLRPLLALLALWPPPPARAFVNQHL
. .. . Xenopus
CGSHLVEALYLVCGDRGFFYYPKIKRDIEQ Bos
CGSHLVEALYLVCGERGFFYTPKARREVEG
Xenopus
AQVNGPQDNELDG-MQFQPQEYQKMKRGIV Bos
PQVG---ALELAGGPGAGGLEGPPQKRGIV
.. Xenopus
EQCCHSTCSLFQLENYCN Bos
EQCCASVCSLYQLENYCN
.
19(No Transcript)
20Functional regions evolve slower than
nonfunctional regions.
21(No Transcript)
22(No Transcript)
23Rates of amino acid replacement in different
proteins
24Fibrinogen to Fibrin
- Fibrinogen consists of 6 chains 2a, 2b, 2g
- Fibrinopeptides are very negatively charged
- Fibrinopeptides A are cleaved first (to allow
polymerization of fibrins) - Fibrinopeptides B are cleaved second (to enhance
crosslinking)
25(No Transcript)
26Important proteins evolve slower than unimportant
ones.
27(No Transcript)
28(No Transcript)
29- Can we explain the different rates of
substitution by the selectionist model? -
- Mutations can be either deleterious or
advantageous. - If the fraction of advantageous mutations is
large, the rate of evolution will be high. If the
fraction of advantageous mutations is small, the
rate of evolution will be low. - A mutation occurring at a functional site has a
higher probability of being advantageous than a
mutation occurring at a nonfunctional site. - Expectation Important entities should evolve
faster than less important ones.
30- Can we explain the different rates of
substitution by the selectionist model? -
- Mutations can be either deleterious or
advantageous. - If the fraction of advantageous mutations is
large, the rate of evolution will be high. If the
fraction of advantageous mutations is small, the
rate of evolution will be low. - A mutation occurring at a functional site has a
higher probability of being advantageous than a
mutation occurring at a nonfunctional site. - Expectation Important entities should evolve
faster than less important ones.
31- Can we explain the different rates of
substitution by the neutralist model? - Mutations can be either deleterious or neutral.
- If the fraction of deleterious mutations is
large, the rate of evolution will be low. If the
fraction of deleterious mutations is small, the
rate of evolution will be high. - A mutation occurring at a functional site has a
higher probability of being deleterious than a
mutation occurring at a nonfunctional site. - Expectation Important entities should evolve
slower than less important ones.
32- Can we explain the different rates of
substitution by the neutralist model? - Mutations can be either deleterious or neutral.
- If the fraction of deleterious mutations is
large, the rate of evolution will be low. If the
fraction of deleterious mutations is small, the
rate of evolution will be high. - A mutation occurring at a functional site has a
higher probability of being deleterious than a
mutation occurring at a nonfunctional site. - Expectation Important entities should evolve
slower than less important ones.
33Kimuras First Law of Molecular Evolution
34Functional entities evolve slower than entities
devoid of function.
35Functional constraint Degree of intolerance
towards mutations at a genomic location.
The functional constraint defines the range of
alternative residues that are acceptable at a
site without affecting negatively the fitness of
the organism.
36For neutral mutations
K v
Rate of substitution
Mutation rate
37Kimuras model of functional constraint Suppose
that a fraction, f0, of all mutations are
selectively neutral and the rest (1 - f0) are
deleterious. Advantageous mutations are assumed
to occur only very rarely, such that their
relative frequency is effectively zero. If we
denote by vT the total mutation rate per unit
time, then the rate of neutral mutation, v0, is
38According to the neutral theory, the rate of
substitution is Hence, The highest
substitution rate is expected in sequences that
do not have any function, such that all mutations
are neutral
39(No Transcript)
40An evolutionary experiment
Spalax ehrenberghi
41aA-crystallin
42In Spalax, aA-crystallin lost its functional role
more than 25 million years ago, when the mole rat
became subterranean and presumably lost use of
its eyes.
The aA-crystallin of Spalax evolves 20 times
faster than the aA-crystallins in other rodents,
such as rats, mice, hamsters, gerbils and
squirrel.
43Additional Facts (1) The aA-crystallin of
Spalax possess all the prerequisites for normal
function and expression, including the proper
signals for alternative splicing. (2) The
aA-crystallin of Spalax evolves slower than
pseudogenes.
44Explanation 1 The aA-crystallin gene may not
have lost all of its vision-related functions,
such as photoperiod perception and adaptation to
seasonal changes. Contradicting evidence
The atrophied eye of Spalax does not respond
to light.
45Explanation 2 The blind mole rat lost its
vision more recently than 25 million years ago.
The rate of nonsynonymous substitution after
nonfunctionalization has been underestimated.
Contradicting evidence The aA-crystallin gene
is still an intact gene as far as the essential
molecular structures for its expression are
concerned.
46- Explanation 3
- The aA-crystallin-gene product serves another
function (unrelated to that of the eye).
aA-crystallin is a multifunctional protein - Supporting evidence
- aA crystallin has been found in other tissues.
- aA crystallin also functions as a chaperonin that
binds denaturing proteins and prevents their
aggregation. - The regions within aA crystallin responsible for
chaperonin activity are conserved in the mole
rat. - The protein has viable secondary and quarternary
structures as well as normal thermostability.
47Genetic nonfunctionalization or partial
nonfunctionalization accelerates evolution.
Most evolutionary action occurs after death.
48The Concept of Functional Constraint
The intensity of purifying selection is
determined by the degree of intolerance
characteristic of a site or a genomic region
towards mutations. The functional or selective
constraint defines the range of alternative
nucleotides that is acceptable at a site without
affecting negatively the function or structure of
the gene or the gene product. DNA regions, in
which a mutation is likely to affect function,
have a more stringent functional constraint than
regions devoid of function
49The stronger the functional constraints on a
macromolecule are, the slower its rate of
substitution will be.
50Functional density (Zuckerkandl 1976) The
functional density, F, of a gene is defined as
ns/N, where ns is the number of sites committed
to specific functions and N is the total number
of sites. F, therefore, is the proportion of
amino acids that are subject to stringent
functional constraints.
51Functional density (Zuckerkandl 1976) The
higher the functional density, the lower the rate
of substitution is expected to be. Thus, a
protein in which the active sites constitute only
1 of its sequence will be less constrained, and
therefore will evolve more quickly than a protein
that devotes 50 of its sequence to performing
specific biochemical or physiological tasks.
52According to the neutral theory of evolution, the
rate of substitution (as inferred from
between-species comparisons) should positively
correlate with the degree of genetic polymorphism
(as inferred from comparisons among individuals
within one species). An interesting corollary
of this hypothesis is that we should observe very
little or no variation at the population level at
evolutionary conserved positions. The variation
observed at conserved positions should be mostly
deleterious (i.e., associated with disease).
53Substitution rates and disease The case of
Gaucher disease
Gaucher disease is an autosomal recessive
lysosomal storage disorder due to deficient
activity of an enzyme called acid b-glucosidase.
There are many subtypes of Gaucher disease with
fitness effects ranging from slight reduction in
fitness to perinatally lethal, in which death
occurs during the period between 154 days of
gestation to seven days after birth.
54b-glucosidase
We aligned the amino acid sequences of acid
b-glucosidase from nine placental mammals (human,
chimpanzee, Sumatran orangutan, bovine, pig, dog,
horse, rat, and mouse). The length of the
alignment (excluding one gap due to a codon
deletion in the ancestor of mouse and rat) was
496 amino-acids, of which 387 (78) were
identical in all nine species and 109 (22) were
variable..
55Thirty-six single amino-acid replacements (at 34
amino-acid positions) resulting in Gaucher
disease are described in the literature.
Perinatal lethal mutations are shown in red.
56All 36 deleterious mutations occur at completely
conserved sites (below asterisks). The
expectation under a random model is that only 36
0.78 28 mutations should occur at completely
conserved sites. This statistically significant
non-random association between disease and
evolutionary conservation (p 0.0002) indicates
that invariable sites are conserved because they
evolve under extremely stringent functional
constraints and cannot tolerate change.
57Q What determines functional constraint? A Many
factors. Q Example? A Interactions.
58A network (or graph) is an abstract
representation of a set of objects, where some
objects are connected to one another. The objects
are represented by vertices (or nodes), and the
links that connect the vertices are called edges
(or branches). Edges can be polarized
59Edges can be polarized to indicate directionality
and type of interaction (e.g., activation,
inhibition). Edges can also be quantified to
denoted extent of effect.
60Protein-protein interaction networks (a) A
simple example of a protein-protein interaction
network consisting of five proteins (A-E),
represented by the nodes, each of which interacts
with at least one other protein. There are five
interactions, denoted by the links. In
biological networks, three variables are usually
studied (b) degree centrality or connectedness
the number of interactions for a protein. (c)
betweenness centrality the number of times
that a node appears on the shortest path between
all pairs of nodes. (d) closeness centrality
the mean number of links connecting a protein to
all other proteins in the network.
61 Proteins with high connectedness evolve
slowly. Proteins with low connectedness evolve
fast. Proteins with high betweenness evolve
slowly. Proteins with low betweenness evolve
fast. Proteins with high closeness evolve
slowly. Proteins with low closeness evolve fast.
62Why do the rates of synonymous substitution vary
from gene to gene? (1) The variation represents
stochastic fluctuations. (2) The variation is
due to deterministic factors on top of
stochastic fluctuations. (2.1) Variation in the
rate of mutation among different regions of the
genome. (2.2) Selection operating on synonymous
mutations.
63- Fact There is a positive correlation between
synonymous and nonsynonymous substitution rates
in a gene. - Explanations
- The rate of mutation varies along the genome and
among genes (and hence some genes will have both
high synonymous and nonsynonymous rates of
substitution) - The extent of selection at synonymous sites is
affected by the nucleotide composition at
adjacent nonsynonymous positions. - (1) and (2).
64In the absence of positive Darwinian selection,
the universal observation is that important
sequences tend to evolve slower than less
important ones. The opposite, however, is not
always true. That is, conserved regions in the
genome may not always be important. Defining
importance is not a trivial undertaking.
65Hurst and Smith (1999) tested the relationship
between rate of substitution and dispensability
(a proxy for importance). Approximately two
thirds of all knockouts of individual mouse genes
give rise to viable fertile mice. These genes
have been termed non-essential, in contrast to
essential genes, the knockouts of which result
in death or infertility. It is predicted that
non-essential genes will subject to lesser
intensities of purifying selection, and should
therefore evolve faster than essential genes.
66In a comparison of 74 non-essential genes with 64
essential ones, the rate of substitution was
found not to correlate with the severity of the
knockout phenotype. To account for differences
in function, Hurst and Smith (1999) restricted
their analysis exclusively to neuron-specific
genes, which have significantly lower rates of
substitution than other genes. They could find
no difference in the rate of substitution between
16 essential neuron-specific genes and 18
non-essential ones.
67The functional role (if any) of 98 of mammalian
genomes remains undetermined. Nóbrega et al.
(2004) deleted 2 Mb-long sequences from the
mouse genome, a 1,817,000 region mapping to mouse
chromosome 3 and a 983,000 region mapping to
chromosome 19. (Orthologous regions of about the
same size are present on human chromosomes 1 and
10, respectively.) Viable mice homozygous for
the deletions were generated and were
indistinguishable from wild-type littermates with
regard to morphology, reproductive fitness,
growth, longevity, and general homeostasis.
Further analysis of the expression of multiple
genes bracketing the deletions revealed only
minor expression differences between
homozygous-deletion mice and wild-type mice.
68The two deleted segments harbor 1,243 non-coding
sequences conserved between humans and rodents
(more than 100 base pairs, 70 identity). Yet,
the deletion of so many sequences that have been
conserved for such long period of time
(mouse-human divergence 100 million years)
resulted in no reduction in fitness.
Conclusion I There are potentially
disposable DNA in the genomes of mammals.
Conclusion II Sequence conservation may not
necessarily indicate constraint.
69Ahituv et al. (2007) removed from the mouse
genome four ultraconserved elementssequences of
200 base pairs or longer that are 100 identical
among human, mouse, and rat.
70 Remarkably, lines of mice homozygous for the
four deletions were viable and fertile, and
failed to reveal any developmental or phenotypic
abnormalities.
71These results indicate that extreme sequence
conservation may not necessarily reflect extreme
evolutionary constraint.
There must be forces other than selection that
promote sequence conservation.