Title: Models of Molecular Evolution II
1Models of Molecular Evolution II
- Level 3 Molecular Evolution and Bioinformatics
- Jim Provan
Page and Holmes Sections 7.3 7.4
2Isochore structure of vertebrate genomes
- Why do patterns of base composition the
frequencies of the four bases and of codons used
to specify amino acids differ between genomes? - Mean G C content in bacteria ranges from 25 to
75, but there is little intragenome variation - Genomes of vertebrates have a much greater range
of G C values - Caused by continuous sections (gt 300kb) each of
which has a uniform G C content (isochores) - G C content of isochores also varies between
species
3Properties of vertebrate isochores
4Theories on the existence of isochores
- Selectionist hypothesis of Bernardi et al.
suggests that GC-rich isochores predominantly
found in warm-blooded vertebrates are an
adaptation to higher body temperature - Extra hydrogen bond in G-C pair may lessen
possibility of thermal damage to DNA - Desert plants also have higher GC contents
- Evidence for independent occurrence of isochores
since birds and mammals do not share an immediate
ancestor - However, some thermophilic bacteria are AT-rich
5Theories on the existence of isochores
- Neutralist explanation for the existence of
isochores is that they simply reflect variation
in the process of mutation across the genome - Studies on argininosuccinate synthetase processed
pseudogenes from anthropoid primates - Pseudogenes were derived from same functional
ancestral gene but then inserted into different
parts of the genome - Despite their common ancestry, they now differ in
base composition - Because pseudogenes are not subject to selection,
differences in base composition must have been
due to regional variation in mutation patterns
6Why should mutation patterns vary across genomes?
- Replication hypothesis suggests that genes which
replicate earlier in the cell cycle are more
GC-rich than those which replicate later - Believed to be due to the fact that G and C
precursor pools of dNTPs are larger at this time
errors are more likely to incorporate G or C - Repair hypothesis is based on assumption that
efficiency of DNA repair varies across genome - May be an outcome of transcriptionally active
areas being repaired more efficiently - CpG islands are maintained by a special repair
system efficiency of DNA replication may be
dependent on location
7Why should mutation patterns vary across genomes?
- Recombination hypothesis claims that isochore
structure of vertebrate genomes is the outcome of
differences in the pattern and frequency of
recombination - Low GC localities will be associated with regions
of reduced recombination - Genes with low rates of recombination have low GC
values - The large, non-recombining region of the
Y-chromosome has a low GC composition - Fact that recombination plays such a large part
in the structuring of eukaryote genomes makes
this an attractive hypothesis - Although the relative contributions of these
hypotheses are still unclear, the neutralist
interpretation seems more likely
8Codon usage
9What determines codon usage?
- Degeneracy of genetic code
- Null hypothesis is that all codons for a
particular amino acid are used with equal
frequency - Refuted when nucleotide sequences became
available for a wide range of organisms - Selectionist argument
- Highly expressed genes show most codon bias
because they require more translational
efficiency coevolution of tRNAs and codons - Also supports the neutralist prediction of a
relationship between functional constraint and
substitution rate
10Gene expression and codon bias
Highly expressed genes
Lowly expressed genes
11The molecular clock
- Idea of a molecular clock is central to the
neutralist theory, since it demonstrates the
constancy of the underlying neutral mutation rate - Previous example of a-globin
- Does not imply that all genes and proteins evolve
at the same rate - Great variation between proteins (fibrinonectins
vs. histones) - Variation in rate among genes and proteins is
compatible with the neutral theory if the
underlying cause is changes in selective
constraint - Key question concerning the validity of a
molecular clock is whether rates of substitution
are constant within genes across evolutionary time
12Neutral theory and the molecular clock
- Rate of nucleotide substitution (fixation) at any
site per year, k, in a diploid population of size
2N is equal to the number of new mutations
(neutral, deleterious or advantageous) arising
per year, m, multiplied by their probability of
fixation, u - k 2N mu
- For a neutral mutation, probability of fixation
is reciprocal of population size - u 1/2N
- So substitution rate for a neutral mutation is
- k (2N )(1/2N )m
13Neutral theory and the molecular clock (continued)
- Parameters for population size (2N) cancel out,
leaving - k m
- One of the most important formulae in molecular
evolution means that rate of substitution in
neutral mutations is dependent only on underlying
mutation rate and is independent of other factors
such as population size - Also holds for mutants with a very weak selective
advantage e.g. s lt 1/2Ne
14Substitution of selectively advantageous mutations
- Probability of fixation is roughly twice the
selection coefficient - u 2sNe/N
- Substituting this into the original equation, we
get - k 4Nesm
- In this case, substitution rate for an
advantageous mutation also depends on population
size and magnitude of selective advantage - For natural selection to produce a molecular
clock, it is necessary for Ne, s and m
(combination of ecological, mutational and
selective events) to be the same across
evolutionary time highly unlikely!
15Constancy of the molecular clock
- Neutral theory predicted a molecular clock and
first protein sequence data appeared to confirm
this led Kimura to cite this as the best
evidence for neutrality - As more comparative sequence data became
available, particularly from mammals, examples of
rate variation began to appear - Debate arose concerning the constancy of the
molecular clock
16Testing the molecular clock
- Dispersion index R(t) test whether there is more
rate variation between lineages than expected
under a Poisson process - If the data fit a Poisson process, variance in
number of substitutions between lineages should
be no greater than the mean number - If the data fit a Poisson process then R(t)
1.0, if not then R(t) gt 1.0 and the clock is said
to be overdispersed - A star phylogeny should be used, since any
phylogenetic structure will complicate the
calculations (e.g. placental mammals)
17Testing the molecular clock
- Mammalian protein data presented a serious
problem for neutralists - Problems most likely due to inaccuracies in
phylogenies - Outlier in data was guinea pig
- Guinea pig is much more divergent than previously
thought
18The relative rate test
- The relative rate test compares the difference
between the numbers of substitutions between two
closely related taxa in comparison with a third,
more distantly related outgroup
- If A and B have evolved according to a molecular
clock, both should be equidistant from C - dAC dBC
- A and B must be closest relatives and C must not
be too far removed
19The relative rate test
- Synonymous sites in nine nuclear genes (3520 bp)
- d12 6.7
- d13 d23 2.3 0.6
- yh-globin pseudogene (1827 bp)
- d12 7.9
- d13 d23 1.5 0.4
- Three introns (3376 bp)
- d12 6.9
- d13 d23 1.0 0.5
- Two flanking regions (936 bp)
- d12 7.9
- d13 d23 3.1 1.1
20Lineage effects and the molecular clock
- Substitution rate varies with underlying neutral
mutation rate k m - Three ways for rates to vary between species
- Differences in generation time
- Differences in metabolic rate
- Differences in efficiency of DNA repair
- These are known as lineage effects neutralists
believe that lineage effects alone can account
for all variation in molecular clock - Selectionists believe that genes also show rate
variation due to other, selection-driven factors
(residue effects)
21Generation time and the molecular clock
22Generation time and the molecular clock
- At the molecular level, generation time (g) can
be defined as time it takes for germ-line DNA to
replicate i.e. from one gamete to the next - Since most mutations occur at this point, rate
of substitution under neutral theory is a
function of both mutation rate and generation
time - k m/g
- General conclusion from molecular data is that
the clock is generation time dependent at silent
sites and in non-coding DNA - Silent rates in orang-utan, gorilla and chimp are
1.3-, 2.2- and 1.2-fold faster than in humans,
which matches differences in generation times
23The metabolic rate hypothesis
- In sharks, rate of silent change is five- to
sevenfold lower than in primates and ungulates
which have similar generation times - Led to the hypothesis that differences in
molecular rate are a better explanation for
differences in mutation rates than differences in
generation time (metabolic rate hypothesis) - States that organisms with high metabolic rates
have higher levels of DNA synthesis - Two pieces of mitochondrial DNA evidence support
this - Small bodied animals, which have higher metabolic
rates, tend to have higher mutation rates - Warm-blooded animals also have higher mutation
rates than cold-blooded animals
24Relationship between body mass and sequence
evolution
25DNA repair and mutation
26DNA repair and mutation
- Repair mechanisms are extremely complex and there
are many repair pathways - There is some evidence supporting the hypothesis
that DNA repair influences mutation rate - Evidence that highly transcribed genes are more
efficiently repaired - Base composition and substitution rates at silent
sites in mammalian genes tends to be gene- rather
than species-specific suggests that homologous
genes are transcribed and repaired in a similar
manner - Conversely, closely related species such as
hominind primates, which share very similar
repair mechanisms, can exhibit greatly differing
substitution rates