Title: Mathematical models of molecular evolution
1Mathematical models of molecular evolution
- Agenda
- Motivation What implicit assumptions are often
made about the - the evolutionary process?
- Historical remarks the neutral theory of
molecular evolution. - The simplest mathematical models of evolution.
- random genetic drift.
- drift-mutation balance.
- drift-selection balance.
- Genotype space and fitness landscapes
- The Eigen quasispecies model.
- the error-threshold.
- More realistic fitness landscapes neutral
networks.
Erik van Nimwegen
2Parsimony tree of D-loop mtDNA of several fish
species
- We are implicitly encouraged to believe that
- Each sequence is representative of its species.
- The relationships of the sequences in the tree
reflect the evolutionary history of the species. - The length of the branches correspond to the
evolutionary distances in time.
3But why not?
- The variation in D-loop mtDNA within a species
is as large as - the variation between species.
- The tree just reflects the relationships
between the single individuals - from which the DNA for each of the species was
extracted. - The differences between the sequences dont
reflect evolutionary history - but rather selective pressures. Each sequence
is optimized for the life-style - and environment of its species.
- The tree reflects similarity in life-style and
environment, not evolutionary - history.
What do very simple models of evolution suggest?
4Historical remarksKimuras neutral theory of
molecular evolution (1968)
- Before Kimura, every locus in the genome was
(implicitly) assumed to be selected. - To maintain a population with this genome, each
individual has to produce at least - 1 offspring whose genome does not have any
deleterious mutations. - In the 1960s numbers started coming out
- the amount of DNA in mammalian genomes (109
nucleotides). - the number of amino acid substitutions in
different proteins between different - mammals (haemoglobin, cytochrome c). One amino
acid change per 107 years. - Those numbers did not make sense
- Human genome 109 nucleotides, mutation
rate 10-8. - The probability to produce an in tact
offspring is 0.000045. Thus we should have - 22,0000 offspring to produce one in tact
offspring. - 30 of Drosophila loci are polymorphic. All
selected for? - Kimuras suggestion The vast majority of
single-nucleotide changes are - selectively neutral.
- This is now well established, and neutral
evolution is often used as the - null model of molecular evolution.
5HistoryKimuras neutral theory of molecular
evolution (1968)
- Before Kimura, every locus in the genome was
(implicitly) assumed to be selected. - To maintain a population with this genome, each
individual has to produce at least - 1 offspring whose genome does not have any
deleterious mutations. - In the 1960s numbers started coming out
- the amount of DNA in mammalian genomes (109
nucleotides). - the number of amino acid substitutions in
different proteins between different - mammals (haemoglobin, cytochrome c). One amino
acid change per 107 years. - Those numbers did not make sense
- Human genome 109 nucleotides, mutation
rate 10-8. - The probability to produce an in tact
offspring is 0.000045. Thus we should have - 22,0000 offspring to produce one in tact
offspring. - 30 of Drosophila loci are polymorphic. All
selected for? - Kimuras suggestion The vast majority of
single-nucleotide changes are - selectively neutral.
- This is now well established, and neutral
evolution is often used as the - null model of molecular evolution.
6JBS Haldane (1892-1964)
- One of the founders of mathematical population
genetics. - Great popularizer of science.
- Influenced Aldous Huxleys Brave New World
7HistoryKimuras neutral theory of molecular
evolution (1968)
- Before Kimura, every locus in the genome was
(implicitly) assumed to be selected. - To maintain a population with this genome, each
individual has to produce at least - 1 offspring whose genome does not have any
deleterious mutations. - In the 1960s numbers started coming out
- the amount of DNA in mammalian genomes (109
nucleotides). - the number of amino acid substitutions in
different proteins between different - mammals (haemoglobin, cytochrome c). One amino
acid change per 107 years. - Those numbers did not make sense
- Human genome 109 nucleotides, mutation
rate 10-8. - The probability to produce an in tact
offspring is 0.000045. Thus we should have - 22,0000 offspring to produce one in tact
offspring. - 30 of Drosophila loci are polymorphic. All
selected for? - Kimuras suggestion The vast majority of
single-nucleotide changes are - selectively neutral.
- This is now well established, and neutral
evolution is often used as the - null model of molecular evolution.
8Motoo Kimura (1924-1994)
- Introduced the neutral theory.
- Developed very important new mathematical tools
in population genetics - (the application of stochastic differential
equations and diffusion models).
9Genetic drift Evolution without selection or
mutation
A population of fixed size. Each individual has
its own type (color). Individuals reproduce
clonally.
Parent generation
Offspring generation
Each individual has the same reproductive success
on average Each offspring individual in the new
generation has a parent chosen at random from
the parent generation.
10Genetic drift Evolution without selection or
mutation
A population of fixed size. Each individual has
its own type (color). Individuals reproduce
clonally.
Parent generation
Offspring generation
Each individual has the same reproductive success
on average Each offspring individual in the new
generation has a parent chosen at random from
the parent generation.
11Genetic drift Evolution without selection or
mutation
A population of fixed size. Each individual has
its own type (color). Individuals reproduce
clonally.
Parent generation
Offspring generation
Each individual has the same reproductive success
on average Each offspring individual in the new
generation has a parent chosen at random from
the parent generation.
12Genetic drift Evolution without selection or
mutation
A population of fixed size. Each individual has
its own type (color). Individuals reproduce
clonally.
Parent generation
Offspring generation
Each individual has the same reproductive success
on average Each offspring individual in the new
generation has a parent chosen at random from the
parent generation.
13Genetic drift Evolution without selection or
mutation
A population of fixed size. Each individual has
its own type (color). Individuals reproduce
clonally.
Parent generation
Offspring generation
Each individual has the same reproductive success
on average Each offspring individual in the new
generation has a parent chosen at random from the
parent generation.
Some have no offspring!
14Genetic drift Eventually one color will take
over
2 N generations.
- For a clonally reproducing population of size N.
After on average 2 N - generations all but 1 lineage will go extinct.
- More complex for sexually reproducing entities
but qualitative the same - idea Almost all genetic material stems from a
very small fraction of the - ancestral population more than 2 N generations
ago.
15Genetic drift with mutation
- Each time an individual reproduces there is a
probability µ that it - mutates and introduces a new color.
16Genetic drift with mutation
(1-µ)
- Each time an individual reproduces there is a
probability µ that it - mutates and introduces a new color.
17Genetic drift with mutation
µ
(1-µ)
- Each time an individual reproduces there is a
probability µ that it - mutates and introduces a new color.
18Genetic drift with mutation
2 N generations.
- Each time an individual reproduces there is a
probability µ that it - mutates and introduces a new color.
- In the limit of large time the number of
different colors will on - average equal C 1 2 N µ.
- In the example above µ 1/8.
19Genetic drift with mutation
The product Nµ determines the amount of genetic
diversity in a population. When Nµ 1 almost
each individual is unique. When Nµ 1 a single
type will dominate the population. Example HIV
virus. µ 10-5 per nucleotide and N 107-108
infected cells in a host. This means almost
every nucleotide is variable in the
population. Example Human µ 10-8 per
nucleotide and N 103-105 (?) A typical
nucleotide shows almost no variation in the
population. µ 10-5 per gene. A typical gene
will have few variants in a population. µ 1
per genome. Every genome is essentially unique.
20Drift and SelectionDoes a beneficial mutant
always take over?
? Fitness of mutant relative to the rest of the
population. i.e 2 means reproducing twice as
fast. p probability that the mutant will take
over the population. Thus, even with a 25
increase in fitness, the probability of the
mutant spreading is about 40 Mutants with small
fitness effects likely need to be discovered
several times before they spread.
21Ronald Aylmer Fisher (1890-1962)
- The theory of natural selection.
- Major contributions to the theory of statistics.
22Can a deleterious mutant take over?
Yes
Threshold s 1/N
When s0, the mutant is neutral and the take over
probability is 1/N. The larger the decrease in
fitness s, the smaller the chance of take over.
Selection can not see fitness differences less
than 1/N !
23SummaryDrift-mutation and Drift-selection
balance
Ns
1
1
Nµ
24Genotype space and fitness landscapes
- Genetic information is carried by the DNA. A
genotype is thus a - long string over a 4 letter alphabet A,C,G,T.
- Genetic variations
- single point mutations (replacing a single
nucleotide). - small insertions and deletions.
- recombination (gene conversion).
- excision and integration of mobile genetic
elements.
Focus on point mutations only.
- Genotypes DNA sequence of length L.
- Genotype space has 4L points.
- Each point has 3L single point mutant
neighbors.
25Example genotype spaces Two nucleotide alphabet
A,T, sequences of length L1,2,3,4.
and so on
26Fitness landscapes
fAA
Intuitive picture
fitness
fAT
fTA
fitness
fTT
genotypes
In evolution populations move uphill
27Sewall Wright (1889-1988)
- Inventor of the adaptive landscape metaphor
(1932). - One of the founders of mathematical population
genetics. - Here together with Kimura in 1968 (the year of
the neutral - theory!)
28The Eigen Quasispecies model
- Genotypes g are strings of length L over
alphabet A,C,G,T or A,T. - Each genotype g has an associated fitness fg
which denotes - the reproduction rate of g.
- The population rate is kept constant by randomly
killing individuals - at the same rate as the overall reproduction
rate. - At each replication, each letter mutates with
probability µ. - Pg is the fraction of the population with
genotype g.
After some time Pg will take on a limit
distribution. That is, the Pg will not change
anymore. This distribution is called the
quasispecies.
29Two Quasispecies examples
Fitness
Quasispecies
Quasispecies
Fitness
Distance from all A string
Distance from all T string.
- Length 40 strings A,T. Mutation rate µ0.02
- The right peak is higher and steeper than the
left peak. - Most of the population is concentrated below
the summit. - Mutation-selection balance determines this
distance from the summit. - The average fitness in both populations is the
same, i.e. the population - at a broad peak may outcompete a population at a
higher, steeper peak.
30The Error threshold
d0 (summit)
error threshold
Fraction of the population
d3
d22
d25
d2
d1
mutation rate µ
- Length L50, A,T strings.
- Fitness all A string (d0) is 3, fitness of all
other strings is 1. - Error threshold occurs (roughly) when f0 (1-µ)L
1. Here at µ 0.0217. - This threshold generally determines the balance
between selection, - mutation, and the amount of genetic
information necessary to maintain the - fitness.
31Error threshold numerical examples
- The minimal increase in relative fitness
necessary to sustain dL nucleotides of genetic - information.
- Bands show the selection/drift balance.
- For human mutational meltdown only becomes an
issue for dL 10000. Below that - drift is the main issue.
- For HIV drift is never the issue. Adding of a
whole gene to the HIV genome needs - fitness increase of 1-10 or more.
32More realistic fitness landscapesRNA secondary
structure
Each RNA molecule will (in solution) fold into
some secondary structure.
- Assume that the fitness/reproduction rate of a
RNA molecule depends - only on its secondary structure.
- Examples, fitness decreases with distance from
structure to a particular - target structure, e.g. tRNA.
33More realistic fitness landscapesRNA secondary
structure
- Population evolving from random starting RNAs.
- Long periods in which the same best structure
dominates the population. The genetic make-up of
the - population keeps changing during these periods.
- Every transition corresponds to a single point
mutation.
34More realistic fitness landscapesNeutral
networks
- Every color corresponds to a phenotype.
- Sets of genotypes with the same phenotype form
neutral networks that are intertwined with one - another.
- During evolution populations drift over these
neutral networks without observable changes in - phenotypes.
- Populations are constrained by to go where
neutral evolution can take them.
35So what about this case?
- Each sequence is representative of its species.
- Only when Nµ Laverage number of neutral mutations D-loop mtDNA
can undergo, N is the species population size
and µ is the per nucleotide mutation rate. - The relationships of the sequences in the tree
reflect the evolutionary history of the species. - Generally yes, if the selective pressures
on the D-loop mtDNA in these species is
comparable. - That is, if these pieces of DNA evolved on a
common (or very similar neutral network). - The length of the branches correspond to the
evolutionary distances in time. - Assumes (in addition the the previous
assumptions) that the parts of the neutral
network that the species evolved on have similar
structure.
36Neutral evolution of mutational robustness
- Some genotypes have more neutral neighbors than
others. - Evolution will automatically concentrate the
population on these genotypes that have most - neutral neighbors, i.e. those that are robust
against mutations. - The magnitude of this effect is a structural
feature of the neutral network.
37References and further reading
- General population genetics theory
- Hartl and Clark, Principles of population
genetics. Sinauer Associates. - The neutral theory of molecular evolution
- M. Kimura, The neutral theory of molecular
evolution, Cambridge University press. - M. Kimura, Population genetics, molecular
evolution, and the neutral theory - (selected papers), edited by Naoyuki Takahata.
The University of Chichage press. - The Eigen Quasispecies model
- M. Volkenstein, Physical approaches to
biological evolution, Springer-Verlag - M. Eigen and P. Schuster, The Hypercycle a
principle of natural self-organization, - Springer 1979.
- M. Eigen, J. McCaskill, P. Schuster, The
Molecular Quasi-species, Adv. in Chem. Phys., 75
(1989)149-263. - Neutral networks (only specialized literature
unfortunately) - W. Huynen, P. Stadler, and W. Fontana,
Smoothness within ruggedness, the role of
neutrality in adaptation. PNAS 93 (1996) 397-401. - W. Fontana and P. Schuster. Continuity in
evolution On the nature of transitions, - Science 280 (1998) 1451-1455.
- E. van Nimwegen, M. Huynen and J. Crutchfield,
Neutral evolution of mutational robustness, PNAS
96 (1999) 9716-9720.