Molecular Evolution - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Molecular Evolution

Description:

Distance-based tree construction methods (UPGMA and neighbour joining) ... Exceptions and limitations: The primate lineage appears to have a somewhat lower ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 13
Provided by: isrecI
Category:

less

Transcript and Presenter's Notes

Title: Molecular Evolution


1
Molecular Evolution
  • The neutral theory of molecular evolution
  • The molecular clock hypothesis
  • Positive selection
  • Phylogenetic trees
  • Evolutionary distance measures
  • Distance-based tree construction methods (UPGMA
    and neighbour joining)

EPFL Bioinformatics I 16 Jan 2006
2
Neutral theory of molecular evolution
  • Historical background (Haldanes paradox)
  • Early population geneticists believed that most
    polymorphisms are maintained by balancing
    selection.
  • Balancing selection implies a genetic load
    because homozygotes are less fit than
    heterozygotes.
  • When protein electrophoresis became available, it
    was found that a very large number of genes were
    actually polymorphic. This appeared to imply an
    unacceptably high genetic load for the human and
    other populations.
  • Kimuras neutral theory of molecular evolution
    provides an explanation for Haldanes paradox
  • Claim The large majority of observed molecular
    polymorphisms reflect neutral changes. Likewise,
    most substitutions observed between homologous
    genes are selectively neutral.
  • Implications Gene (protein) families evolve
    through neutral mutations and purifying
    selection. Most genes (proteins) have not been
    improved during the period of metazoan evolution.

EPFL Bioinformatics I 16 Jan 2006
3
The molecular clock hypothesis
In 1965, Zuckerkandl and Pauling proposed that
for any given lineage, the rate of molecular
evolution (amino acid substitutions per year) is
constant over time. In other words there exists
a universal molecular clock. Implications
Mutation and substitution rates are the same in
all lineages. This supports the neutral theory of
molecular evolution (no dependence on population
size and generation time) but is difficult to
reconcile with claims that mutation rates differ
along chromosomes. Exceptions and limitations
The primate lineage appears to have a somewhat
lower evolutionary rate than other lineages. The
theory is not readily testable for lower
eukaryotes and bacteria, for which a fossil
record is lacking. Note further the rate
appears to be proportional to time, and not to
the number of generations or cell divisions. The
independence of generation time speaks against
positive selection as a driving force of
evolution. The independence of cell cycles
suggests that most mutations do not happen during
replication.
EPFL Bioinformatics I 16 Jan 2006
4
Proteins evolve at different rates
  • Making the following assumptions
  • All amino acid replacements are selectively
    neutral (neutral theory)
  • There is a constant molecular clock
  • A given protein (e.g. an enzyme) has the same
    function and thus evolves under the same
    purifying selection conditions in all species
  • it follows that
  • a given protein evolves at a constant rate in all
    lineages
  • However
  • Different proteins may evolve at different rates
    due to varying levels of functional constraints
  • At the nucleotide sequence, the different rates
    are primarily reflected by non-silent base
    substitutions (assuming that silent substitutions
    are selectively neutral).
  • These predictions are matched by many protein
    families

EPFL Bioinformatics I 16 Jan 2006
5
Positive Selection
  • Positive selection may occur
  • When the function of a protein is improved, e.g.
    the efficiency or substrate specificity of an
    enzyme
  • When a protein is undergoing adaptation to
    changes in the environment, e.g. viral surface
    proteins try to escape the immune system. This
    case is documented by many examples and may
    generally be more frequent than functional
    improvement.
  • Potential evidence for positive selection
  • Ratio of silent versus non-silent amino acid
    substitutions
  • Accelerated or population-size dependent rate of
    amino acid substitutions in a particular lineage
  • More silent replacements among within species
    polymorphisms than among between species
    replacements (evidence for positive selection in
    the past).

EPFL Bioinformatics I 16 Jan 2006
6
Phylogenetic trees
Rooted tree
Rooted tree satisfying molecular clock
hypothesis all leaves at same distance from the
root.
root
6
root
7
8
time
7
6
8
3
5
1
2
4
2
1
3
4
5
Unrooted tree
Note 1-5 are called leaves, or leave nodes. 6-8
are inferred nodes corresponding to ancestral
species or molecules. Branches are also called
edges. The edge lengths reflect evolutionary
distances.
3
4
8
6
2
7
5
1
Bioinformatics I 16 Jan 2006
7
Phylogenetic trees
A phylogenetic tree is a graph reflecting the
approximate distances between a set of objects in
a hierarchical fashion. A tree is also called a
dendrogram. There are different types of trees
Unrooted versus rooted trees A rooted tree has
an additional node representing the origin, in
molecular phylogeny the last common ancestor of
the sequences analyzed. In general, the root
cannot be directly inferred from the data. It may
be inferred from the paleontological record, from
a trusted outlier, or on the basis of the
molecular clock hypothesis. Scaled and
unscaled trees In an unscaled tree, the length
of the branches are not important. Only the
topology counts. In phylogeny, trees are usually
scaled. Binary trees each node branches into
two daughter nodes. Other trees are usually not
considered in phylogegy as they can easily be
approximated by binary trees with very short
edges between nodes. Note A rooted (unrooted)
tree connecting n objects (leaves) has 2n1
(2n2) nodes altogether and 2n2 (2n3) edges
Bioinformatics I 16 Jan 2006
8
Phylogenetic tree reconstruction, overview
  • Computational challenge There is an enormous
    number of different topologies even for a
    relatively small number of sequences
  • 3 sequences 1
  • 4 sequences 3
  • 5 sequences 15
  • 10 sequences 2,027,025
  • 20 sequences 221,643,095,476,699,771,875
  • Consequence Most tree construction algorithm are
    heuristic methods not guaranteed to find the
    optimal topology.
  • Input data for two major classes of algorithms
  • Input data distance matrix, examples UPGMA,
    neighbor-joining
  • 2. Input data multiple alignment parsimony,
    maximum likelihood
  • Distance matrix methods use distances computed
    from pairwise or multiple alignments as input.

Bioinformatics I 16 Jan 2006
9
Distance measures for phylogenetic tree
construction
Distance measures respect the following
constraints d 0 if the sequences are
identical, d gt 0 if the sequences are
different Distances between molecular sequences
are computed from pair-wise alignment scores.
For closely related DNA sequences, one could
simply use f , the fraction of non-identical
residues (readily computed from the identity
value returned by an alignment program). For
more distantly related sequences, the
Jukes-Cantor distance, dij ¾log(14f/3) is
preferred. This measures is supposed to be
proportional to evolutionary time. It takes into
account that the percent identity values
saturates at 25 over time. For protein
sequences aligned with the aid of a substitution
matrix, an approximate distance is often computed
as follows
Sobs observed pairwise alignment
score Smax maximum score (average of sequences of
sequences against themselves Srand expected
score for random sequences of same length and
composition
Bioinformatics I 16 Jan 2006
10
Distance matrixbased methods Example UPGMA
  • Unrooted pair-group method with arithmetic means
    (UPGMA)
  • Initialization
  • assign each sequence to its own cluster
  • define one leave node for each single-sequence
    cluster
  • put all leave nodes at height zero
  • Iteration.
  • determine the two clusters for which the distance
    is minimal and combine them in a new cluster.
  • compute the distance between the new cluster and
    all other clusters by averaging over all
    pair-wise distances between cluster elements
  • define a new node for the new cluster and place
    it at height corresponding to the average
    distance between the cluster elements
  • Termination
  • When only two clusters remain, place root at the
    average distance between the elements of the two
    clusters
  • Limitation of UPGMA The algorithms implicitly
    assumes a constant evolutionary rate in all
    branches. It is therefore unfit to test the
    molecular clock hypothesis.
  • An alternative method called neighbour-joining
    provides more realistic branch lengths.

Bioinformatics I 16 Jan 2006
11
Distance matrixbased methods Neighbor joining
  • Underlying assumptions
  • Additivity The distance between two leaves is
    the sum of the lengths of the edges on the path
    connecting them (may at best be approximately
    true)
  • Motivation
  • Additivity is a less stringent assumption than
    the molecular clock assumption
  • .If different branches of a tree evolve at
    different rates, the closest pair of leaves may
    not be neighboring leaves, see example below.
  • Output an un-rooted tree
  • Principle
  • A modified distance measure Dij is used to detect
    neighbors, which is obtained by subtracting the
    average distances to all other leaves from the
    distance dij

0.4
3
Example The closest pair of leaves 1,2 are not
neigbors d120.3, d130,5, however D12 1.1,
D13 1.2
1
0.1
0.1
4
2
0.1
0.4
Bioinformatics I 16 Jan 2006
12
Distance matrixbased methods Neighbor joining
  • Initialization
  • Define T to be the set of leave nodes,
  • Initialize the current set of nodes as LT.
  • Iteration
  • Pick a pair i, j from L for which Dij is minimal
  • Define a new node k and set dkm ½(dimdjmdij),
    for all m in L, also compute Dkm for all m.
  • Add k to T with edges of length
  • dij ½(dijrirj)
  • djk dijdik
  • Remove i and j from L and add k.
  • Termination
  • When L consists of two leaves i and j add
    remaining edge between i and j, with length dij
  • Formulae
  • where L is the size of the current set of leave
    nodes

Bioinformatics I 16 Jan 2006
Write a Comment
User Comments (0)
About PowerShow.com