Title: Phylogenetics Marek Kimmel Statistics, Rice kimmelrice'edu 713 348 5255
1PhylogeneticsMarek Kimmel (Statistics,
Rice)kimmel_at_rice.edu713 348 5255
2Molecular phylogenetics
- Study of evolutionary relationships among
organisms, genes or proteins, by a combination of
molecular biology and statistical techniques - Molecular data are more powerful DNA and protein
sequences generally evolve in a more regular
manner than morphological and physiological
characters. Also, they are more amenable to
quantitative treatment.
3Trees
- Phylogenetic relationships are usually
represented in the form of trees. - Observations available only at the bottom of the
tree. Find tree structure (topology) and branch
lengths.
4Vocabulary of trees
- Phylogenetic tree A graph composed of nodes and
branches, in which only one branch connects any
two adjacent nodes. - Nodes Taxonomic units. Usually represented by
sequences (DNA or AA). - Branches Relationships among units in terms of
descent/ancestry. - Topology Branching pattern of a tree. Number of
topologies is generally enormous, - N(n)(2n-3)!/2n-2(n-2)!.
- Branch length changes in the branch.
- Terminal (external) nodes Extant TUs OTUs
(operational taxonomic units). - Internal nodes Ancestral TUs.
- Root A node from which unique path leads to any
other node, in the direction of evolutionary
time. The root is the common ancestor of all
OTUs under study. - Rooted/unrooted tree Specifies (does not
specify) the evolutionary path.
5Phylogenetic Tree
6Tree building methodologies
- Distance methods construct tree by joining
sequences with small distance between them. - Maximum parsimony finds tree which explains
observed data using smallest of substitution. - Maximum likelihood maximizes the likelihood of
all possible trees. Probabilistic model of
evolution is assumed
7Overview
- Principles of maximum likelihood trees
- Variation in substitution rates
Felsenstein-Churchill algorithm - Finding functionally important sites in proteins
Comparison with Evolutionary Trace
8Phylogenetic tree
9Maximum likelihood (Felsensteins) trees
- Want to find a tree with the highest probability
(i.e., likelihood) of reproducing data on OTUs - Assume independence of evolution at different
sites. - Compute probability of given tree site by site.
- Take product of all sites at end
- If all sequences know, likelihood is product of
probabilities of change in each segment times
prior probability of initial state.
10Model of evolution
- Start with a rate matrix (Q) one which states
the rates of change from one amino acid to
another - Obtain a prob. matrix via P(t)eQt or by
iterating PAM1 or BLOSUM1 substitution matrices
A K R N K T T H H N V P P A D
substitution
time
A K R N K T N H H N V P P A D
11Likelihood
- If all ancestors are known, likelihood is simple
to compute - L product of probability of change in each leaf
of tree times prior prob. of given ancestor,
calculated site by site along sequence - Pij(t) probability of a lineage initially
having amino acid i will have amino acid j after
t time units
12Phylogenetic tree and its likelihood
13Likelihood calculations with internal nodes
unknown
- Problem amino acids at interior nodes are not
generally known - Solution
- generalize the likelihood
- sum over all possible assignments of amino acids
to the splits of the trees
14Phylogenetic tree and its likelihood
15Likelihood calculation
- Restate in terms of conditional likelihoods
- Ls(k) likelihood based on data at or above
point k on tree, given point k is known to have
amino acid s for the specific site under
consideration - Work up tree from tips to root
16Pulley principle
- Any segment of an unrooted tree can contain the
root - Related to reversibility of the evolutionary
model - Allows us to alter the length of branches in an
optimal fashion - Can construct an algorithm to alter branches one
at a time to find tree with highest likelihood
17Are likelihood trees significant?Non-Parametric
(Felsenteins) Bootstrap
Species 1 A T K Species 2 A R K
Species 3 K T F
MSA
Sampling columns with replacement
Bootstrapping allows investigating the influence
of misalignments
Bootstrapped MSA Species 1 T K T Species
2 R K R Species 3 T F T
18Non-parametric (Felsenteins) bootstrap
- For each resampled MSA a new bootstrap tree is
computed. - Bootstrapping allows investigating the influence
of misalignments on tree structure and branch
length. - Various statistics can be collected, e.g.,
fraction of trees including given branch
(bootstrap support for the branch).
19Hidden Markov Model
Simultaneous estimation of phylogeny and site
specific substitution rates
- Felsenstein, J. and Churchill, G. (1996) A Hidden
Markov Model Approach to Variation Among Sites in
Rate of Evolution. Molecular Biology and
Evolution, 13(1) 93-104 - Allows for unequal unknown evolutionary rates
at different sites - Allows for correlations between rates at
neighboring sites - Uses Markov process to assign rates to sites
20HMM graphical depiction
21PAML method
- Yang, Z. (2000) Phylogenetic Analysis by Maximum
Likelihood (PAML) 3.0, University College
London, London, England. - Uses Felsensteins algorithm to reconstruct
phylogeny. - Allows for each site to evolve at a different
rate. - Returns rates of evolution for each site.
22Evolutionary Trace
- Lichtarge, O., Bourne, H.R., and Cohen, F.E.
(1996) An Evolutionary Trace Method Defines
Binding Surfaces Common to Protein Families.
Journal of Molecular Biology, 257 342-358. - Identifies active sites
- Looks for conserved residues in branches
- Maps functionally important residues (FIR) onto
surface
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27Trace Integral(ET reduced to a parameter)
integral is area under the graph
28(No Transcript)
29Substitution rates and ET ranks along the
aminoacid sequence
30Comparison of functional sites by ET vs. HMM
Epitope (true active region)
31Remarks
- ML is a powerful technique to investigate
phylogeny. - Computational difficulties are serious and
require much machine power. - Insights can be gained into various aspects of
evolution.
32Maximum Parsimony
- No model of protein evolution
- Selects tree which minimizes of transformations
from one state to another at all sites
A K R N K T T H H N V P P A D
substitution
time
A K R N K T N H H N V P P A D