Phylogenetics Marek Kimmel Statistics, Rice kimmelrice'edu 713 348 5255 - PowerPoint PPT Presentation

About This Presentation
Title:

Phylogenetics Marek Kimmel Statistics, Rice kimmelrice'edu 713 348 5255

Description:

... by Maximum Likelihood (PAML) 3.0, University College London, London, England. ... Maps functionally important residues (FIR) onto surface. Trace Integral ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 33
Provided by: spr124
Category:

less

Transcript and Presenter's Notes

Title: Phylogenetics Marek Kimmel Statistics, Rice kimmelrice'edu 713 348 5255


1
PhylogeneticsMarek Kimmel (Statistics,
Rice)kimmel_at_rice.edu713 348 5255
2
Molecular phylogenetics
  • Study of evolutionary relationships among
    organisms, genes or proteins, by a combination of
    molecular biology and statistical techniques
  • Molecular data are more powerful DNA and protein
    sequences generally evolve in a more regular
    manner than morphological and physiological
    characters. Also, they are more amenable to
    quantitative treatment.

3
Trees
  • Phylogenetic relationships are usually
    represented in the form of trees.
  • Observations available only at the bottom of the
    tree. Find tree structure (topology) and branch
    lengths.

4
Vocabulary of trees
  • Phylogenetic tree A graph composed of nodes and
    branches, in which only one branch connects any
    two adjacent nodes.
  • Nodes Taxonomic units. Usually represented by
    sequences (DNA or AA).
  • Branches Relationships among units in terms of
    descent/ancestry.
  • Topology Branching pattern of a tree. Number of
    topologies is generally enormous,
  • N(n)(2n-3)!/2n-2(n-2)!.
  • Branch length changes in the branch.
  • Terminal (external) nodes Extant TUs OTUs
    (operational taxonomic units).
  • Internal nodes Ancestral TUs.
  • Root A node from which unique path leads to any
    other node, in the direction of evolutionary
    time. The root is the common ancestor of all
    OTUs under study.
  • Rooted/unrooted tree Specifies (does not
    specify) the evolutionary path.

5
Phylogenetic Tree
6
Tree building methodologies
  • Distance methods construct tree by joining
    sequences with small distance between them.
  • Maximum parsimony finds tree which explains
    observed data using smallest of substitution.
  • Maximum likelihood maximizes the likelihood of
    all possible trees. Probabilistic model of
    evolution is assumed

7
Overview
  • Principles of maximum likelihood trees
  • Variation in substitution rates
    Felsenstein-Churchill algorithm
  • Finding functionally important sites in proteins
    Comparison with Evolutionary Trace

8
Phylogenetic tree
9
Maximum likelihood (Felsensteins) trees
  • Want to find a tree with the highest probability
    (i.e., likelihood) of reproducing data on OTUs
  • Assume independence of evolution at different
    sites.
  • Compute probability of given tree site by site.
  • Take product of all sites at end
  • If all sequences know, likelihood is product of
    probabilities of change in each segment times
    prior probability of initial state.

10
Model of evolution
  • Start with a rate matrix (Q) one which states
    the rates of change from one amino acid to
    another
  • Obtain a prob. matrix via P(t)eQt or by
    iterating PAM1 or BLOSUM1 substitution matrices

A K R N K T T H H N V P P A D
substitution
time
A K R N K T N H H N V P P A D
11
Likelihood
  • If all ancestors are known, likelihood is simple
    to compute
  • L product of probability of change in each leaf
    of tree times prior prob. of given ancestor,
    calculated site by site along sequence
  • Pij(t) probability of a lineage initially
    having amino acid i will have amino acid j after
    t time units

12
Phylogenetic tree and its likelihood
13
Likelihood calculations with internal nodes
unknown
  • Problem amino acids at interior nodes are not
    generally known
  • Solution
  • generalize the likelihood
  • sum over all possible assignments of amino acids
    to the splits of the trees

14
Phylogenetic tree and its likelihood
15
Likelihood calculation
  • Restate in terms of conditional likelihoods
  • Ls(k) likelihood based on data at or above
    point k on tree, given point k is known to have
    amino acid s for the specific site under
    consideration
  • Work up tree from tips to root

16
Pulley principle
  • Any segment of an unrooted tree can contain the
    root
  • Related to reversibility of the evolutionary
    model
  • Allows us to alter the length of branches in an
    optimal fashion
  • Can construct an algorithm to alter branches one
    at a time to find tree with highest likelihood

17
Are likelihood trees significant?Non-Parametric
(Felsenteins) Bootstrap
Species 1 A T K Species 2 A R K
Species 3 K T F
MSA
Sampling columns with replacement
Bootstrapping allows investigating the influence
of misalignments
Bootstrapped MSA Species 1 T K T Species
2 R K R Species 3 T F T
18
Non-parametric (Felsenteins) bootstrap
  • For each resampled MSA a new bootstrap tree is
    computed.
  • Bootstrapping allows investigating the influence
    of misalignments on tree structure and branch
    length.
  • Various statistics can be collected, e.g.,
    fraction of trees including given branch
    (bootstrap support for the branch).

19
Hidden Markov Model
Simultaneous estimation of phylogeny and site
specific substitution rates
  • Felsenstein, J. and Churchill, G. (1996) A Hidden
    Markov Model Approach to Variation Among Sites in
    Rate of Evolution. Molecular Biology and
    Evolution, 13(1) 93-104
  • Allows for unequal unknown evolutionary rates
    at different sites
  • Allows for correlations between rates at
    neighboring sites
  • Uses Markov process to assign rates to sites

20
HMM graphical depiction
21
PAML method
  • Yang, Z. (2000) Phylogenetic Analysis by Maximum
    Likelihood (PAML) 3.0, University College
    London, London, England.
  • Uses Felsensteins algorithm to reconstruct
    phylogeny.
  • Allows for each site to evolve at a different
    rate.
  • Returns rates of evolution for each site.

22
Evolutionary Trace
  • Lichtarge, O., Bourne, H.R., and Cohen, F.E.
    (1996) An Evolutionary Trace Method Defines
    Binding Surfaces Common to Protein Families.
    Journal of Molecular Biology, 257 342-358.
  • Identifies active sites
  • Looks for conserved residues in branches
  • Maps functionally important residues (FIR) onto
    surface

23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Trace Integral(ET reduced to a parameter)
integral is area under the graph
28
(No Transcript)
29
Substitution rates and ET ranks along the
aminoacid sequence
30
Comparison of functional sites by ET vs. HMM
Epitope (true active region)
31
Remarks
  • ML is a powerful technique to investigate
    phylogeny.
  • Computational difficulties are serious and
    require much machine power.
  • Insights can be gained into various aspects of
    evolution.

32
Maximum Parsimony
  • No model of protein evolution
  • Selects tree which minimizes of transformations
    from one state to another at all sites

A K R N K T T H H N V P P A D
substitution
time
A K R N K T N H H N V P P A D
Write a Comment
User Comments (0)
About PowerShow.com