Models of Protein Evolution - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Models of Protein Evolution

Description:

DNA (4 x 4 rate matrix) vs amino acid (20 x 20) resulting in many more ... in MrBayes), but all codon positions the same base freq and substitution matrix ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 20
Provided by: guille3
Category:

less

Transcript and Presenter's Notes

Title: Models of Protein Evolution


1
Models of Protein Evolution
  • Amino acid sequences (20 amino acids)
  • Protein-coding DNA sequences

2
Models for Amino Acid Sequences
  • DNA (4 x 4 rate matrix) vs amino acid (20 x 20)
    resulting in many more parameters and thus,
    computation time
  • Consequently, amino acid models have
    concentrated on empirical approaches
  • EMPIRICAL (several implemented in MrBayes model
    fixed )
  • NON-EMPIRICAL (model variable in MrBayes)

3
Models for Amino Acid Sequences
  • EMPIRICAL (several implemented in MrBayes model
    fixed ) 20 x 20 matrices
  • Dayhoff et al. (1978) matrix based on the
    observation of 1572 accepted mutations between 34
    superfamilies of closely related sequences
  • JTT matrix (Jones et al. 1992 Gonnett et al.
    1992) same methodology as Dayhoff, but with
    modern databases (other later modifications for
    transmembrane Jones et al. 1994)
  • mtREV (Adachi and Hasegawa 1995, 1996) matrix
    derived from maximum likelihood-inferred
    replacements in mitochondrial proteins of 20
    vertebrate species
  • WAG (Whelan and Goldman 2001) matrix derived from
    maximum likelihood improvement of JTT
  • Poisson assumes equal stationary state
    frequencies and equal substitution rates
    (equivalent to JC model for DNA). Not really
    empirical, but it is fixed

4
Dayhoff Evolutionary Mutation Matrix
5
Models for Amino Acid Sequences
  • 2. NON-EMPIRICAL ( variable in MrBayes)
  • Equalin in MrBayes substitution rates equal,
    but adjusted for amino acid equilibrium
    frequencies in dataset (equivalent to F81 for
    DNA)
  • Cao et al. (2004) and Goldman and Wheelan (2002)
    Empirical matrices adjusted for the equilibrium
    frequencies of amino acids in dataset (similar to
    base frequencies in GTR matrix 19 rate
    parameters)
  • GTR allows all stationary state frequencies and
    substitution rates to vary (MANY parameters 19
    free stationary state frequency parameters and
    189 free substitution rate parameters)

6
Equalin instantaneous rate matrix
7
GTR instantaneous rate matrix
8
Phylogeny inference software for Amino Acid
Sequences
  • Parsimony PAUP and others
  • Maximum Likelihood
  • Molphy (only UNIX)
  • TreePuzzle uses quartet puzzling a different
    search strategy
  • PhyML (web based or downloadable binary) uses a
    different search algorithm
  • RaxML
  • ProtML (really old probably slow searches)
  • Bayesian MrBayes
  • Distance (MEGA)
  • www.megasoftware.net
  • implements empirical (Dayhoff and JTT, as well as
    Poisson model for inferring pairwise distances)

9
Models of Protein Evolution
  • Amino acid sequences (20 amino acids)
  • Protein-coding DNA sequences
  • Codon-position models (4 nucleotides)
  • Codon-based models (64 codons)

10
1. Codon-position models (4 nucleotides)
  • 1st, 2nd, and 3rd codon positions are treated
    differently
  • Equivalent to establishing different partitions
    such as different partitions for different genes
    (covered in Data Partition lecture)
  • Uses NO information from the Genetic Code
  • Each codon position can have
  • A different rate (partition by codon position in
    PAUP and in MrBayes), but all codon positions
    the same base freq and substitution matrix
  • A different substitution matrix (MrBayes but not
    PAUP)
  • A different base frequencies (MrBayes but not
    PAUP)
  • A different gamma distribution (MrBayes but not
    PAUP)
  • Or a combination of the above (MrBayes but not
    PAUP)
  • see Shapiro et al. 2006

11
2. Codon-based models (64 codons)
12
2. Codon-based models
  • consider a codon triplet as the unit of evolution
  • A codon can change to another only through steps
    of one nucleotide change at a time
  • distinguish between synonymous (silent) and
    nonsynonymous (replacement) substitutions
  • Uses a 64 X 64 (or 61 X 61 excluding stop codons
    3721) matrix of probabilities of change among
    codons
  • The two most commonly used models employ an
    extension of the HKY DNA model (ts/tv ratio and
    base or codon frequencies) and an additional
    parameter
  • nonsynonymous/synonymous rate ratio (?)
  • Models
  • Goldman and Yang 1994 Yang et al. 1998
  • Muse and Gaut 1994
  • Widely used for testing hypotheses about natural
    selection (see PAML manual), but not for
    phylogenetic inference because of the
    computational expense
  • However, may still be used to select among a
    reduced number of possible trees
  • see Ren et al. 2005

13
2. Codon-based models
  • Goldman and Yang 1994 Yang et al. 1998
  • Equilibrium frequencies of each codon are
    estimated from the codon frequency (?j)
  • Parameters
  • qij transition probability of codon i to codon
    j
  • ?j frequency of codon j
  • ? transition/transversion ratio
  • ? nonsynonymous/synonymous rate ratio

14
2. Codon-based models
  • Muse and Gaut 1994
  • Transition probability of codons in proportional
    to the equilibrium frequencies the target
    nucleotide rather than of the target codon
  • Equilibrium frequencies of target nucleotides can
    be treated the same for all three codon positions
    together or separately for each codon position
    (parameter k below)
  • Parameters
  • qij transition probability of codon i to codon
    j
  • ?j frequency of nucleotide j at codon position
    k (k 3)
  • ? transition/transversion ratio
  • ? nonsynonymous/synonymous rate ratio

15
2. Codon-based models
  • Inagaki Y, Roger AJ (2006) present a problem of
    codon-based models when codon usage varies among
    lineages
  • Similar to a long branch attraction effect when
    two distantly related lineages have similar codon
    biases
  • None of the models implemented to date
    incorporates codon usage heterogeneity among
    lineages

16
2. Codon-based models Software
  • PAML
  • implements the two models described (i.e.,
    extension of the HKY)
  • Not good for searching trees
  • In practice, used to compare hypotheses of models
    or parameters on one or a few trees especially
    tests of positive selection
  • Parameters
  • ?j frequency of codon j or frequency of
    nucleotide j at codon position k (k 3)
  • ? transition/transversion ratio
  • ? nonsynonymous/synonymous rate ratio
  • Fixed or variable among lineages (branch models)
  • Fixed or variable among sites (sites models)
  • Fixed or variable among sites and lineages
    (branch-site models)

17
2. Codon-based models Software
  • MrBayes
  • does allow you to search for the tree topology
    (i.e., the topology does not have to be fixed)
  • the substitution matrix has 3600 instead of 16
    cells
  • Runs 200 times slower
  • Require 16 times more memory than nucleotide
    models
  • Parameters
  • GTR (F81 or JC) rather than HKY
  • substitution probabilities and equilibrium
    frequencies of nucleotides? or codons?)
  • ? nonsynonymous/synonymous rate ratio
  • Fixed among lineages
  • May vary among sites
  • values 0 lt ?1 lt 1, ?2 1, and ?3 gt 1 (Ny98)
  • ?1 lt ?2 lt ?3 (M3)

18
References
  • Anisimova, M., and C. Kosiol. 2009. Investigating
    Protein-Coding Sequence Evolution with
    Probabilistic Codon Substitution Models.
    Molecular Biology and Evolution 26255-271.
    (deals more with hypothesis testing than
    phylogenetic inference).
  • Huelsenbeck, J. P., P. Joyce, C. Lakner, and F.
    Ronquist. 2008. Bayesian analysis of amino acid
    substitution models. Philosophical Transactions
    of the Royal Society B Biological Sciences
    3633941-3953.
  • Le, S. Q., N. Lartillot, and O. Gascuel. 2008.
    Phylogenetic mixture models for proteins.
    Philosophical Transactions of the Royal Society
    B Biological Sciences 3633965-3976. (see
    http//www.atgc-montpellier.fr/models/index.php?mo
    delmixture)
  • Inagaki Y, Roger AJ (2006) Phylogenetic
    estimation under codon models can be biased by
    codon usage heterogeneity. Mol. Phylogenet. Evol.
    40, 428-434.
  • Kosiol C, Holmes I, Goldman N (2007) An empirical
    codon model for protein sequence evolution.
    Molecular Biology and Evolution 24, 1464-1479.
  • Shapiro B, Rambaut A, Drummond AJ (2006) Choosing
    appropriate substitution models for the
    phylogenetic analysis of protein-coding
    sequences. Mol. Biol. Evol. 23, 7-9.
  • Ren FR, Tanaka H, Yang ZH (2005) An empirical
    examination of the utility of codon-substitution
    models in phylogeny reconstruction. Systematic
    Biology 54, 808-818.

19
References
  • Chapter 14 Felsensteins textbook Models of
    protein evolution.
  • MrBayes manual sections 4.1.34.2.3.
  • PAML manual.
  • Goldman N, Yang Z (1994) A codon-based model of
    nucleotide substitution for protein-coding DNA
    sequences. Mol. Biol. Evol. 11, 725-736
  • Muse SV, Gaut BS (1994) A likelihood approach for
    comparing synonymous and nonsynonymous
    substitution rates, with application to the
    chloroplast genome. Mol. Biol. Evol. 11, 715-724.
  • Yang Z, Nielsen R, Hasegawa M (1998) Models of
    amino acid substitution and applications to
    mitochondrial protein evolution. Mol. Biol. Evol.
    15, 1600-1611.
Write a Comment
User Comments (0)
About PowerShow.com