Phylogenetic Inference - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Phylogenetic Inference

Description:

No transfer of genetic information by ... Algorithm choice is a contested, active research field. ... Order of input can be problematic. Jumble them! ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 34
Provided by: csta2
Category:

less

Transcript and Presenter's Notes

Title: Phylogenetic Inference


1
Phylogenetic Inference
  • Data
  • Optimality Criteria
  • Algorithms
  • Results
  • Practicalities

Reading Ch8
BIO520 Bioinformatics Jim Lund
2
Our Goals
  • Infer Phylogeny
  • Optimality criteria
  • Algorithm
  • Phylogenetic inference
  • (interesting ones)

3
Phylogenetic Model Assumptions
  • No transfer of genetic information by
    hybridization
  • All sequences are homologous
  • Each position in alignment homologous
  • Observed variation is valid sample from included
    group
  • Positions evolve independently

4
Steps in Analysis
  • Data Model (Alignment)
  • alignment method
  • trimming to a phylogenetic set
  • DNA base substitution model
  • Build Trees
  • Algorithm based vs Criterion based
  • Distance based vs Character-based
  • Assess tree quality.

5
Choice of Input Data
  • Data Type
  • Aligned sequences, RFLP, morphological data
  • Molecule of interest
  • rRNA (general purpose)
  • Mitochondrial DNA
  • Selected genes
  • Number/type of taxa
  • ingroup and outgroup

6
rRNA Genes
  • Conserved across kingdoms
  • Varies within species
  • Widely sequenced, easy
  • Long, lots of characters

Duplication?
7
Multiple Alignment Method
  • Phylogenetic Assumptions
  • Alignment parameters
  • (substitution matrix, gap cost)
  • Aligned features
  • primary sequence, structure
  • Optimization
  • statistical, non-statistical

8
Typical Alignment Method
  • CLUSTAL, then manual editing
  • Manual editing for phylogeny
  • phylogenetic assumption in guide tree
  • parameters a priori and dynamic
  • Optimization
  • Non-statistical
  • Remove poorly aligned regions
  • Test several gap penalties

9
Substitution Models
  • G to A, C to T versus N to N
  • Amino acid substitution
  • Forwards and backwards weights identical?
  • Site-to-site variation

Simpler model better Estimate from "quick" tree
building, Observed Variation
10
Tree-Building Methods
  • Distance-based methods
  • NJ, FM, ME, UPGMA
  • Character-based methods
  • Maximum Parsimony (PAUP)
  • Maximum Likelihood (PHYLIP)

Algorithm choice is a contested, active research
field.
11
Molecular phylogenetic tree building
methods Are mathematical and/or statistical
methods for inferring the divergence order of
taxa, as well as the lengths of the branches that
connect them. There are many phylogenetic
methods available today, each having strengths
and weaknesses. Most can be classified as
follows
COMPUTATIONAL METHOD
Clustering algorithm
Optimality criterion
Characters (bp, aa)
PARSIMONY MAXIMUM LIKELIHOOD
DATA TYPE
UPGMA NEIGHBOR-JOINING
MINIMUM EVOLUTION LEAST SQUARES
Distances
12
Distance Methods
  • Measure distance (dissimilarity)
  • Accurate if distances are all summative
    (ultrametric)
  • NEVER true over large distance
  • Methods
  • NJ (Neighbor joining)
  • FM (Fitch-Margoliash)
  • ME (Minimal Evolution)
  • UPGMA (Unweighted pair group method with
    Arithmetic Mean)

13
Which Distance Method?
  • UPGMA (Unweighted pair group
  • method with Arithmetic Mean)
  • Least accurate, still commonly used
  • NJ (Neighbor joining)
  • EXTREMELY RAPID
  • GIVES ONLY 1 TREE
  • ME (Minimal Evolution) and FM
  • (Fitch-Margoliash) seem best
  • Minimize tree path lengths

14
Inferring Trees and Ancestors
CCCAGG CCCAAG-gt CCCAAG CCCAAA-gt
CCCAAA CCCAAA-gt CCCAAC
15
Different Criteria
1 CCCAGG 2 CCCAAG 3 CCCAAA 4 CCCAAC
1,2 can be sister taxa AND 3,4 can be sister
taxa Infer ancestor of 1,2 and 3,4 Distance
from 1/2, 3/4 equal
16
Character Methods
  • Maximum Parsimony
  • minimal changes to produce data
  • can use different substitution models
  • Maximum Likelihood
  • turns problem inside out, single most likely
    tree that explains data
  • coin flip analogy
  • increasingly popular
  • Bayesian
  • Searches for Best Set of trees that explains data
    AND fits evolutionary model

17
Parsimony
CCCAGG CCCAAG-gt CCCAAG CCCAAA-gt
CCCAAA CCCAAA-gt CCCAAC 4 TAXA, 3
changes minimum
Search for shortest tree, the one with the fewest
changes.
18
Likelihood Models
Hypothesis 1 All 3 teams are equally
good. Hypothesis 2 The Yankees are the best
team. Hypothesis 3 The Tigers are the worst team
19
Searching for Trees
20
Tree Search Algorithms
  • Exhaustive
  • VERY INTENSIVE
  • Branch and Bound
  • Compromise
  • Heuristic
  • FAST (usually start with NJ)

21
Evaluating Trees
  • Consensus Tree
  • Randomized Trees
  • Skewness tests
  • Randomized Character Data
  • Permutation tests (permuted by column)
  • Bootstrap, Jackknife
  • resampling techniques
  • Counts how often each clade appears in test data.
  • gt70 probably correct 50 overestimates accuracy

22
Tree Congruence
  • Tree-to-Tree Comparison
  • 2 different characters/same groups
  • Important for evaluating biological hypotheses
  • Example
  • Did lentiviruses diverge within their current
    hosts only?
  • Or did plant pathogenicity has arisen many times
    in fungi?

23
Inferring evolutionary relationships between the
taxa requires rooting the tree
To root a tree mentally, imagine that the tree is
made of string. Grab the string at the root
and tug on it until the ends of the string (the
taxa) fall opposite the root
Unrooted tree
24
Now, try it again with the root at another
position
B
C
Root
Unrooted tree
D
A
A
B
C
D
Rooted tree
Note that in this rooted tree, taxon A is most
closely related to taxon B, and together they are
equally distantly related to taxa C and D.
Root
25
Rooting Trees
  • Molecular Clock
  • Rootmidpoint of longest span
  • Unreliable, often wrong.
  • Evidence
  • select fungus as root for plants, eg
  • long branch attraction can be Extrinsic problem
  • Paralog rooting
  • long branch problems

26
Phylogenetic Software
  • PHYLIP
  • http//evolution.genetics.washington.edu/phylip.ht
    ml
  • http//saf.bio.caltech.edu/www/saf_manuals/phylip/
    phylip.htmlPAUP
  • Pileup, Lineup, Paupsearch, Paupdisplay
  • http//paup.csit.fsu.edu/versions.html
  • MrBayes
  • Bayesian trees
  • http//mrbayes.csit.fsu.edu/
  • Treeview
  • Draw/format phylogenic trees
  • http//darwin.zoology.gla.ac.uk/rpage/treeviewx/

27
Phylogenetic Stories
  • HIV
  • complete genome accessible
  • evolution rapid
  • selection, neutralism?
  • Primate evolution
  • Which primate is the closest relative to modern
    humans?

28
HIV Genome Diversity
  • Error prone (RT) replication
  • High rate of replication
  • 1010 virions/day
  • In vivo selection pressure

And In vivo recombination!
29
HIV tree
ENV
GAG
AIDS 1996, 10S13
Recombinants?
30
Subtype E
ENVA
Bootscanning
AIDS 1996, 10S13
31
Which species are the closest living relatives of
modern humans?
Humans
Gorillas
Chimpanzees
Chimpanzees
Bonobos
Bonobos
Gorillas
Orangutans
Orangutans
Humans
14
0
0
15-30
MYA
MYA
  • Mitochondrial DNA, most nuclear DNA-encoded
    genes, and DNA/DNA hybridization all show that
    bonobos and chimpanzees are related more closely
    to humans than either are to gorillas.

The pre-molecular view was that the great apes
(chimpanzees, gorillas and orangutans) formed a
clade separate from humans, and that humans
diverged from the apes at least 15-30 MYA.
32
Phylogenetic Resources
  • NCBI Taxonomy Browser
  • http//www.ncbi.nlm.nih.gov/Taxonomy/
  • RDP database
  • (Ribosomal Database Project)
  • http//rdp.cme.msu.edu/index.jsp
  • Tree of Life
  • http//tolweb.org/tree/phylogeny.html

33
Practicalities
  • Quality of input alignment critical
  • Examine data from all possible angles
  • distance, parsimony, likelihood, Bayes
  • Outgroup taxon critical
  • problem if outgroup shares a selective property
    with a subset of ingroup
  • Order of input can be problematic
  • Jumble them!
Write a Comment
User Comments (0)
About PowerShow.com