Jacques'van'Heldenulb'ac'be - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Jacques'van'Heldenulb'ac'be

Description:

A species tree aims at representing the evolutionary relationships between species. ... kitsch (rooted) retree. Neighbor -joining. neighbor. Bootstrapping. seqboot ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 44
Provided by: jacquesv8
Category:

less

Transcript and Presenter's Notes

Title: Jacques'van'Heldenulb'ac'be


1
Phylogeny
  • Bioinformatics

2
Species trees versus molecule tree
  • A species tree aims at representing the
    evolutionary relationships between species.
  • A molecule tree represents the evolutionary
    history of a family of related molecules (genes,
    proteins).
  • Species trees and gene trees are generally
    related ...
  • Species tree can be inferred from various
    criteria, including the history of carefully
    chosen molecules.
  • ... but not identical.
  • A molecular family can contain several copies in
    the same species (in-paralogs), due to gene
    duplications.
  • Some molecules can be transferred horizontally
    between species.
  • Due to combinations of duplications-divergences,
    the tree of a given gene may be inconsistent with
    the species tree.
  • Illustration Figure 7.3 from Zvelebil and Baum.

Source Zvelebil, M.J. and Baum, J.O. (2008)
Understanding Bioinformatics. Garland Science,
New York and London.
3
Tree reconciliation
Source Zvelebil, M.J. and Baum, J.O. (2008)
Understanding Bioinformatics. Garland Science,
New York and London.
4
Concept definitions from Fitch (2000)
  • Discussion about definitions of the paper
  • Fitch, W. M. (2000). Homology a personal view on
    some of the problems. Trends Genet 16, 227-31.
  • Homology
  • Owen (1843).  the same organ under every variety
    of form and function .
  • Fitch (2000). Homology is the relationship of any
    two characters that have descendent, usually with
    divergence, from a common ancestral character.
  • Note character can be a phenotypic trait, or a
    site at a given position of a protein, or a whole
    gene, ...
  • Molecular application two genes are homologous
    if diverge from a common ancestral gene.
  • Analogy relationship of two characters that have
    developed convergently from unrelated ancestor.
  • Cenancestor the most recent common ancestor of
    the taxa under consideration
  • Orthology relationship of any two homologous
    characters whose common ancestor lies in the
    cenancestor of the taxa from which the two
    sequences were obtained.
  • Paralogy Relationship of two characters arising
    from a duplication of the gene for that
    character.
  • Xenology relationship of any two characters
    whose history, since their common ancestor,
    involves interspecies (horizontal) transfer of
    the genetic material for at least one of those
    characters.

Analogy Homology Paralogy Xenology or not
(xeonologs from paralogs) Orthology Xenology or
not
5
Exercise
  • On the basis of Fitchs definitions (previous
    slide), qualify the relationships between each
    pair of genes in the illustrative schema.
  • P paralog
  • O ortholog
  • X xenolog
  • A analog
  • Orthologs can fomally be defined as a pair of
    genes whose last common ancestor occurred
    immediately before a speciation event (ex a1 and
    a2).
  • Paralogs can fomally be defined as a pair of
    genes whose last common ancestor occurred
    immediately before a gene duplication event (ex
    b2 and b2'). Source Zvelebil Baum, 2000

6
Exercise
  • Example B1 versus C1
  • The two sequences (B1 and C1) were obtained from
    taxa B and C, respectively.
  • The cenancestor (blue arrow) is the taxon that
    preceded the second speciation event (Sp2).
  • The common ancestor gene (green dot) coincides
    with the cenancestor
  • -gt B1 and C1 are orthologs
  • Orthologs can fomally be defined as a pair of
    genes whose last common ancestor occurred
    immediately before a speciation event.
  • Paralogs can fomally be defined as a pair of
    genes whose last common ancestor occurred
    immediately before a gene duplication event.
  • Source Zvelebil Baum, 2000

7
Exercise
  • Example B1 versus C2
  • The two sequences (B1 and C2) were obtained from
    taxa B and C, respectively.
  • The common ancestor gene (green dot) is the gene
    that just preceded the duplication Dp1.
  • This common ancestor is much anterior to the
    cenancestor (blue arrow).
  • -gt B1 and C2 are paralogs
  • Orthologs can fomally be defined as a pair of
    genes whose last common ancestor occurred
    immediately before a speciation event.
  • Paralogs can fomally be defined as a pair of
    genes whose last common ancestor occurred
    immediately before a gene duplication event.
  • Source Zvelebil Baum, 2000

8
Solution to the exercise
  • On the basis of Fitchs definitions (previous
    slide), qualify the relationships between each
    pair of genes in the illustrative schema.
  • P paralog
  • O ortholog
  • X xenolog
  • A analog

9
Cladistics, cladograms and clades
  • Cladistics
  • (Greek klados branch) is a branch of biology
    that determines the evolutionary relationships
    between organisms based on derived similarities
    (source Wilkipaedia).
  • Cladogram
  • tree-like drawing, usually with binary
    bifurcations, representing one evolutionary
    scenario about divergences between species or
    sequences.
  • Clade
  • Any sub-tree of a cladogram.
  • Note branch lengths to not reflect evolutionary
    time.

10
Phylogram
  • Phylogram tree-like structure representing an
    evolutionary scenario, and including
  • the events of divergence between species or
    sequences
  • the evolutionary time between each species and
    the divergence events.

11
Molecular clock
  • The "molecular clock" hypothesis (left tree)
    assumes that rates of evolution do not vary
    between branches. All leaf nodes are thus aligned
    vertically.
  • This hypothesis is not always valid
  • in some cases, two genes can diverge from a
    common ancestor, but one of them may have
    diverged faster than the other one. This is a
    rather classical mechanism of evolution a
    duplication creates some redundancy, and one copy
    of the gene will evolve whereas the other one
    retains the initial function.

Ultrametric tree (with clock) (e.g. UPGMA)
Without clock (e.g. neighbour-joining)
12
Phylogenetic inference from sequence comparison
Unaligned sequences
  • Alternative approaches
  • Maximum parsimony
  • Distance
  • Maximum likelihood

Sequence alignment
Aligned sequences
strong similarity ?
many (gt 20) sequences ?
Maximum parsinomy
no
yes
Source Mount (2000)
13
Maximum parsimony
  • For each column of the alignment, all possible
    trees are evaluated and the tree with the
    smallest number of mutations is retained
  • The trees which fit with the highest number of
    columns are retained
  • The program can return several trees

Adapted from Mount (2000)
14
Maximum parsimony example
  • Parsimony tree calculated from a multiple
    alignment of the E.coli proteins containing a
    lacI-type HTH domain
  • Left text representation (protpars output)
  • Bottom right visualized with njplot (in the
    ClustalX distribution)


-----------CYTR_ECOLI ------------------
--------6 ! !
--------EBGR_ECOLI !
-13 !
! -----CSCR_ECOLI !
-12 !
! --IDNR_ECOLI !
--5 !
--GNTR_ECOLI --4 !
!
-----MALI_ECOLI ! !
-10 ! !
! ! --TRER_ECOLI ! !
--------------9 -14 ! ! !
! --YCJW_ECOLI ! !
! ! ! ! !
--------LACI_ECOLI !
--------------8 --2 !
--FRUR_ECOLI ! !
! -------15 ! ! !
! --RAFR_ECOLI ! !
----------11 ! !
! -----ASCG_ECOLI ! !
-----7 --1 !
! --GALS_ECOLI ! !
--3 ! !
--GALR_ECOLI ! !
! -----------------------------------------RBSR_
ECOLI ! -----------------------------------
---------PURR_ECOLI remember this is an
unrooted tree! requires a total of 4095.000
15
Maximum parsimony - drawbacks
  • Number of trees to evaluate increases
    exponentially with the number of sequences.
  • Assumes that all sequences evolved at the same
    rate (molecular clock hypothesis).
  • Only works for well conserved sequence families.

16
Phylogenetic inference from sequence comparison
Unaligned sequences
  • Alternative approaches
  • Maximum parsimony
  • Distance
  • Maximum likelihood

Sequence alignment
Aligned sequences
strong similarity ?
many (gt 20) sequences ?
Maximum parsinomy
no
yes
no
no
clear similarity ?
Distance
yes
Source Mount (2000)
17
Distance method
  • Starting from a multiple alignment, calculate the
    distance between each pair of sequences
  • Calculate a tree which fits as well as possible
    with the distance matrix
  • branch lengths should correspond to distances
  • rooted or unrooted
  • Several methods can be used for calculating a
    tree from the distance matrix.
  • Fitch-Margoliah
  • Neighbour-Joining
  • UPGMA

Aligned sequences
Distance calculation
Distance matrix
Tree calculation
Tree
18
Distance matrix
  • The distance matrix indicates the distance
    between each pair of sequence.
  • The matrix is symmetrical, and the diagonal only
    contains 0s.

19
Trees
Rooted tree
Unrooted tree
Unrooted tree
  • The distance between two nodes is the sum of
    lengths of the branches between them

20
Methods for calculating trees from a distance
matrix
  • It is usually not possible to find a tree whose
    branch length fit with all the values of the
    distance matrix.
  • Several approaches exist to calculate a tree
    which approximates the distances.
  • The Fitch-Margoliah method minimizes the sum of
    squares between distances in the matrix and
    distances in the tree.
  • The Neighbour-Joining (NJ) method minimizes the
    sum of branch lengths for the resulting tree.
    This methods does not assume a molecular clock
    it is thus appropriate when some proteins
    sequences have evolved faster than some other
    ones. It returns an unrooted tree.
  • The Unweighted Pair-Group Method by arithmetic
    Averaging (UPGMA) clusters the sequences by order
    of distance in the distance matrix. This method
    relies on the assumption of evolutionary clock,
    and it produces a rooted tree.

21
Example of phylogenetic tree
  • This tree was obtained with the Neighbour-Joining
    method (implemented in ClustalX).
  • The drawing was obtained with njplot (part of the
    ClustalX package)
  • Each branch of the tree is labelled with the
    distance.

22
Distance-based methods for calculating trees in
the package PHYLIP
  • Summary of the methods for calculating a tree
    from a distance matrix.

23
Bootstrapping
  • In some cases, the data does not allow to infer
    phylogeny
  • To assess the reliability of the inference, one
    can apply the bootstrap method
  • Given an alignment of n sequences and p columns,
    one performs a random selection of p columns,
    with replacement. Some columns can thus be
    selected multiple times, whilst some others are
    not selected at all.
  • Calculate a tree with the sampled columns.
  • Repeat many (e.g. 1000) times, and check whether
    the same branches occur frequently (e.g. gt 70).

24
Phylogenetic inference from sequence comparison
  • Alternative approaches
  • Maximum parsimony
  • Distance
  • Maximum likelihood

Unaligned sequences
Sequence alignment
Aligned sequences
strong similarity ?
many (gt 20) sequences ?
Maximum parsinomy
no
yes
no
no
clear similarity ?
Distance
yes
no
Maximum likelihood
Source Mount (2000)
25
Practicals with phylogeny.fr
26
Phylogeny.fr
  • http//www.phylogeny.fr
  • Offers a user-friendly interface to run all the
    steps for inferring phylogeny from a set of
    unaligned sequences.
  • Completely automated workflow or user-specified
    parameters.
  • Alternative methods for each step of the
    workflow.
  • Results are exported in multiple formats
    (convenient for using them with other programs).
  • Results can be displayed immediately (for fast
    programs) or sent by email (slow programs).

27
Phylogeny.fr sequence input
  • The one click option only requires for you to
    enter a set of sequences, and click on the
    submit button.

28
Phylogeny.fr work flow
  • At each step of the workflow, you can
  • Check the parameters used for the analysis
  • Choose alternative parameters (advanced use)
  • Export the intermediate and final results in a
    variety of formats, which can then be opened in
    other programs.

29
Phylogeny.fr - alignment result
30
Phylogeny.fr - phylogenic tree in text format
31
Phylogeny.fr - Phylogram (various output formats
are supported)
32
Phylogeny.fr - display options
33
Phylogram with an outgroup added (Bacillus) but
not correctly rooted (midpoint grouping)
34
Cladogram incorrectly rooted (midpoint)
35
Phylogram rooted with an outgroup
36
Further reading
37
Further reading
  • Textbooks
  • Zvelebil, M.J. and Baum, J.O. (2008)
    Understanding Bioinformatics. Garland Science,
    New York and London.
  • Mount, M. (2001) Bioinformatics Sequence and
    Genome Analysis. Cold Spring Harbor Laboratory
    Press, New York.
  • Pevzner, J. (2003) Bioinformatics and Functional
    Genomics. Wiley.
  • all his teaching material on http//pevsnerlab.k
    ennedykrieger.org/bioinfo_course.htm

38
Supplementary material
39
PHYLIP flowchart
Distance calculation protdist dnadist
Bootstrapping seqboot
aligned sequences
distance matrix
Parsimony protpars dnapars
Branch-and-bound dnapenny
Maximum likelihood dnaml protml
UPGMA neighbor (rooted)
Fitch-Margoliash fitch (unrooted) kitsch (rooted)
Neighbor -joining neighbor
tree
consense
retree
Tree drawing drawgram
Tree drawing drawtree
drawing of rooted tree
drawing of unrooted tree
40
Taxonomy of bacteria having a gene metA (August
2004)
Bacteria
Bacillales
Bacillaceae
Bacillus
Firmicutes
Clostridia
Clostridiales
Clostridium
Lactococcus
Streptococcus
Brucella
Alpha subdivision
Rhizobiaceae group
Rhizobium
Sinorhizobium
Proteobacteria
Epsilon subdivision
Campylobacter group
Campylobacter
Escherichia
Enterobacteriaceae
Salmonella
Gamma subdivision
Yersinia
Vibrionaceae
Vibrio
Thermotogae
Thermotogae (class)
Thermotogales
Thermogata
41
Tree nomenclature
  • Node
  • Leave
  • Internal branch
  • External branch

42
Alignment methods
Source Zvelebil, M.J. and Baum, J.O. (2008)
Understanding Bioinformatics. Garland Science,
New York and London.
43
Evolutionary model
Source Zvelebil, M.J. and Baum, J.O. (2008)
Understanding Bioinformatics. Garland Science,
New York and London.
Write a Comment
User Comments (0)
About PowerShow.com