Intro to Phylogenetic Tree Reconstruction - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Intro to Phylogenetic Tree Reconstruction

Description:

2. Set the size of each cluster to 1: . 3. In the output tree T, assign a ... 3. Join clusters i and j to a new cluster (ij) , with a corresponding node in T. ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 41
Provided by: leahHa2
Category:

less

Transcript and Presenter's Notes

Title: Intro to Phylogenetic Tree Reconstruction


1
Intro to Phylogenetic Tree Reconstruction
  • Basics
  • Phylogeny   (evolutionary) relationships between
    any set of species
  • Hypothesis   All organisms on Earth are
    evolutionarily related via a common ancestor
  • Evidence   similarity of many molecular
    mechanisms and genetic materials

2
Intro to Phylogenetic Tree Reconstruction
  • Basics
  • Phylogeny can be represented as a tree.
  • 2 Types of Phylogenetics
  • classic phylogenetics based on morphological
    characters
  • modern phylogenetics based on information
    extracted from sequence data (DNA, RNA and
    proteins)
  • based on characters sites on sequences
  • Assumption
  • Sequences have descended from common ancestral
    genes/species,but difficult to distinguish
    orthologues from paralogues
  • Phylogenetic tree of a group of sequences does
    not necessarily represent the true phylogenetic
    tree of host species

3
Intro to Phylogenetic Tree Reconstruction
  • Phylogenetic Trees
  • leaves species
  • internal nodes (hypothetical) ancestors
  • nodes species or character values (states)
  • edges evolutionary relationships between nodes
  • edge lengths evolutionary distance between
    nodes (evolutionary time)
  • restrict ourselves to binary trees only
  • ok, as we can use distances of 0
  • rooted vs. unrooted trees
  • root represents the ultimate ancestor of the
    group of sequences(includes hierarchy)

4
Intro to Phylogenetic Tree Reconstruction
  • Phylogenetic Tree Reconstruction (Inference)
    Problem
  • Given
  • n species
  • m characters
  • for each species, values for all characters
  • Want fully labelled phylogenetic tree that
    'best' explains the given data (i.e. maximize a
    target function (score) )
  • Assumptions
  • characters are mutually independent
  • after two species diverged, their further
    evolution is independent of each other
  • Simple Solution check them all out and pick the
    best one
  • problem too many possibilities to check
  • n species -gt (2n-3)!! different rooted trees
  • n 20 -gt 1021 trees

5
Intro to Phylogenetic Tree Reconstruction
  • Distance-Based Algorithms
  • Idea
  • begin with a set of distances di between each
    pair i,j of seq.
  • find the tree that predicts the observed sequence
    data as accurately as possible
  • How to find the tree
  • 1. general idea given pairwise distance dij and
    tree T predicting pairwise distance dij', find
    the T that minimizes SSQ(T) gt Least Squares
    Methodbut NP-complete

6
Intro to Phylogenetic Tree Reconstruction
  • Distance-Based Algorithms
  • 2. Clustering UPGMA (Unweighted Pair Group
    Method Using Arithmetic Averages)
  • Idea cluster sequences at each stage, merge two
    groups and create a new node in the tree
  • build the tree bottom up from the leaves
  • result rooted tree with molecular clock property
    (MCP)
  • 11 correspondence between distance and
    evolutionary time
  • not always true in reality some sequences evolve
    faster
  • If 'true' tree doesn't have MCP, UPGMA will give
    incorrect results

7
Intro to Phylogenetic Tree Reconstruction
  • Distance-Based Algorithms
  • 3. Clustering Neighbor Joining
  • guarantees to generate correct tree in polynomial
    time if distance is additive
  • (weaker than MCP, so more reasonable still, not
    always true)

8
  • Phylogenetics and Phylogenetic Trees

9
Why do a phylogenetic analysis
  • important for deciphering relationships in gene
    function and protein structure and function in
    different organisms
  • helps to utilize genetic information of a model
    organism to analyze a second organism
  • helps to sort out gene family relationships
  • valuable tool for tracing the evolutionary
    history of genes

10
Performing a phylogenetic analysis
  • Start with reasonable multiple sequence
    alignment
  • Examine either sequence variation in each column
    or no. of differences between each pair of
    sequences
  • Produce a tree representation of the sequences
    based on similarity/differences

11
Methods of evaluating sequence relationships
  • sequence A ERKSIQDLFQSFTLFERRLLIEF
  • sequence B ERLSISELIGSLRLYERRLIIEY
  • sequence C DRKSISDLIGSLRLA---LLIEF
  • sequence D DRK---DLISSLRKA---LLIEW
  • 1. Account for all column variations
  • A,B and C,D form similar groups
    based on col. 1
  • A,C,D based on col. 3
  • 2. Count differences between sequences
  • A,B 17/23 similar, 6/23 different
  • C,D 21/23 similar, 2/23 different
  • C and D are very closely related by either method

12
What is a tree?
  • a graphical representation of the sequence
    similarities among a group of nucleic acid or
    protein sequences

13
?????
  • ?????????? ????????? ( Molecular Phylogenetics
    )- ??? ??? ????? ?????? ???????????? ???
    ??????????? ?? ???? ??? ????? ????? ?? ????????
    ????????? ?????? ?????????.
  • ????? phylogeny ?????? ?????? ??? ????? ???
    ????? ???' ?? ???????? ( Phylogenetic Tree - PT
    ).

14
??????
  • ????? ??? ?? taxonomic units , ?''? ??????
    ???? ?????? ?????? ???????? ?? ??????????.
  • ????? ???????? ?? operational taxonomic units
    (OTUs)???????? ?? ??????????? ?? ?? ????? (???? ,
    ?? ????? ???).
  • ????? ??????? ??????? ?? ??????????? ?? ?? ?????
    ?? ???????? ????? ????? ?????? ??????????? (??
    ????) ???????? ???' ??????? ????????.
  • ????? ??? ???????? ?? ?????? ???? ???' ????? ???
    ?- scaled ?- unscaled edges . ?''? ???? ???? ????
    ????? ???? ?????? ( scaled ) ?? ??.

15
????? 1
Two alternative representations of a phylogenetic
tree for fife OTUs . (a) Unscaled branches
extant OTUs are lined up and nodes are positioned
proportionally to times of divergence. (b) Scaled
branches lengths of branches are proportional to
the numbers of molecular changes.
16
Example 1
  • Two alternative representations of a phylogenetic
    tree for fife OTUs .
  • (a) Unscaled branches extant OTUs are lined up
    and nodes are positioned proportionally to times
    of divergence.
  • (b) Scaled branches lengths of branches are
    proportional to the numbers of molecular changes.

17
?????? - ????
  • additive tree ??? ?? ?? ?????? ????? ??
    ?"????" ??? ????/?????????? ???????? ???, ???? ,
    ?- (b) ?? ???? ?????, ????? ????? ??? A ?- B ???
    213.
  • ??? ???? ????? ??? ???? ????? ( rooted tree ) ??
    ??? ???? ( unrooted tree ) .

18
????? 2
  • Rooted and
  • unrooted phylogenetic trees .
  • Arrows indicate the unique path leading from the
    root (R) to OTU D .

19
?????? - ????
  • additive tree species tree ??? ?? ?????? ????.
  • orthologous genes ???? ????? ?????
    ??????????? ?????.
  • paralogous genes ???? ????? ????? ????? ????????
  • homologous genes ???? ????? ???? ????.
  • clade ( monophyletic group ) ???? ?????
    ???????? ??? ?????? ?? ???? ???? ??.

20
????? 3
Phylogenetic tree of birds , reptiles , and
mammals . The reptiles does constitute a natural
clade since they share ancestors with the birds ,
which are included in the Reptilia . Birds
and crocodiles , on the other hand , constitute a
clade ( Archosauria ) since they
share a common ancestor ( black box ) not shared
any other organism.
21
Phylogenetic Prediction
  • A phylogenetic analysis of a family of related
    nucleic acid or protein sequences is a
    determination of how the family might have been
    derived during evolution.
  • The evolutionary relationships among the
    sequences are depicted by placing the sequences
    as outer branches on a tree.
  • The branching relationships on the inner part of
    the tree then reflect the degree to which
    different sequences are related.

22
Phylogenetic Prediction
  • Two sequences that are very much alike will be
    located as neighboring outside branches and will
    be joined by a common branch beneath them.
  • The object of phylogenetic analysis is to
    discover all of the branching relationships in
    the tree and the branch lengths.
  • The chapter 6 of David Mounts Bioinformatics
    presents procedures for phylogenetic analysis,
    with an emphasis on the complexity of the problem
    and advice for solving difficult analyses.

23
Relationship of Phylogenetic Analysis to
Sequence Alignment
  • The commonest method of multiple sequence
    alignment (CLUSTALW) is the progressive alignment
    method.
  • The progress is supposed to represent a reliable
    history of the evolutionary changes that have
    occurred.
  • A sequence alignment reveals which positions in
    the sequences were conserved and which diverged
    from a common ancestor sequence, as illustrated
    in the next slide.

24
Sequence similarity
  • Origin of similar sequences.
  • Sequences 1 and 2 are each assumed to be derived
    from a common ancestor sequence. Some of the
    ancestor sequence can be inferred from conserved
    positions in the two sequences.
  • For positions that vary, there are two possible
    choices at these sites in the ancestor.

25
  • Clusterization

26
Methods of tree reconstruction
  • ???? ??? ????? ?????? ??
  • ? Distance Matrix Method ( DMM )
  • ? Maximum Parsimony Methods
  • Maximum Likelihood Methods ?
  • Method of Invariants ?
  • Mount pp248-254

27
Maximum Parsimony Method
  • This method predicts the evolutionary tree that
    minimizes the number of steps required to
    generate the observed variation in the sequences.
  • A multiple sequence alignment is required to
    predict variation.
  • For each aligned position, PT that require the
    smallest number of changes are identified.
  • This method is used for sequences that are quite
    similar and for small number of sequences.
  • One or more unrooted trees are predicted.

28
Maximum Parsimony Method
  • The main programs are in the Phylip package
  • 1. DNAPARS treats gaps as a fifth nucleotide
    state.
  • 2. DNAPENNY branch and bound search
  • 3. DNACOMP
  • and so on and so fore

29
Methods of tree reconstruction
  • ???? ??? ????? ?????? ??
  • ? Distance Matrix Method ( DMM )
  • ? Maximum Parsimony Methods
  • Maximum Likelihood Methods ?
  • Method of Invariants ?
  • ??????? ?????? ???? ????? ?"????" ??? ??????
    (???????).
  • ?????? ????? ??? ?????? ???? ???????? ??????.
  • ?????? ??? ????? ?? ?????? ????? ??? ????
    (??????) ?? ??????? ??? ????? / ?????????? ??????.

30
Least Squares Method
  • ????? , ???? ????? NP-complete, ????????? ???? ??
    ??? ?????????? UPGMA
  • ( Unweighted Pair Group Method with Arithmetic
    mean ) .
  • Input ?? Dij ?? distance matrix,
  • ?? Wij ?? weights
  • Find the tree T that minimizes SSQ(T)
  • n
  • SSQ(T) ??wij (Dij dij)²
  • i1 i?j

31
UPGMA Unweighted Pair Group Method with
Arithmetic mean
  • Initialization
  • 1. Initialize n clusters with the given species,
    one species per cluster.
  • 2. Set the size of each cluster to 1 .
  • 3. In the output tree T, assign a leaf for each
    species.

32
UPGMA Unweighted Pair Group Method with
Arithmetic mean
  • Iteration
  • 1. Find the i and j that have the smallest
    distance Dij.
  • 2. Create a new cluster - (ij), which has n(ij)
    ni nj members.
  • 3. Connect i and j on the tree to a new node,
    which corresponds to the new cluster (ij), and
    give the two branches connecting i and j to (ij)
    length each.

33
UPGMA Unweighted Pair Group Method with
Arithmetic mean
  • Iteration
  • 4. Compute the distance from the new cluster to
    all other clusters (except for i and j, which are
    no longer relevant) as a weighted average of the
    distances from its components
  • 5. Delete the columns and rows in D that
    correspond to clusters i and j, and add a column
    and row for cluster (ij), with D(ij),k computed
    as above.
  • 6. Return to 1 until there is only one cluster
    left.

34
UPGMA Unweighted Pair Group Method with
Arithmetic mean
  • Complexity
  • The time and space complexity of UPGMA is O(n2),
    since there are n-1 iterations, with O(n) work in
    each one.

35
????? 4
(a) The true phylogenetic tree. (b) The
erroneous phylogenetic tree reconstructed by
using the UPGMA method , which does not take
into account the possibility of unequal
substitution rates along different branches .
36
Neighbor Joining (NJ)
  • ????????? NJ ???? ???? ????? ?????? ??? ??????
    ?? Least Square Method.
  • ???? ??? ???? ?? ???? ???? ????????. ?????? ???
    ???? ?? ?- clusters ??? ???? ?????? ??? ???? , ??
    ?????? ???? ???????? .
  • ??? ??????? ????????? ???? ????? ?? ?- ancestor
    ????? ?? ??? species ???.
  • ??? ????? i????? ui ???? ???? ?????? ??? ?????
    ???' ?????


  • Dik


    ? ui ----------

  • k?i (n-2)
  • ?? ??? ?????? ?? ???????? ?? ???? ????? ??????
    (minimum-evolution criterion), ???????? i ?- j
    ?????? ?- cluster ???? ??? ?? ????????? ?? ???
    ?????? ???? ??
  • Dij ui uj ??? ???? ?????. ??????? dk,(ij) ??
    ????? ????? ??????? ???' ????? ?? ???? ?????
    ???????? .

37
Neighbor Joining (NJ)
  • Initialization same as in UPGMA
  • 1. Initialize n clusters with the given species,
    one species per cluster.
  • 2. Set the size of each cluster to 1 .
  • 3. In the output tree T, assign a leaf for each
    species.

38
Neighbor Joining (NJ)
  • Iteration

  • Dik 1. For each species ,
    compute ui ? ------

  • k?i (n-2)
  • 2. Choose the i and j for which Dij ui uj
    is smallest .
  • 3. Join clusters i and j to a new cluster
    (ij) , with a corresponding node in T .
  • Calculate the branch lengths from i and j to
    the new node as
  • di,(ij) ½ Dij ½(ui uj)
    , dj,(ij) ½ Dij ½(ui uj).
  • 4. Compute the distance between the new cluster
    and each other cluster
  • Dik Djk
    - Dij
  • D(ij),k
    --------------------

  • 2

39
Neighbor Joining (NJ)
  • Iteration
  • 5. Delete clusters i and j from the tables, and
    replace them by (ij).
  • 6. If more than two nodes ( clusters ) remain ,
    go back to 1. Otherwise , connect the two
    remaining nodes by a branch of length Dij .
  • ??? ????? ?? NJ ??? O(n²) , ??? UPGMA.

40
Feng-Doolittle algorithm
  • Sequence-sequence alignments usual pairwise.
  • Sequence-group the highest scoring pairwise
    alignments determines the s-g alignment.
  • Group-group again the best pairwise sequence
    alignment among all pairs.
  • After an alignment is completed, gap symbols are
    replaced with a neutral X character.
Write a Comment
User Comments (0)
About PowerShow.com