Introduction to Bioinformatics PowerPoint PPT Presentation

presentation player overlay
1 / 14
About This Presentation
Transcript and Presenter's Notes

Title: Introduction to Bioinformatics


1
Introduction to Bioinformatics
  • Phylogenetics
  • Part II
  • Distance-Based Methods

2
Distance Matrix
  • (Evolutionary) Distance
  • Many possible measures
  • Fraction of sites that differ between two
    sequences
  • of changes needed to convert one sequence to
    another (count mismatches, substitution models,
    )
  • Distance Matrix
  • Matrix of pairwise distances between all
    sequences
  • Used to generate tree
  • Varies with construction method, distance metric

3
Distance in Phylogenetic Tree
  • Distances are ultrametric if
  • Same rate of change on all branches in tree (rare
    in practice)
  • All leaves equidistant from root
  • Also known as a molecular clock
  • Distance matrix must satisfy the following
    3-point condition
  • For any three leaves i, j, k, distances dij, dik,
    djk
  • two of three distances are equal and third

4
Distance in Phylogenetic Tree
  • Distances are additive if
  • Distance between any two leaves i j on tree
    sum of lengths of edges connecting i j
  • Distance matrix must satisfy the following
    4-point condition
  • For any four leaves i, j, k, m, two of the
    distances dijdkm, dikdjm, dimdjk are equal and
    greater than the third
  • In fact, the difference is 2 x the length of the
    bridge edge(s)

5
UPGMA
  • UPGMA (Unweighted Pair Group Method using
    Arithmetic Averages) Sokal Michener 1958
  • Algorithm
  • 1. Find pair of sequences (or clusters) A, B with
    smallest distance dAB
  • 2. Insert join for A, B at tree height ½ dAB.
    A and B thus form a new cluster.
  • 3. Update distance of any other sequence/cluster
    X to new cluster as ½ (dAX dBX)
  • 4. Repeat until all sequences / clusters joined
  • 5. Produces rooted tree
  • Assumptions
  • Distances for tree are ultrametric
  • Branch lengths for 2 leaves same after join
  • Distances for tree are additive

similar algorithms vary at this step
6
UPGMA Example
  • Given sequences
  • Build distance matrix

7
UPGMA Example
  • Form clusters
  • Next step?

8
Transformed Distance Method
  • Weakness of UPGMA
  • Assume constant evolution rate across lineage
  • Example Consider sequences A, B, C, and D is
    Figure 4.5. UPGMA cluster A and C first.
  • Transformed Distance Method J. Farris, 1977
  • Take advantage of the power of an outgroup
  • Similar to UPGMA except for the distance matrix
  • Algorithm
  • Select an outgroup D
  • Transformed distance between i and j
  • dij (dij diD djD)/2 (?dkD)/n
  • where n is ingroups
  • Run UPGMA with matrix of dij

9
Transformed Distance Method
  • Example
  • Select D as the outgroup
  • Calculate transformed distance
  • (?dkD)/n (dAD dBD dCD)/3
  • (12 15 10)/3 37/3
  • dAB (dAB dAD dBD)/2 37/3
  • (9 12 15)/2 37/3 10/3
  • dAC (dAC dAD dCD)/2 37/3
  • (8 12 10)/2 37/3 16/3
  • dBC (dBC dBD dCD)/2 37/3
  • (11 15 10)/2 37/3 16/3
  • Construct new distance matrix
  • Run UPGMA

10
Transformed Distance Method
  • Example (contd)
  • How do you compute the length of a lineage?

11
Neighbor-Joining Method
  • Goal
  • Join closest neighbors (nodes w / same parent) in
    tree
  • Avoids problem with UPGMA when rates of change
    differ
  • Example
  • Closest leaves not neighbors in correct tree, but
    joined first by UPGMA (see previous example)
  • Assumptions
  • Rate of change can differ
  • Branch lengths may differ after join
  • Branch lengths for tree are additive

12
Neighbor-Joining Method
  • Approach
  • To find closest pair of neighbors
  • Reduce branch length for a node by
    (approximately) the average distance of the node
    from all other nodes
  • Find smallest distance between nodes (after
    reduction)
  • Definitions
  • For all pairs of nodes A B in set of all nodes
    L, let
  • dA,B distance between A,B
  • RX ? dX,N where N ? L (total distance from X to
    all N)
  • rX RX / (n 2),where n of nodes
  • (normalized divergence from X to all other nodes)
  • QA,B (n 2) dA,B (RA RB) (rate-corrected
    distance)
  • Key property - 2 nodes w/ minimum Q are always
    neighbors!

13
Neighbor-Joining Method
  • Algorithm Saitou Nei 1987, Studier Keppler
    1988
  • 1. Begin with star tree all sequences as nodes
    in L
  • 2. Find pair of nodes A B ? L with minimum QA,B
  • 3. Create insert new join (node K) w/ branch
    lengths
  • dA,K ½ (dA,B rA rB)
  • dB,K ½ (dA,B rB rA)
  • 4. For remaining nodes C ? L, update distance to
    K as
  • dK,C ½ (dA,C dB,C dA,B)
  • 5. Insert K and remove A, B from L
  • 6. Repeat steps 2-5 until only two nodes left

K
A
B
14
Neighbor-Joining Method
  • Example
Write a Comment
User Comments (0)
About PowerShow.com