Bioinformatics 1 lecture 12 - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Bioinformatics 1 lecture 12

Description:

Fitch-Margoliash algorithm for calculating the branch lengths ... 3. Adjust the position of the common ancestor node for A and B so that the ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 16
Provided by: chrisby
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics 1 lecture 12


1
Bioinformatics 1 -- lecture 12
More distance-based methods Maximum parsimony An
exercise in tree building
2
Fitch-Margoliash algorithm for calculating the
branch lengths
A
B
C
D
1. Find the most closely-related pair of
sequences, A and B 2. Calculate the average
distance from A to all other sequences, then
from B to all other sequences. 3. Adjust the
position of the common ancestor node for A and B
so that the difference between the averages is
equal to the difference between the A and B
branch lengths, while the sum of the branch
lengths is D(A,B).
A
B
C
D
A
x
x
x
B
C
x
D
B
C
D
A
NOTE the difference between the averages may be
greater than D(A,B), making step 3 impossible.
3
Distance metrics
METRIC DISTANCES between any two or three
taxa (a, b, and c) have the following
properties Property 1 d (a, b)
0 Non-negativity Property 2 d (a, b) d (b,
a) Symmetry Property 3 d (a, b) 0 if and
only if a b Distinctness Property 4 d (a, c)
d (a, b) d (b, c) Triangle inequality
4
ULTRAMETRIC DISTANCES
must satisfy the previous four conditions,
plus Property 5 The distances from any branch
point to the taxa in the clade defined by
that branch point are equal.
If distances are ultrametric, then the sequences
are evolving in a perfectly clock-like manner. So
any two sequences always have the same distance
to their common ancestor.
5
Additivity
ADDITIVE DISTANCES Property 6 Example if
(a,b) are nearest neighbors, d (a, b) d
(c, d) maximum d (a, c) d (b, d), d (a, d)
d (b, c) For distances to fit into an
evolutionary tree, they must be additive.
Estimated distances often fall short of these
criteria, and thus can fail to produce correct
evolutionary trees.
A lineage that goes backwards in time violates
additivity.
A
B
C
D
6
Whats wrong with this tree?
1
2
1
3
1
C
6
B
A
D
7
Whats wrong with these distances?
A
B
C
D
A
3
5
7
B
4
1
C
9
D
8
Maximum parsimony -- it's character-building
Optimality criterion The most-parsimonious
tree is the one that requires the fewest number
of evolutionary events (e.g., nucleotide substitut
ions, amino acid replacements) to explain the
sequences.
T
A ATGGCTATTCTTATAGTACG B ATCGCTAGTCTTATATTACA C TT
CACTAGACCTGTGGTCCA D TTGACCAGACCTGTGGTCCG E TTGACC
AGTTCTCTAGTTCG
T
A
For this column, and this tree, one mutation
event is required.
A
T
9
character-based tree-building
For this other column, the same tree requires two
mutation events. A different tree would require
only one.
T
A ATGGCTATTCTTATAGTACG B ATCGCTAGTCTTATATTACA C TT
CACTAGACCTGTGGTCCA D TTGACCAGACCTGTGGTCCG E TTGACC
AGTTCTCTAGTTCG
T
T
C
C
10
Minimum number of mutations
Given a tree and a set of taxa, one-letter each,
choose optional characters for each ancestor,
starting from the most recent. Choose the most
popular character at the root, then choose not to
mutate if possible.
T/C
T/C/C
11
Trying all trees
A ATGGCTATTCTTATAGTACG B ATCGCTAGTCTTATATTACA C TT
CACTAGACCTGTGGTCCA D TTGACCAGACCTGTGGTCCG E TTGACC
AGTTCTCTAGTTCG
TOTALS
1
1
9
0
31
etc...
0
1
25
0
2
0
2
28
12
  • Optimality criterion The most-parsimonious
    tree is the one that
  • requires the fewest number of evolutionary events
    (e.g., nucleotide
  • substitutions, amino acid replacements) to
    explain the sequences.
  • Advantages
  • Are simple, intuitive, and logical (many
    possible by pencil-and-paper).
  • Can be used on molecular and non-molecular
    (e.g., morphological) data.
  • Can tease apart types of similarity
    (shared-derived, shared-ancestral, homoplasy)
  • Can be used for character (can infer the exact
    substitutions) and rate analysis.
  • Can be used to infer the sequences of the
    extinct (hypothetical) ancestors.
  • Disadvantages
  • Are simple, intuitive, and logical (derived from
    Medieval logic, not statistics!)
  • Can be fooled by high levels of homoplasy
    (i.e.backmutations, parallel mutations).
  • See Stewart (1993) for a simple explanation of
    parsimony analysis, and Swofford
  • et al. (1996) for a detailed explanation of
    various parsimony methods.

13
Outbreak!
You get a phone call. An outbreak has occurred of
a pulmonary disease of unknown origin. 250 cases
have occurred in Rensselaer County and more are
coming in every day. Doctors have identified a
gram negative bacteria as the cause, but the
disease does not respond to any of the common
antibiotics!! Now they need your help. Doctors
have sent 16S ribosomal DNA sequences from 8
patients, some of whom have already died. Answer
these questionsWhat is the organism that
caused the disease? What drugs should be used?
Who caught it from whom? (most likely)
Warning!! There may be sequencing errors!
14
Outbreak
Copy the file "outbreak.rsf, which contains the
16S sequence data sent by the doctors, including
brief labels. Read this file into SeqLab and do
the exercises on the following slide.
15
Outbreak
Use Blast to search the databases to find out
what organism produced the 16S sequences. Use
NCBI/PubMed to find out more about the disease.
What drugs can be used against it? Use ClustalW
to align the sequences and GrowTree to build a
phylogenetic tree. Group the patients according
to sequence distance. Which patients form
clades? Identify possible sequencing errors and
edit them out. Repeat the GrowTree calculation.
Are the branches different now?
Write a Comment
User Comments (0)
About PowerShow.com