G53BIO - Bioinformatics Phylogenetic Trees presentation

About This Presentation

Transcript and Presenter's Notes

Title: G53BIO - Bioinformatics Phylogenetic Trees

1
G53BIO - BioinformaticsPhylogenetic Trees

Dr. Jaume Bacardit
http//www.cs.nott.ac.uk/jqb/G53BIO

Examples from D.A.Krane M.L. Raymers
Fundamental Concepts of Bioinformatics and from
D.W. Mounts Bioinformatics Sequence and Genome
Analysis
2
Outline

Introduction and motivation
Types of trees
Algorithms to construct trees
UPGMA
Fitch-Margoliash
Neighbour-Joining
Sources of information

3
Aims

Phylogeny has the goals of working out the
relationships among species, populations,
individuals or genes (taxa in a general sense)
The results of phylogenetic analysis are usually
presented as a collection of nodes and branches.
That is, a tree
In such tree, taxa that are closely related in an
evolutionary sense appear close to each other,
and taxa that are distantly related are in
different (far) branches of the trees
Phylogenetic trees are also important for
multiple sequence alignment
Various
Types of tree exists
Sources of information to generate the trees
Ways to generate the trees

Trees are usually bifurcating but it is also
possible to have multifurcating trees
Interpretation
At some point in the past an ancestral population
gave rise to more than 2 lineages or
Insufficient/erroneous data impedes the
discrimination of the true nature of the tree
thus coalescing various branches into one
multifurcating one.
Not only the topology of the trees convey
information, also the relative sizes of the
branches
Scaled trees branch length are proportional to
the differences between pairs of neighbouring
nodes.
Additive trees these are scaled trees in which
the physical length of the branches connecting
two nodes is an accurate representation of their
accumulated differences
Unscaled trees only convey kinship information

Phylogenetic Trees can be
Rooted A single node is designated as root and
it represents a common ancestor with a unique
path leading from it through evolutionary time to
any other node
Unrooted tree specifies only the nodes
interrelations but says nothing about the
direction in which evolution occurred.
Roots can be artificially assigned to unrooted
trees by means of an outgroup.
An outgroup is a species that have unambiguously
separated early from the other species being
considered
Example comparing Humas and Gorilas, Baboons
could be used as outgroups and the root would be
placed somewhere along the branch conecting
Baboons to the common ancestors for Humans and
Gorilas.

6
Rooted trees
Unrooted trees
7
(No Transcript)
8
Number of Rooted VS Unrooted Trees

NR (2n -3)!/ 2(n-2) (n 2)!
NU (2n 5)!/ 2(n-3) (n 3)!
But only one of these represents the true turn of
events!
Most phylogenetic trees generated with molecular
data are thus referred to as inferred trees.

9
Unweighted pair group method with arithmetic
meant (UPGMA)

The oldest tree reconstructions method (1960)
Requires a distance matrix, e.g.

E.G. dAB represents the distance between species
A B, while dAC is the distance between taxa A
C, etc
UPGMA
Cluster the two species with the smallest
distance putting then into a single group. Assume
that in the example dAB is the smallest, hence a
new group (AB) is created.
Recalculate the distance matrix with the new
group (AB) against C and D
d(AB)C 0.5 (dACdBC)
d(AB)D 0.5 (dADdBD)
With the new distance matrix repeat 1 until all
species have been grouped.

11
EXAMPLE
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
Fitch-Margoliash Algorithm

Main idea
Sequences are first combined into groups of three
and used to calculated branches length.
Sequences are added progresively
Branch lengths are assumed to be additive
Then join all sequences in pair, assess their
inferred distances and calculate a percentage
squared error
Repeat with different initialisation until
finding a good (small error) tree

16
(No Transcript)
17
(No Transcript)
18
Fitch-Margoliash Algorithm

From the distance matrix find the closest pair,
e.g., A B
Treat the rest of the sequences as a single
composite sequence. Calculate the average
distance from A to all of the other sequences and
B to all of the other sequences
Use these values to calculate the distances a and
b between A and the joining common node to B and
the same for B.
Take A and B as a single composite sequence AB,
calculate the average distances between AB and
each of the other sequences, and make a new
distance table from these values.
Indentify the next pair of most closely related
sequences and proceed as in step 1 to calculate
the next set of branch length.
When necessary substract extended branch lengths
to calculate lengths of intermediate branches.
Repeat the entire procedure starting with all
possible pairs of sequences A and B, A and C, A
and D, etc
Calculate the predicted distances between each
pair of sequences for each tree to find the tree
that best fits the original data

19
(No Transcript)
20
D
a
D and E are the closest sequences
c
A-C
b
E
a 4 b 6 c 29
Now lets recompute the complate distance matrix
21
C
a
b is not just for that segment, it represents the
complete distance from the connecting node to the
leaves
C and DE are the closet sequences
c
A-B
b
DE
a 9 b 10 c 31
Now lets recompute the complate distance matrix
C
9
31
A-B
5
4
D
6
E
22
B
Now we are in thee trivial case of 3 sequences
b
b is not just for that segment, it represents the
complete distance from the connecting node to the
leaves
c
A
a
CDE
a 29.5 b 10 c 12
A
C
9
10
This time we got the perfect tree. However, this
is not always the case. The algorithm should be
repeated with different initial pairings (who are
A and B) and then compare the difference between
the actual and predicted distnaces (from summing
the length of the branches)
20
5
4
12
D
B
6
E
23
Neighbour Joining Algorithm

Similar to Fitch-Margoliash except that sequences
are paired based on the effect of the pairing on
the sum of the branch lengths of the tree.
The general Neighbour Joining algorithm can be
downloaded from ftp.virginia.edu/pub/fasta/GNJ

24
The Algorithm

1. The distances between pair of objects are used
to calculate the sum of the branch length for a
tree that has no preferred pairing of sequences.

Decompose the star-like tree by combining pairs
of sequences. Using the same example as before
this gives

Each possible sequence pair is chosen and the sum
of the branch lengths of the corresponding tree
is calculated. For the example S_AB67.7,
S_BC81, S_CD76, S_DE70 plus six other
possibilities.
Choose the one with the lowest sum, in this case
S_AB.
Once the choice is made calculate the brachn
lengths a,b and the average distance from AB to
CDE using FM method
a d_AB (d_ACd_ADd_AE)/3
(d_BCd_BDd_DE)/3/2
(22 39.7 -41.7)/2
10
b d_AB (d_BCd_BDd_BE)/3
(d_ACd_ADd_AE)/3/2
(22 41.7 39.7)/2
12

6. Like in Fitch-Margoliash method A new
distance table with A and B forming a single
composite sequence is produced and the algorithm
is iterated from the beginning to find the next
sequence pair and the next branch lengths.

28
Sources of information

So far, all methods shown computed the distance
matrix between species from a set of aligned
sequences (DNA or Protein)
There are many more sources of information
Complete genomes
Restriction sites
Non-coding DNA regions

29
Tree of life constructed from all species for
which their complete genome has been sequenced
30
Summary

There are several methods to compute phylogenetic
trees, and sources of information
Need to be familiar with several of them to
appreciate their differences
There are various guiding mechanisms to choose
how to build the trees based on likelihood
functions and information theory
Get familiar with Phylip package as it is a
standard one
Other programs exist

Write a Comment

User Comments (0)

About PowerShow.com

G53BIO - Bioinformatics Phylogenetic Trees PowerPoint PPT Presentation