Title: Introduction to phylogeny
1Modified Mincut Supertrees
Roderic Page University of Glasgow
2Tree of Life
- About 1.7 million species described.
- What we have so far
- TreeBASE database (15,000 taxa)
- Ribosomal Database Project (RDP II) (20,000
sequences) - The Tree of Life Project (11,000 taxa)
-
3Recent interest in the Tree of Life
NSF sponsored Tree of Life workshops (2000-2001)
US 10 million to construct a phylogeny for the
1.7 million described species of Life announced
February 15th 2002
Assembling the Tree of Life Science, Relevance,
and Challenges AMNH, New York, May 2002
European initiative (ATOL) under FP6
4Problem how to build the tree of life
- Solutions
- Find one or more magic markers that will allow
us to recover the whole tree in one go (problems
combinability and complexity) - Assemble big tree from many smaller trees derived
from many kinds of data (supertrees)
5Tree terminology
d
a
b
c
leaf
a,b
edge
internal node
a,b,c
cluster
root
a,b,c,d
6Nestings and triplets
d
a
b
c
Nestings
a,b ltT a,b,c,d
b,c ltT a,b,c,d
Triplets
(bc)d
bcd
7Supertree
d
a
b
c
a
b
c
b
c
d
T
T
1
2
supertree
8Some desirable properties of a supertree
method(Steel et al., 2000)
- The supertree can be computed in polynomial time
- A grouping in one or more trees that is not
contradicted by any other tree occurs in the
supertree
91 2 3
MRP (Matrix Representation Parsimony)
Homo sapiens 1 1 1 Pan paniscus 1 1
1 Gorilla gorilla 1 1 0 Pongo
pygmaeus 1 0 0 Hylobates 0 0 0
3
2
1
- NP-hard
- Can generate many solutions
10Aho et al.s algorithm (OneTree)
- Aho, A. V., Sagiv, Y., Syzmanski, T. G., and
Ullman, J. D. 1981. Inferring a tree from lowest
common ancestors with an application to the
optimization of relational expressions. SIAM J.
Comput. 10 405-421. - Input set of rooted trees
- 1. If set is compatible (i.e., will agree on a
tree), output that tree. - 2. If set is not compatible, stop!
11a
b
c
b
c
d
Aho et al.s OneTree algorithm
T
T
1
2
supertree
12Mincut supertrees
- Semple, C., and Steel, M. 2000. A supertree
method for rooted trees. Discrete Appl. Math.
105 147-158. - Modifies OneTree by cutting graph
- Requires rooted trees (no analogue of OneTree for
unrooted trees) - Recursive
- Polynomial time
13a
b
c
d
e
a
b
c
d
T
T
1
2
S
T
,
T
1
2
Semple and Steel (2000)
14Collapsing the graph(Semple and Steel mincut
algorithm)
This edge has maximum weight
b
a,b
1
2
1
c
a
c
1
1
1
d
e
d
e
1
1
max
S
S
/
E
T
,
T
T
,
T
T
,
T
1
2
1
2
1
2
15Cut the graph to get supertree
a,b
a
b
c
d
e
1
c
1
d
e
1
max
S
/
E
T
,
T
T
,
T
1
2
1
2
supertree
16My mincut supertree implementationdarwin.zoology.
gla.ac.uk/rpage/supertree
- Written in C
- Uses GTL (Graph Template Library) to handle
graphs (formerly a free alternative to LEDA) - Finds all mincuts of a graph faster than Semple
and Steels algorithm
17A counter example two input trees...
a
c
b
b
a
c
y
1
x
1
y
2
x
2
y
3
x
y
3
4
18Mincut gives this (strange) result
- Disputed relationships among a, b, and c are
resolved - x1, x2, and x3 collapsed into polytomy
c
x
1
x
2
x
3
b
a
y
1
y
2
y
3
y
4
19ProblemCuts depend on connectivity(in this
example it is a function of tree size)
y4
x3
y1
x2
y2
b
x1
y3
c
a
20So, mincut doesnt work
- But, Semple and Steel said it did
- My program seems to work
- Argh!!! What is happening.?
21What mincut does and does not do
- Mincut supertree is guaranteed to include any
nesting which occurs in all input trees - Makes no claims about nestings which occur in
only some of the trees - Does exactly what it says on the tin
22Modifying mincut supertree
- Can we incorporate more of the information in the
input trees? - Three categories of information
- Unanimous (all trees have that grouping)
- Contradicted (trees explicitly disagree)
- Uncontradicted (some trees have information that
no other tree disagrees with)
23Uncontradicted informationassume we have k input
trees
a and b co-occur in a tree
a and b nested in a tree
n
c
a
b
a
b
c - n 0 ? uncontradicted (if c k then
unanimous)
c - n gt 0 ? contradicted
24Uncontradicted informationassume we have k input
trees
a and b in a fan
a and b co-occur in a tree
a and b nested in a tree
f
n
c
a
b
a
b
a
b
c - n -f 0 ? uncontradicted (if c k then
unanimous)
c - n - f gt 0 ? contradicted
25Classifying edges
S
T
,
T
1
2
y
x
1
1
y
y
1
2
x
x
y
2
1
2
y
y
x
3
4
2
x
3
b
y
b
4
y
x
3
3
a
c
a
c
Uncontradicted
Uncontradicted but adjacent to contradicted
Contradicted
26Modified mincut
- Species a, b, and c form a polytomy
- x1, x2, and x3 resolved as per the input tree
modified
mincut
a
b
c
x
1
x
2
x
3
y
1
y
2
y
3
y
4
27If no tree contradicts an item of information, is
that information always in the supertree?
(23)5
(12)5
(45)1
(34)1
28No!Steel, Dress, Böcker 2000
- The four trees display (12)5, (23)5, (34)1, and
(45)1 - No tree displays (IK)J or (JK)I for any (IJ)K
above - Triplets are uncontradicted, but cannot form a
tree
29Future directions
- Improve handling of uncontradicted information
- Add support for constraints
- Visualising very big trees
- Better integration into phylogeny
- databases (www.treebase.org)
- darwin.zoology.gla.ac.uk/rpage/supertree