Title: SplitsTree
1SplitsTree
Vincent Moulton The Linnaeus Centre for
Bioinformatics
2Tree of Life.
(M.Sogin D.Patterson, Tree of Life Web
project)
3..or network ?
4Why use networks to analyse evolution?
- Tree can be an inappropriate evolutionary model
- Visualization of complex evolutionary patterns
5Distance methods
6Quartets
Can represent a distance matrix on four taxa
A,B,C,D by weighting the following graph
7Example
8Tree-likeness
9plots (Holland, Huber, Dress, Moulton MBE 2002)
d
10Example
- 42 isolates of yeast C.albicaus
- Distances computed from AFLP bands
- Clonal (tree-like) vs sexual (nontree-like)
- reproduction?
11Quartet-mappings (Strimmer, von Haeseler, PNAS,
1997, Nieselt-Struwe, von Haeseler, MBE 2001)
12Example
13Detecting Recombination
14Highway Plots (Strimmer, Forslund, Holland,
Moulton, Gen. Bio, 2003)
15HIV genome scan
Data from Piyasirisilp et al., Journal Virology,
2000
Generated with VisRD software package, Forslund,
Huson, Moulton, submitted
16How do we get networks from distance matrices?
?
17Splits
c
d
b
a
e
A a,b B c, d, e Denote by A B
18Compatible splits
c
d
b
e
a
a, b c, d, e and a, b, c d, e are
compatible
19Fact (Buneman 1971 and others)
A collection of splits of a set of taxa in which
each pair of splits is compatible corresponds to
a unique tree labeled by the taxa.
20Problem
Given n taxa there are 2(n -1) 1 possible
splits, so how should one find a relevant
collection of (compatible) splits?
21Splits from distances
c
b
3
2
3
1
1
3
1
d
4
a
1/2 (ac bd (ab cd)) 1/2 (11 7
(7 5)) 3
22Split decomposition(Bandelt and Dress, 1992)
A
D
F
C
E
B
G
Isolation index of the split ABCD EFG equals
minimum box side length taken over all boxes
crossing the split
23Isolation index
- Define
- (ab cd) max ac bd, ad bc (ab cd)
- Given a split A B define
- (A B) 1/2 min ?(ab cd) a, b in A and c,
d in B
24Fact (Bandelt/Dress 1991)
Given any distance on a set of taxa, the
collection of splits A B with ? (A B ) gt 0 is
weakly compatible.
25Consequence
The collection of splits that positive isolation
index is not too large and can be efficiently
computed.
26Representing weighted splits by graphs
AB CDEFG, 3 AF BCDEG, 2 CF ABDEG,
1
A,B,C,D,E,F,G
27Split graphs
0.1
0.05
0.2
0.1
0.2
0.1
0.5
0.3
0.1
0.1
0.2
0.2
0.1
d(B,D)
1.35
28Summary
29Middle Earth
30Color circle
31Virus data DEN 1
32Systematic bias
A
D
F
C
E
B
G
33Result loss of resolution
(mitochondrial gene order data from early
branching Eukaryotes using normalized breakpoint
distance)
34Neighbor-Joining (NJ) (Saitou/Nei 1987)
- Happy to give a tree, whatever the data.
- The most widely used distance based phylogenetic
method.
35What do you get if you cross SplitsTree with
Neighbor-Joining?
?
36Neighbor-Netting(Bryant and Moulton, MBE 2003)
37Example
38SplitsTree 3.2
Program for windows written by Daniel
Huson. http//www-ab.informatik.unituebingen.
de/software/splits/welcome_en.html
39SplitsTree 4
All-new implementation of SplitsTree in Java,
developed by Daniel Huson and David Bryant.
http//www-ab.informatik.uni-tuebingen.de/software
/splits/welcome_en.html
40Summary
- SplitsTree provides an easy to use tool for
exploratory data analysis.
- It provides a means to help visualize the
complexity in - phylogenetic data.
- The extent and localization of incompatibilities
within split graphs - may be alerting you to something
interesting about underlying - biological processes.
- SplitsTree can help inform you of the
suitability of your data for - building optimal bifurcating trees.