Introduction to phylogeny - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to phylogeny

Description:

b. c. Aho et al.'s. OneTree. algorithm. supertree. Mincut supertrees. Semple, C., and Steel, M. 2000. ... max. S. E. T. 1. T. 2. T. 1. T. 2. My mincut supertree ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 51
Provided by: roderic2
Category:

less

Transcript and Presenter's Notes

Title: Introduction to phylogeny


1
Supertrees Algorithms and Databases
Roderic Page University of Glasgow r.page_at_bio.gla
.ac.uk DIMACS Working Group Meeting on
Mathematical and Computational Aspects Related
to the Study of The Tree of Life
2
What do we mean by the Tree of Life
Our perception of what the tree is may affect
what we view as being the interesting problems
or
Supertrees, datatypes, databases, taxonomy
Tree algorithms, models, genomics, lateral gene
transfer
3
Topics
  • Supertrees (MinCut)
  • Phylogenetic databases

4
Tree terminology
d
a
b
c
leaf

a,b

edge
internal node

a,b,c

cluster
root

a,b,c,d

5
Nestings and triplets
d
a
b
c
Nestings
a,b ltT a,b,c,d
b,c ltT a,b,c,d
Triplets
(bc)d
bcd
6
Supertree
d
a
b
c
a
b
c
b
c
d


T
T
1
2
supertree
7
Some desirable properties of a supertree
method(Steel et al., 2000)
  • The supertree can be computed in polynomial time
  • A grouping in one or more trees that is not
    contradicted by any other tree occurs in the
    supertree

8
Aho et al.s algorithm (OneTree)
  • Aho, A. V., Sagiv, Y., Syzmanski, T. G., and
    Ullman, J. D. 1981. Inferring a tree from lowest
    common ancestors with an application to the
    optimization of relational expressions. SIAM J.
    Comput. 10 405-421.
  • Input set of rooted trees
  • 1. If set is compatible (i.e., will agree on a
    tree), output that tree.
  • 2. If set is not compatible, stop!

9
a
b
c
b
c
d
Aho et al.s OneTree algorithm
T
T
1
2
supertree
10
Mincut supertrees
  • Semple, C., and Steel, M. 2000. A supertree
    method for rooted trees. Discrete Appl. Math.
    105 147-158.
  • Modifies OneTree by cutting graph
  • Requires rooted trees (no analogue of OneTree for
    unrooted trees)
  • Recursive
  • Polynomial time

11
a
b
c
d
e
a
b
c
d
T
T
1
2
S

T
,
T

1
2
Semple and Steel (2000)
12
Collapsing the graph(Semple and Steel mincut
algorithm)
This edge has maximum weight
b
a,b
1
2
1
c
a
c
1
1
1
d
e
d
e
1
1
max
S
S
/
E

T
,
T


T
,
T


T
,
T

1
2
1
2
1
2
13
Cut the graph to get supertree
a,b
a
b
c
d
e
1
c
1
d
e
1
max
S
/
E

T
,
T


T
,
T

1
2
1
2
supertree
14
My mincut supertree implementationdarwin.zoology.
gla.ac.uk/rpage/supertree
  • Written in C
  • Uses GTL (Graph Template Library) to handle
    graphs (formerly a free alternative to LEDA)
  • Finds all mincuts of a graph faster than Semple
    and Steels algorithm

15
A counter example two input trees...
a
c
b
b
a
c
y
1
x
1
y
2
x
2
y
3
x
y
3
4
16
Mincut gives this (strange) result
  • Disputed relationships among a, b, and c are
    resolved
  • x1, x2, and x3 collapsed into polytomy

c
x
1
x
2
x
3
b
a
y
1
y
2
y
3
y
4
17
ProblemCuts depend on connectivity(in this
example it is a function of tree size)
y4
x3
y1
x2
y2
b
x1
y3
c
a
18
So, mincut doesnt work
  • But, Semple and Steel said it did
  • My program seems to work
  • Argh!!! What is happening.?

19
What mincut does and does not do
  • Mincut supertree is guaranteed to include any
    nesting which occurs in all input trees
  • Makes no claims about nestings which occur in
    only some of the trees
  • Does exactly what it says on the tin

20
Modifying mincut supertree
  • Can we incorporate more of the information in the
    input trees?
  • Three categories of information
  • Unanimous (all trees have that grouping)
  • Contradicted (trees explicitly disagree)
  • Uncontradicted (some trees have information that
    no other tree disagrees with)

21
Uncontradicted informationassume we have k input
trees
a and b co-occur in a tree
a and b nested in a tree
n
c
a
b
a
b
c - n 0 ? uncontradicted (if c k then
unanimous)
c - n gt 0 ? contradicted
22
Uncontradicted informationassume we have k input
trees
a and b in a fan
a and b co-occur in a tree
a and b nested in a tree
f
n
c
a
b
a
b
a
b
c - n -f 0 ? uncontradicted (if c k then
unanimous)
c - n - f gt 0 ? contradicted
23
Classifying edges
S

T
,
T

1
2
y
x
1
1
y
y
1
2
x
x
y
2
1
2
y
y
x
3
4
2
x
3
b
y
b
4
y
x
3
3
a
c
a
c
Uncontradicted
Uncontradicted but adjacent to contradicted
Contradicted
24
Modified mincut
  • Species a, b, and c form a polytomy
  • x1, x2, and x3 resolved as per the input tree

modified
mincut
a
b
c
x
1
x
2
x
3
y
1
y
2
y
3
y
4
25
If no tree contradicts an item of information, is
that information always in the supertree?
(23)5
(12)5
(45)1
(34)1
26
No!Steel, Dress, Böcker 2000
  • The four trees display (12)5, (23)5, (34)1, and
    (45)1
  • No tree displays (IK)J or (JK)I for any (IJ)K
    above
  • Triplets are uncontradicted, but cannot form a
    tree

27
Future directions for supertrees
  • Improve handling of uncontradicted information
  • Add support for constraints
  • Visualising very big trees
  • Better integration into phylogeny
  • databases (www.treebase.org)
  • darwin.zoology.gla.ac.uk/rpage/supertree

28
Supertree Challenge (proposed by Mike Sanderson
mjsanderson_at_ucdavis.edu)
The TreeBASE database currently contains over
1000 phylogenies with over 11,000 taxa among
them. Many of these trees share taxa with each
other and are therefore candidates for the
construction of composite phylogenies, or
"supertrees", by various algorithms. A
challenging problem is the construction of the
largest and "best" supertree possible from this
database. "Largest" and "best" may represent
conflicting goals, however, because resolution of
a supertree can be easily diminished by addition
of "inappropriate" trees or taxa.
29
Its a scandal
  • We cannot answer even the most basic question
    what is the phylogeny for group x?
  • GenBank is currently the best phylogenetic
    database (!)
  • Can't even say how many species are in a given
    group
  • Little idea of who is doing what

30
(No Transcript)
31
Tree of Lifetolweb.org
  • Provides text and images
  • Relies on extensive manual effort (e.g., writing
    text)
  • Cant do any computations with it
  • Limited research value

32
TreeBASEwww.treebase.org
  • Relational database
  • Query by author, taxon, study number
  • Compute supertrees
  • Submit NEXUS data files

33
TreeBASE
34
TreeBASE and mincut supertrees
  • User selects two or more trees
  • Clicks on button
  • and script on darwin.zoology.gla.ac.uk is
    run to create supertree
  • Can view as PS, PDF, treefile, or in Java applet
    (ATV)

35
Whats wrong with TreeBASE?
  • No consistency of taxon names
  • (e.g., Human, Homo sapiens,
  • Homo sapiens X54666-1)
  • No consistency of data names (e.g., gene names,
    morphological characters, etc.)

36
The same organism may have multiple names
37
www.all-species.org
The ALL Species Foundation is a non-profit
organization dedicated to the complete inventory
of all species of life on Earth within the next
25 years - a human generation.
Press Release November 13, 2002
Starting December 1, the ALL Species Foundation
will close its San Francisco office because of a
lack of funding for the Foundation.
38
The first challenge
  • We need a taxonomic name server that can resolve
    the name of any organism
  • This server needs to reconcile multiple
    classifications (e.g., GenBank, ITIS, etc.)
  • Must handle at least 1 million names, perhaps 100
    million

39
Second Challenge
  • How do we query trees?
  • Trees can be classifications or phylogenies

40
SQL Queries on Trees
  • Oracle SQL Transitive Closure Query (recursion)
  • Nested queries
  • Node path queries

41
1. All ancestors of node A
A
42
2. Least Common Ancestor (LCA) of A and B
A
B
43
3. Spanning Clade of A and B
A
B
44
4. Path Length from A and B
A
B
5
45
(No Transcript)
46
Node paths
/1/1/2
/1/2/2
/1/2/1
/1/1/1/2
/2
/1/1/1/1
/1/2
/1/1/1
/1/1
/1
47
Node paths - selecting subtree
/1/1/2
/1/2/2
/1/2/1
/1/1/1/2
/2
/1/1/1/1
/1/2
/1/1/1
/1/1
/1
SELECT node WHERE (path LIKE /1/1/) AND
(path lt /1/10/)
48
Node paths - selecting subtree
/1/1/2
/1/2/2
/1/2/1
/1/1/1/2
/2
/1/1/1/1
/1/2
/1/1/1
/1/1
/1
SELECT node WHERE (path LIKE /1/1/) AND
(path lt /1/10/) AND (num_children IS 0)
49
Node paths - LCA
/1/1/2
/1/2/2
/1/2/1
/1/1/1/2
/2
/1/1/1/1
/1/2
/1/1/1
/1/1
/1
Common substring starting from left
50
What do we do now?
  • Setup a taxonomic name server (TNS)
  • Develop a phylogenetic genetic database linked to
    TNS, PubMed, GenBank, etc.
  • Develop easy ways to populate database (e.g.,
    from TreeBASE, GenBank, journal databases)
  • Develop standard set of tree queries
  • Deploy
Write a Comment
User Comments (0)
About PowerShow.com