Gene tree discordance and - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Gene tree discordance and

Description:

Probability of a concordant gene tree topology. Hudson (1983, Evolution) ... Proof (summary): For n 8, induction reduces the problem to the case of n=5, 6, 7, or 8. ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 53
Provided by: noahros
Category:

less

Transcript and Presenter's Notes

Title: Gene tree discordance and


1
Gene tree discordance and multi-species
coalescent models
Noah Rosenberg December 21, 2007
James Degnan
David Bryant
2
Gene trees and species trees
Different genes may produce different inferences
about species relationships
3
T2
T3
Coalescent model for evolution within species,
conditional on the species tree
Hudson (1983, Evolution) Tajima (1983,
Genetics) Nei (1987, Molecular Evolutionary
Genetics book) Pamilo Nei (1988, Molecular
Biology and Evolution) Takahata (1989,
Genetics) Wu (1991, Genetics) Hudson (1992,
Genetics) Maddison (1997, Systematic Biology)
4
T2
T3
Assumptions of the multispecies coalescent model
conditional on a species tree
1. Coalescences occur within species, with the
same rate for each lineage pair.
2. The rate of coalescence is proportional to the
number of pairs of lineages.
3. When species splits are encountered, lineages
from all groups descended from the split are
allowed to coalesce.
5
Takahata and Nei (1985, Genetics) Tavare (1984,
Theoretical Population Biology)
6
Probability of a concordant gene tree topology
Concordant gene tree
Discordant gene tree
1. The probability gene tree is determined in the
2-species phase, or 1-e-T
2. 1/3 of the probability that gene tree is
determined in the ancestral phase, or (1/3)e-T
Hudson (1983, Evolution) Nei (1987, Molecular
Evolutionary Genetics) Tajima (1983, Genetics)
7
Probability of the matching gene tree ((AB)C)
Probability of a particular discordant gene tree
((BC)A)
8
It would be desirable to have a general
computation of the probability that a particular
species tree topology with branch lengths gives
rise to a particular gene tree topology
9
Gene tree probabilities under the multispecies
coalescent model
A coalescent history gives the list of species
tree branches on which gene tree coalescences
occur.
A
B
C
A
B
C
Consider a species tree S (topology and branch
lengths)
Consider a species tree G (topology only)
JH Degnan LA Salter Evolution 59 24-37 (2005)
10
The list of coalescent histories for an example
with five taxa
Gene tree
Species tree
1
2
3
4
A
B
C
D
E
A
C
B
D
E
11
(No Transcript)
12
The number of coalescent histories
13
The number of coalescent histories for the
matching gene tree
14
The number of coalescent histories for trees with
at most 5 taxa
15
Number of coalescent histories for special shapes
with n taxa
16
The number of coalescent histories for up to 11
taxa
17
Ratio of the largest and smallest number of
coalescent histories for n taxa
gt
18
Which types of shapes have the most coalescent
histories?
Most
The number of coalescent histories for trees with
8 taxa
Least
19
Caterpillar-like shapes with n taxa, based on 4-
and 5-taxon subtrees
Cn-1
20
Largest values for caterpillar-like shapes based
on 7 and 8-taxon subtrees
21
Can a non-matching gene tree have more coalescent
histories?
Caterpillar species tree
1430 coalescent histories
1441 coalescent histories
22
Computing the probabilities of gene trees
What are the properties of the number of
coalescent histories?
23
For ngt3 taxa, can species trees be discordant
with the gene trees they are most likely to
produce?
24
The labeled history for a gene tree is its
sequence of coalescence events.
The two labeled histories below produce the same
labeled topology ((AB)(CD))
Randomly joining pairs of lineages leads to a
uniform distribution over the set of possible
labeled histories.
The number of labeled histories possible for four
taxa is
25
If the branch lengths of the species tree are
sufficiently short, coalescences will occur more
anciently than the species tree root.
26
Gene tree frequency distribution
((AB)(CD)) 0.132 ((AC)(BD)) 0.094 ((AD)(BC)) 0.094
(((AB)C)D) 0.125 (((AB)D)C) 0.100 (((AC)B)D) 0.07
0 (((AC)D)B) 0.062 (((AD)B)C) 0.032 (((AD)C)B) 0.0
32 (((BC)A)D) 0.070 (((BC)D)A) 0.062 (((BD)A)C) 0.
032 (((BD)C)A) 0.032 (((CD)A)B) 0.032 (((CD)B)A) 0
.032
Species tree
Matching gene tree
27
Species tree is (((AB)C)D) but most likely gene
tree is ((AB)(CD))
T2
T3
Species tree is (((AB)C)D)
Most likely gene tree is not (((AB)C)D)
A species tree topology produces anomalous gene
trees if branch lengths can be chosen so that the
most likely gene tree topology differs from the
species tree topology.
28
Does the 4-taxon symmetric species tree topology
produce anomalous gene trees?
29
  • 3 species no anomalous gene trees.
  • 4 species asymmetric but not symmetric species
    trees have AGTs.
  • 5 or more species?

Probability of the concordant gene tree
Probability of a particular discordant gene tree
30
With 5 or more species, any species tree topology
produces at least one anomalous gene tree.
For n gt 4, suppose a species tree topology is not
n-maximally probable. If its branches are short
enough, it produces AGTs that are n-maximally
probable.
31
With 5 or more species, any species tree topology
produces at least one anomalous gene tree.
Proof (continued)
Suppose a species tree topology is n-maximally
probable.
For n gt 8 an inductive argument reduces the
problem to the case of n5, 6, 7, or 8.
For n5, 6, 7, or 8 taxa it remains to show that
the n-maximally probable species tree topologies
produce AGTs.
32
With 5 or more species, any species tree topology
produces at least one anomalous gene tree.
Proof (continued)
For n5 the n-maximally probable species tree
topology produces AGTs.
33
With 5 or more species, any species tree topology
produces at least one anomalous gene tree.
Proof (continued)
For n5, 6, 7, or 8 the n-maximally probable
species tree topologies produce AGTs.
34
With 5 or more species, any species tree topology
produces at least one anomalous gene tree.
Proof (continued)
An inductive argument for n gt 8 reduces the
problem to the case of n5, 6, 7, or 8.
For n gt 8 one of the two most basal subtrees has
between 5 and n-1 taxa inclusive.
Choose branch lengths to produce an AGT for that
subtree, and make them long for the other subtree.
35
With 5 or more species, any species tree topology
produces at least one anomalous gene tree.
Proof (summary)
If the species tree topology is not n-maximally
probable, it has maximally probable AGTs.
By example, n-maximally probable species tree
topologies produce AGTs for n5, 6, 7, or 8.
For n gt 8, induction reduces the problem to the
case of n5, 6, 7, or 8.
This completes the proof
36
Some properties of anomalous gene trees
37
Species tree
Gene tree
Anomalous gene trees can have the same unlabeled
shape as the species tree
38
There exist mutually anomalous sets of tree
topologies (wicked forests).
39
T3
T4
T2
AGTs can occur if some but not all species tree
branches are short
40
Does the severity of AGTs increase with more taxa?
Maximal value for shared branch length that
still produces AGTs 0.1568
41
Does the severity of AGTs increase with more taxa?
42
Number of AGTs for the 4-taxon asymmetric species
tree
43
Number of AGTs for 5-taxon species trees
44
Does the number of AGTs increase with more taxa?
45
What implications do gene tree probabilities have
for phylogenetic inference algorithms?
46
  • Most commonly observed gene tree topology

Statistically inconsistent in estimating the
species tree
Species tree
Estimated species tree
47
  • Estimated gene tree of concatenated sequence

Statistically inconsistent in estimating the
species tree
48
  • Maximum likelihood based on the frequency
    distribution of gene tree topologies

Statistically consistent even when anomalous gene
trees exist
Gene tree frequency distribution
Anomalous gene tree
((AB)(CD)) 0.132 ((AC)(BD)) 0.094 ((AD)(BC)) 0.094
(((AB)C)D) 0.125 (((AB)D)C) 0.100 (((AC)B)D) 0.07
0 (((AC)D)B) 0.062 (((AD)B)C) 0.032 (((AD)C)B) 0.0
32 (((BC)A)D) 0.070 (((BC)D)A) 0.062 (((BD)A)C) 0.
032 (((BD)C)A) 0.032 (((CD)A)B) 0.032 (((CD)B)A) 0
.032
Species tree
Matching gene tree
49
  • Consensus among gene tree topologies

-Majority rule consensus -Greedy
consensus -Rooted triple consensus (R)
50
  • Tree obtained by agglomeration using minimum
    pairwise coalescence times across a large number
    of loci (Glass tree)

51
Summary
There exist algorithms for computing gene tree
probabilities on species trees
The number of coalescent histories increases
quickly - algorithmic improvements in gene tree
probability computations are likely possible
A species tree can disagree with the gene tree
that it is most likely to produce
This severe discordance only gets worse with more
taxa
HOWEVER, some algorithms can infer the correct
species tree even when gene tree discordance is
extreme
52
Acknowledgments David Bryant Mike
DeGiorgio James Degnan Randa Tao
National Science Foundation DEB-0716904
Write a Comment
User Comments (0)
About PowerShow.com