Inferring species histories - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Inferring species histories

Description:

Inferring species histories. despite incomplete lineage sorting. L. Lacey Knowles ... (Maddison & Knowles, Syst. Biol. in press) ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 50
Provided by: laceyk
Category:

less

Transcript and Presenter's Notes

Title: Inferring species histories


1
Inferring species histories despite incomplete
lineage sorting
L. Lacey Knowles
Dept. of Ecology Evolutionary
Biology University of Michigan, Ann Arbor, MI
2
Inferring species histories despite incomplete
lineage sorting
Population tree
and related challenges to statistical
phylogeographic inference
L. Lacey Knowles
Dept. of Ecology Evolutionary
Biology University of Michigan, Ann Arbor, MI
3
The history of a species as a whole or
Evolutionary roles of individual genes
sampled genealogies
monophyly of A B
paraphyly of B with respect to A
paraphyly of A with respect to B
polyphyly of A B
Rosenberg 2003
4
sampled genealogies
Modes of species divergence or Identification
of genes involved in reproductive isolation
divergence with NO gene flow
divergence with gene flow
Wu 2001
5
Modes of species divergence or Identification
of genes involved in reproductive isolation
sampled genealogies
speciation genes
Random loci
Ting et al. 2000
6
The history of species as a whole or
Evolutionary roles of individual genes
sampled genealogies
7
sampled genealogies
History of speciation events (i.e., the species
phylogenetic history)
8
(No Transcript)
9
Challenges of inferring evolutionary
relationships for recently diverged species
10
quantitative methods to yield species trees?????
11
incorporate explicit models of lineage sorting
to infer species trees
Incorporation of explicit models of evolutionary
character change (e.g., process of nucleotide
substitution)
12
Even with considerable incomplete lineage sorting
13
Population relationships
(except, higher risk that shared haplotypes might
reflect gene flow rather than relationship or
incomplete lineage sorting)
14
t 100,000 generations Ne 100,000 generations
  • Gene tree may not accurately
  • reflect species tree
  • gene tree shows some relation to
  • the species tree

(2) Is there sufficient information for
reconstructing species tree despite widespread
incomplete lineage sorting
15
Two simple approaches
species tree
gene tree
16
Two simple approaches
  • minimize the number of deep coalescences

species tree
(2) shallowest divergence between species

(after Takahata 1989)
cluster algorithm that groups species by their
most similar contained sequences (the shallowest
divergence)
under the expectation that the number of
nucleotide differences between sequences
corresponds to the order of interspecific
coalescence, the most similar sequences between
species will represent the shallowest coalescence
(Takahata Nei 1985)
gene tree
17
What sampling strategy will increase the
probability of species tree - gene tree
concordance
How the accuracy of a species phylogeny is
effected by
(1) the number of loci used to estimate the
phylogeny
(2) increasing the number of individuals sampled
per locus versus increasing the number of loci
(3) the total sampling effort
18
Effects of sampling strategy on the consistency
probability between species and gene tree differ
depending on the shape of the species tree.
intraspecific coalescence
Takahata 1989
19
IF t1 t2 are small, increasing the number of
gene copies (i.e., individuals) sampled increases
the probability of concordance between gene and
population trees.
Takahata 1989
20
Little effect of increasing the number of gene
copies sampled if
if time between 1st and 2nd split is long
sampled genes
Takahata 1989
21
Little effect of increasing the number of gene
copies sampled if
If t1 is large and t2 small (sampling multiple
independent loci doesnt solve the problem either)
sampled genes
Takahata 1989
22
What information can be extracted by simple
methods
  • minimize the number of deep coalescences

(2) shallowest divergence between species
species tree
consider the process of lineage sorting, but the
actual probabilities of incomplete lineage
sorting are not quantified using a stochastic
model
Likelihood or Bayesian methods that incorporate
stochastic models of both nucleotide substitution
and lineage sorting processes
gene tree
23
shallowest divergence approach
minimize the number of deep coalescences
simulated sequences
simulated gene trees
simulated species trees
reconstructed species trees
reconstructed gene trees
24
General approach
Simulate the processes of lineage sorting
nucleotide substitution
Use the sequence data to attempt to infer the
species tree
Consider how underlying species relationships
might be masked by
the shapes of genealogies (i.e., sorting of
gene lineages by drift)
the stochastic differences in the
number of mutations that accumulate along
different lineages
25
length of sequence, total number of
individuals and loci sequenced, model of
sequence evolution
All simulations and inferences were done using
Mesquite (Maddison Maddison 2002) and PAUP
(Swofford 2002)
26
simulated species trees
  • Goal
  • reasonable spectrum of topologies and branch
    lengths

(rather than choosing a single species tree
assessing how well it can be reconstructed with
many simulation replicates)
27
simulated species trees
simulated gene trees
within each species tree 1, 3, 9 or 27
gene trees representing unlinked
loci simulated independently
with either 1, 3, 9 or 27 gene sequences
simulated for each locus per species
Accuracy affected by
? Increasing total sampling effort per species
(i.e., either 1, 3, 9 or 27 individuals
sequenced, or 1, 3, 9, 27 loci
sequenced)
  • Increasing the number of individuals per locus
  • versus the number of loci per species for
    a given
  • sampling effort

28
simulated sequences
simulated species trees
simulated gene trees
scaling factor of 3 X 10-8
Intraspecific divergence
average of 0.1-1.8
Interspecific divergence
average of 0.9for t1Ne
average of 3.9for t10Ne
1 kb per locus
HKY85, tstv 3, gamma (?0.8)
29
shallowest divergence criteria
infer species tree
minimize the number of deep coalescences
simulated sequences
simulated gene trees
simulated species trees
parsimony
30
accuracy assessment
31
Number of deep coalescences
g
e
n
e
c
o
p
ie
s
p
er
1
N
1
0
N
locus
e
e
1
7
.
6
1
.
8
3
2
8
.
7
6
.
9
9
6
3
.
2
1
4
.
7
2
7
1
1
4
.
4
2
5
.
7
Lots of discord (i.e.,our simulated data should
well reflect the challenges faced by
reconstructing phylogeny near the species level)
32
Accuracy averaged over the 500 replicate
simulated species trees
1

loc
us
3

loci
9

loci
27

loci

1

ge
n
e

co
p
y
0.26 0.34 0.42 0.63 0.27 0.33
0.43 0.61




De
e
p
C
oale
s
c
e
n
ts




S
h
al
l
ow
e
s
t

D
i
verg
e
n
c
e

3

ge
n
e

co
p
ie
s
0.47 0.58 0.65 0.53 0.64 0.73



D
ee
p

C
oal
e
s
c
e
nt
s



S
ha
l
lo
w
e
st

D
iv
e
rge
n
ce
Average accuracy greater as expected

9

ge
n
e

co
p
ie
s
0.59 0.65 0.60 0.74




De
e
p
C
oale
s
c
e
n
ts



S
ha
l
lo
w
e
st

D
iv
e
rge
n
ce

1

ge
n
e

co
p
y
0.76 0.79 0.86 0.89 0.73 0.79
0.85 0.89

27

ge
n
e

co
p
ie
s
0.64 0.56




De
e
p
C
oale
s
c
e
n
ts



D
ee
p

C
oal
e
s
c
e
nt
s




S
h
al
l
ow
e
s
t

D
i
verg
e
n
c
e



S
ha
l
lo
w
e
st

D
iv
e
rge
n
ce
0.79 0.82 0.87 0.78 0.84 0.88

3

ge
n
e

co
p
ie
s



D
ee
p

C
oal
e
s
c
e
nt
s
0.60 is reasonably successful, given that the
shared partition measure is sensitive to minor
changes in tree structure (approximately
equivalent to a single terminal taxon being out
of place)



S
ha
l
lo
w
e
st

D
iv
e
rge
n
ce

9

ge
n
e

co
p
ie
s
0.80 0.85 0.79 0.86




De
e
p
C
oale
s
c
e
n
ts



S
ha
l
lo
w
e
st

D
iv
e
rge
n
ce

27

ge
n
e

co
p
ie
s
0.82 0.84



D
ee
p

C
oal
e
s
c
e
nt
s



S
ha
l
lo
w
e
st

D
iv
e
rge
n
ce
33
How does total sampling effort affect accuracy
(t1Ne)
  • The curve marked random shows the expected
    distribution of the accuracy measure
  • in comparing two randomly simulated trees

34
Tradeoff between sequencing more loci versus more
individuals
10 Ne
accuracy of species tree inference
ratio of loci to individuals in each species
35
In summary
Actual information content relevant to
inferring phylogenic history shifts from inter-
to intra-locus sequence data as amount of
incomplete lineage sorting increases
the many gene lineages that independently reach
as deep as the species divergence (because of
failure to sort) can each provide independent
clues to species relationships (i.e., each
coalescence with a sister species genes provides
extra evidence that the species are sisters)
36
In summary
Actual information content relevant to
inferring phylogenic history shifts from inter-
to intra-locus sequence data as amount of
incomplete lineage sorting increases
Gene trees retain some signal of phylogenetic
history despite widespread incomplete lineage
sorting, but require
Specific methodology that considers the genetic
process that results in incomplete lineage sorting
Careful consideration of the tradeoffs in
sampling design that affect whether species
history will be accurately estimated
37
incorporate stochastic models of both
nucleotide substitution and lineage sorting
(e.g., likelihood or Bayesian methods)
Future directions
thoroughly explore the parameter space
mutation rates????? (How does insufficient
variation degrade accuracy)
38
Gene trees and species trees
gene trees contained within a species tree
Speciation
39
Statistical phylogeography
difficulties with inferring population
relationships
difficulties with inferring population history
model selection (many potential processes can
be considered)
assessing the error in the inferences about
historical process and choice of alternative
hypotheses to be considered
knowlesl_at_umich.edu
40
Methods for parameter estimation are well
developed statistically, but typically pay little
attention to geographical history
Methods that seek to reconstruct phylogeographic
history are able to consider many alternative
geographical scenarios, but are primarily
nonstatistical (i.e., they make inferences about
particular biological processes without explicit
reference to stochastically derived expectations)
species history may not be easily inferred
from a gene genealogy
Nonetheless, many papers published with
interpretations of past events based on a gene
tree/gene networks without any consideration of
assessing the error in the inferences about
historical process
41
Importance of considering the stochasticity of
population genetic processes and assessing the
confidence of phylogeographic conclusions
Highlights some of the challenges without
solution in phylogeography
Illustrate by example with nested-cladistic
analysis (NCA)
(1) Test for an association between patterns of
genetic variation and geography
(2) Make inferences about the processes
underlying the patterns (considers many
biological processes recurrent gene flow and
historical events such as allopatric divergence,
past fragmentation, expansion, long distance
dispersal, colonization with isolation by
distance)
42
Knowles Maddison 2002
simulate under a known history of allopatric
divergence with no gene flow
analyze with NCA and compare accuracy of results
43
Problem of messy gene trees
? one-mutational step as criterion for
constructing nested clades
44
Problem of messy gene trees
Knowles Maddison 2002
nested clades dont reflect populations (or
relationships of populations)
!
!
45
How important is it to consider the stochasticity
of population genetic processes and assess the
confidence of phylogeographic conclusions
Knowles Maddison 2002
Was allopatric divergence inferred by NCA??
46
Knowles Maddison 2002
Statistical phylogeographic approach
Can we reject alternative models of population
history??
Using a summary statistic approach and coalescent
simulations, the history of allopatric divergence
was accepted over a fragmentation and
colonization model only 40 30 of the time,
respectively.
fundamental difference between failures of NCA
and coalescent simulations With the statistical
phylogeographic approach, the conclusion was that
we were unable to distinguish among alternative
phylogeographic hypotheses
47
Statistical phylogeography
difficulties with inferring population
relationships
difficulties with inferring population history
model selection (many potential processes can
be considered)
assessing the error in the inferences about
historical process and choice of alternative
hypotheses to be considered
BUT.NCA is still widely used among empiricists!!
knowlesl_at_umich.edu
48
How to respond to empiricists complaints about
how to select a specific historical model
(much of the appeal of NCA is the notion that it
does not make a prior assumptions about the
species history - i.e., its model free)
Is there really any place for an approach like
Templetons NCA ????
could something like NCA be used to identify
hypotheses to be tested
49
Challenges and future developments in statistical
phylogeography
difficulties with inferring population
relationships
Model selection is no trivial matter
costs associated with increased versatility of
methods that can accommodate a diverse array of
processes may offset any potential gains
with increased model complexity, not only do
more parameters have to be estimated, but the
utility of complex models is also limited by the
extent to which the models expectations differ
an enormous number of alternative histories
could be considered in statistical
phylogeographic tests
Write a Comment
User Comments (0)
About PowerShow.com