Title: Combinatorics%20
1Combinatorics the Coalescent (26.2.02)
Tree Counting Tree Properties. Basic
Combinatorics. Allele distribution. Polya
Urns Stirling Numbers. Number af ancestral
lineages after time t. Inclusion-Exclusion
Principle.
2A set of realisations (from Felsenstein)
3Binomial Numbers
1
2
3
4
5
n
Binomial Expansion
Special Cases
4n-1
n-r-1
r
n-r
n0 1 1 1
1 1 2 2
1 2 1 4 3 1 3
3 1 8 4 1 4 6 4 1
16 5 1 5 10 10 5 1
32 6 1 6 15 20 15 6 1 64 7
1 7 21 35 35 21 7 1 128 k 0
1 2 3 4 5 6 7
5The Exponential Distribution. The Exponential
Distribution R Expo(a) Density f(t)
ae-at, P(Xgtt) e-at
Properties X Exp(a) Y Exp(b) independent
i. P(Xgtt2Xgtt1) P(Xgtt2-t1) (t2 gt
t1) ii. E(X) 1/a. iii. P(X lt Y)
a/(a b). iv. min(X,Y) Exp (a b).
v. Sums of k iid Xi is G(k,a) distributed
6The Standard Coalescent
Two independent Processes Continuous
Exponential Waiting Times Discrete
Choosing Pairs to Coalesce.
Waiting
Coalescing
1,2,3,4,5
(1,2)--(3,(4,5))
1,23,4,5
1--2
1,234,5
3--(4,5)
123,4,5
4--5
12345
7Tree Counting
Tree Connected undirected graph without cycles.
k nodes (vertices) k-1 edges. Nodes with one
edge are leaves (tips) - the rest are internal.
Labels of internal nodes are permutable without
change of biological interpretation. If labels at
leaves are ignored we have the shape of a tree.
Ignore root branch lengths gives unrooted tree
topology.
If age ordering of internal nodes are retained
this gives the coalescent topology.
Most biological trees are bifurcating. Valency 3
(number of edges touching internal nodes) if made
unrooted. Such unrooted trees have n-2 internal
nodes 2n-3 edges.
8Counting by Bijection
Bijection to a decision series
Nk1k2...kL
1
3
2
N
9Trees Rooted, bifurcating nodes time-ranked.
Recursion Tk Tk-1
Initialisation T1 T21
3 4 5 6 7 8 9 10 15 20
3 18 180 2700 5.7 104 1.5 106 5.7 107 2.5 109 6.9 1018 5.6 1029
10Trees Unrooted valency 3
Recursion Tn (2n-5) Tn-1
Initialisation T1 T2 T31
4 5 6 7 8 9 10 15 20
3 15 105 945 10345 1.4 105 2.0 106 7.9 1012 2.2 1020
11Coalescent versus unrooted tree topologies
4 leaves 3 unrooted trees 18 coalescent
topologies. 1
unrooted tree topology contains 6 coalescent
topologies.
3
1
4
2
4
2
3
4
1
1
1
2
2
3
3
4
4
12Inner outer branches Fu Li (1993)
External (e) versus Internal (?) Branches.
E(e) 2 E(i)
Red - external. Others internal. Except for
green branch, internal-external corresponds to
singlet/non-singlet segregating sites if only one
mutation can happen per position. ACTTGTACGA ACTT
GTACGA ACTTGTACGA TCTTATACGA ACTTATACGA s n
Let li,n be length of ith external branch in an
n-tree. Obviously E(e) nE(ln,i) (any i)
ln-1,j tn Pr 1-2/n Ln,i
tn Pr 2/n
13Probability of hanging Sub-trees. Kingman (1982b)
For a coalescent with n leaves at time 0, with k
ancestors at time t1, let ? be the groups of
leaves of the k subtrees hanging from time t1.
Let l1, l2 .., lk be the number of leaves of
these sub-trees.
Example n8, k3. Classes observed 4, 3, 1
The basal division splits the leaves into (k,n-k)
sets with probability 1/(n-1).
14Nested subsamples (Saunders et al.(1986)
Adv.Appl.Prob.16.471-91.)
Transitions
tt1
i-1,j
2N
i,j
i-1,j-1
i
j
i,j
i , j
1,1
2,1 2,2
3,1 3,2 3,3
4,1 4,2 4,3 4,4
5,1 5,2 5,3 5,4 5,5
6,1 6,2 6,3 6,4 6,5 6,6
7,1 7,2 7,3 7,4 7,5 7,6 7,7
8,1 8,2 8,3 8,4 8,5 8,6 8,7 8,8
9,1 9,2 9,3 9,4 9,5 9,6 9,7 9,8 9,9
t0
2N
i
j
Sub-sample
Sample
Population
15Nested subsamples (Saunders et al.(1986)
Adv.Appl.Prob.16.471-91.)
PrMRCA(sub-sample) MRCA(sample)
PrMRCA(sub-sample) MRCA(population)
16Age of a Mutation Wiuf Donnelly (1999) Wiuf
(2000), Matthews (2000)
-------------------- -------------------
Exp(?)
Exp(1)
17Polya Urns Infinite Allele Model (Donnelly,1986
Hoppe,198487)
The only observation made in the infinite allele
models is identity/non-identity among all pairs
of alleles. I.e. The central observation is a
series of classes and their sizes.
Expected number of mutations in unit interval
(2N) is ?.
This model will give rise to distributions on
partitions of 1,2,..,n like 1,4,72,356.
Since the labelling is arbitrary, only the
information about the size of these groups is
essential for instance represented as 122131.
What is the next event - a duplication of an
exiting type or a introduction of a new allele.
18Classical Polya Urns Feller I.
Let X0 be the initial configuration of the
initial Urn. A step take a random ball the urn
and put it back together with an extra of the
same colour. Xk be the content after the kth
step. Let Yk be the colour of the kth picked
ball.
i. PYk j PY1 j. ii. Sequences Y1 ... Yk
resulting in the same Xk - has the same
probability.
19Labelling, Polya Urns Age of Alleles (Donnelly,1
986 Hoppe,198487)
As they come By size By age
A ball is picked proportionally to its weight.
Ordinary balls have weight 1. If the initial
?-size ball is picked, it is replaced together
with a completely new type. If an ordinary ball
is picked, it is replaced together with a copy of
itself.
An Urn
?
1
2
?
1
1
There is a simple relationship between the
distribution of the alleles labeled with age
ranking is the same as the alleles labeled with
size ranking
20Ewens' formula. (1972 TPB 3.87-112)
P5(2,0,1,0,0) is the probability of seeing 2
singles and one allele in 3 copies in a sample of
5. Obviously, a12a2 iai nann
Pn(a1,a2, ,an)
En(k types)
Pn(a1,a2, ,ank)
k is a minimal sufficient statistic for ???????
the probability of the data conditioned on k is
?-less and there is no simpler such statistic.
21Stirling Numbers
Partitioning into k sets - Stirling Numbers (of
second kind) - Sn,k
k n 1 2 3 4 5 6 7
1 1
2 1 1
3 1 3 1
4 1 7 6 1
5 1 15 25 10 1
6 1 31 90 65 15 1
7 1 63 301 350 140 21 1
B
1
2
5
15
52
193
k unlabelled bins - all non-empty.
877
Bell Numbers - Bn - Partioning into any number of
sets.
Obviously
22Stirling Numbers
n-1 items - k classes ..,..,..,..
(n-1,k-1) ..,..,..,..
n
n
(n,k) ..,..,..,..
Basic Recursion Sn,k kSn-1,k Sn-1,k-1
Initialisation Sn,1 Sn,n 1.
23Ewens' formula - example. (1972 TPB 3.87-112)
Assume
has been observed and that 0.5 mutation is
expected per unit (2N) time.
24Ancestors to Ancestors Griffiths(1980),
Tavaré(1984)
hi,j probability that i individuals has j
ancestors after time t.
ik i(i-1)..(i-k1) i (k)
i(i1)..(ik-1)
Example Disappearance of 7 lineages.
25Y of Ancestors to time t.
- 3 methods of solution
- i.Sum of different independent exponential
distributions
ii. Distribution in markov chain
i-1
j1
j
i
j-1
1
1
iii. Combination of known probabilities a.
Probability that i alleles has i/less ancestors.
b. This probability is the same for all
i-sets c. No coalescence within a set,
implies no coalescence within all
subsets.
263 Ancestors to 2 Ancestors (3/2)(e-t - e-3t)
e-t
1,2,3
?
?
(2,3)
(1,3)
1,2
1,3
2,3
(1,2)
e-3t
(1,2)
e-t
?
(2,3)
e-t
(1,3)
? (e-t - e-3t)/2 Exactly one coalescence3(e-t-
(e-t - e-3t)/2)-e-3t)
Jordans Sieve A1 3e-t
- 2A2 2
((e-t e-3t)/2)
3A3 3 e-3t
27The exclusion-inclusion principle.
Venn Diagrams
I II - I II III 0
IIIIII I II III
- (I,II I,III II,III)
I,II,III
28Exclusion-inclusion Jordans Sieve
Sj j1,..,r the given sets, Ak - sum of
intersection of k sets
Total number
In exactly m sets (Jordans Sieve)
Example the elements above
in 1 sets A1 - 2A2 3A3 - 4 A4 in 2 sets
A2 - 3A3 6 A4 in 3 sets
A3 - 4 A4 in 4 sets
A4 in some set A1 - A2 A3 - A4
(Jordans Sieve)
exclusion-inclusion
29Surviving Lineages
Which probability statements can be made? Let s
be subset of i 1,2,..i and S(s) be the event
that no coalescence has happened to s.
Additionally, if s is a subset of s, then S(s)
implies S(s).
Size number
1,2,..,i
i 1
1,2,..,i-1
2,..,i
1,3..,i
i-1 i
j
1,2
(i-1,i)
2
e-t
e-t
30Surviving Lineages
There are
sets. We want events member of only one of them.
where
Summation is over all k-subsets of 1,..,r and
intersection is between the k sets chosen.
31Pk(t1) hi,k(t1) hk,j(t- t1)/ hi,j(t)
Example 7 --gt 4 lineages.
32Summary
Tree Counting Tree Properties. Basic
Combinatorics. Allele distribution. Polya
Urns Stirling Numbers. Number af ancestral
lineages after time t. Inclusion-Exclusion
Principle.
33Recommended Literature
Bender(1974) Asymmptotic Methods in Enumeraion
Siam Review vol16.4.485- Donnelly (1986)
Theor.Pop.Biol. Ewens (1972) Theor.Pop.Biol.
Ewens (1989) Population Genetics Theory - The
Past and the Future Feller (196871) Probability
Theory and its Applications I II Wiley Fu Li
(1993) Statistical Tests of Neutrality of
Mutations Genetics 133.693-709. Griffiths (1980)
Griffiths Tavaré(1998) The Age of a mutation
on a general coalescent tree. Griffiths
Tavaré(1999) The ages of mutations in gene
trees Griffiths Tavaré(2001) The genealogy of
a neutral mutation Hoppe (1984) Polya-like
urns and the Ewens sampling formula
J.Math.Biol. 20.91-94 Kingman (1982) On the
Genealogy of Large Populations 27-43. Kingman
(1982) The Coalescent Stochastic Processes and
their Applications 13..235-248. Kingman
(1982) Matthews,S.(1999) Times on Trees, and the
Age of an Allele Theor.Pop.Biol. 58.61-75. Möhle
Pitman Schweinsberg Simonsen Churchill
(1997) Saunders et al.(1986) On the genealogy of
nested subsamples from a haploid population
Adv.Apll.Prob. 16.471-91. Tajima (1983)
Evolutionary Relationships of DNA Sequences in
Finite Poulations Genetics 105.437-60. Tavaré
(1984) Line-of-Descent and Genealogical
Processes, and Their Application in Population
Genetics Models. Theor.Pop.Biol. 26.119-164.
Thompson,R. (1998) Ages of mutations on a
coalescent tree Math.Bios. 153.41-61. van Lint
Wilson (1991) A Course in Combinatorics -
Cambridge Wiuf (2000) On the Genealogy of a
Sample of Neutral Rare Alleles. Theor.Pop.Biol.
58.61-75. Wiuf Donnelly (1999) Conditional
Genealogies and the Age of a Mutant. Theor.
Pop.Biol. 56.183-201.