Title: break
1Population genetics
break
2- coalesce
- To grow together fuse.
- To come together so as to form one whole unite
The rebel units coalesced into one army to fight
the invaders.
3The coalescent
4Nu determines the level of variation under the
neutral model
4The coalescent
Each two alleles have a common ancestor -gt can be
represented by a tree.
5The coalescent
T 0
T t1
T t2
T t3
The genealogy of the sample. The alleles might be
the same by state or not.
6The coalescent
T 0
The total time in the coalescent is
T t1
T t2
T t3
Define Ti to be the time needed to reduce a
coalescent with i alleles to a one with i-1
alleles. Thus, T4t1, T3t2-t1, and T2t3-t2.
Joining these equations we obtain Or in general
for n alleles
7The coalescent
n alleles
Tc is a function of Npopulation size and
nnumber of alleles in the sample. We can
compute Tc assuming the infinite allele model.
n-1 alleles
Focusing on the last generation. For 2 alleles,
what is the probability that they have different
ancestors in the previous generation?
8The coalescent
We have n alleles. What is the probability that
they all have different ancestors in the previous
generation?
Assuming N is very big, and thus ignoring terms
in which N2 appears in the denominator, we obtain
9The coalescent
The probability that n alleles have different
ancestors in the previous generation?
The probability that at least 2 allele out of n
alleles have a common ancestor in the previous
generation? This is the probability of a
coalescent in each generation
10The coalescent
The probability of a coalescent in a single
generation is
The number of generation till a coalescent is
geometrically distributed with pn(n-1)/4N. Thus,
the expected time till a coalescent event is
1/p4N/n(n-1). In other words
11The coalescent
From the following two equations, we can obtain
E(Tc)
12The coalescent adding mutation.
T 0
T t1
T t2
T t3
The n alleles are either the same by states or
not. Each mutation in the history of these
alleles resulted in a segregating site. If there
was one mutation, there is one segregating site.
If there were 2 mutations, there are 2
segregating sites (the infinite allele
model). In general k mutation -gt k segregating
sites.
13The coalescent
Let u be the mutation rate per generation. Thus,
the total number of mutation in a coalescent is,
on average, uTc, which is
But, this is exactly the expectation of the
number of segregating sites, S
Since S can be estimated from the sample (i.e.,
the number of segregating sites observed) we can
get an estimate of ?.
14The coalescent
Example Assume 11 sequences, each 768
nucleotides, were sampled and 14 segregating
sites were found. Estimate ? for each allele
(sequence) and for each nucleotide site.
Here, n11 and the sigma equals to 2.929. E(S) is
estimated to be 14, and hence the estimate of ?
is 14/2.929 4.78. Hence 4Nu is estimated to be
4.78, for u which is the allele mutation rate.
4Nu in which u denotes the nucleotide mutation
rate is 4.78/768 0.0062.
15The coalescent
- A few words about the harmonic series
- The sum is infinite. Proof
- The partial sum converges in the sense that
- So the rate of growth of the series is the same
as that of ln(n). For the series to be equal 3,
one needs 10 samples. For the series to be equal
4, one already needs 30 samples.
16The coalescent
- We thus have 2 methods for estimating ?.
- Based on the general heterozygosity
- Based on the number of segregating sites
17The coalescent
The estimation based on general heterozygosity
does not use the information from each
site. The contrast between the two formulas
can be used to test the neutral theory (Tajimas
D test).