break

About This Presentation

Title:

break

Description:

Based on the general heterozygosity: Based on the number of segregating ... The estimation based on general heterozygosity does not use the information from ... – PowerPoint PPT presentation

Number of Views:10

Avg rating:3.0/5.0

Slides: 18

Provided by: TalP2

Category:

more less

Transcript and Presenter's Notes

Title: break

1
Population genetics
break
2

coalesce
To grow together fuse.
To come together so as to form one whole unite
The rebel units coalesced into one army to fight
the invaders.

3
The coalescent
4Nu determines the level of variation under the
neutral model
4
The coalescent
Each two alleles have a common ancestor -gt can be
represented by a tree.
5
The coalescent
T 0
T t1
T t2
T t3
The genealogy of the sample. The alleles might be
the same by state or not.
6
The coalescent
T 0
The total time in the coalescent is
T t1
T t2
T t3
Define Ti to be the time needed to reduce a
coalescent with i alleles to a one with i-1
alleles. Thus, T4t1, T3t2-t1, and T2t3-t2.
Joining these equations we obtain Or in general
for n alleles
7
The coalescent
n alleles
Tc is a function of Npopulation size and
nnumber of alleles in the sample. We can
compute Tc assuming the infinite allele model.
n-1 alleles
Focusing on the last generation. For 2 alleles,
what is the probability that they have different
ancestors in the previous generation?
8
The coalescent
We have n alleles. What is the probability that
they all have different ancestors in the previous
generation?
Assuming N is very big, and thus ignoring terms
in which N2 appears in the denominator, we obtain
9
The coalescent
The probability that n alleles have different
ancestors in the previous generation?
The probability that at least 2 allele out of n
alleles have a common ancestor in the previous
generation? This is the probability of a
coalescent in each generation
10
The coalescent
The probability of a coalescent in a single
generation is
The number of generation till a coalescent is
geometrically distributed with pn(n-1)/4N. Thus,
the expected time till a coalescent event is
1/p4N/n(n-1). In other words
11
The coalescent
From the following two equations, we can obtain
E(Tc)
12
The coalescent adding mutation.
T 0
T t1
T t2
T t3
The n alleles are either the same by states or
not. Each mutation in the history of these
alleles resulted in a segregating site. If there
was one mutation, there is one segregating site.
If there were 2 mutations, there are 2
segregating sites (the infinite allele
model). In general k mutation -gt k segregating
sites.
13
The coalescent
Let u be the mutation rate per generation. Thus,
the total number of mutation in a coalescent is,
on average, uTc, which is
But, this is exactly the expectation of the
number of segregating sites, S
Since S can be estimated from the sample (i.e.,
the number of segregating sites observed) we can
get an estimate of ?.
14
The coalescent
Example Assume 11 sequences, each 768
nucleotides, were sampled and 14 segregating
sites were found. Estimate ? for each allele
(sequence) and for each nucleotide site.
Here, n11 and the sigma equals to 2.929. E(S) is
estimated to be 14, and hence the estimate of ?
is 14/2.929 4.78. Hence 4Nu is estimated to be
4.78, for u which is the allele mutation rate.
4Nu in which u denotes the nucleotide mutation
rate is 4.78/768 0.0062.
15
The coalescent

A few words about the harmonic series
The sum is infinite. Proof
The partial sum converges in the sense that
So the rate of growth of the series is the same
as that of ln(n). For the series to be equal 3,
one needs 10 samples. For the series to be equal
4, one already needs 30 samples.

16
The coalescent

We thus have 2 methods for estimating ?.
Based on the general heterozygosity
Based on the number of segregating sites

17
The coalescent
The estimation based on general heterozygosity
does not use the information from each
site. The contrast between the two formulas
can be used to test the neutral theory (Tajimas
D test).

Write a Comment

User Comments (0)