Title: A%20Bayesian%20approach%20to%20inferring%20recent%20selective%20sweeps%20in%20West%20African%20Anopholes%20gambiae%20populations
1A Bayesian approach to inferring recent selective
sweeps in West African Anopholes gambiae
populations
- John Marshall1, Professor Robert Weiss2
1Department of Biomathematics, UCLA School of
Medicine, Los Angeles CA 90095-1766
USA 2Department of Biostatistics, UCLA School of
Public Health, Los Angeles CA 90095-1772 USA
2Using microsatellite alleles to detect recent
selective sweeps
- Microsatellites
- Tandem repeats of short DNA segments typically
1-5 bp in length - Alleles defined by number of repeats at a
particular locus - Multiallelic ? highly informative markers
- Factors affecting variance in microsatellite
allele size - Locus specific
- Microsatellite mutation rate (mainly due to
slippage during DNA replication) - Population specific
- Effective population size
- Population-level events (migration, bottlenecks)
- Population and locus specific
- Hitchhiking of a microsatellite allele to a
selected gene
3The lnRV statistic
- From population genetics, variance in
microsatellite allele size at a given locus (j)
in a given population (i) is a function of
effective population size (Nei) and
microsatellite mutation rate (?j)
- Taking the ratio of expected variances in
microsatellite allele sizes for a pair of
populations (i1 and i2) thus removes the
locus-dependence
- For a pair of populations (i1 and i2) the ratio
of variances for a set of loci (j1,2,,T) can be
calculated
- Using coalescent simulations, the lnRV values
have empirically been shown to follow a normal
distribution.
- A microsatellite near to a selected locus is
expected to have reduced variance and hence to
have an lnRV value that is an outlier from the
otherwise normal distribution of lnRV values
4Pros and cons of the lnRV statistic
- CONS
- Much information is lost when a set of allele
size data at a particular locus for all
individuals in a population is reduced to a
single value - Only makes pair-wise comparisons
- Difficult to extrapolate methodology to gt2
populations - Inferences from pairs of populations are not
carried over to other populations - Masking can occur when multiple outliers expand
the confidence interval and lead to none or only
a subset of outliers being detected
- PROS
- Easy and fast to calculate
- Intuitive to understand
- Can cope with a very large number of loci
- Not sensitive to genetic drift, migration or
inbreeding since these processes affect all loci
to the same extent and so are removed in the
ratio calculation
5The Bayesian model
Distribution of microsatellite allele sizes
Mean components
Variance components
(i indexes population, j indexes locus, k indexes
individual)
6Consistency between lnRV statistic and Bayesian
ANOVA
Bayesian ANOVA
lnRV statistic
Relative selection
7Bayesian statistics for detecting selective sweeps
For a given locus j, the population with the
smallest fractional reduction in allele size
variance is denoted imax and has this
corresponding variance component.
Relative selection at locus j can be measured
relative to population imax, e.g.
- Here BnM has the largest ? value so is least
selected - BnB and SeB have the smallest ? values so are
most selected - The extent of selection can be measured by
- And
8Pros and cons of Bayesian approach
- CONS
- Can take a long time to converge
- Sometimes requires a lot of computer power
- Bayesian methods are more difficult to implement
- Require well-specified prior distributions
- Require programming, use of complicated software
- Inferences are slightly determined by subjective
choice of prior distributions
- PROS
- Doesnt shrink data down to summary statistics
before analysis - Can be used to compare gt2 populations at once
- Inferences from one population are carried over
to all others - Can cope with any number of selected loci without
shielding occurring - Supplies quantitative measures of the probability
that selection has occurred - Can cope well with tiny sample sizes
9Microsatellite data for West African Anopholes
gambiae populations
- 1998 data set
- Allele size data collected at 21 microsatellite
loci dispersed throughout Anopholes gambiae - 5 subpopulations
- Bamako chromosomal form in villages of Banambani
and Selinkenyi - Mopti chromosomal form in villages of Banambani
and Selinkenyi - Savannah chromosomal form in village of Banambani
- 2003 data set
- Microsatellite allele size data collected at 12
microsatellite loci dispersed throughout
Anopholes gambiae chromosome 3 - Data taken for 12 subpopulations
- Mopti chromosomal form in the villages of Oure,
Dire, Kondi, Nampala, Torkya and Banikane - Savannah chromosomal form in the villages of
Oure, Gono, Kokouna, Pimperena, Soulouba and
Madina Diasra
10Loci likely targeted by recent selective sweeps
(1998 data set)
Applying the Bayesian ANOVA model to the 1998
data set, there is evidence of selection (in
order of magnitude) in
025
Locus Chromosome Chromosomal form Location
1 637 2L Bamako Banambani Selinkenyi
2 038 X Savannah Banambani
3 135 2 Mopti Banambani Selinkenyi
4 079 2 Mopti Banambani Selinkenyi
5 175 2 Mopti Banambani Selinkenyi
6 095 2R Mopti Banambani Selinkenyi
7 025 X Savannah Banambani
637
637 /
Locus 637
11Loci likely targeted by recent selective sweeps
(2003 data set)
Applying the Bayesian ANOVA model to the 2003
data set, there is evidence of selection (in
order of magnitude) in
Locus Chromosome Chromosomal form Location
1 119 3R Mopti Oure
2 127 3R Savannah Oure
3 093 3L Mopti Kondi Banikane
3 093 3L Savannah Gono Kokouna
4 812 3 Mopti Nampala Dire
5 817 3L Savannah Soulouba
6 555 3 Savannah Madina Daisra
119
Locus 119
12Implications for recent selection in Anopholes
gambiae genome
- 1998 data set
- Strongest evidence for selection is for
- locus 637 (chromosome 2) in Bamako form
- locus 038 (X chromosome) in Savannah form
- Most selected loci are on chromosome 2
- For a given chromosomal form collected at
Banambani and Selenkenyi, selection seems to be
evident in both locations - The same does not apply for a given location
where multiple chromosomal forms are collected - Suggests there is more gene flow between these
two villages than there is between chromosomal
forms
- 2003 data set
- Strongest evidence for selection is for
- locus 119 (chromosome 3R) in Mopti form in Oure
- Locus 127 (chromosome 3R) in Savannah form in
Oure - Selected loci are dispersed throughout chromosome
3 (only chromosome 3 loci were analyzed in this
data set) - This time there is very little correlation for
given chromosomal forms collected at neighbouring
locations - Possibly selection on chromosome 3 is weaker
(1998 data set showed no selection on chromosome
3)
-093
119
-577
059-