Title: Recombination based population genomics
1Recombination based population genomics
- Jaume Bertranpetit
- Marta Melé
- Francesc Calafell
Asif Javed Laxmi Parida
2Recall IRiS
- Identification of Recombinations in Sequences
- IRiS
- is a computational method developed with
biological insight - detects evidence of historical recombinations
- minimizes number of recombinations in Ancestral
Recombinational Graph (ARG)
3Recotypes
- Two chromosomes share a recombination if the
junction is co-inherited.
mutation edge
recombinationedge
extantsequence
4Recotypes
- Two chromosomes share a recombination if the
junction is co-inherited.
r1
a
b
5Recotypes
- Two chromosomes share a recombination if the
junction is co-inherited.
r1
r2
a
b
c
6Recotypes
- Two chromosomes share a recombination if the
junction is co-inherited.
r1 r2
a 1 0
b 1 0
c 0 1
r1
r2
a
b
c
7Validity of inferred recombinations
- Comparison with sperm typing
- Computer simulated recombinations
8in vitro
Chr 1 near MS32 minisatellite
Jeffreys et al. 2005 80 UK semen donor of North
European origin - Sperm typing- LDhat and Phase
(200 SNPs)
spermtyping
LDhat Phase
HapMap 2 CEU populationsimilar SNP density
IRiS
9in silico
Chromosomes
- HapMap 3 X chromosome data
- Select 2 chromosomes at random.
- Pick a random breakpoint.
- Create a new chromosome.
- Check if it is unique, add to the dataset.
- Run IRiS on the dataset to see if the breakpoint
is detected.
10in silico
Chromosomes
- HapMap 3 X chromosome data
- Select 2 chromosomes at random.
- Pick a random breakpoint.
- Create a new chromosome.
- Check if it is unique, add to the dataset.
- Run IRiS on the dataset to see if the breakpoint
is detected.
11in silico
Chromosomes
- HapMap 3 X chromosome data
- Select 2 chromosomes at random.
- Pick a random breakpoint.
- Create a new chromosome.
- Check if it is unique, add to the dataset.
- Run IRiS on the dataset to see if the breakpoint
is detected.
12in silico
Chromosomes
- HapMap 3 X chromosome data
- Select 2 chromosomes at random.
- Pick a random breakpoint.
- Create a new chromosome.
- Check if it is unique, add to the dataset.
- Run IRiS on the dataset to see if the breakpoint
is detected.
13in silico
Chromosomes
- HapMap 3 X chromosome data
- Select 2 chromosomes at random.
- Pick a random breakpoint.
- Create a new chromosome.
- Check if it is unique, add to the dataset.
- Run IRiS on the dataset to see if the breakpoint
is detected.
14in silico
Chromosomes
- HapMap 3 X chromosome data
- Select 2 chromosomes at random.
- Pick a random breakpoint.
- Create a new chromosome.
- Check if it is unique, add to the dataset.
- Run IRiS on the dataset to see if the breakpoint
is detected.
IRiS
recombination detected?
15in silico
Chromosomes
- HapMap 3 X chromosome data
- Select 2 chromosomes at random.
- Pick a random breakpoint.
- Create a new chromosome.
- Check if it is unique, add to the dataset.
- Run IRiS on the dataset to see if the breakpoint
is detected.
IRiS
recombination detected?
69 recombinations detectedAll detected
recombinations detect the correct sequenceNo
false positives
16Recombinomics
- Strong population structure
- Agreement with traditional methods
- FST vs. recombinational distance
- More informative than SNPs
- STRUCTURE
17Regions
- 18 regions selected from HapMap 3
- X-chromosome in males (to avoid phasing errors)
- 50 KB away from known CNV and SD(to avoid
genotyping errors) - 50 KB away from genes(to avoid selection)
- at least 80 SNPs
Chromosomes LWK(43), MKK (88), YRI (88), ASW
(42), GIH (42), CHB (40), CHD (21), JPT(25),
MEX(21), CEU (74), TSI (40)
18Analysis
- For each region IRiS inferred recotypes for each
chromosome - 5166 recombinations were inferred
- 3459 co-occurred in at least two chromosomes
Recombination
Chromosome
r1 r2 r3 r4 r5 r6 r3459
LK1 0 1 1 0 0 0 0
LK2 1 0 1 1 0 0 0
LK43 1 0 1 0 0 0
MK1 0 1 0 0 1 1 1
TI40 0 0 0 0 0 1 0
19Analysis
- For each region IRiS inferred recotypes for each
chromosome - 5166 recombinations were inferred
- 3459 co-occurred in at least two chromosomes
Recombination
Chromosome
r1 r2 r3 r4 r5 r6 r3459
LK1 0 1 1 0 0 0 0
LK2 1 0 1 1 0 0 0
LK43 1 0 1 0 0 0
MK1 0 1 0 0 1 1 1
TI40 0 0 0 0 0 1 0
Recotype
20Agreement with LDhat
Each point represents a short haplotype segment
in HapMap CEU population
Spearman correlation 0.711pvalue lt10-30
recombination rate inferred by LDhat
number of recombinations inferred by IRiS
21Agreement with LDhat
Each point represents a short haplotype segment
in HapMap CEU population
Spearman correlation 0.711pvalue lt10-30
recombination rate inferred by LDhat
Correlation in hotspots c2 38.39 pvaluelt6x10-10
number of recombinations inferred by IRiS
22Recombinational distance between populations
- Two populations genetically closer will share a
higher number of recombinations
Recombinational distance Correlation between
FST distance and recombinational distance for
the 18 region 0.35 0.75 with pvalues lt
0.025
RAB
DAB
1 -
RA RB -RAB
MDS All regions combined stress6.1
23PCA of population data
Recall recotypes
r1 r2 r3 r4 r5 r6 r3459
LK1 0 1 1 0 0 0 0
LK2 1 0 1 1 0 0 0
LK43 1 0 1 0 0 0
MK1 0 1 0 0 1 1 1
TI40 0 0 0 0 0 1 0
24PCA of population data
Recall recotypes
r1 r2 r3 r4 r5 r6 r3459
LK1 0 1 1 0 0 0 0
LK2 1 0 1 1 0 0 0
LK43 1 0 1 0 0 0
MK1 0 1 0 0 1 1 1
TI40 0 0 0 0 0 1 0
r1 r2 r3 r4 r5 r6 r3459
LK 14 7 4 9 0 1 0
MK 1 4 7 0 5 7 24
TI 0 1 7 1 0 0 1
25PCA of population data
The first two PCs capture 66.4 of the variance
r1 r2 r3 r4 r5 r6 r3459
LK 14 7 4 9 0 1 0
MK 1 4 7 0 5 7 24
TI 0 1 7 1 0 0 1
26PCA of recotypes
27Recotypes vs. SNPs
- Due to ascertainment bias gene diversity does not
reflect population structure
results similar to Conrad 07
Percentage of variance
SNPs Recotypes
Across groups 9 6
Within groups 4 1
Within populations 87 93
Normalized comparison linearly scaled to 0,1
using 21 samples per population
in agreement with Lewontin 72
28from SNPs to haplotypes to recotypes(a STRUCTURE
comparison)
K2
SNPs
haplotypes
recotypes
29from SNPs to haplotypes to recotypes(a STRUCTURE
comparison)
K3
SNPs
haplotypes
recotypes
30from SNPs to haplotypes to recotypes(a STRUCTURE
comparison)
K4
SNPs
haplotypes
recotypes
31from SNPs to haplotypes to recotypes(a STRUCTURE
comparison)
K5
SNPs
haplotypes
recotypes
32Africa within global genetic variation
Structure k4
minority African specific component
Avg. Number of recombinations in 21 random
chromsomes
Out of Africa hypothesisFounders effect
33Genetic variation within Africa
Structure k5
Maasai specificminor component
- Subsaharan Maasai are distinct among Africans.
- African-American exhibit stronger
recombinational affinity with African populations
than European populations. (Parra 98)
34Genetic variation outside Africa
Structure k5
Avg. Number of recombinations in 21 random
chromsomes
- Outside Africa, Gujarati and Japanese exhibit
the highest and lowest number of recombinations
respectively. - Gujarati Indians show intermediate position
between Europeans and East Asians.
35Venturing outside the X-chromosome
- Benefits
- The bigger picture
- More regions and hence more information
- Challenges
- Higher number of recombinations makes the picture
murkier - Phasing errors
36Regions
- 81 regions selected from HapMap 3
- 50 KB away from known CNV and SD(to avoid
genotyping errors) - 50 KB away from genes(to avoid selection)
- at least 200 SNPs
- 25 samples per population(each sample has
twochromosomes)
37Analysis
- For each region IRiS inferred recotypes for each
chromosome - 34140 recombinations were inferred
- For each sample the two recotypes were merged.
SNPs
recotypes
PCA plots
38Quantifying population structure
- PCA and by k nearest neighbors is used to predict
population of every sample
Perfectly classified
Africans
Non- Africans
classifiedwith errors
MKK
GIH
E. Asian
MEX
European
(0,7)
(3,13)
(8,13)
LKK
(4,3)
CHBCHD
JPT
CEU
TSI
YRI
ASW
Misclassification by (recotypes, SNPs)
39East Asian population
- Recotypes are more informative of underlying
population structure.
SNPs
recotypes
PCA plots
40in conclusion
- Recotypes
- show strong agreement with in silico and in
vetro recombination rates estimates - are highly informative of the underlying
population structure - provide a novel approach to study the
recombinational dynamics