Recombination based population genomics - PowerPoint PPT Presentation

About This Presentation
Title:

Recombination based population genomics

Description:

is a computational method developed with biological insight ... extant. sequence. Two chromosomes share a recombination if the junction is co-inherited. Recotypes ... – PowerPoint PPT presentation

Number of Views:156
Avg rating:3.0/5.0
Slides: 41
Provided by: asif6
Category:

less

Transcript and Presenter's Notes

Title: Recombination based population genomics


1
Recombination based population genomics
  • Jaume Bertranpetit
  • Marta Melé
  • Francesc Calafell

Asif Javed Laxmi Parida
2
Recall IRiS
  • Identification of Recombinations in Sequences
  • IRiS
  • is a computational method developed with
    biological insight
  • detects evidence of historical recombinations
  • minimizes number of recombinations in Ancestral
    Recombinational Graph (ARG)

3
Recotypes
  • Two chromosomes share a recombination if the
    junction is co-inherited.

mutation edge
recombinationedge
extantsequence
4
Recotypes
  • Two chromosomes share a recombination if the
    junction is co-inherited.

r1
a
b
5
Recotypes
  • Two chromosomes share a recombination if the
    junction is co-inherited.

r1
r2
a
b
c
6
Recotypes
  • Two chromosomes share a recombination if the
    junction is co-inherited.

r1 r2
a 1 0
b 1 0
c 0 1

r1
r2
a
b
c
7
Validity of inferred recombinations
  • Comparison with sperm typing
  • Computer simulated recombinations

8
in vitro
Chr 1 near MS32 minisatellite
Jeffreys et al. 2005 80 UK semen donor of North
European origin - Sperm typing- LDhat and Phase
(200 SNPs)
spermtyping
LDhat Phase
HapMap 2 CEU populationsimilar SNP density
IRiS
9
in silico
Chromosomes
  • HapMap 3 X chromosome data
  • Select 2 chromosomes at random.
  • Pick a random breakpoint.
  • Create a new chromosome.
  • Check if it is unique, add to the dataset.
  • Run IRiS on the dataset to see if the breakpoint
    is detected.

10
in silico
Chromosomes
  • HapMap 3 X chromosome data
  • Select 2 chromosomes at random.
  • Pick a random breakpoint.
  • Create a new chromosome.
  • Check if it is unique, add to the dataset.
  • Run IRiS on the dataset to see if the breakpoint
    is detected.

11
in silico
Chromosomes
  • HapMap 3 X chromosome data
  • Select 2 chromosomes at random.
  • Pick a random breakpoint.
  • Create a new chromosome.
  • Check if it is unique, add to the dataset.
  • Run IRiS on the dataset to see if the breakpoint
    is detected.

12
in silico
Chromosomes
  • HapMap 3 X chromosome data
  • Select 2 chromosomes at random.
  • Pick a random breakpoint.
  • Create a new chromosome.
  • Check if it is unique, add to the dataset.
  • Run IRiS on the dataset to see if the breakpoint
    is detected.

13
in silico
Chromosomes
  • HapMap 3 X chromosome data
  • Select 2 chromosomes at random.
  • Pick a random breakpoint.
  • Create a new chromosome.
  • Check if it is unique, add to the dataset.
  • Run IRiS on the dataset to see if the breakpoint
    is detected.

14
in silico
Chromosomes
  • HapMap 3 X chromosome data
  • Select 2 chromosomes at random.
  • Pick a random breakpoint.
  • Create a new chromosome.
  • Check if it is unique, add to the dataset.
  • Run IRiS on the dataset to see if the breakpoint
    is detected.

IRiS
recombination detected?
15
in silico
Chromosomes
  • HapMap 3 X chromosome data
  • Select 2 chromosomes at random.
  • Pick a random breakpoint.
  • Create a new chromosome.
  • Check if it is unique, add to the dataset.
  • Run IRiS on the dataset to see if the breakpoint
    is detected.

IRiS
recombination detected?
69 recombinations detectedAll detected
recombinations detect the correct sequenceNo
false positives
16
Recombinomics
  • Strong population structure
  • Agreement with traditional methods
  • FST vs. recombinational distance
  • More informative than SNPs
  • STRUCTURE
  • PCA

17
Regions
  • 18 regions selected from HapMap 3
  • X-chromosome in males (to avoid phasing errors)
  • 50 KB away from known CNV and SD(to avoid
    genotyping errors)
  • 50 KB away from genes(to avoid selection)
  • at least 80 SNPs

Chromosomes LWK(43), MKK (88), YRI (88), ASW
(42), GIH (42), CHB (40), CHD (21), JPT(25),
MEX(21), CEU (74), TSI (40)
18
Analysis
  • For each region IRiS inferred recotypes for each
    chromosome
  • 5166 recombinations were inferred
  • 3459 co-occurred in at least two chromosomes

Recombination
Chromosome
r1 r2 r3 r4 r5 r6 r3459
LK1 0 1 1 0 0 0 0
LK2 1 0 1 1 0 0 0

LK43 1 0 1 0 0 0
MK1 0 1 0 0 1 1 1

TI40 0 0 0 0 0 1 0
19
Analysis
  • For each region IRiS inferred recotypes for each
    chromosome
  • 5166 recombinations were inferred
  • 3459 co-occurred in at least two chromosomes

Recombination
Chromosome
r1 r2 r3 r4 r5 r6 r3459
LK1 0 1 1 0 0 0 0
LK2 1 0 1 1 0 0 0

LK43 1 0 1 0 0 0
MK1 0 1 0 0 1 1 1

TI40 0 0 0 0 0 1 0
Recotype
20
Agreement with LDhat
Each point represents a short haplotype segment
in HapMap CEU population
Spearman correlation 0.711pvalue lt10-30
recombination rate inferred by LDhat
number of recombinations inferred by IRiS
21
Agreement with LDhat
Each point represents a short haplotype segment
in HapMap CEU population
Spearman correlation 0.711pvalue lt10-30
recombination rate inferred by LDhat
Correlation in hotspots c2 38.39 pvaluelt6x10-10
number of recombinations inferred by IRiS
22
Recombinational distance between populations
  • Two populations genetically closer will share a
    higher number of recombinations

Recombinational distance Correlation between
FST distance and recombinational distance for
the 18 region 0.35 0.75 with pvalues lt
0.025
RAB
DAB

1 -
RA RB -RAB
MDS All regions combined stress6.1
23
PCA of population data
Recall recotypes
r1 r2 r3 r4 r5 r6 r3459
LK1 0 1 1 0 0 0 0
LK2 1 0 1 1 0 0 0

LK43 1 0 1 0 0 0
MK1 0 1 0 0 1 1 1

TI40 0 0 0 0 0 1 0
24
PCA of population data
Recall recotypes
r1 r2 r3 r4 r5 r6 r3459
LK1 0 1 1 0 0 0 0
LK2 1 0 1 1 0 0 0

LK43 1 0 1 0 0 0
MK1 0 1 0 0 1 1 1

TI40 0 0 0 0 0 1 0
r1 r2 r3 r4 r5 r6 r3459
LK 14 7 4 9 0 1 0
MK 1 4 7 0 5 7 24

TI 0 1 7 1 0 0 1
25
PCA of population data
The first two PCs capture 66.4 of the variance
r1 r2 r3 r4 r5 r6 r3459
LK 14 7 4 9 0 1 0
MK 1 4 7 0 5 7 24

TI 0 1 7 1 0 0 1
26
PCA of recotypes
  • more on this later

27
Recotypes vs. SNPs
  • Due to ascertainment bias gene diversity does not
    reflect population structure

results similar to Conrad 07
Percentage of variance
SNPs Recotypes
Across groups 9 6
Within groups 4 1
Within populations 87 93
Normalized comparison linearly scaled to 0,1
using 21 samples per population
in agreement with Lewontin 72
28
from SNPs to haplotypes to recotypes(a STRUCTURE
comparison)
K2
SNPs
haplotypes
recotypes
29
from SNPs to haplotypes to recotypes(a STRUCTURE
comparison)
K3
SNPs
haplotypes
recotypes
30
from SNPs to haplotypes to recotypes(a STRUCTURE
comparison)
K4
SNPs
haplotypes
recotypes
31
from SNPs to haplotypes to recotypes(a STRUCTURE
comparison)
K5
SNPs
haplotypes
recotypes
32
Africa within global genetic variation
Structure k4
minority African specific component
Avg. Number of recombinations in 21 random
chromsomes
Out of Africa hypothesisFounders effect
33
Genetic variation within Africa
Structure k5
Maasai specificminor component
  • Subsaharan Maasai are distinct among Africans.
  • African-American exhibit stronger
    recombinational affinity with African populations
    than European populations. (Parra 98)

34
Genetic variation outside Africa
Structure k5
Avg. Number of recombinations in 21 random
chromsomes
  • Outside Africa, Gujarati and Japanese exhibit
    the highest and lowest number of recombinations
    respectively.
  • Gujarati Indians show intermediate position
    between Europeans and East Asians.

35
Venturing outside the X-chromosome
  • Benefits
  • The bigger picture
  • More regions and hence more information
  • Challenges
  • Higher number of recombinations makes the picture
    murkier
  • Phasing errors

36
Regions
  • 81 regions selected from HapMap 3
  • 50 KB away from known CNV and SD(to avoid
    genotyping errors)
  • 50 KB away from genes(to avoid selection)
  • at least 200 SNPs
  • 25 samples per population(each sample has
    twochromosomes)

37
Analysis
  • For each region IRiS inferred recotypes for each
    chromosome
  • 34140 recombinations were inferred
  • For each sample the two recotypes were merged.

SNPs
recotypes
PCA plots
38
Quantifying population structure
  • PCA and by k nearest neighbors is used to predict
    population of every sample

Perfectly classified
Africans
Non- Africans
classifiedwith errors
MKK
GIH
E. Asian
MEX
European
(0,7)
(3,13)
(8,13)
LKK
(4,3)
CHBCHD
JPT
CEU
TSI
YRI
ASW
Misclassification by (recotypes, SNPs)
39
East Asian population
  • Recotypes are more informative of underlying
    population structure.

SNPs
recotypes
PCA plots
40
in conclusion
  • Recotypes
  • show strong agreement with in silico and in
    vetro recombination rates estimates
  • are highly informative of the underlying
    population structure
  • provide a novel approach to study the
    recombinational dynamics
Write a Comment
User Comments (0)
About PowerShow.com