Handling stratification in GWAS - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Handling stratification in GWAS

Description:

... where each cluster has a particular allele frequency profile ... Estimate cluster allele frequencies and probabilities of cluster membership for samples ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 30
Provided by: mathUniv
Category:

less

Transcript and Presenter's Notes

Title: Handling stratification in GWAS


1
Handling stratification in GWAS
  • Simon Heath
  • CNG, Institut de Génomique, CEA

2
Cancer GWAS
  • For common cancers, very large cohorts are
    required
  • Typically involve multiple cohorts from different
    countries
  • Pool collections
  • Acquire international funding

3
Identification of genetic predisposition to lung
cancer
1989 lung cancer cases 2625 hospital matched
controls from multiple European countries
(IARC) 310,023 SNPs Genome-wide significance
plt5x10-7
P 10-10
Hung et al. (2008) Nature vol. 452 (7187) pp.
633-7
4
Cancer association region
5
Samples
  • Cases and controls collected from Hungary, Czech
    Republic, Slovakia, Poland, Romania, Russia and
    Norway
  • Initial analysis carried out within each country
  • No differences in effect of chr. 15 locus
    detected between countries - samples pooled for
    final analysis

6
Ongoing cancer studies at the CNG
  • Breast, Kidney, Lung, Head neck, Melanoma
  • All involve multi-national cohorts
  • Several of the studies share control cohorts

7
Shared Controls
  • Increasingly common to use control samples in
    multiple studies
  • Make use of previously genotyped samples - cost
    and time savings
  • Combination of multi-national cohorts and shared
    control samples makes stratification an important
    issue

8
Stratify by country
  • Requires accurate information on origins
  • Can not account for differences within a country
    or for admixture
  • Loss of power

9
Predicting origins
  • Genome wide genotype data provides a lot of
    information on the relationships between
    individuals
  • Given a set of samples of different origins,
    should be able to predict the most likely origin
    of an unknown sample

10
Bayesian clustering methods
  • Define a set of clusters, where each cluster has
    a particular allele frequency profile
  • Estimate cluster allele frequencies and
    probabilities of cluster membership for samples
  • Known samples can be used to seed the clusters

11
PCA based methods
  • Perform PCA on individual allele frequencies for
    markers across the genome
  • Select PC that separate samples of known origin
    from each other
  • Use samples of known origin to predict origins of
    unknown samples

12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
(No Transcript)
18
(No Transcript)
19
(No Transcript)
20
Correction for PCs
  • Rather than stratifying on country of origin
    (either given or predicted), can correct for the
    larger PCs in the analysis
  • More flexible allows for overlaps between
    countries and differences within a country
  • EIGENSTRAT

21
Matching strategies
  • Match new cases to pre-existing controls
  • Use all controls and correct for PCs
  • Select controls that cluster together with cases
    and correct for PCs
  • Select matching cases and controls based on the
    PCA perform paired analysis

22
Simulation studyE Genin M-C Babron
  • 6000 control samples from 13 European countries,
    all typed on Infinium 300k
  • Simulated case-control status under 3 scenarios
    with different genetic models and disease
    gradients within Europe
  • Disease incidences in each country modelled on
    Kidney, Melanoma and Breast cancers

23
Disease scenarios
  • Single disease susceptibility (DS) locus in each
    scenario
  • Model 1 - SNP close to HERC2
  • Model 2 - SNP close to LCT
  • Model 3 - SNP close to CBFA2T3

24
Random selection
25
Random selection
26
PCA based selection
27
PCA based selection
28
Conclusions
  • Stratification, even within Caucasian
    populations, is an important factor that must be
    accounted for
  • Multi-national cohorts can be combined with
    appropriate corrections for ancestry
  • Efficiency can be improved by matching of cases
    to available controls

29
Further work
  • Investigation of different matching strategies,
    particularly individual pairing of cases with
    suitable controls
  • Look at performance with admixture
  • Work on reducing the type I error rate
Write a Comment
User Comments (0)
About PowerShow.com