Genome-wide association studies (GWAS) - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

Genome-wide association studies (GWAS)

Description:

... Limitations & missing heritability Gene/pathway tests Polygenic models Outline for GWAS Review / Overview Design Analysis QC Prostate cancer example ... – PowerPoint PPT presentation

Number of Views:3280
Avg rating:3.0/5.0
Slides: 64
Provided by: Witte
Category:

less

Transcript and Presenter's Notes

Title: Genome-wide association studies (GWAS)


1
Genome-wide association studies (GWAS)
Thomas Hoffmann
2
Outline for GWAS
  • Review / Overview
  • Design
  • Analysis
  • QC
  • Prostate cancer example
  • Imputation
  • Replication Meta-analysis
  • Advanced analysis intro (more next lecture)
  • Limitations missing heritability
  • Gene/pathway tests
  • Polygenic models

3
Outline for GWAS
  • Review / Overview
  • Design
  • Analysis
  • QC
  • Prostate cancer example
  • Imputation
  • Replication Meta-analysis
  • Advanced analysis intro (more next lecture)
  • Limitations missing heritability
  • Gene/pathway tests
  • Polygenic models

4
Manolio et al., Clin Invest 2008
5
(No Transcript)
6
Recap Association studies
(guilt by association)
Hirschhorn Daly, Nat Rev Genet 2005
7
GWAS Microarray
Assay 0.7 - 5M SNPs (keeps increasing)
Affymetrix, http//www.affymetrix.com
8
Genotype calls
Bad calls!
Good calls!
9
Outline for GWAS
  • Review / Overview
  • Design
  • Analysis
  • QC
  • Prostate cancer example
  • Imputation
  • Replication Meta-analysis
  • Advanced analysis intro (more next lecture)
  • Limitations missing heritability
  • Gene/pathway tests
  • Polygenic models

10
Genome-wide assocation studies (GWAS)
11
One- and two-stage GWA designs
Two-Stage Design
One-Stage Design
SNPs
SNPs
nsamples
Stage 1
Samples
Samples
Stage 2
nmarkers
12
One-Stage Design
SNPs
Samples
Two-Stage Design
Replication-based analysis
Joint analysis
SNPs
SNPs
1
1
Stage 1
Stage 1
Samples
Samples
Stage 2
Stage 2
2
2
13
Multistage Designs
  • Joint analysis has more power than replication
  • p-value in Stage 1 must be liberal
  • Lower costdo not gain power
  • CaTs power calculator http//www.sph.umich.edu/cs
    g/abecasis/CaTS/index.html

14
Genome-wide Sequence Studies
  • Trade off between number of samples, depth, and
    genomic coverage.

MAF MAF
Sample Size Depth 0.5-1 2-5 2-5
1,000 20x perfect perfect perfect
2,000 10x r20.98 r20.995 r20.995
4,000 5x r20.90 r20.98 r20.98
More later in Next generation sequencing (NGS)
lecture
Goncalo Abecasis
15
Near-term sequencing design choices
  • For example, between
  • Sequencing few subjects with extreme phenotypes
  • e.g., 200 cases, 200 controls, 4x coverage. Then
    follow-up in larger population.
  • 10M SNP chip based on 1,000 genomes.
  • 5K cases, 5K controls.
  • Which design will work best?
  • More later in Next generation sequencing (NGS)
    lecture

16
Design choices
  • GWAS Microarray
  • Only assay SNPs designed into array (0.7-5
    million)
  • Much cheaper (so many more subjects)
  • Genotypes currently more reliable
  • GWAS Sequencing
  • De novo discovery (particularly good for rare
    variants)
  • More expensive (but costs are falling) (many less
    subjects)
  • Need much more expansive IT support
  • Lots of interesting interpretation problems
    (field rapidly evolving)

17
Design choices
  • Exome Microarray
  • Only assay SNPs designed into array
    (300Kcustom) in exons only and that could
    affect protein coding function
  • Cheapest (so many more subjects)
  • Genotypes currently more reliable (some question
    about rarest, but preliminary results good)
  • Exome Sequencing
  • De novo discovery (particularly good for rare
    variants) age of exons only
  • More expensive than microarrays, less expensive
    than gwas sequencing
  • Need more expansive IT support
  • Lots of interesting interpretation problems

18
Size of study
Visscher, AJHG 2012,
19
Size of study
Visscher, AJHG 2012,
20
Biggest studies...
  • GWAS Microarray 100,000 People in the Kaiser
    RPGEH, still to be analyzed (Hoffmann et al.,
    Genomics, 2011ab)
  • Sequencing 1000 Genomes Project (though not
    disease focused, low coverage issues)
  • Exome Sequencing GO ESP (12,031 subjects, for
    exome microarray design)

21
Outline for GWAS
  • Review / Overview
  • Design
  • Analysis
  • QC
  • Prostate cancer example
  • Imputation
  • Replication Meta-analysis
  • Advanced analysis intro (more next lecture)
  • Limitations missing heritability
  • Gene/pathway tests
  • Polygenic models

22
QC Steps
  • Filter SNPs and Individuals
  • MAF, Low call rates
  • Test for HWE among controls within ethnic
    groups. Use conservative alpha-level.
  • Check for relatedness. Identity-by-state
    calculations.
  • Check genotype gender
  • Filter Mendelian inhertance (family-based, or
    potentially cryptics, if large enough sample)

23
Check for relatedness, e.g., HapMap
  • Pemberton et al., AJHG 2010

24
GWAS analysis
  • Most common approach look at each SNP
    one-at-a-time.
  • Possibly add in multi-marker information.
  • Further investigate / report top SNPs only.
  • Or backwards replication

P-values
25
GWAS analysis
  • Additive coding of SNP most common, just a
    covariate in a regression framework
  • Dichotomous phenotype logistic regression
  • Continuous phenotype linear regression
  • Correct for multiple comparisons
  • e.g., Bonferroni, 1 million gives ?5x10-8
  • more next time
  • Adjust for potential population stratification
  • principal components (PCs), on best performing
    SNPs
  • software usually does LD filter (e.g. Eigensoft)

26
Adjusting for PCs (recap)
Balding, Nature Reviews Genetics 2010
27
Adjusting for PC's
  • Li et al., Science 2008

28
Adjusting for PC's
  • Razib, Current Biology 2008

29
Adjusting for PC's
  • Wang, BMC Proc 2009

30
QQ-plots and PC adjustment
  • Wang, BMC Proc 2009

31
Quantile-quantile (QQ) plot
32
Example GWAS of Prostate Cancer
chromosome
http//cgems.cancer.gov
Multiple prostate cancer loci on 8q24
Witte, Nat Genet 2007
33
Prostate Cancer Replications
Locus A Freq A Freq Association Association
Chr Reg SNP Cntrl Case Case OR p value Nearby Genes / Fcn Nearby Genes / Fcn
2p15 rs721048 G/A 0.19 0.21 0.21 1.15 7.7x10-9 EHBP1 endocytic trafficking EHBP1 endocytic trafficking
3p12 rs2660753 C/T 0.10 0.12 0.12 1.30 2.7x10-8 Intergenic Intergenic
6q25 rs9364554 C/T 0.29 0.33 0.33 1.21 5.5x10-10 SLC22A3 drugs and toxins. SLC22A3 drugs and toxins.
7q21 rs6465657 T/C 0.46 0.50 0.50 1.19 1.1x10-9 LMTK2 endosomal trafficking LMTK2 endosomal trafficking
8q24 (2) rs16901979 C/A 0.04 0.06 0.06 1.52 1.1x10-12 Intergenic Intergenic
8q24 (3) rs6983267 T/G 0.50 0.56 0.56 1.25 9.4x10-13 Intergenic Intergenic
8q24 (1) rs1447295 C/A 0.10 0.14 0.14 1.42 6.4x10-18 Intergenic Intergenic
10q11 rs10993994 C/T 0.38 0.46 0.46 1.38 8.7x10-29 MSMB suppressor prop. MSMB suppressor prop.
10q26 rs4962416 T/C 0.27 0.32 0.32 1.18 2.7x10-8 CTBP2 antiapoptotic activity CTBP2 antiapoptotic activity
11q13 rs7931342 T/G 0.51 0.56 0.56 1.21 1.7x10-12 Intergenic Intergenic
17q12 rs4430796 G/A 0.49 0.55 0.55 1.22 1.4x10-11 HNF1B suppressor properties HNF1B suppressor properties
17q24 rs1859962 T/G 0.46 0.51 0.51 1.20 2.5x10-10 Intergenic Intergenic
19q13 rs2735839 A/G 0.83 0.87 0.87 1.37 1.5x10-18 KLK2/KLK3 PSA KLK2/KLK3 PSA
Xp11 rs5945619 T/C 0.36 0.41 0.41 1.29 1.5x10-9 NUDT10, NUDT11 apoptosis NUDT10, NUDT11 apoptosis
Witte, Nat Rev Genet 2009
Modest ORs
34
Prostate Cancer Replications
Locus A Freq A Freq Association Association
Chr Reg SNP Cntrl Case Case OR p value Nearby Genes / Fcn Nearby Genes / Fcn
2p15 rs721048 G/A 0.19 0.21 0.21 1.15 7.7x10-9 EHBP1 endocytic trafficking EHBP1 endocytic trafficking
3p12 rs2660753 C/T 0.10 0.12 0.12 1.30 2.7x10-8 Intergenic Intergenic
6q25 rs9364554 C/T 0.29 0.33 0.33 1.21 5.5x10-10 SLC22A3 drugs and toxins. SLC22A3 drugs and toxins.
7q21 rs6465657 T/C 0.46 0.50 0.50 1.19 1.1x10-9 LMTK2 endosomal trafficking LMTK2 endosomal trafficking
8q24 (2) rs16901979 C/A 0.04 0.06 0.06 1.52 1.1x10-12 Intergenic Intergenic
8q24 (3) rs6983267 T/G 0.50 0.56 0.56 1.25 9.4x10-13 Intergenic Intergenic
8q24 (1) rs1447295 C/A 0.10 0.14 0.14 1.42 6.4x10-18 Intergenic Intergenic
10q11 rs10993994 C/T 0.38 0.46 0.46 1.38 8.7x10-29 MSMB suppressor prop. MSMB suppressor prop.
10q26 rs4962416 T/C 0.27 0.32 0.32 1.18 2.7x10-8 CTBP2 antiapoptotic activity CTBP2 antiapoptotic activity
11q13 rs7931342 T/G 0.51 0.56 0.56 1.21 1.7x10-12 Intergenic Intergenic
17q12 rs4430796 G/A 0.49 0.55 0.55 1.22 1.4x10-11 HNF1B suppressor properties HNF1B suppressor properties
17q24 rs1859962 T/G 0.46 0.51 0.51 1.20 2.5x10-10 Intergenic Intergenic
19q13 rs2735839 A/G 0.83 0.87 0.87 1.37 1.5x10-18 KLK2/KLK3 PSA KLK2/KLK3 PSA
Xp11 rs5945619 T/C 0.36 0.41 0.41 1.29 1.5x10-9 NUDT10, NUDT11 apoptosis NUDT10, NUDT11 apoptosis
Witte, Nat Rev Genet 2009
Modest ORs
35
SNPs Missed in Replication?
Locus A Freq A Freq Association Association
Chr Reg SNP Cntrl Case Case OR p value Nearby Genes / Fcn Nearby Genes / Fcn
2p15 rs721048 G/A 0.19 0.21 0.21 1.15 7.7x10-9 EHBP1 endocytic trafficking EHBP1 endocytic trafficking
3p12 rs2660753 C/T 0.10 0.12 0.12 1.30 2.7x10-8 Intergenic Intergenic
6q25 rs9364554 C/T 0.29 0.33 0.33 1.21 5.5x10-10 SLC22A3 drugs and toxins. SLC22A3 drugs and toxins.
7q21 rs6465657 T/C 0.46 0.50 0.50 1.19 1.1x10-9 LMTK2 endosomal trafficking LMTK2 endosomal trafficking
8q24 (2) rs16901979 C/A 0.04 0.06 0.06 1.52 1.1x10-12 Intergenic Intergenic
8q24 (3) rs6983267 T/G 0.50 0.56 0.56 1.25 9.4x10-13 Intergenic Intergenic
8q24 (1) rs1447295 C/A 0.10 0.14 0.14 1.42 6.4x10-18 Intergenic Intergenic
10q11 rs10993994 C/T 0.38 0.46 0.46 1.38 8.7x10-29 MSMB suppressor prop. MSMB suppressor prop.
10q26 rs4962416 T/C 0.27 0.32 0.32 1.18 2.7x10-8 CTBP2 antiapoptotic activity CTBP2 antiapoptotic activity
11q13 rs7931342 T/G 0.51 0.56 0.56 1.21 1.7x10-12 Intergenic Intergenic
17q12 rs4430796 G/A 0.49 0.55 0.55 1.22 1.4x10-11 HNF1B suppressor properties HNF1B suppressor properties
17q24 rs1859962 T/G 0.46 0.51 0.51 1.20 2.5x10-10 Intergenic Intergenic
19q13 rs2735839 A/G 0.83 0.87 0.87 1.37 1.5x10-18 KLK2/KLK3 PSA KLK2/KLK3 PSA
Xp11 rs5945619 T/C 0.36 0.41 0.41 1.29 1.5x10-9 NUDT10, NUDT11 apoptosis NUDT10, NUDT11 apoptosis
24,223 smallest P-value!
Witte, Nat Rev Genet, 2009
36
Population Attributable Risks for GWAS
Smoking lung cancer
BRCA1 Breast cancer
Jorgenson Witte, 2009
37
Imputation of SNP Genotypes
  • Combine data from different platforms (e.g., Affy
    Illumina) (for replication / meta-analysis).
  • Estimate unmeasured or missing genotypes.
  • Based on measured SNPs and external info (e.g.,
    haplotype structure of HapMap).
  • Increase GWAS power (impute and analyze all),
    e.g. Sick sinus syndrome, most significant was
    1000 Genomes imputed SNP (Holm et al., Nature
    Genetics, 2011)
  • HapMap as reference, now 1000 Genomes Project?

38
Imputation Example
Study Sample
HapMap/ 1K genomes
Gonçalo Abecasis
  • http//www.shapeit.fr/, http//mathgen.stats.ox.ac
    .uk/impute/impute_v2.html
  • http//faculty.washington.edu/browning/beagle/beag
    le.html
  • http//www.sph.umich.edu/csg/abecasis/MACH/downloa
    d/

39
Identify Match with Reference
Gonçalo Abecasis
  • http//www.shapeit.fr/, http//mathgen.stats.ox.ac
    .uk/impute/impute_v2.html
  • http//faculty.washington.edu/browning/beagle/beag
    le.html
  • http//www.sph.umich.edu/csg/abecasis/MACH/downloa
    d/

40
Phase chromosomes, impute missing genotypes
Gonçalo Abecasis
  • http//www.shapeit.fr/, http//mathgen.stats.ox.ac
    .uk/impute/impute_v2.html
  • http//faculty.washington.edu/browning/beagle/beag
    le.html
  • http//www.sph.umich.edu/csg/abecasis/MACH/downloa
    d/

41
Imputation Application
TCF7L2 gene region T2D from the WTCCC data
Observed genotypes black Imputed genotypes
red.
Chromosomal Position
Marchini Nature Genetics2007 http//www.stats.ox.a
c.uk/marchini/software
42
Replication
  • To replicate
  • Association test for replication sample
    significant at 0.05 alpha level
  • Same mode of inheritance
  • Same direction
  • Sufficient sample size for replication
  • Non-replications not necessarily a false positive
  • LD structures, different populations (e.g.,
    flip-flop)
  • covariates, phenotype definition, underpowered

43
Meta-analysis
  • Combine multiple studies to increase power
  • Either combine p-values (Fishers test),
  • or z-scores (better)

44
(Meta-analysis)Example GWAS of Prostate Cancer
chromosome
http//cgems.cancer.gov
Multiple prostate cancer loci on 8q24
Witte, Nat Genet 2007
45
Replication Meta-analysis
46
Meta-analysis
47
Outline for GWAS
  • Review / Overview
  • Design
  • Analysis
  • QC
  • Prostate cancer example
  • Imputation
  • Replication Meta-analysis
  • Advanced analysis intro (more next lecture)
  • Limitations missing heritability
  • Gene/pathway tests
  • Polygenic models

48
Limitations of GWAS
Example AUC for Breast Cancer Risk 58 Gail
model ( first degree relatives w bc, age
menarche, age first live birth, number of
previous biopsies) age, study, entry
year 58.9 SNPs 61.8 Combined Wacholder et
al., NEJM 2010
  • Not very predictive

Witte, Nat Rev Genet 2009
49
Limitations of GWAS
  • Not very predictive
  • Explain little heritability
  • Focus on common variation
  • Many associated variants are not causal

50
Where's the heritability?
Visccher, AJHG 2011
51
Wheres the heritability?
Common disease rare variant (CDRV) hypothesis
diseases due to multiple rare variants with
intermediate penetrances (allelic heterogeneity)
Many more of these?
See NEJM, April 30, 2009
McCarthy et al., 2008
52
Will GWAS results explain more heritability?
  • Possibly, if
  • Causal SNPs not yet detected due to power /
    practical issues (e.g., not yet included in
    replication studies).
  • Stronger effects for causal SNPs
  • Associated SNP may only serve as a marker for
    multiple different causal SNPs.

53
Gene/pathway-based tests
  • Various ways of collapsing the genotype
    information in multiple genes
  • Less multiple comparison adjustment
  • logit (Prob(y1 x, c)) ? ?x ?c
  • e.g.??1x1?12x12PCs, other covariates
  • y disease status)
  • x is a vector of genotypes (e.g., a gene, or a
    pathway)
  • c is a vector of covariates
  • H0 ?0

54
Gene/pathway-based tests
  • logit (Prob(y1 x, c)) ? ?x ?c
  • e.g.??1x1?12x12?1PC1?4PC4...
  • One example Kernel machine Question from last
    time
  • Simplest case of linear kernel reduces to
    linear/logistic regression (model above)
  • More complicated function of genotypes can be
    tested, e.g., interactions, etc.
  • Gory details Variance components score test
    h(x) in paper (Wu et al., AJHG 2010)

55
  • Pathways - how to define?
  • Many websites / companies provide dynamic
    graphic models of molecular and biochemical
    pathways.
  • Example BioCarta http//www.biocarta.com/
  • May be interested in potential joint and/or
    interaction effects of multiple genes in one
    pathway.

56
Polygenic Models
  • Many weak associations combine to risk?
  • Score model (use all GWAS SNPs)
  • where
  • ln(ORi ) score for SNPi from discovery
    sample
  • SNPij of alleles (0,1,2) for SNPi, person j
    in validation sample.
  • Large number of SNPs (m)
  • xj associated with disease?

ISC / Purcell et al. Nature 2009
57
Application of Model
Purcell / ISC et al. Nature 2009
58
Application to CGEMs PCa GWAS
  • 1,172 cases, 1,157 controls from PLCO Trial
  • Oversampled more aggressive cases.
  • Illumina 550K array.
  • PCa stratified by disease aggressiveness.
  • Split into halves, resampling
  • one as discovery sample
  • other as validation.
  • LD filter r2 0.5.

Witte Hoffmann, OMICs 2010
59
Results for Prostate Cancer
60
Common Polygenic Model for Prostate and Breast
Cancer?
  • CGEMs GWAS data on prostate and breast cancer.
  • Use one cancer as discovery sample, the other
    as validation.

Nat Rev Cancer 201010205-212
61
Results for PCa BrCa
62
Complex diseases
Physical activity
Genetic susceptibility
Obesity
Hyperlipidemia
Diet
Diabetes
Vulnerable plaques
Hypertension
MI
Atherosclerosis
Complex diseases Many causes many causal
pathways!
63
Moving Beyond Genome
Transcriptome All messenger RNA molecules
(transcripts) Proteome All proteins in cell
or organism Metabolome all metabolites in a
biological organism (end products of its gene
expression).
Write a Comment
User Comments (0)
About PowerShow.com