Title: HapMap:
1HapMap application in the design and
interpretation of association studies Mark J.
Daly, PhD on behalf of The International HapMap
Consortium
2Goals of this segment
- Briefly summarize HapMap design and current
status - Discuss the application of HapMap to all aspects
of association study design, analysis and
interpretation
3HapMap Project
A freely-available public resource to increase
the power and efficiency of genetic association
studies to medical traits
- High-density SNP genotyping across the genome
provides information about - SNP validation, frequency, assay conditions
- correlation structure of alleles in the genome
All data is freely available on the web for
application in study design and analyses as
researchers see fit
4HapMap Samples
- 90 Yoruba individuals (30 parent-parent-offspring
trios) from Ibadan, Nigeria (YRI) - 90 individuals (30 trios) of European descent
from Utah (CEU) - 45 Han Chinese individuals from Beijing (CHB)
- 45 Japanese individuals from Tokyo (JPT)
5HapMap progress
PHASE I completed, described in Nature
paper 1,000,000 SNPs successfully typed in all
270 HapMap samples ENCODE variation reference
resource available PHASE II data generation
complete, data released this past Monday
gt3,500,000 SNPs typed in total !!!
6ENCODE-HAPMAP variation project
- Ten typical 500kb regions
- 48 samples sequenced
- All discovered SNPs (and any others in dbSNP)
typed in all 270 HapMap samples - Current data set 1 SNP every 279 bp
A much more complete variation resource by
which the genome-wide map can evaluated
7Completeness of dbSNP
Vast majority of common SNPs are contained in or
highly correlated with a SNP in dbSNP
8Recombination hotspots are widespreadand account
for LD structure
7q21
9Utility of LD in association study
- If Im a causal variant, what is relevant to my
detection in association studies is how well
correlated I am with one of the SNPs or
haplotypes examined in the study.
10Coverage of Phase II HapMap(estimated from
ENCODE data)
Panel r2 gt 0.8 max r2 YRI
81 0.90 CEU 94 0.97 CHBJPT 94 0.97
From Table 6 A Haplotype Map of the Human
Genome, Nature
11Coverage of Phase II HapMap(estimated from
ENCODE data)
Panel r2 gt 0.8 max r2 YRI
81 0.90 CEU 94 0.97 CHBJPT 94 0.97
Percentage of deeply ascertained common variants
highly correlated with a HapMap SNP
From Table 6 A Haplotype Map of the Human
Genome, Nature
12Coverage of Phase II HapMap(estimated from
ENCODE data)
Panel r2 gt 0.8 max r2 YRI
81 0.90 CEU 94 0.97 CHBJPT 94 0.97
Average maximum correlation between a
deeply ascertained variant and a neighboring
HapMap SNP
From Table 6 A Haplotype Map of the Human
Genome, Nature
13Coverage of Phase II HapMap(estimated from
ENCODE data)
Panel r2 gt 0.8 max
r2 YRI 81 0.90 CEU 94 0.97 CHBJPT 94 0.97
Vast majority of common variation (MAF gt .05)
captured by Phase II HapMap
14Applying the HapMap
- Study design - tagging
- Study coverage evaluation
- Study analysis - improving association testing
- Study interpretation
- Comparison of multiple studies
- Connection to genes/genomic features
- Integration with expression and other functional
data - Other uses of HapMap data
- Admixture, LOH, selection
15Tagging from HapMap
- Since HapMap describes the majority of common
variation in the genome, choosing non-redundant
sets of SNPs from HapMap offers considerable
efficiency without power loss in association
studies
16(No Transcript)
17Pairwise tagging
Tags SNP 1 SNP 3 SNP 6 3 in total Test for
association SNP 1 SNP 3 SNP 6
After Carlson et al. (2004) AJHG 74106
18Pairwise Tagging Efficiency
Table 7 Number of selected tag SNPs to capture all observed common SNPs in the Phase I HapMap for the three analysis panels using pairwise tagging at different r2 thresholds Table 7 Number of selected tag SNPs to capture all observed common SNPs in the Phase I HapMap for the three analysis panels using pairwise tagging at different r2 thresholds Table 7 Number of selected tag SNPs to capture all observed common SNPs in the Phase I HapMap for the three analysis panels using pairwise tagging at different r2 thresholds Table 7 Number of selected tag SNPs to capture all observed common SNPs in the Phase I HapMap for the three analysis panels using pairwise tagging at different r2 thresholds Table 7 Number of selected tag SNPs to capture all observed common SNPs in the Phase I HapMap for the three analysis panels using pairwise tagging at different r2 thresholds
YRI CEU CHBJPT
Pairwise r2 0.5 324,865 178,501 159,029
Pairwise r2 0.8 474,409 293,835 259,779
Pairwise r2 1 604,886 447,579 434,476
Tag SNPs were picked to capture common SNPs in
release 16c.1 for every 7,000 SNP bin using
Haploview.
Tagging Phase I HapMap offers 2-5x gains in
efficiency
19Use of haplotypes can improve genotyping
efficiency
Tags SNP 1 SNP 3 2 in total Test for
association SNP 1 captures 12 SNP 3 captures
35 AG haplotype captures SNP 46
Tags SNP 1 SNP 3 SNP 6 3 in total Test for
association SNP 1 SNP 3 SNP 6
tags in multi-marker test should be conditional
on significance of LD in order to avoid
overfitting
20Efficiency and power
tag SNPs
300,000 tag SNPs needed to cover
common variation in whole genome in CEU
Relative power ()
random SNPs
Average marker density (per kb)
P.I.W. de Bakker et al. (2005) Nat Genet Advance
Online Publication 23 Oct 2005
21How to pick tag SNPs?
- What is the genetic hypothesis? Which variants do
you want to test for a role in disease? - functional annotation (coding SNPs)
- allele frequency (HapMap ascertainment)
- previously implicated associations
- Go to http//www.hapmap.org DCC supported
interactive tagging - Export HapMap data into tools such as Tagger,
Haploview (www.broad.mit.edu/mpg)
22Will tag SNPs picked from HapMap apply to other
population samples?
CEU
CEU
CEU
Whites from Los Angeles, CA
Botnia, Finland
Utah residents with European ancestry(CEPH)
Population differences add very little
inefficiency Platform presentation Paul de
Bakker (223 Sat 9.30)
23Applying the HapMap
- Study design - tagging
- Study coverage evaluation
- Study analysis - improving association testing
- Study interpretation
- Comparison of multiple studies
- Connection to genes/genomic features
- Integration with expression and other functional
data - Other uses of HapMap data
- Admixture, LOH, selection
24Genome-wide association coverage
- If genome-wide products are typed on the HapMap
sample panel, the SNPs on HapMap not included in
the panel provide an evaluation for the coverage
of the product - ENCODE (deep ascertainment)
- Phase II (dense, genome-wide)
25Association tests with fixed markers
Tests of association SNP 1 SNP 3
SNP on whole-genome product (1 - 5 common
variation directly assayed)
26Association tests with fixed markers
Tests of association SNP 1 SNP 3
27Association tests with fixed markers
Tests of association SNP 1 SNP 3 SNPs
actually tested SNP 1 SNP 3 SNP 2 SNP 5
28Genome-wide products can capture most common
variation
Example 500K data generated by Affymetrix and
recently submitted to HapMap DCC
29More on this topic
- Platform presentations tomorrow morning 8 AM
sharp - Peer
- Jorgenson
- Lazarus
- As well as several detailed posters!
30Applying the HapMap
- Study design - tagging
- Study coverage evaluation
- Study analysis - improving association testing
- Study interpretation
- Comparison of multiple studies
- Connection to genes/genomic features
- Integration with expression and other functional
data - Other uses of HapMap data
- Admixture, LOH, selection
31Can incorporating tests of haplotypes of SNPs on
the genome-wide product improve this coverage?
32Improving association power using data from
HapMap
Tests of association SNP 1 SNP 3 SNPs
actually tested SNP 1 SNP 3 SNP 2 SNP 5
33Improving association power using data from
HapMap
Tests of association SNP 1 SNP 3 SNPs
actually tested SNP 1 SNP 3 SNP 2 SNP 5
34Improving association power using data from
HapMap
Tests of association SNP 1 SNP 3 AG
haplotype SNPs actually tested SNP 1 SNP
3 SNP 2 SNP 5 SNP 4 SNP 6
35Haplotypes increase coverage
36Applying the HapMap
- Study design - tagging
- Study coverage evaluation
- Study analysis - improving association testing
- Study interpretation
- Connection to genes/genomic features
- Comparison of multiple association studies
- Integration with expression and other functional
data - Other uses of HapMap data
- Admixture, LOH, selection
37Integration with genomic features
- Positive association to a SNP on HapMap enables
detailed interpretation - How many other SNPs are in LD with this SNP?
- What genes are in LD with this SNP?
- What coding variants and putative functional
variants are in LD with this SNP? - Potential to improve power by modifying Bayesian
priors - of each association test based on this
information
38Example Complement Factor H - AMD
- Original SNP hit in Affy 100K experiment
rs380390 - Extent and structure of LD from HapMap aids in
the fine mapping phase of project
Klein et al Science 2005
39Example Complement Factor H - AMD
rs380390
40Example Complement Factor H - AMD
rs380390
41Meta-analysis of association studies
- When different marker sets are used to study
association (candidate gene or genome-wide),
results can be readily integrated when all
markers are typed on HapMap samples
42(No Transcript)
43Example DTNBP1 and schizophrenia
- Multiple studies have described modest
association to schizophrenia - Most studies have examined small numbers of
non-overlapping sets of SNPs - HapMap data can be used to determine whether
these association finding
Derek Morris, Mousumi Mutsuddi (WCPG meeting)
44Extensive LD across DTNBP1
Phase II HapMap - 186 SNPs 180 kb
45Phylogeny of DTNBP1 tag SNPs
Ancestral haplotype
6 33 42
8 11
46Associated alleles reported
Straub 2002 Van den Oord 2003
47Associated alleles reported
Straub 2002 Van den Oord 2003
Schwab 2003
48Associated alleles reported
Straub 2002 Van den Oord 2003
Van den Bogaert 2003 Funke 2004
Schwab 2003
49Associated alleles reported
Straub 2002 Van den Oord 2003
Williams 2004 Bray 2005
Van den Bogaert 2003 Funke 2004
Schwab 2003
50Associated alleles reported
Kirov 2004
Straub 2002 Van den Oord 2003
Williams 2004 Bray 2005
Van den Bogaert 2003 Funke 2004
Schwab 2003
51Inconsistent findings
- No consistently associated SNP/haplotype pattern
across studies - All studies (European-derived populations) had
allele/haplotype frequencies compatible with
HapMap-CEU sample - HapMap can successfully relate associations from
diverse marker sets
52Other Applications Structural Variation
- 3 papers coming out in the next month describe
use of HapMap data to identify large, common
deletion polymorphisms - LD around these polymorphisms permits their
assessment with tag SNPs/haplotypes in
genome-wide association studies
53Other Applications Admixture Scanning
- HapMap data provides a rich source of highly
differentiated SNPs for design of admixture
panels - Fine mapping of admixture signals can be focused
on the full set of highly differentiated alleles
in any region of the genome
54Other Applications LOH
- HapMap identifies
- Regions of extended LD that may manifest
themselves as unusually long stretches of
homozygosity in individual samples - The catalog of large deletion variants on the
HapMap will differentiate between LOH that is
potentially de novo and causal, and that which is
simply commonly segregating in the population
LOH analysis cognizant of HapMap patterns under
development
55Early results encouraging
- At this meeting
- Arking and colleagues describe identification of
variant altering QT-interval - Herbert and colleagues describe a novel gene for
obesity - Wijmenga and colleagues describe a novel gene for
celiac disease