Title: A Genome-Wide Assessment of Linkage Disequilibrium
1High-density admixture mapping to find genes for
complex disease
David ReichHarvard Medical School Department of
GeneticsBroad Institute
July 13, 2004
(work with Nick Patterson)
2Why do we want to find disease-causing variants?
Identify new targets for rational drug design
and treatment
Identify new biological pathways
Clinical genetic testing
3Linkage mapping doesnt work well for common
diseases
Turn to association methods instead
4Association Mapping
Direct association between mutations and disease
ACTGAACATTTAGACA
ACTGATCATTTAGACA
ACTGATCATTTAGACA
ACTGAACATTTAGACA
ACTGATCATTTAGACA
ACTGATCATTTAGACA
ACTGAACATTTAGACA
ACTGATCATTTAGACA
5Admixture mapping.
In favorable circumstances, the most economical
method for a whole-genome scan
- The idea of admixture mapping
2) Methods
3) A practical whole-genome map
4) Two real studies
6Admixture Mapping (type of association mapping)
Can be as powerful as haplotype association
but requires 100- to 500-times fewer SNPs
Populations like African and Hispanic Americans
Most promising for diseases with different
population risks multiple sclerosis, prostate
cancer,
7Admixture creates a mosaic
8How does admixture mapping work?
These samples will be enriched in European
ancestry at the disease locus
9The Signal of Admixture Association
Controls are not necessary!
The perfect control is the rest of peoples
genome
2,000 SNPs for genome-wide mapping
10EXPERIMENTALLY how do you distinguish African and
European ancestry?
11How does one identify European or African
segments despite similar gene frequencies?
12New Methods
The Hidden Markov Model (for combining
information from closely linked, partially
informative markers to make inferences about
ancestry)
The Markov Chain Monte Carlo (to deal with
uncertainties in the HMM parameters that can
produce false-positives in analysis)
13How to track regions of European African
ancestry along the genome?
Mi European ancestry in individuals
ancestors gt40 generations ago
li Number of generations since mixture
Key parameters for the HMM
14Hidden Markov Model (HMM) to combine information
from neighboring markers
Genome of an African American is a mosaic of
European and African ancestry
15(No Transcript)
16Scoring for disease genes
The locus-genome statistic
hi,0 Mi2 hi,1 2Mi(1-Mi) hi,2
(1-Mi)2
yi1, yi2 increased risks due to 1, 2 European
alleles
17Can detect regions of increased European ancestry
in a data set of 756 SNPs and 442 samples
Section 3
18Problem with the HMMpjEuropean and pjAfrican are
assumed known
In fact, they are unknown due to
sampling error when genotyping the parental
populations
modern populations arent the true parental
populations
This can cause false-positives!
19Markov Chain Monte Carlo to account for this
uncertainty (MCMC)
Frequency estimates pjEuropean and pjAfrican
affect the inferences across ALL samples, so
we no longer treat individuals independently
to estimate Mi and li
In a study of 2,500 markers, 2,500 samples,
there would be about 10,000 unknown parameters,
so we use an MCMC to average over them
20How many burn-in and follow-on iterations for the
MCMC?
100 burn-in iterations OK
200 follow-on iterations are recommended as
whole- genome score is 97 correlated to 2,000
follow-ons
21gt2,000 simulations to assess power to detect
disease genes show the method is robust with
current maps
22Genotypes required for whole-genome scans with
admixture, linkage and haplotype mapping
(at 80 power)
23Making admixture mapping work
- gt2,000 samples required for a powerful study(far
more than the 300 previously recommended)(no
controls strictly necessary cases from one
study are controls for another)
2) Diseases to studyHypertension, End-stage
renal disease, prostate cancer, Multiple
sclerosis, ovarian cancer, Alzheimers disease,
Type II diabetes (Hispanic Americans)
Note 10-30 more samples to study diseases
prevalent in Africans
24The first practical admixture map
of SNPs Source
450,000 Non-redundant snps in our dbase
3,583 Experimentally revalidated
3,378 Genotyped in at least 20 Eur and Afr
3,250 Hardy Weinberg p gt 0.005
3,095 Information content SIC gt 0.035
3,045 No significant population differentiation (P gt0.002)
2,504 SNP spacing of gt 50kb
2,138 No LD in West Africans or Europeans
25(No Transcript)
26Power of themap for discerning ancestry
27Our first two large-scans
Prostate cancer 2-3 fold more prevalent in
African Americans 650 cases, 698 controls
already in lab
Multiple sclerosis 1.5-2 fold more prevalent
in European Americans 502 cases, 175
controls already in lab
28Initial screen of 39 of the genome focusing on
linkage peaks in 442 MS patients
Nothing compelling yet
29Currently planning to increase power
30Conclusions
The imperative now is on finding something with
this new method
Must do SEVERAL large-scale studies to assess
whether admixture mapping works
31Acknowledgements
MethodsNick PattersonNeil Hattangadi
New mapMike Smith Steve OBrienDennis
Gilbert Francisco de la VegaTrevor
Woodage Charles ScafeNick Patterson Gavin
McDonaldAlicja WalizewskaDavid Altshuler
Multiple sclerosisDavid Hafler Nick
PattersonGavin McDonaldAlicja WaliszewskaPhil
de JagerJorge Oksenberg Stephen HauserAmy
SwerdlinBruce CreeRobin LincolnCari de Loa
Prostate canerMatt FreedmanDavid
AltshulerChris HaimanBrian Henderson