Title: Whole Genome Association study in the International Multisite ADHD Genetics IMAGE Project
1Whole Genome Association study in the
International Multisite ADHD Genetics (IMAGE)
Project
Benjamin M. Neale1,2,3, Stephen V. Faraone4,5,
Ph.D.1SGDP Centre, Institute of Psychiatry,
Kings College London 2Broad Institute of Harvard
and MIT 3Center for Human Genetic Research,
MGH 4Medical Genetics Research Center
and5Departments of Psychiatry and Neuroscience
Physiology, SUNY Upstate Medical University
2Data Cleaning Quality Thresholds for Individuals
- Low Call Rate
- 10 individuals
- Gender Discrepancies (no discrepancies based on X
chromosome) - 1 individual
- Sample Heterozygosity gt 0.32
- 16 individuals
- Mendelian Inconsistencies gt2
- 6 individuals
Excluded 10 individuals
Quality Control analyses were processed using
the GAIN QA/QC Software Package (version 0.7.4)
3Data Cleaning Quality Thresholds for Individuals
- Low Call Rate
- 10 individuals
- Gender Discrepancies (no discrepancies based on X
chromosome) - 1 individual
- Sample Heterozygosity gt 0.32
- 16 individuals
- Mendelian Inconsistencies lt2
- 6 individuals
Excluded 16 individuals
Quality Control analyses were processed using
the GAIN QA/QC Software Package (version 0.7.4)
4Key Quality Consideration
- Missingness by Minor Allele Frequency (MAF)
- We utilize the distribution of association test
statistics to identify problem categories - Variance inflation factor (lambda) from genomic
control Devlin and Roeder 1999 provides a
benchmark - Lambda is calculated by taking the median ?2 of
the observed distribution and dividing by 0.455
(i.e. theoretical median of the ?2) - Distribution of test statistics provides our
guide as we want mainly a null distribution
5Distribution of association conditional on minor
allele frequency and call rate
Call Rate
Minor Allele Frequency
6Distribution of association conditional on minor
allele frequency and call rate
Call Rate
- 0.01 MAF lt 0.05 and call rate 99
- 0.05 MAF lt 0.10 and call rate 97
- 0.10 MAF and call rate 95.
Minor Allele Frequency
7Distribution of association conditional on minor
allele frequency and call rate
Call Rate
- 0.01 MAF lt 0.05 and call rate 99
- 0.05 MAF lt 0.10 and call rate 97
- 0.10 MAF and call rate 95.
144,511 SNPs lost
Minor Allele Frequency
8Data Cleaning Quality Thresholds for SNPs
- Mendel Error gt4
- 15,387 SNPs excluded
- Duplicate Sample Discordance gt1
- 185
- Hardy-Weinberg Equilibrium test pgt 0.000001
- 278
- Quality score lt10
- SNP coded missing
Low quality scores
Quality Control analyses were processed using
the GAIN QA/QC Software Package (version 0.7.4)
9Data Cleaning Quality Thresholds for Inclusion
- Quality Control analyses were processed using the
GAIN QA/QC Software Package (version 0.7.4) - Mendelian Errors
- Mendelian errors for lt 2 of markers
- Markers lt 4 Mendelian errors
- Gender Discrepancies (no discrepancies based on X
chromosome) - Sample Heterozygosity
- Heterozygosity gt 0.32
- Sample Completeness
- Genotype call rates gt 95
- Genotype Call Quality Score cutoff
- Perlegen genotype quality score ? 10
- Hardy-Weinberg Equilibrium test
- p-value gt 0.000001 using Fishers exact test
- Duplicate Sample Discordance
- No more than one discordant genotype among 15
duplicate samples - SNP Call Rate and Minor Allele Frequency (MAF)
- Binary approach for call rate and MAF, as
informed by the association statistic
438,784 SNPs survived QC
10Final Sample
11Approaches to statistical analysis
- Classical TDT analysis (qualitative phenotype) in
PLINK - Analysis of quantitative phenotypes in PBAT
- Re-ranking of a SNP subset according to
biological information (bioinformatics approach) - Aim to identify genome-wide significant regions
and select variation for replication efforts
12Approaches to statistical analysis
- Classical TDT analysis (qualitative phenotype) in
PLINK - Analysis of quantitative phenotypes in PBAT
- Re-ranking of a SNP subset according to
biological information (bioinformatics approach) - Aim to identify genome-wide significant regions
and select variation for replication efforts
13Observed bias in TDT
14Observed bias in TDT
McNemar ?2 407, p-value 10-90
15Further indications of trouble
- If we bin the sorted results from low p-value to
high into sets of 5,000 - We can calculate the McNemar ?2 for each bins by
comparing the common vs. rare allele
overtransmission
16Graph of McNemar Chi Squares
Bins of 5,000 SNPs ranked by P-value, from low to
high
17Graph of McNemar Chi Squares
11490 Common 10425 Rare ?2525.9
107099 Common 104583 Rare ?229.9
Bins of 5,000 SNPs ranked by P-value, from low to
high
18To fix the problem
Apply the lambda correction by dividing ?2 by
lambda
19Apply a lambda correction
Bins of 5,000 SNPs organized by rank, from low to
high
Assign directionality based on whether common or
rare allele is overtransmitted
20Uncorrected Chi Square Q-Q Plot
21Corrected Chi Square Q-Q plot
22Rare QQ
23Common QQ
24Planned strategies for solving the problem
- Imputation
- Breaking TDT to case-control sample using MDD
sample - Two-tiered data cleaning (condition on
over-transmission of common vs. rare alleles) - Attempt to identify source of bias (missingness
and MAF do not quite account)
25Planned Replication Strategy
- Stage I children with ADHD
- European Consortium (1000 parent-child trios)
- American group 1000 trios from Wash U, UCLA, and
MGH with Pfizer funding (WaPUM) - Stage II adults with ADHD
- 1000 cases 1000 controls (NL, Norway, Spain)
26IMAGE Project Investigators
- NIMH Grant Principal Investigator
- S. Faraone, SUNY Upstate Medical University,
Syracuse, NY - Site Principal Investigators
- P. Asherson, Data Collection Coordination
Center, London, UK - J. Sergeant, Director of Eunethydis Network, The
Netherlands - R. Ebstein, Jerusalem, Israel
- M. Gill, Dublin, Ireland
- A. Miranda F. Mulas, Valencia, Spain
- R. Oades, Essen, Germany
- H. Roeyers, Ghent, Belgium
- A. Rothenberger (Göttingen) T. Banaschewski
(Mannheim), Germany - J. Buitelaar, The Netherlands
- E. Sonuga-Barke, Southhampton, UK
- H.C. Steinhausen, Zurich, Switzerland
Members of Genetics Subcommittee (also B.
Franke, Netherlands, R. Anney, Ireland)
Supported by NIH grant R01MH62873 to S. Faraone
27IMAGE Project Investigators
- Statistical Analysis Team
- C. Lange, Harvard School of Public Health,
Boston - N. Laird, Harvard School of Public Health,
Boston - M. Daly, Center for Human Genetic Research, MGH
- P. Sham, Institute of Psychiatry, London, UK
- J. Su, SUNY Upstate Medical University, Syracuse
- B. Neale, Institute of Psychiatry, London, UK
- Note Original QTL design based on advice from
Prof Sham and from Shaun Purcell - Bioinformatics Team
- R. Anney, University of Dublin Trinity College,
Ireland - E. Kenny, University of Dublin Trinity College,
Ireland - C. Ó'Dúshláine, University of Dublin Trinity
College, Ireland - D. Morris, University of Dublin Trinity College,
Ireland - Copy number analysis Team
- B. Franke, Radboud University Nijmegen Medical
Centre, Netherlands - J. Hehir-Kwa, Radboud University Nijmegen Medical
Centre, Netherlands - J. Veltman, Radboud University Nijmegen Medical
Centre, Netherlands
Supported by NIH grant R01MH62873 to S. Faraone