Title: Genetic Association Studies: Analysis Strategy Considerations
1Genetic Association Studies Analysis Strategy
Considerations
Eleftheria Zeggini elez_at_well.ox.ac.uk
2Overview
- Quality control
- -single nucleotide polymorphism genotyping
quality - -Hardy-Weinberg equilibrium
- -frequency comparisons
- -linkage disequilibrium
- Analysis
- -single-point
- -haplotype-based
- Interpretation
- -adjusting for multiple testing
- -replication
- -meta-analysis
3Quality Control Genotyping
- Duplicate genotype consistency
- check duplicates across plates
- flag / drop SNPs with inconsistencies
- Genotype call quality scores
- set threshold for acceptable quality
- plot quality scores v. physical distance
- Genotyping completeness
- proportion of missing data
4QC Hardy-Weinberg equilibrium
- Check HWE in cases and controls separately
- Control genotypes should be in HWE
- Deviation from HWE may indicate
- problematic genotyping
- population stratification
- biological mechanism
5QC HWE in cases
- Deviation from HWE in cases may indicate
- problematic genotyping
- population stratification
- biological mechanism
- underlying disease association
- Will affect downstream choice of analysis
strategy - Allele and haplotype-specific tests not
applicable due to non-independent parental
allele transmission
6QC Evaluating deviation from HWE
- Check genotype distribution
- Excess heterozygotes
- Excess homozygotes
- implies recessive model
- can approximate expected effect size
- e.g. ls GRR
- Plot HWE p values v. physical distance
7QC Evaluating deviation from HWE
8QC Genotype frequency comparison
- Review literature for studies including SNPs
tested - Compare SNP frequencies with data from same
population / populations of similar ancestry - sample size
- genotyping method
- Flag SNPs displaying gross inconsistencies
9QC Linkage disequilibrium patterns
- Characterise pairwise LD patterns in cases and
controls separately - Pairs of SNPs separated by long physical distance
are not likely to be in strong LD - Correlate strong apparent long-range LD with
- minor allele frequency
- genotype quality scores
- proportion of missing data
10QC Wish list
- High quality genotyping
- Low proportion of missing data
- Controls and cases in HWE
- or
- Deviation from HWE attributable to plausible
association - Similar genotype frequencies with public domain
data - No unexplainable strong apparent long-range LD
11Analysis
- Single-point
- Haplotype-based
12Analysis Single-point
- Compare individual SNP frequencies between cases
and controls - 5 different models can be tested
- General, dominant, recessive, multiplicative,
additive
13Analysis Which model?
- General Compare genotype distributions
AA, AB, BB between cases and controls - Dominant (AA AB) v. BB
- Recessive AA v. (AB BB)
- Multiplicative A v. B
- Additive AA AB BB
14Analysis Fish and tell
- No prior reason to expect specific model
- Good practice to
- report number of models tested
- report sub-analysis results
- adjust for multiple testing
- replicate positive findings
15Association testing
- Direct functional variant affecting disease risk
is tested - Indirect SNPs in LD with the true functional
variant are tested -
16Indirect association testing
- Relies on allelic association between tested SNPs
and functional variant - Influenced by effect size, disease variant
frequency, marker allele frequency, extent of LD - Increase power to capture disease association by
matching marker and disease variant allele
frequencies
17Analysis Our friend the haplotype
- Haplotypes can contain more information than
single markers - If the disease variant is not directly tested,
haplotypes can help increase power by matching
variant allele frequency - Haplotypes can indicate synergistic effects
18Analysis Estimating haplotypes
- Drawbacks of haplotype inference in unrelated
individuals methodological assumptions, missing
data, uncertainty - Loss of information compared with genotype-based
analysis approaches when uncertainty not
incorporated
19Analysis Haplotype blocks v. moving haplotype
window
- Selection of haplotype-based analysis strategy
- Characterise blocks and compare frequencies of
within-block haplotypes - Compare frequencies of moving haplotype window
across region of interest - Consequences for downstream interpretation of
results
20Analysis Single-point v. haplotype-based
- Relative efficiency of approaches depends on
tested SNP density - Haplotype tagging SNPs
- Majority of haplotype information encapsulated by
htSNPs - Complementary approaches
21Analysis Wish list
- Single-point analyses point to one specific
variant / gene / subregion - Haplotype analyses confirm finding and help
narrow down the disease gene-containing interval - Unfortunately, the wish list scenario rarely
occurs multiple associations, hard to discern
true positives
22Will the real disease gene please stand up?
- Adjusting for multiple testing
- Understanding LD and non-independence of markers
- Replication
- Meta-analysis
23Interpretation Adjusting for multiple testing
- Guarding against false positives
- Family-wise error rate (FWER) v. false discovery
rate (FDR) - false positive rate
- of false positives / of true null tests
- false discovery rate
- of false positives / of significant tests
24Interpretation Non-independence of markers
- Important to characterise the underlying
structure of LD - Markers in variable levels of LD with the true
disease variant will pick up the same association
signal - Conditional tests can help dissect out true
positives
25Interpretation Replication and meta-analysis
- Important to achieve independent replication of
positive findings - Inconsistency of published results variable
effect sizes, power, extent of LD - Meta-analysis, combining many different studies,
can provide better estimates of effect sizes
26Conclusion What makes a good association study?
- Meticulous study design
- Accurate genotyping
- Stringent QC
- Carefully selected analysis plan
- Review of complementary evidence
- Replication replication replication