Genetic Association Studies: Analysis Strategy Considerations - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Genetic Association Studies: Analysis Strategy Considerations

Description:

elez_at_well.ox.ac.uk. Quality control -single nucleotide polymorphism genotyping quality ... Check HWE in cases and controls separately. Control genotypes should ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 27
Provided by: gareth49
Category:

less

Transcript and Presenter's Notes

Title: Genetic Association Studies: Analysis Strategy Considerations


1
Genetic Association Studies Analysis Strategy
Considerations
Eleftheria Zeggini elez_at_well.ox.ac.uk
2
Overview
  • Quality control
  • -single nucleotide polymorphism genotyping
    quality
  • -Hardy-Weinberg equilibrium
  • -frequency comparisons
  • -linkage disequilibrium
  • Analysis
  • -single-point
  • -haplotype-based
  • Interpretation
  • -adjusting for multiple testing
  • -replication
  • -meta-analysis

3
Quality Control Genotyping
  • Duplicate genotype consistency
  • check duplicates across plates
  • flag / drop SNPs with inconsistencies
  • Genotype call quality scores
  • set threshold for acceptable quality
  • plot quality scores v. physical distance
  • Genotyping completeness
  • proportion of missing data

4
QC Hardy-Weinberg equilibrium
  • Check HWE in cases and controls separately
  • Control genotypes should be in HWE
  • Deviation from HWE may indicate
  • problematic genotyping
  • population stratification
  • biological mechanism

5
QC HWE in cases
  • Deviation from HWE in cases may indicate
  • problematic genotyping
  • population stratification
  • biological mechanism
  • underlying disease association
  • Will affect downstream choice of analysis
    strategy
  • Allele and haplotype-specific tests not
    applicable due to non-independent parental
    allele transmission

6
QC Evaluating deviation from HWE
  • Check genotype distribution
  • Excess heterozygotes
  • Excess homozygotes
  • implies recessive model
  • can approximate expected effect size
  • e.g. ls GRR
  • Plot HWE p values v. physical distance

7
QC Evaluating deviation from HWE
8
QC Genotype frequency comparison
  • Review literature for studies including SNPs
    tested
  • Compare SNP frequencies with data from same
    population / populations of similar ancestry
  • sample size
  • genotyping method
  • Flag SNPs displaying gross inconsistencies

9
QC Linkage disequilibrium patterns
  • Characterise pairwise LD patterns in cases and
    controls separately
  • Pairs of SNPs separated by long physical distance
    are not likely to be in strong LD
  • Correlate strong apparent long-range LD with
  • minor allele frequency
  • genotype quality scores
  • proportion of missing data

10
QC Wish list
  • High quality genotyping
  • Low proportion of missing data
  • Controls and cases in HWE
  • or
  • Deviation from HWE attributable to plausible
    association
  • Similar genotype frequencies with public domain
    data
  • No unexplainable strong apparent long-range LD

11
Analysis
  • Single-point
  • Haplotype-based

12
Analysis Single-point
  • Compare individual SNP frequencies between cases
    and controls
  • 5 different models can be tested
  • General, dominant, recessive, multiplicative,
    additive

13
Analysis Which model?
  • General Compare genotype distributions
    AA, AB, BB between cases and controls
  • Dominant (AA AB) v. BB
  • Recessive AA v. (AB BB)
  • Multiplicative A v. B
  • Additive AA AB BB

14
Analysis Fish and tell
  • No prior reason to expect specific model
  • Good practice to
  • report number of models tested
  • report sub-analysis results
  • adjust for multiple testing
  • replicate positive findings

15
Association testing
  • Direct functional variant affecting disease risk
    is tested
  • Indirect SNPs in LD with the true functional
    variant are tested

16
Indirect association testing
  • Relies on allelic association between tested SNPs
    and functional variant
  • Influenced by effect size, disease variant
    frequency, marker allele frequency, extent of LD
  • Increase power to capture disease association by
    matching marker and disease variant allele
    frequencies

17
Analysis Our friend the haplotype
  • Haplotypes can contain more information than
    single markers
  • If the disease variant is not directly tested,
    haplotypes can help increase power by matching
    variant allele frequency
  • Haplotypes can indicate synergistic effects

18
Analysis Estimating haplotypes
  • Drawbacks of haplotype inference in unrelated
    individuals methodological assumptions, missing
    data, uncertainty
  • Loss of information compared with genotype-based
    analysis approaches when uncertainty not
    incorporated

19
Analysis Haplotype blocks v. moving haplotype
window
  • Selection of haplotype-based analysis strategy
  • Characterise blocks and compare frequencies of
    within-block haplotypes
  • Compare frequencies of moving haplotype window
    across region of interest
  • Consequences for downstream interpretation of
    results

20
Analysis Single-point v. haplotype-based
  • Relative efficiency of approaches depends on
    tested SNP density
  • Haplotype tagging SNPs
  • Majority of haplotype information encapsulated by
    htSNPs
  • Complementary approaches

21
Analysis Wish list
  • Single-point analyses point to one specific
    variant / gene / subregion
  • Haplotype analyses confirm finding and help
    narrow down the disease gene-containing interval
  • Unfortunately, the wish list scenario rarely
    occurs multiple associations, hard to discern
    true positives

22
Will the real disease gene please stand up?
  • Adjusting for multiple testing
  • Understanding LD and non-independence of markers
  • Replication
  • Meta-analysis

23
Interpretation Adjusting for multiple testing
  • Guarding against false positives
  • Family-wise error rate (FWER) v. false discovery
    rate (FDR)
  • false positive rate
  • of false positives / of true null tests
  • false discovery rate
  • of false positives / of significant tests

24
Interpretation Non-independence of markers
  • Important to characterise the underlying
    structure of LD
  • Markers in variable levels of LD with the true
    disease variant will pick up the same association
    signal
  • Conditional tests can help dissect out true
    positives

25
Interpretation Replication and meta-analysis
  • Important to achieve independent replication of
    positive findings
  • Inconsistency of published results variable
    effect sizes, power, extent of LD
  • Meta-analysis, combining many different studies,
    can provide better estimates of effect sizes

26
Conclusion What makes a good association study?
  • Meticulous study design
  • Accurate genotyping
  • Stringent QC
  • Carefully selected analysis plan
  • Review of complementary evidence
  • Replication replication replication
Write a Comment
User Comments (0)
About PowerShow.com