Efficiency and power in genetic association studies - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Efficiency and power in genetic association studies

Description:

Utilize family data by applying phenotype scores for founder haplotypes ... LDU locations for the 27 SNPs flanking the CYP2D6 gene on chromosome 22, the ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 24
Provided by: yixua
Category:

less

Transcript and Presenter's Notes

Title: Efficiency and power in genetic association studies


1
Efficiency and power in genetic association
studies
  • Yixuan Chen
  • 1/11/2008

2
  • Paul I Wde Bakker, et al. Efficiency and power
    in genetic association studies. Nature Genetics
    37, 11 (2005).
  • Nikolas Maniatis, et al. Effects of Single SNPs,
    Haplotypes, and Whole-Genome LD Maps on Accuracy
    of Association Mapping. Genetic Epidemiology 31
    179188 (2007).

3
Outline
  • Motivation
  • Comparison
  • Conclusions
  • Discussion

4
Motivation
  • HapMiner project

5
HapMiner
  • HapMiner project extension
  • Utilize family data by applying phenotype scores
    for founder haplotypes
  • Seek experimental supports
  • Situations where haplotype-based association
    tests have advantages over single-marker tests

6
Association Study
  • Genotyping a higher density of SNPs increases the
    fraction of sites captured through LD (linkage
    disequilibrium)
  • Correlations among nearby variants (LD) can
    improve the costeffectiveness, guiding selection
    of informative tag SNPs and providing
    information about nearby variants not genotyped
  • The use of multimarker haplotypes may result in
    greater efficiency

7
Efficiency and Power
  • Explicitly modeled disease association studies
  • Empirically assessed significance thresholds
  • Carried out evaluations using empirical (rather
    than simulated) human genotype data from the
    International HapMap ENCODE Project

8
Simulation
  • To simulate a case-control panel
  • Designated one SNP to be causal
  • Calculated an effect size such that if this SNP
    were directly tested in 1,000 cases and 1,000
    controls, power would be 95 to achieve a nominal
    P value of 0.01
  • Fixed the absolute power for each putative causal
    SNP
  • Rare alleles are assigned a stronger effect than
    common alleles
  • Examined only common alleles (with frequency gt
    5)

9
Significance Threshold
  • The significance threshold for declaring
    association was based on the empirical null
    distribution
  • The statistical tests were examined in a set of
    null panels (in which no SNP is causal), with the
    maximum chi-square value exceeded in 1 of null
    panels

10
Tag SNP Selection
  • Select a subset of non-redundant SNPs from the
    reference panel such that every common allele
    either is directly genotyped or has a perfect
    proxy (r2 1.0) among the tags.

11
Use Haplotype
  • An identical set of tests of association with one
    degree of freedom (d.f.) are done but we allow a
    haplotype of tags to serve as surrogate for an
    untyped SNP
  • In other words, if a specific multimarker
    combination (i.e., haplotype of tag SNPs) can
    serve as an effective proxy for some putative
    causal alleles, then these alleles need not be
    typed as tag SNPs (or tested as single markers).

12
Increased Efficiency
13
Relax Thresholds for Tagging
  • r2 gt 0.8
  • Best N rank potential tags according to the
    number of other SNPs for which they can act as a
    proxy and then to type the SNPs in this priority
    order

14
In Summary
  • If a complete reference panel is available,
    multimarker haplotype tests are more efficient
    than pairwise tests, and prioritizing SNPs on the
    basis of their LD properties allows impressive
    reductions in the genotyping burden while
    maintaining excellent power.

15
Incomplete Panel
  • Created a pseudo 5-kb HapMap by thinning the
    ENCODE data to achieve the spacing and frequency
    distribution of phase I HapMap
  • Selected tags and designed tests using this
    incomplete resource, evaluating performance in
    simulated case-control panels where all alleles
    (not just those from the incomplete HapMap) were
    allowed to be causal

16
Undiminished Power
  • Two key changes
  • A much smaller set of tags was selected
  • A subset of common variants had no good proxies
  • Power was largely undiminished
  • 95 relative power in CEU, using a set of best N
    tags

17
The Unexpected
  • Tests that capture many putative causal alleles
    add the same amount to the multiple testing
    burden as do independent tests that capture only
    a single site.

18
The Best N
  • The best N approach underperforms at sparser
    densities
  • The best N method suffers further when applied to
    incomplete reference panels
  • Therefore, where complete data is available, and
    as denser versions of HapMap become available
    (such as the pending phase II), the utility of
    the best N method should increase, particularly
    for choosing marker densities of more than one
    SNP per 10 kb

19
Conclusions
  • Specified multimarker tests substantially
    increase tagging efficiency relative to
    singlemarker approaches, without loss of power.
  • When selecting SNPs from very dense reference
    panels, a method such as the best N strategy,
    which ranks SNPs according to the number of
    proxies they have, allows marked reductions in
    genotyping with limited loss of power,
    substantially outperforming a method based on
    relaxing r2 thresholds.
  • Sparser sets of tags selected from a pseudo phase
    I HapMap are almost as powerful as equally sized
    sets chosen from complete reference panels.

20
Linkage Disequilibrium Maps in LD Units (LDU)
  • Using LDU locations for the 27 SNPs flanking the
    CYP2D6 gene on chromosome 22, the most common
    functional polymorphism within the gene was
    located at 15 kb from its true location.
  • Expressing the locations of the 27 SNPs in LDU
    from the HapMap LDU map, analysis yielded an
    estimated location that is only 0.3 kb away from
    the CYP2D6 gene.
  • The haplotype data provided much poorer
    localization compared to single SNP analysis.

21
Drawbacks of Haplotype
  • If haplotype testing increases the degrees of
    freedom or number of tests in statistical
    analysis, it may decrease, rather than increase,
    overall power.
  • Autocorrelation generated by the duplicated SNPs
    among haplosets
  • Deflates the error variance for association
    mapping
  • power is exaggerated and localization is poor
    compared to single SNPs

22
Discussions
  • Haplotype-based methods may perform better
  • If allelic heterogeneity exists (more than one
    causal alleles at a disease locus) (not
    applicable)
  • Two-locus disease model
  • Disease allele is not common (frequency lt 5)?
  • Use tag SNPs?
  • More?

23
THANKS!
Write a Comment
User Comments (0)
About PowerShow.com