Statistical methods for genetic association studies - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Statistical methods for genetic association studies

Description:

genetic association studies. http://www.stats.gla.ac.uk/~paulj/assoc_study_stats.ppt. A tutorial on statistical methods for population association studies. David ... – PowerPoint PPT presentation

Number of Views:307
Avg rating:3.0/5.0
Slides: 31
Provided by: paul565
Category:

less

Transcript and Presenter's Notes

Title: Statistical methods for genetic association studies


1
Statistical methods forgenetic association
studies
http//www.stats.gla.ac.uk/paulj/assoc_study_stat
s.ppt
2
  • A tutorial on statistical methods for population
    association studies
  • David Balding
  • Nature Reviews Genetics (2006) 7781-791

3
(No Transcript)
4
Recombination
X/x unobserved causative mutation A/a distant
marker B/b linked marker
5
Approaches to finding disease genes
  • Population-based association study
  • unrelated subjects
  • Family-based association study
  • nuclear families
  • Admixture mapping
  • recently admixed population
  • Linkage mapping
  • large pedigrees

6
Types of population association study
  • Candidate causative polymorphism
  • SNP (single nucleotide polymorphism), deletion,
    duplication
  • Candidate causative gene (5-50 marker SNPs)
  • evidence from linkage study or function
  • Candidate causative region (100s of marker SNPs)
  • evidence from linkage study
  • Genome-wide (gt300,000 marker SNPs)
  • no prior evidence required

7
Common disease common variant (CDCV) hypothesis
8
Preliminary analysis data quality
  • Assuming mating is random and the population is
    large, HWE genotype frequencies will apply
  • Allele frequencies
  • P(X) p
  • P(x) q
  • HWE genotype frequencies
  • P(XX) p2
  • P(Xx) 2pq
  • P(xx) q2
  • Useful data quality check
  • chi-squared or exact test
  • log QQ plot
  • But can discard causative mutations

p q
p p2 pq
q pq q2
9
Log QQ plot
10
Preliminary analysis dealing with missing data
  • Imputation
  • various methods maximum likelihood probalistic
    hot-deck regression modelling
  • test for independence of missingness and
    case-control status

11
Choice of inheritance model
12
Choice of inheritance model
13
Choice of inheritance model
14
Tests of association single SNP
  • Case-control
  • Treat genotype as factor with 3 levels, perform
    2x3 goodness-of-fit test. Loses power if effect
    is additive
  • Count alleles rather than individuals, perform
    2x2 goodness-of-fit test. Out of favour because
  • sensitive to deviation from HWE
  • risk estimates not interpretable

Major allele homozygote (0) Heterozygote (1) Minor allele homozygote (2)
Case
Control
15
Tests of association single SNP
  • Case-control
  • Cochran-Armitage test
  • loses power if additivity assumption wrong

Cochran-Armitage test
16
Tests of association single SNP
  • Case-control
  • Armitage or goodness-of-fit? Depends on
  • Prior knowledge of inheritance (additive,
    dominant, etc)
  • Genotype frequencies, e.g. use Armitage test when
    minor allele is rare, goodness-of-fit test
    otherwise

17
Tests of association single SNP
  • Case-control
  • Logistic regression
  • Easily incorporates inheritance model (additive,
    dominant, etc)
  • But assumes phenotype is outcome variable not
    genotype, so easier to justify for prospective
    studies

18
Tests of association single SNP
  • Continuous outcome
  • Linear regression
  • Ordered categorical outcomes
  • Multinomial regression

19
Problems population stratification
20
Correcting for population stratification
  • Genomic control
  • Genotype null SNPs and use to calculate
    background inflation in test statistic due to
    population stratification
  • Limited to simple single-SNP analyses
  • Can over- or under-correct
  • Other approaches using null SNPs
  • Regression, principal components analysis, model
    underlying demography

21
Problems multiple testing
  • Bonferroni correction
  • conservative when SNPs are linked
  • Permutation
  • computationally demanding
  • False discovery rate
  • Bayesian approaches

22
Tests of association multiple SNPs
  • Advantages
  • Many SNPs may be linked to a gene, but
    individually may not have a significant effect
  • Interactions between SNPs can be modelled
  • Tag SNPs can reduce testing of redundant linked
    SNPs
  • Methods
  • Linear regression, logistic regression
  • Armitage test
  • Haplotype-based methods
  • Natural interpretation
  • But power reduced due to multiple alleles

23
Haplotypes
Nature Genetics  37, 915 - 916 (2005)
24
(No Transcript)
25
Inferring haplotype phase


26
Inferring haplotype phase






?
27
Inferring haplotype phase






28
Inferring haplotype phase






29
Inferring haplotype phase
  • Methods software
  • PHASE, FASTPHASE
  • EH
  • FBAT
  • HAPLOTYPER
  • EM-DECODER
  • PLEM
  • HAP
  • HAPLORE
  • Haplo.stat
  • SNPEM
  • PEDPHASE
  • SNPHAP
  • TDTHAP

30
Inferring haplotype phase
  • Phase cases and controls separately or pooled?
  • Separating can give inflated type I error
  • Pooling can reduce power
Write a Comment
User Comments (0)
About PowerShow.com