Advances in Populationbased Studies of Complex Genetic Disorders - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Advances in Populationbased Studies of Complex Genetic Disorders

Description:

Advances in Population-based Studies of Complex Genetic ... Good genealogical records. Easier to standardise phenotype definitions. Wider intervals of LD ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 39
Provided by: gareth49
Category:

less

Transcript and Presenter's Notes

Title: Advances in Populationbased Studies of Complex Genetic Disorders


1
Advances in Population-based Studies of Complex
Genetic Disorders
  • n i h e s Programme
  • March 31 April 4, 2003
  • Erasmus MC
  • Rotterdam

Faculty Yurri Aulchenko, David Clayton, Cornelia
van Duijn, Susan Service
2
Programme Overview
  • Day 1 Basic Principles of Association
  • Day 2 Population-based Studies
  • Day 3 Family-based Studies
  • Day 4 Linkage Disequilibrium and
    Haplotyping
  • Day 5 Isolated Populations

3
Day 1 Basic Principles of Association
  • Hardy-Weinberg Equilibrium
  • Measures of Association
  • Study Designs Case Control

4
Hardy-Weinberg Equilibrium
  • In a population, allele and genotype frequencies
    will remain constant over generations

5
HWE
  • Can compare allele frequencies between cases and
    controls under HWE only
  • If not in HWE, then parental allele transmissions
    have not been independent / uncorrelated
  • Therefore, classical statistics cant be applied

6
Measures of Association
  • In epidemiology, associations between disease and
    aetiological factors are usually expressed in
    terms of relative risk measures
  • In the simplest case
  • measure of disease risk in exposed subjects /
    same measure of risk in unexposed subjects
  • Relative risks may be defined for genotypes,
    alleles or haplotypes

7
Genotype Relative Risks
  • For a biallelic locus with alleles A, a, there
    are three genotypes AA, Aa, aa
  • We usually take one of these, e.g. aa, as
    reference GRRAA Risk for AA / Risk for aa
  • GRRAa Risk for Aa/ Risk for aa
  • No standard in which should be taken as
    reference, but usually the commonest
  • CI easier to interpret when the reference is
    common

8
Allelic Relative Risks
  • Allelic relative risks, fA and fa are defined by
    the multiplicative model
  • One allele, e.g. a, is taken as reference so that
    fa1
  • GRRAA(fA)2 and GRRAafA
  • Assume HWE each subjects 2 chromosomes are
    sampled independently from the population

9
Study Designs Case Control
  • Healthy controls vs. Population controls
  • Healthy controls matched for age, gender etc
  • more power
  • Population controls randomly drawn
  • cost-effective
  • Usefulness of controls drops after 4 controls to
    1 case (max)

10
Testing H0 against alternative models
  • Multiplicative model allele-wise comparison
  • Dominant model carriers vs. non-carriers
  • Recessive model homozygotes vs. rest
  • Must have reason to think the effect is dominant
    / recessive
  • If we dont know the best compromise is the
    multiplicative model
  • If all 3 tests are carried out, must correct for
    multiple, non-independent testing randomise cc
    status and repeat permutations on 3 tests

11
Day 2 Population-based Studies
  • Multiple Comparison Problems
  • Confounding and Stratification
  • Study Design Matching

12
Possible Outcomes of a Statistical Test
aprobability of false ve, bprobability of
false ve 1-bpower of the test
13
Correcting for Multiple Testing
  • Traditional solution the Bonferroni Method
  • for k tests, reject H0 at a/k level for each
    test
  • Controls the probability to falsely reject at
    least 1 H0
  • Overly conservative for large k and / or
    dependent tests diminished power
  • Appropriate when expecting 0 or 1 H0 to be false

14
New Paradigm the False Discovery Rate
  • Controls the rate of false positives
  • More appropriate when expecting several H0 to be
    false / when tests are correlated
  • More liberal than traditional methods
  • Greatly increases power
  • Order p values from n tests smallest to largest
  • FDR threshold is stringent for 1st test, but gets
    less stringent as number of tests is reduced

15
Confounding and Stratification
  • Spurious associations can be due to confounding
    by population stratification
  • Can avoid difficulties by analysing within strata
  • Stratified analysis
  • - loss of power due to little data in each
    stratum
  • - useful when different strata are associated
    with different alleles / inverse effect
    of same allele

16
Different approach
  • Assume that the same effect exists across strata
  • Sum contributions from each stratum
  • Can use logistic regression

17
Study Design Matching
  • Matching maintain the same ratio of controls to
    cases in every stratum
  • Also better sampling of controls
  • Individually matched studies each case has
    his/her own set of controls, defining a stratum
    conditional logistic regression must be used
  • Overmatching matching for a variable which,
    while not a confounder, is related to the factor
    of interest reduction of effective sample size

18
Unobserved Stratification
  • Random differences in allele frequencies between
    strata
  • Two ways of tackling this have been proposed
  • Estimate unobserved stratification empirically
    Devlin and Roeders genome-wide control
  • Unobserved stratification generates deviation
    from HWE and apparent LD between distant markers.
    Thus latent stratification can be modelled

19
Day 3 Family-based Studies
  • The Transmission Disequilibrium Test
  • Parental Origin
  • Quantitative Traits

20
Transmission Disequilibrium Test
  • The use of family-based controls overcomes the
    effects of unmeasured population stratification
  • Case-parent trios conditioning on parental
    genotypes the TDT

i/j 1/3 each with probability 0.25
1/4 2/3 2/4
21
Reconstructing Missing Parental Genotypes
1/2
?/?
1/2
?/?
1/3
1/2
1/2
?/?
1/1
22
Two Unlinked Loci
  • For 2 unlinked loci, there are 16 transmission
    patterns, which are equiprobable in the
    population
  • Can compare each case with 15 pseudocontrols
  • Can look for GxG interactions using conditional
    logistic regression
  • Less efficient than the case-only method, but
    resistant to population stratification

23
Parental Origin
  • An important aspect of the TDT is the ability to
    differentiate allelic effects based on parental
    origin
  • The following intercross triads excluded from
    analysis
  • Several methods TAT, PAT, CPG, CEPG

1/2
1/2
1/2
24
Quantitative Traits
  • The weighted TDT (implemented in FBAT) a
    conditional logistic model
  • Genotype effects appear as interactions with
    (y-m), where ytrait value of offspring and
    mpopulation mean for trait
  • QTDT robust to stratification / admixture
  • Regression of trait value y on genotype score g
  • Parent-of-origin effect also implemented in
    software

25
Day 4 LD and Haplotyping
  • Measures of LD
  • LD Problems the Bias
  • Estimating Haplotype Frequencies
  • Haplotype Blocks

26
Measures of LD
  • p11frequency of 11 haplotype, etc
  • Most LD measures are based on the covariance
    D p11 p22 - p21 p12

27
Lewontins D
  • D D / Dmax, where
  • Dmax min(p1.p.1, p2.p.2) if Dlt0
  • Dmax min(p1.p.2 , p2.p.1) if D0

28
LD Correlation Coefficient r2
  • D / vp1.p2.p.1p.2 is a measure of correlation
  • r2 D2 / p1.p2.p.1p.2
  • the square of the correlation between marker
    alleles
  • c2 for the 2x2 table is c2 ND2 / p1.p2.p.1p.2
  • with 1 df, where N sample size
  • p value for the significance of LD between the
    markers
  • r2 is well-related with p value from c2

29
LD Problems the Bias
  • D is biased upwards with smaller sample size
  • Correct the problem by
  • - using different measures of LD, e.g. r2
  • - use bootstrap Dboo
  • - use permutation Dadj

30
Estimating Haplotype Frequencies
  • In the absence of family data
  • Expectation-Maximization algorithm
  • Markov chain Monte Carlo algorithm

31
EM algorithm
  • Maximum likelihood technique
  • Goal find haplotype frequency that maximises
    probability of observed genotypes
  • Assumption HWE at all loci
  • Limitation EM estimate may not be global optimum
  • - use different starting conditions to avoid
    convergence to local maximum

32
Markov chain Monte Carlo algorithm
  • Another approximation method
  • Uses sampling to estimate expectations
  • Operates on one persons haplotype resolution at
    a time
  • MCMC can handle larger problems than EM
  • MCMC provides estimates of uncertainty on
    phase-unknown calls
  • Monitoring convergence on MCMC can be hard

33
Haplotype Blocks
  • n SNPs 2n possible haplotypes
  • Regions of extended haplotype conservation
  • Definition by minimising haplotype diversity
  • by identifying regions with low recombination
    rate (D)
  • using common SNPs?
  • excluding low frequency haplotypes?

Different methods/ thresholds result in different
block structure
34
HB use in Association Mapping Possible Strategy
  • Genotype a subset of samples for all SNPs
  • Define HBs and htSNPs
  • Genotype entire sample for htSNPs
  • Investigate association
  • Will reduce cost
  • Will facilitate haplotype approaches
  • May not be common to different populations
  • Is information lost? Simulation studies

35
Day 5 Isolated Populations
  • Advantages of Population Isolates
  • Disadvantages of Population Isolates
  • Ancestral Haplotype Reconstruction

36
Advantages of Population Isolates
  • Higher prevalence of some diseases
  • More inbreeding
  • More uniform genetic, environmental and cultural
    background
  • Good genealogical records
  • Easier to standardise phenotype definitions
  • Wider intervals of LD
  • Closer to HWE

37
Disadvantages of Population Isolates
  • Possibly fewer affected individuals
  • Difficult to replicate studies
  • Markers not polymorphic
  • Genes mapped less important to rest of humanity

38
Ancestral Haplotype Reconstruction
  • A LD mapping method for samples from population
    isolates
  • Quantifies chromosome sharing among individuals
    affected with a common phenotype
  • Equipped to deal with aetiologic heterogeneity
Write a Comment
User Comments (0)
About PowerShow.com