BIOSTAT 830 winter 2006 - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

BIOSTAT 830 winter 2006

Description:

... where members are affected and attempt to demonstrate linkage between the ... Based on populations and attempt to show an association between a particular ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 40
Provided by: sphU
Category:

less

Transcript and Presenter's Notes

Title: BIOSTAT 830 winter 2006


1
BIOSTAT 830winter 2006
  • Association studies

2
Outline
  • Background.
  • Comparison to linkage studies.
  • Requirements and key components of WGA.
  • Issues and cautions of using WGA.
  • Case study of AMD.

3
Different study designs
  • The goal is to map the genes responsible for
    common diseases. Two categories.
  • Candidate gene studies.
  • Association,
  • Resequencing.
  • Genome-wide studies.
  • Linkage mapping,
  • Whole genome association study.

4
Linkage studies
  • Genome-wide, 1,000 microsatellite markers across
    genome.
  • Very successful in mapping monogenic Mendelian
    diseases.
  • hemochromatosis, cystic fibrosis, Fanconi
    anemia
  • Using technique called positional cloning.
  • Works well in locating highly penetrant variant
    that causes rare diseases.

5
Linkage study
  • Linkage studies use individual families where
    members are affected and attempt to demonstrate
    linkage between the occurrence of the disease and
    genetic markers (creates associations within
    families, but not among unrelated people)

6
Background
  • Linkage or pedigree analysis mapped disease genes
    to within 1cM at most.
  • Further refinement in location using family
    studies is difficult because recombinations are
    rarely observed even within the large pedigrees.
  • Boehnke, 1994.
  • 1 Mb of DNA to search, daunting. need method that
    is able to narrow it.

7
Complex traits
  • Phenotype is determined by the sum total of,
    and/or interactions between multiple genetic and
    environment factors.
  • Studying inheritance of traits that show no clear
    Mendelian inheritance but cluster in families
  • Usually a mix of genetic and environmental
    factors
  • Several models
  • Single gene modified by environment
  • Several genes each with significant contributions
    (oligogenes)
  • Many genes each making a small contribution
    (polygenes)
  • Last two- multifactorial inheritance

8
Problems of complex diseases
  • Low heritability of complex traits.
  • Imprecise definition of phenotype.
  • genetic heterogeneity
  • Alleles at more than one locus can trigger a
    specific disease
  • Microsatellite markers 10cM apart too sparse
  • Reduced penetrance
  • Individuals with predisposing genotype are
    unaffected.

9
Problems of complex diseases
  • Phenocopy
  • Disease is triggered by environmental factors in
    the absence of a predisposing genotype
  • Gene-gene interaction.
  • Gene-environment interaction.
  • Inadequately powered study designs.
  • Even if linkage study successful, candidate gene
    follow up studies are often needed to locate the
    causal locus. Linkage region typically 10cM (10
    Mb).

10
Association Studies
Cases
Controls
11
Association studies
  • Based on populations and attempt to show an
    association between a particular allele and
    susceptibility to disease (a statistical
    statement about the co-occurrence of alleles or
    phenotypes).
  • In simplest form, compare the frequency of
    alleles or genotypes of a particular variant
    between disease cases and controls.

12
Association studies
  • Allele A is associated with disease D if people
    who have D also have A more (or less) often than
    would be predicted from individual frequencies of
    A and D in the population
  • eg. HLA-DR4 is found in 36 of UK population, but
    in 78 of people with rheumatoid arthritis.

13
Association tests
  • Observe 3 by 2 table
  • cases controls
  • AA nAA mAA
  • Aa nAa mAa
  • aa naa maa
  • Likelihood ratio test
  • Comparing different models no constraint
    dominant, recessive, multiplicative.
  • Score test.

14
Other reasons for associations
  • Direct causation
  • Having allele A makes you susceptible to disease
    D (increases the likelihood)
  • Expect to see same allele A associated with
    disease in any population (bypasses common
    ancestor)
  • Natural selection
  • people with disease may have a competitive
    advantage if they also carry allele A
  • These are unlikely if the associated DNA is a
    variant in non-coding DNA.

15
Candidate gene association study
  • Rely on knowledge of the correct gene(s), based
    on biological hypotheses, or located near the
    linkage region.
  • Successes have been reported.
  • Review papers Cardon et al. 2001 Tabor et al.
    2002 Hirschhorn et al. 2002.
  • The main problem is this type of study is not
    comprehensive.

16
Genome-wide association study
  • Survey most of the genome for causal genetic
    variants, therefore no need to guess the identify
    of the causal gene. Unbiased and fairly
    comprehensive.
  • Future of genetic studies?
  • http//www.ncbi.nlm.nih.gov/WGA/

17
Requirements for WGA
  • Knowledge about common genetic variation and the
    ability to genotype a sufficiently comprehensive
    set of variants in a large patient sample.
  • dbSNP contain gt10 million SNPs.
  • Genotyping technology advanced and cost lowered.
  • Now 0.01, feasible around 0.001 ? 500 for
    500K genotypes.
  • Knowledge of LD patterns on a genome-wide scale.
  • Provided by HapMap.

18
Choices of markers
  • LD-based markers.
  • Indirect approach, one marker can serve as a
    proxy for many others.
  • Algorithm for this purpose
  • LDselect Greedy Carlson et al. 2004
  • FESTA Exhaustive (almost) Qin et al. 2006

19
Missense approach
  • Focus on missense SNPs only.
  • Aka, nonsynonymous mutations are types of point
    mutations where a nucleotide is changed which
    results in a different amino acid. This in turn
    can render the resulting protein nonfunctional.
  • For example, in sickle-cell disease, the 17th
    nucleotide of the gene for the beta chain of
    hemoglobin fonund on chr 11 is erroneously
    changed from the codon GAG (for glutamic acid) to
    GTG (which codes valine), so the sixth amino acid
    is incorrectly substituted.
  • http//en.wikipedia.org/wiki/Missense_mutation

20
Arguments-for
  • More likely to have functional consequences.
  • High proportion of missense mutations underlie
    Mendelian disorders.
  • However, ascertainment bias, and implicitly
    biased in declaring association.
  • Only need to survey 30,000 60,000 SNPs genome
    wide.
  • Botstein and Risch 2003.

21
Arguments-against
  • Alleles underlie complex traits typically have
    modest impact, more likely to be non-coding
    regulatory variants.
  • E.g., Thr17Ala coding polymorphism in CTLA4 is in
    strong LD with a non-coding regulatory variant,
    which is more strongly associated with autoimmune
    disease.
  • Adding SNPs in evolutionarily conserved
    non-coding regions, close to WGA.

22
A convenience-based approach
  • Select markers not based on LD or functional
    considerations. But convenience or logistic
    considerations. E.g., equal-spaced markers.
  • Genome coverage varies, mostly inadequate. Least
    comprehensive is using 400-1000 markers used in
    linkage studies. One such marker only covers 20
    kb.
  • Pre-selected set of markers, e.g., Illumina 300K
    and Affymetrix 500K.

23
Study design for WGA
  • Staged design for cost-efficiency and less false
    positives.
  • Andrew will address these issues in his guest
    lecture.

24
Control for multiple testing
  • Multiple testing is a serious issue in WGA.
  • Due to large number of markers and LD between
    them, independence assumption is strongly
    violated. Bonferroni correction is punitively
    conservative.
  • Karen will tell us about other strategies.
  • A good alternative for defining significance
    thresholds is permutation testing, which
    empirically assessing the probability of having
    observe equal or more extreme result by chance.
  • Computationally intensive.

25
Permutation test
26
Cost-efficient association designsuse founder
populations.
  • Are those that have been recently derived100 or
    fewer generations agofrom a limited pool of
    individuals. Such as Finnish people,
    French-Canadians and Ashkenazi Jews.
  • Because of higher extended LD, less markers are
    needed to survey the entire genome.
  • Rare variants on long shared haplotypes.
  • More isolated populations will be more
    homogeneous and therefore might have the
    advantage of a more consistent environment.

27
Cost-efficient association designsuse pooled
samples
  • Equal amount of DNA from multiple individuals are
    mixed into a single well before genotyping.
  • Review paper Sham et al. Nat Rev Gen 2002.
  • Reduce genotyping cost. Measure allele frequency,
    no genotypes.
  • Require high-throughput, accurate determination
    of allele frequencies.
  • Variants with pure recessive effects will be
    difficult to identify, since genotype frequency
    will be much more informative.
  • To study multiple traits, a different pool must
    be made for each traits, which reduced
    efficiency.
  • Gene-gene, gene-environmental interaction can not
    be studied without genotype data.

28
Population stratification
  • Most noticeable source of systematic bias.
  • The presence of multiple subgroups within a
    population that differ in disease prevalence,
    which leads to the over-representation of one or
    more subgroups in cases. When allele frequencies
    in different subgroups differ, false positive
    associations can ensure.
  • Simpsons paradox.

29
Examples
  • first half second half whole season
  • Player A 5/10 (0.400) 25/100 (0.250) 30/110
    (0.264)
  • Player B 36/100 (0.360) 2/10 (0.200) 38/110
    (0.336)
  • subpop A subpop B total
  • Case
  • Control

Observe the association at the population level,
but not at the subpopulation level.
30
Correct for population stratification
  • Detecting population substructure
  • STRUCTURE Pritchard et al. 2000.
  • Correcting for population stratification
  • Genomic control Devlin and Roeder 1999.
  • Mild stratification might exist in less admixed
    population which can be a problem when testing
    alleles with modest effects on diseases.

31
Solutions
  • Precise assessment of population stratification
    is possible given the large number of markers
    typed in WGA.
  • Match cases and controls based on their
    genotypes. Will lose some power since match is
    not likely to be one-on-one.
  • An alternative is to use family-based samples.

32
Family based design
  • Can control for the effects of shared
    environment.
  • Allow combined analysis of linkage and
    association, family based association tests.
  • family-based samples are difficult to collect.
  • Sibship-based association studies are
    underpowered relative to case-control studies.
  • Adding living parents may introduce an
    age-of-onset bias towards younger patients when
    study late-on-set diseases.

33
Gene-gene interaction
  • Epistasis. Unfortunately, due to massive number
    of hypotheses, fully powered, unconstrained scans
    for epistasis that account for multiple
    hypothesis testing might not be possible in the
    near future.
  • However, it is possible to detect the main
    effects. Hence an effective scan for epistasis
    involves searching for modest individual effects
    and then query for interactions among them.

34
Real casesAMD study
  • Regarded the first real WGA.
  • Klein et al. 2005 Zareparsi et al. 2005
  • Age-related macular degeneration.
  • 96 cases and 50 control, controls older than
    cases.
  • genotyped by Affymetrix 100K chip.

35
Klein et al. study
36
Admixture mapping
  • Admixture mapping is a method for localizing
    disease causing genetic variants that differ in
    frequency across populations.
  • most advantageous to apply this approach to
    populations that have descended from a recent mix
    of two ancestral groups that have been
    geographically isolated for many tens of
    thousands of years for example, African
    Americans.
  • The approach assumes that near a disease causing
    gene there will be enhanced ancestry from the
    population that has greater risk of getting the
    disease. Thus if one can calculate the ancestry
    along the genome for an admixed sample set, one
    could use that to identify disease causing gene
    variants.

Zhu et al. 2005, Reich et al. 2005
37
Homozygosity mapping
  • Homozygosity mapping is a rapid means of mapping
    autosomal recessive genes in consanguineous
    families by identifying chromosomal regions that
    show homozygous IBD segments in pooled samples.

38
References
  • Hirschhorn and Daly Nat Rev Gen 2005.
  • Wang et al. Nat Rev Gen 2005.

39
Acknowledgement
  • Terry Speed
Write a Comment
User Comments (0)
About PowerShow.com