Title: BIOSTAT 830 winter 2006
1BIOSTAT 830winter 2006
2Outline
- Background.
- Comparison to linkage studies.
- Requirements and key components of WGA.
- Issues and cautions of using WGA.
- Case study of AMD.
3Different study designs
- The goal is to map the genes responsible for
common diseases. Two categories. - Candidate gene studies.
- Association,
- Resequencing.
- Genome-wide studies.
- Linkage mapping,
- Whole genome association study.
4Linkage studies
- Genome-wide, 1,000 microsatellite markers across
genome. - Very successful in mapping monogenic Mendelian
diseases. - hemochromatosis, cystic fibrosis, Fanconi
anemia - Using technique called positional cloning.
- Works well in locating highly penetrant variant
that causes rare diseases.
5Linkage study
- Linkage studies use individual families where
members are affected and attempt to demonstrate
linkage between the occurrence of the disease and
genetic markers (creates associations within
families, but not among unrelated people)
6Background
- Linkage or pedigree analysis mapped disease genes
to within 1cM at most. - Further refinement in location using family
studies is difficult because recombinations are
rarely observed even within the large pedigrees. - Boehnke, 1994.
- 1 Mb of DNA to search, daunting. need method that
is able to narrow it.
7Complex traits
- Phenotype is determined by the sum total of,
and/or interactions between multiple genetic and
environment factors. - Studying inheritance of traits that show no clear
Mendelian inheritance but cluster in families - Usually a mix of genetic and environmental
factors - Several models
- Single gene modified by environment
- Several genes each with significant contributions
(oligogenes) - Many genes each making a small contribution
(polygenes) - Last two- multifactorial inheritance
8Problems of complex diseases
- Low heritability of complex traits.
- Imprecise definition of phenotype.
- genetic heterogeneity
- Alleles at more than one locus can trigger a
specific disease - Microsatellite markers 10cM apart too sparse
- Reduced penetrance
- Individuals with predisposing genotype are
unaffected.
9Problems of complex diseases
- Phenocopy
- Disease is triggered by environmental factors in
the absence of a predisposing genotype - Gene-gene interaction.
- Gene-environment interaction.
- Inadequately powered study designs.
- Even if linkage study successful, candidate gene
follow up studies are often needed to locate the
causal locus. Linkage region typically 10cM (10
Mb).
10Association Studies
Cases
Controls
11Association studies
- Based on populations and attempt to show an
association between a particular allele and
susceptibility to disease (a statistical
statement about the co-occurrence of alleles or
phenotypes). - In simplest form, compare the frequency of
alleles or genotypes of a particular variant
between disease cases and controls.
12Association studies
- Allele A is associated with disease D if people
who have D also have A more (or less) often than
would be predicted from individual frequencies of
A and D in the population - eg. HLA-DR4 is found in 36 of UK population, but
in 78 of people with rheumatoid arthritis.
13Association tests
- Observe 3 by 2 table
- cases controls
- AA nAA mAA
- Aa nAa mAa
- aa naa maa
- Likelihood ratio test
- Comparing different models no constraint
dominant, recessive, multiplicative. - Score test.
-
14Other reasons for associations
- Direct causation
- Having allele A makes you susceptible to disease
D (increases the likelihood) - Expect to see same allele A associated with
disease in any population (bypasses common
ancestor) - Natural selection
- people with disease may have a competitive
advantage if they also carry allele A - These are unlikely if the associated DNA is a
variant in non-coding DNA.
15Candidate gene association study
- Rely on knowledge of the correct gene(s), based
on biological hypotheses, or located near the
linkage region. - Successes have been reported.
- Review papers Cardon et al. 2001 Tabor et al.
2002 Hirschhorn et al. 2002. - The main problem is this type of study is not
comprehensive.
16Genome-wide association study
- Survey most of the genome for causal genetic
variants, therefore no need to guess the identify
of the causal gene. Unbiased and fairly
comprehensive. - Future of genetic studies?
- http//www.ncbi.nlm.nih.gov/WGA/
17Requirements for WGA
- Knowledge about common genetic variation and the
ability to genotype a sufficiently comprehensive
set of variants in a large patient sample. - dbSNP contain gt10 million SNPs.
- Genotyping technology advanced and cost lowered.
- Now 0.01, feasible around 0.001 ? 500 for
500K genotypes. - Knowledge of LD patterns on a genome-wide scale.
- Provided by HapMap.
18Choices of markers
- LD-based markers.
- Indirect approach, one marker can serve as a
proxy for many others. - Algorithm for this purpose
- LDselect Greedy Carlson et al. 2004
- FESTA Exhaustive (almost) Qin et al. 2006
19Missense approach
- Focus on missense SNPs only.
- Aka, nonsynonymous mutations are types of point
mutations where a nucleotide is changed which
results in a different amino acid. This in turn
can render the resulting protein nonfunctional. - For example, in sickle-cell disease, the 17th
nucleotide of the gene for the beta chain of
hemoglobin fonund on chr 11 is erroneously
changed from the codon GAG (for glutamic acid) to
GTG (which codes valine), so the sixth amino acid
is incorrectly substituted. - http//en.wikipedia.org/wiki/Missense_mutation
20Arguments-for
- More likely to have functional consequences.
- High proportion of missense mutations underlie
Mendelian disorders. - However, ascertainment bias, and implicitly
biased in declaring association. - Only need to survey 30,000 60,000 SNPs genome
wide. - Botstein and Risch 2003.
21Arguments-against
- Alleles underlie complex traits typically have
modest impact, more likely to be non-coding
regulatory variants. - E.g., Thr17Ala coding polymorphism in CTLA4 is in
strong LD with a non-coding regulatory variant,
which is more strongly associated with autoimmune
disease. - Adding SNPs in evolutionarily conserved
non-coding regions, close to WGA.
22A convenience-based approach
- Select markers not based on LD or functional
considerations. But convenience or logistic
considerations. E.g., equal-spaced markers. - Genome coverage varies, mostly inadequate. Least
comprehensive is using 400-1000 markers used in
linkage studies. One such marker only covers 20
kb. - Pre-selected set of markers, e.g., Illumina 300K
and Affymetrix 500K.
23Study design for WGA
- Staged design for cost-efficiency and less false
positives. - Andrew will address these issues in his guest
lecture.
24Control for multiple testing
- Multiple testing is a serious issue in WGA.
- Due to large number of markers and LD between
them, independence assumption is strongly
violated. Bonferroni correction is punitively
conservative. - Karen will tell us about other strategies.
- A good alternative for defining significance
thresholds is permutation testing, which
empirically assessing the probability of having
observe equal or more extreme result by chance. - Computationally intensive.
25Permutation test
26Cost-efficient association designsuse founder
populations.
- Are those that have been recently derived100 or
fewer generations agofrom a limited pool of
individuals. Such as Finnish people,
French-Canadians and Ashkenazi Jews. - Because of higher extended LD, less markers are
needed to survey the entire genome. - Rare variants on long shared haplotypes.
- More isolated populations will be more
homogeneous and therefore might have the
advantage of a more consistent environment.
27Cost-efficient association designsuse pooled
samples
- Equal amount of DNA from multiple individuals are
mixed into a single well before genotyping. - Review paper Sham et al. Nat Rev Gen 2002.
- Reduce genotyping cost. Measure allele frequency,
no genotypes. - Require high-throughput, accurate determination
of allele frequencies. - Variants with pure recessive effects will be
difficult to identify, since genotype frequency
will be much more informative. - To study multiple traits, a different pool must
be made for each traits, which reduced
efficiency. - Gene-gene, gene-environmental interaction can not
be studied without genotype data.
28Population stratification
- Most noticeable source of systematic bias.
- The presence of multiple subgroups within a
population that differ in disease prevalence,
which leads to the over-representation of one or
more subgroups in cases. When allele frequencies
in different subgroups differ, false positive
associations can ensure. - Simpsons paradox.
29Examples
- first half second half whole season
- Player A 5/10 (0.400) 25/100 (0.250) 30/110
(0.264) - Player B 36/100 (0.360) 2/10 (0.200) 38/110
(0.336) - subpop A subpop B total
- Case
- Control
Observe the association at the population level,
but not at the subpopulation level.
30Correct for population stratification
- Detecting population substructure
- STRUCTURE Pritchard et al. 2000.
- Correcting for population stratification
- Genomic control Devlin and Roeder 1999.
- Mild stratification might exist in less admixed
population which can be a problem when testing
alleles with modest effects on diseases.
31Solutions
- Precise assessment of population stratification
is possible given the large number of markers
typed in WGA. - Match cases and controls based on their
genotypes. Will lose some power since match is
not likely to be one-on-one. - An alternative is to use family-based samples.
32Family based design
- Can control for the effects of shared
environment. - Allow combined analysis of linkage and
association, family based association tests. - family-based samples are difficult to collect.
- Sibship-based association studies are
underpowered relative to case-control studies. - Adding living parents may introduce an
age-of-onset bias towards younger patients when
study late-on-set diseases.
33Gene-gene interaction
- Epistasis. Unfortunately, due to massive number
of hypotheses, fully powered, unconstrained scans
for epistasis that account for multiple
hypothesis testing might not be possible in the
near future. - However, it is possible to detect the main
effects. Hence an effective scan for epistasis
involves searching for modest individual effects
and then query for interactions among them.
34Real casesAMD study
- Regarded the first real WGA.
- Klein et al. 2005 Zareparsi et al. 2005
- Age-related macular degeneration.
- 96 cases and 50 control, controls older than
cases. - genotyped by Affymetrix 100K chip.
35Klein et al. study
36Admixture mapping
- Admixture mapping is a method for localizing
disease causing genetic variants that differ in
frequency across populations. - most advantageous to apply this approach to
populations that have descended from a recent mix
of two ancestral groups that have been
geographically isolated for many tens of
thousands of years for example, African
Americans. - The approach assumes that near a disease causing
gene there will be enhanced ancestry from the
population that has greater risk of getting the
disease. Thus if one can calculate the ancestry
along the genome for an admixed sample set, one
could use that to identify disease causing gene
variants.
Zhu et al. 2005, Reich et al. 2005
37Homozygosity mapping
- Homozygosity mapping is a rapid means of mapping
autosomal recessive genes in consanguineous
families by identifying chromosomal regions that
show homozygous IBD segments in pooled samples.
38References
- Hirschhorn and Daly Nat Rev Gen 2005.
- Wang et al. Nat Rev Gen 2005.
39Acknowledgement