Molecular - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Molecular

Description:

Molecular – PowerPoint PPT presentation

Number of Views:100
Avg rating:3.0/5.0
Slides: 74
Provided by: johnw6
Category:
Tags: ftv | molecular

less

Transcript and Presenter's Notes

Title: Molecular


1
Molecular Genetic Epi 217Association Studies
Direct
  • John Witte

2
Muddy Points?
  • Linkage results reported as a maximum lod score,
    for a given value of theta.
  • First indicates strength of linkage, second how
    far away from potentially causal marker.
  • Reading Cordell and Clayton.

3
Linkage vs. Association Studies
Rare variants Larger effects
Common variants Modest effects CD-CV hypothesis
ROCHE Genetic Education (www)
4
Association Studies
  • Use of association studies is rapidly expanding,
    reflecting a number of laudable properties,
    including their
  • Ease, since one need not collect large pedigrees
    and
  • Potential for being more powerful than
    conventional linkage-based approaches.

5
Linkage vs. Association
Risch Merikangas, Science 1996
6
Association Study Approaches
  • Candidate genes
  • Functional
  • All common variants
  • All common variants in genome (GWAS)
  • All SNPs in genome
  • Expensive

7
Direct and Indirect Association
Direct Association
Indirect Association
Ability to undertake indirect association depends
on the LD / correlation among measured and
unmeasured variants (i.e., tagging and
coverage).
8
Control Selection
  • A critical aspect of association studies is that
    controls should be selected from the cases
    source population.
  • That is, controls should be those individuals
    who, if they were diseased, would become cases.

9
Population Stratification
  • Confounding bias that may occur if ones sample
    is comprised of sub-populations with different
  • allele frequencies (?) and
  • disease rates (RpR)
  • Cases are more likely than controls to arise from
    the sub-population with the higher baseline
    disease rate.
  • Cases and controls will have different allele
    frequencies regardless of whether the locus is
    causal.

10
Example of Population Stratification
Cardon Palmer, 2003
11
Bias Due to Popln. Stratification
Witte et al. Am J Epidemiol 1999
12
Family-Based Association Studies
Siblings
Parents
G
G
G
G
G
G
Cousins
G
G
13
Transmission Disequilibrium Test (TDT)
  • Transmitted alleles vs. non-transmitted alleles

M1 M2
M2 M2
M1 M2
14
TDT
  • Transmitted alleles vs. non-transmitted alleles

TDT (n12 - n21)2 (n12 n21)
Asymptotically c2 with 1 degree of freedom
15
TDT
  • For this one Trio

TDT (1 - 0)2 (1 0)
p-value 0.32
1
16
Comparison of Designs
  • Family-based designs can be less efficient than
    population-based designs.

Rare Recessive
Common
Rare Dominant
High Risk
Low Risk
High Risk
Population-based
100
100
100
Case-sibling
69
51
50
Case-cousin
97
88
88
TDT
231
102
101
Witte et al. Am J Epidemiol 1999
  • Further, family-based designs can be require
    more recruitment efforts.
  • How about extending the designs to include
    unrelateds?

17
Genomic Control
  • Use population-based design, but incorporate into
    analysis genomic information to adjust for
    population stratification.
  • Genomic control adjust test statistics for
    outliers due to population stratification.
  • Use unlinked genetic markers.

18
Genomic Control
  • For the gene(s) of interest, alter the test
    statistic(s) from case-control comparison
  • ?2new ?2/?
  • where
  • ? mean(?21,, ?2k)
  • or
  • ? median(?21,, ?2k)/0.456
  • 1,k index the ?2 tests for the unlinked
    markers.
  • (Devlin Roeder, 1999 Reich Goldstein, 2000)
  • That is, one decreases the test statistic by a
    factor (?) that reflects stratification in the
    population.

19
Continuum of Assoc Study Designs
Population-based
Ethnicity Matched
Structured Assoc
Family-based
Population Stratification
Overmatching
(Biasversus...efficiency)
  • ? Sharing of genes envt.
  • Efficiency
  • Also, recruitment issues

20
Candidate Gene Studies
  • Selection of candidates Linkage regions?
    Biological support?I am interested in a
    candidate gene and have samples ready to study.
    What SNPs do I genotype?

21
Candidate Gene Where do I Start?
  • Location
  • What chromosome? What position on the chr?
  • Exons/UTR
  • How many exons? UTR regions?
  • Size
  • How large is the gene?

22
Candidate Gene Example MTHFRthanks to I. Cheng
  • UCSC Genome Browser
  • http//genome.ucsc.edu/cgi-bin/hgGateway

23
Candidate Gene Example MTHFR
3
5
24
SNP Picking Things to Consider
  • Validation What is the quality of the SNPs?
  • Informativity Are these SNPs informative in my
    population? How common are they? Location?
  • Potentially Functional Do these SNPs have a
    potential biological impact? Missense variants?
  • Previously Associated Have previous studies
    found SNPs in the candidate gene associated with
    the outcome?

25
SNP Picking Database Resources
  • Validation dbSNP
  • http//www.ncbi.nlm.nih.gov/projects/SNP/
  • Informative dbSNP
  • http//www.ncbi.nlm.nih.gov/projects/SNP/
  • Potentially Functional dbSNP http//www.ncbi.nlm
    .nih.gov/projects/SNP/
  • Previously Associated PubMed/OMIM
  • http//www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB
    pubmed
  • http//0-www.ncbi.nlm.nih.gov.library
    .vu.edu.au/entrez/query.fcgi?dbOMIM

26
SNP Picking Other Resources
  • UCSC Genome Browser
  • http//genome.ucsc.edu/
  • SNPper
  • http//snpper.chip.org/
  • Seattle SNPs http//pga.gs.washington.edu/
  • HapMap
  • http//www.hapmap.org/

27
SNP Picking Validation
28
SNP Picking Validation
29
SNP Picking Validation
30
SNP Picking Informative
31
SNP Picking Potentially Functional
C677T
32
SNP Picking Previously Associated
33
MTHFR Summary
  • Chromosome 1 11,780,053-11,800,381
  • Size 20,329 bp
  • Exons 12
  • Potentially Functional
  • 5 missense of which 3 MAF 5
  • Previously Associated
  • 3 (C677T, A1298C, A2756G)

34
MTHFR SNPs
http//genome.ucsc.edu/cgi-bin/hgGateway
35
Analysis
Simple chi-square test comparing genotype
frequencies (2 d.f.) Called a model-free or
co-dominant analysis
36
Genetic Model
ORs depend on genetic model R r 1 not risk
allele R r 1 recessive R r 1 dominant R
r2 1 log additive (Assuming positive
association)
Genotype OR GG 1 GT r TT R
37
Tests of association
  • If genetic model known
  • Collapse genotypes into 2x2 table, 1 d.f. test
  • Trend test for log additive
  • (Use logistic regression)
  • Rarely know genetic model
  • Use all three models (dom, rec, log additive)
  • Compare fit with the co-dominant (2d.f.) model
    (LR test)
  • Cannot use LR test to compare models with each
    other as not nested
  • Model with best fit and smallest P is best?

38
(No Transcript)
39
Molecular Genetic Epi 217Association Studies
Indirect
  • John Witte

40
Linkage DisequilibriumLike Shuffling a Deck of
Cards
41
Too many MTHFR SNPsSolution Tag SNP Selection
  • SNPs are correlated (aka Linkage Disequilibrium)

Pairwise Tagging SNP 1 SNP 3 SNP 6 3 tags in
total Test for association SNP 1 SNP 3 SNP 6
Carlson et al. (2004) AJHG 74106
42
Tagging SNPs
Existence of haplotype blocks across genome
x x x x x x x
High LD ? some SNPs redundant ? tagging SNPs
recover majority of information
43
Coverage Measurement Error in TagSNPs
44
Common Measures of Coverage
  • Threshold Measures
  • e.g., 73 of SNPs in the complete set are in LD
    with at least one SNP in the genotyping set at r2
    0.8
  • Average Measures
  • e.g., Average maximum r2 0.84

45
Coverage and Sample Size
  • Sample size required for Direct Association, n
  • Sample size for Indirect Association
  • n n/ r2
  • For r2 0.8, increase is 25
  • For r2 0.5, increase is 100
  • But n (and power) not a simple function of
  • 1 / (threshold r2) or
  • 1 / (average maximum r2).

Jorgenson Witte, AJHG 2006
46
Tag SNPs Database Resources
http//www.hapmap.org
http//gvs.gs.washington.edu/GVS/index.jsp
47
The HapMap Project
  • Initial Goal
  • 600,000 SNPs for indirect association studies
  • LD information between SNPs
  • Phase 1 1 million SNPs
  • Phase 2 additional 2.9 million SNPs

48
HapMap
  • SNPs from dbSNP were genotyped
  • Looked for 1 every 5kb
  • SNP Validation
  • Polymorphic
  • Frequency
  • Linkage Disequilibrium Estimation
  • LD tagging SNPs

49
HapMap
  • 270 subjects
  • 45 Chinese
  • 45 Japanese
  • 90 Yoruban and 90 European-American
  • 30 Trios
  • 2 parents, 1 child

50
Tag SNPs HapMap
51
Tag SNPs HapMap
52
Tag SNPs HapMap Haploview
http//www.broad.mit.edu/mpg/haploview/
53
Tag SNPs HapMap Haploview
54
Tag SNPs HapMap Haploview
55
Tag SNPs HapMap Haploview
56
Tag SNPs HapMap Haploview
57
Tag SNPs HapMap Summary
  • Identified 33 common MTHR SNPs (MAF 5) among
    Caucasians
  • Forced in 3 potentially functional/previously
    associated SNPs
  • Identified tag based on pairwise tagging
  • 15 tags SNPs could capture all 33 MTHR SNPs
    (mean r2 97)
  • Note number of SNPs required varies from gene
    to gene and from population to population

58
Genome-wide Assocation Studies (GWAS)
59
One- and Two-Stage GWA Designs
Two-Stage Design
One-Stage Design
SNPs
SNPs
1,2,3,,M
1,2,3,,M
1,2,3,,N
1,2,3,,N
?samples
Stage 1
Samples
Samples
Stage 2
?markers
60
One-Stage Design
SNPs
Samples
Two-Stage Design
Replication-based analysis
Joint analysis
SNPs
SNPs
Samples
Stage 1
Stage 1
Samples
Stage 2
Stage 2
61
Multistage Designs
  • Joint analysis has more power than replication
  • p-value in Stage 1 must be liberal
  • Lower costdo not gain power
  • http//www.sph.umich.edu/csg/abecasis/CaTS/index.h
    tml

62
Complex diseases
Physical activity
Genetic susceptibility
Obesity
Hyperlipidemia
Diet
Diabetes
Complex diseases Many causes many causal
pathways!
Vulnerable plaques
Hypertension
MI
Atherosclerosis
63
  • Pathways
  • Many websites / companies provide dynamic
    graphic models of molecular and biochemical
    pathways.
  • Example BioCarta http//www.biocarta.com/
  • May be interested in potential joint and/or
    interaction effects of multiple genes in one
    pathway.

64
Interactions
  • The interdependent operation of two or more
    causes to produce or prevent an effect
  • Differences in the effects of one or more
    factors according to the level of the remaining
    factor(s)
  • Last, 2001

65
Why look for interactions?
  • Improve detection of genetic ( environmental)
    risks.
  • Understand etiology/biology
  • New hypotheses?
  • Diagnostics
  • Prevention and interventions

66
Dilution of effects
Gene A
OR1.5
67
Statistical vs. Biological Interactions
  • Not identical.
  • One hypothesizes biological interaction
  • But tests for statistical interaction
  • Does statistical evidence support our biological
    hypothesis?

68
Multiplicative vs. Additive Interactions
RER relative excess risk
69
Two possible causal pathways additive and
multiplicative interaction for colorectal cancer
If factors are not known to act independently,
use multiplicative.
Brennan, P. Carcinogenesis 2002 23381-387
70
Analysis of Multiple Genes
  • Joint / Additive
  • Multiplicative
  • Increasing complexity

71
More Complex Modeling
  • Multifactor-dimensionality reduction
  • (Moore Williams, Ann Med 2002)
  • Logic regression
  • (Kooperberg Ruczinski, Genetic Epi 2005)
  • Multi-loci analysis
  • (Marchini, Donnelly, Cardon, Nat Genet 2005)
  • Bayesian epistasis association mapping
  • (Zhang Liu, Nat Genet 2007)

72
Pathway Analysis
  • Wang et al. (AJHG 2007 in press)
  • Calculate SNP associations.
  • Assign each gene the min association p-value
    for typed genic SNPs.
  • Test if genes within particular pathways have
    disproportionate number of SNPs with high max
    p-values.
  • Such candidate genes high priority.

73
Incorporate Additional Information into Analysis?
  • Part of a known pathway?
  • Within linkage \ association regions?
  • Potentially functional?
  • Degree of conservation?
  • Tagging other SNPs?
  • Copy number polymorphism?
  • One can incorporate this information with a
    hierarchical model.
Write a Comment
User Comments (0)
About PowerShow.com