Title: Association analysis
1Association analysis
- Shaun Purcell
- Boulder Twin Workshop 2004
2Overview
- Candidate gene association
- Haplotypes and linkage disequilibrium
- Linkage and association
- Family-based association
3What is association?
- Categorical traits
- disease susceptibility genes
- Continuous traits
- quantitative trait loci, QTL
4Disease traits
Is there a difference in allele/genotype
frequency between cases and controls?
- Case Control
- AA n1 n2
- Aa n3 n4
- aa n5 n6
5Disease traits
Is there a difference in allele/genotype
frequency between cases and controls?
- Case Control
- AA 30 25 p2
- Aa 50 50 2p(1-p)
- aa 20 25 (1-p)2
, p-value
Test for independence
6Disease traits
General model
Additive model
Dominant model for A
1 df
1 df
2 df
Effect sizes calculated as odds ratios
7Relative risk
- D D-
- E a b
- E- c d
- Risk in E a / ( a b )
- Risk in E- c / ( c d )
- Relative risk of exposure (a /( a b )) / (c
/(c d )) -
8Odds ratio
- D D-
- E a b
- E- c d
- Odds in D a/c
- Odds in D- b/d
- Odds ratio (a/c) / (b/d)
-
9Quantitative traits
ID Y G A D 001 0.34 aa -1 0 002 1.23 Aa 0 1 003 1
.66 Aa 0 1 004 2.74 AA 1 0 005 1.33 AA 1 0
Y aA dD e
10Some web resources
- BGIM
- http//statgen.iop.kcl.ac.uk/bgim/
- Introductory tutorials on twin analysis, primer
on maximum likelihood, Mx language. - GxE moderator models
- http//statgen.iop.kcl.ac.uk/gxe/
- Power calculation
- http//statgen.iop.kcl.ac.uk/gpc/
-
- Case/control association tools
- http//statgen.iop.kcl.ac.uk/gpc/model/
-
11(No Transcript)
12Relative risk
P(DAA) / P(Daa) labelled RR(AA) P(DAa) /
P(Daa) labelled RR(Aa)
13Genetic models
14Tests
15Multiple samples
- Constrain frequencies across samples
- Constrain effects across samples
- Can test genetic models with effects and/or
frequencies constrained to be equal - Can perform tests of homogeneity of effects
and/or frequencies across samples
16An example2 case/control samples
17(No Transcript)
18- Homogeneous effects across samples
- Homogeneous allele frequencies across samples
- Model p RR(Aa) RR(AA) -2LL
- ----- - ------ ------ ----
- Gen 0.367 1.979 3.663
- 0.367 1.979 3.663 793.143
- Mult 0.367 1.911 3.651
- 0.367 1.911 3.651 793.199
- Dom 0.401 1.990 1.990
- 0.401 1.990 1.990 802.927
- Rec 0.405 1.000 1.921
- 0.405 1.000 1.921 805.064
- None 0.442 1.000 1.000
- 0.442 1.000 1.000 815.628
19- Heterogeneous effects across samples
- Homogeneous allele frequencies across samples
- Model p RR(Aa) RR(AA) -2LL
- ----- - ------ ------ ----
- Gen 0.367 1.235 2.136
- 0.367 2.890 5.547 786.498
- Mult 0.367 1.440 2.073
- 0.367 2.282 5.208 788.262
- Dom 0.401 1.216 1.216
- 0.401 2.936 2.936 796.422
- Rec 0.405 1.000 1.519
- 0.405 1.000 2.195 803.849
- None 0.443 1.000 1.000
- 0.443 1.000 1.000 815.628
20- TESTS OF GENETIC MODELS -- ASSUMING EQ EFFECTS
EQ FREQS
- Gen vs None (2 df) 22.485 p 0.000
- Mult vs None (1 df) 22.429 p 0.000
- Dom vs None (1 df) 12.701 p 0.000
- Rec vs None (1 df) 10.564 p 0.001
- Gen vs Mult (1 df) 0.056 p 0.813
- Gen vs Dom (1 df) 9.784 p 0.002
- Gen vs Rec (1 df) 11.921 p 0.001
- TESTS OF GENETIC MODELS -- ASSUMING UNEQ EFFECTS
EQ FREQS
- Gen vs None (4 df) 29.130 p 0.000
- Mult vs None (2 df) 27.366 p 0.000
- Dom vs None (2 df) 19.205 p 0.000
- Rec vs None (2 df) 11.779 p 0.003
- Gen vs Mult (2 df) 1.764 p 0.414
21Indirect association
Genotyped markers
QTL
Ungenotyped markers
22Recombination
Homologous chromosomes in one parent
Paternal chromosome
Maternal chromosome
Recombination event during meiosis
Recombinant gamete transmitted, harboring mutation
23Recombination
Homologous chromosomes in one parent
Paternal chromosome
Maternal chromosome
No recombination event during meiosis
Nonrecombinant gamete transmitted, not harboring
mutation
24Linkage affected sib pairs
Paternal chromosome
Maternal chromosome
First affected offspring, no recombination
Second affected offspring, recombinant gamete
IBD sharing from this one parent (0 or 1)
1
0
25Association analysis
- Mutation occurs on a red chromosome
26Association analysis
- Mutation occurs on a red chromosome
27Association analysis
- Association due to linkage disequilibrium
28Haplotypes
- A a
- M AM aM
- m Am am
- This individual has aa and Mm genotypes
- and am and aM haplotypes
-
a
m
M
a
29Haplotypes
- A a
- M AM aM
- m Am am
- This individual has Aa and Mm genotypes
- and AM and am haplotypes
- but given only genotype data,
- consistent with Am/aM as well as AM/am
a
m
A
M
30Haplotypes
- A a
- M AM aM
- m Am am
- This individual has AA and Mm genotypes
- and AM and Am haplotypes
-
A
m
A
M
31Equilibrium haplotype frequencies
- A a
- M pr ps p
- m qr qs q
- r s
32Linkage disequilibrium
- A a
- M pr D ps - D p
- m qr - D qs D q
- r s
- DMAX Min(qs, pr)
- D D /DMAX
- r2 D / pqrs
33Haplotype analysis
- Estimate haplotypes from genotypes
- Associate haplotypes with trait
- Haplotype Freq. Odds Ratio
- AAGG 40 1.00
- AAGT 30 2.21
- CGCG 25 1.07
- AGCT 5 0.92
- baseline, fixed to 1.00
34(No Transcript)
35Linkage Association
Trait
aa
Aa
AA
QTL genotype
36Variance Components
- Means
- M1 M2
-
- Variance-covariance matrix
- V1 C21
- C12 V2
ASSOCIATION
LINKAGE
37Variance Components
- Means
- M1 bG1 M2 bG2
-
- Variance-covariance matrix
- V1 C21 q(?-½)
- C12 q(?-½) V2
ASSOCIATION b regression coef. G individuals
genotype
LINKAGE q regression coef. ? IBD sharing
0 , ½ , 1
38Components of a Genetic Theory
- POPULATION MODEL
- Allele genotype frequencies
- Demographics population history
- Linkage disequilibrium, haplotype structure
- TRANSMISSION MODEL
- Mendelian segregation
- Identity by descent genetic relatedness
- PHENOTYPE MODEL
- Biometrical model of quantitative traits
- Additive dominance components
39Linkage without association
3/5
2/6
3/5
2/6
3/2
3/6
5/2
5/6
Both families are linked with the marker but
a different allele is involved.
40Linkage and association
3/6
2/4
3/5
2/6
4/6
2/6
3/2
3/6
6/2
5/6
6/6
6/6
All families are linked with the marker and
allele 6 is associated with disease
Linkage is just association within families
41Association without linkage
Controls
Cases
6/6
6/2
3/5
3/4
3/6
5/6
2/4
3/2
3/6
2/2
4/6
2/6
2/5
5/2
Allele 6 is more common in the GREEN
population The disease is more common in the
GREEN population a spurious association
42TDT
- Transmission disequilibrium test
- test for linkage and association
aa
Aa
AA
AA
AA
Aa
Aa
AA
Aa
Aa
Aa
AA
43TDT A disease allele
- AA x Aa AA x Aa aa x Aa aa x Aa
- AA Aa
Aa aa - -
- - 0.5 0.5
- - -
0.5 0.5
Additive
Dominant
Recessive
44Between and within components
Sib1
Sib2
45Between and within components
Note W S1 B
46Parental genotypes
- Use parental genotypes to generate B
- Examples
- AA from AAxAA W 0
- Aa from AAxAa W -0.5
- Aa from AaxAa W 0
47assoc.mx
- Sibling pair sample
- B and W components precalculated in input file
- Single SNP genotype
- Quantitative trait
48assoc.dat
s1 s2
g1 g2 b
w1 w2
- -0.007 -0.972 -1 0 -0.5
-0.5 0.5 - -0.829 -0.196 1 1 1
0 0 - 0.369 0.645 1 1 1
0 0 - 0.318 1.55 0 1 0.5
-0.5 0.5 - 1.52 0.910 0 0 0
0 0 - -0.948 -1.55 1 1 1
0 0 - 0.596 -0.394 1 0 0.5
0.5 -0.5 - -1.91 -0.905 0 1 0.5
-0.5 0.5 - 0.499 0.940 1 0 0.5
0.5 -0.5 - -1.17 -1.29 1 0 0.5
0.5 -0.5 - -0.16 -1.81 1 1 1
0 0
49- ! Mx script for QTL association sib pairs,
univariate - Group 1
- Calc NG2
- Begin Matrices
- ! Parameters
- B Full 1 1 free ! association between
component - W Full 1 1 free ! association within
component -
- M Full 1 1 free ! mean
- S Full 1 1 free ! Shared residual variance
- N Full 1 1 free ! Nonshared residual variance
- ! Definition variables
- C Full 1 1 ! association between
- X Full 1 1 ! association within, sib 1
- Y Full 1 1 ! association within, sib 2
- End Matrices
50- Group2 Data Group
- Data NI7 NO0
- RE fileassoc.dat
- Labels Sib1 Sib2 g1 g2 b w1 w2
- Select Sib1 Sib2 b w1 w2 /
- Definition b w1 w2 /
- Matrices Group 1
- Means M BC WX M BC WY /
-
- Covariance
- S N S _
- S S N /
-
- Specify C b /
- Specify X w1 /
- Specify Y w2 /
51Models
- B W
- B Full 1 1 free
- W Full 1 1 free
- !Equate W 1 1 1 B 1 1 1
- B W
- B Full 1 1 free
- W Full 1 1 free
- Equate W 1 1 1 B 1 1 1
- B
- B Full 1 1 free
- W Full 1 1
- !Equate W 1 1 1 B 1 1 1
- BW0
- B Full 1 1
- W Full 1 1
- !Equate W 1 1 1 B 1 1 1
52Tests
- Test HA H0
- Standard association test B W BW0
- Test of stratification B W B W
- Robust association test B W B
53assoc.mx
- Model B W -2LL df
- B W -0.478 -0.365 2103.96 795
- B W -0.420 -0.420 2105.05 796
- B -0.4778 2127.01 796
- BW0 2163.34 797
Test of total association HA BW 2105.05
H0 BW0 2163.34 ?-2LL 58.29, df
1, p lt 1e-14
54assoc.mx
- Model B W -2LL df
- B W -0.478 -0.365 2103.96 795
- B W -0.420 -0.420 2105.05 796
- B -0.4778 2127.01 796
- BW0 2163.34 797
Test of stratification HA B W 2103.96
H0 B W 2105.05 ?-2LL 1.09, df
1, p 0.29
55assoc.mx
- Model B W -2LL df
- B W -0.478 -0.365 2103.96 795
- B W -0.420 -0.420 2105.05 796
- B -0.4778 2127.01 796
- BW0 2163.34 797
Test of within association HA B W 2103.96
H0 B 2127.01 ?-2LL 23.06,
df 1, p lt 1e-6
56Implementation
- QTDT
- Abecasis et al (2001) AJHG
- extends between/within model to general pedigrees
- multiple alleles
- covariates
- combined test of linkage and association
- discrete as well as quantitative traits
57Linkage Association
- families
- detectable over large distances gt10 cM
- large effects OR gt3, variancegt10
- unrelateds or families
- detectable over small distances lt1 cM
- small effects ORlt2, variancelt1