Association Analysis of Rare Genetic Variants - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Association Analysis of Rare Genetic Variants

Description:

... learning weight from data or adaptive selection, permutation test; Any combination ... Application 1: Family Data * ~N(0,2 2K) Unadjusted: Adjusted: ... – PowerPoint PPT presentation

Number of Views:172
Avg rating:3.0/5.0
Slides: 50
Provided by: Quny5
Category:

less

Transcript and Presenter's Notes

Title: Association Analysis of Rare Genetic Variants


1
Association Analysis of Rare Genetic Variants
  • Qunyuan Zhang
  • Division of Statistical Genomics
  • Course M21-621
  • Computational Statistical Genetics

2
Rare Variants
  • Low allele frequency usually less than 1
  • Low power for most analyses, due to less
    variation of observations
  • High false positive rate for some model-based
    analyses, due to sparse distribution of data,
    unstable/biased parameter estimation and inflated
    p-value.

2
3
An Example of Low Power
3
Jonathan C. Cohen, et al.
Science 305, 869 (2004)
4
An Example of High False Positive Rate(Q-Q plots
from GWAS data, unpublished)
N2500 MAFgt0.03
N2500 MAFlt0.03
N50000 MAFlt0.03 Bootstrapped
N2500 MAFlt0.03 Permuted
5
Three Levels of Rare Variant Data
  • Level 1 Individual-level
  • Level 2 Summarized over subjects
  • Level 3 Summarized over both subjects and
    variants

5
6
Level 1 Individual-level

Subject V1 V2 V3 V4 Trait-1 Trait-2
1 1 0 0 0 90.1 1
2 0 1 0 . 99.2 1
3 0 0 0 0 105.9 0
4 0 0 0 0 89.5 0
5 0 . 0 0 97.6 0
6 0 0 0 0 110.5 0
7 0 0 1 0 88.8 0
8 0 0 0 1 95.4 1
6
7
Level 2 Summarized over subjects (by group)
7
Jonathan C. Cohen, et al.
Science 305, 869 (2004)
Jonathan C. Cohen, et al.
Science 305, 869 (2004)
8
Level 3 Summarized over subjects (by group) and
variants (usually by gene)
Variant allele number Reference allele number Total
Low-HDL group 20 236 256
High-HDL group 2 254 256
Total 22 490 512
9
Methods For Level 3 Data
9
10
Single-variant Test vs Total Freq.Test (TFT)
Jonathan C. Cohen, et al.
Science 305, 869 (2004)
11
What we have learned
  • Single-variant test of rare variants has very low
    power for detecting association, due to extremely
    low frequency (usually lt 0.01)
  • Testing collective effect of a set of rare
    variants may increase the power (sum test,
    collective test, group test, collapsing test,
    burden test)

12
Methods For Level 2 Data
  • Allowing different samples sizes for different
    variants
  • Different variants can be weighted differently

12
13
CAST A cohort allelic sums test
Morgenthaler and Thilly, Mutation Research 615
(2007) 2856
Under H0 S(cases)/2N(cases)-S(controls)/2N(contro
ls) 0 S variant number N sample size T
S(cases) - S(controls)N(cases)/N(controls)
S(cases) - S(controls) (S can be calculated
variant by variant and can be weighted
differently, the final Tsum(WiSi)
) ZT/SQRT(Var(T)) N (0,1) Var(T) Var
(S(cases) - S (controls) ) Var(S(cases))
Var(S (controls)) Var(S(cases))
Var(S(controls)) X N(cases)/N(controls)2
13
14
C-alpha
PLOS Genetics, 2011 Volume 7 Issue 3
e1001322
Effect direction problem
15
C-alpha
15
16
QQ Plots of Existing Methods (under the null)
  • EFT and C-alpha
  • inflated with false positives
  • TFT and CAST
  • no inflation, but assuming single
  • effect-direction
  • Objective
  • More general, powerful methods

EFT TFT
CAST C-alpha
17
More Generalized Methods For Level 2 Data
17
18
Structure of Level 2 data
variant 1
variant 2


variant k
variant 3
variant i
Strategy Instead of testing total freq./number,
we test the randomness of all tables.
19
Exact Probability Test (EPT)
1.Calculating the probability of each table based
on hypergeometric distribution
2. Calculating the logarized joint
probability (L) for all k tables
3. Enumerating all possible tables and L
scores
4. Calculating p-value P Prob.( )
ASHG Meeting 1212, Zhang
20
Likelihood Ratio Test (LRT)
Binomial distribution
ASHG Meeting 1212, Zhang
21
Q-Q Plots of EPT and LRT(under the null)
EPT N500
LRT N500
LRT N3000
EPT N3000
22
Power Comparison significance level0.00001
Variant proportion Positive causal 80 Neutral
20 Negative Causal 0
Power
Power
Power
Sample size
Sample size
Sample size
23
Power Comparison significance level0.00001
Variant proportion Positive causal 60 Neutral
20 Negative Causal 20
Power
Sample size
24
Power Comparison significance level0.00001
Variant proportion Positive causal 40 Neutral
20 Negative Causal 40
Power
Sample size
25
Methods For Level 1 Data
  • Including covariates
  • Extended to quantitative trait
  • Better control for population structure
  • More sophisticate model

25
26
Collapsing (C) test
Li and Leal,The American Journal of Human
Genetics 2008(83) 311321
Step 1
Step 2 logit(y)a b X e (logistic
regression)
27
Variant Collapsing
() () (.) (.)
Subject V1 V2 V3 V4 Collapsed Trait
1 1 0 0 0 1 1
2 0 1 0 0 1 1
3 0 0 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 0 0
6 0 0 0 0 0 0
7 0 0 1 0 1 0
8 0 0 0 1 1 1
28
WSS
29
WSS
29
30
WSS
30
31
Weighted Sum Test
Collapsing test (Li Leal, 2008), wi 1 and
s1 if sgt1 Weighted-sum test (Madsen Browning
,2009), wi calculated based-on allele freq. in
control group aSum Adaptive sum test (Han Pan
,2010), wi -1 if blt0 and plt0.1, otherwise
wj1 KBAC (Liu and Leal, 2010), wi left tail
p value RBT (Ionita-Laza et al, 2011), wi log
scaled probability PWST p-value weighted sum
test (Zhang et al., 2011) , wi rescaled left
tail p value, incorporating both significance and
directions EREC( Lin et al, 2011), wi
estimated effect size
31
32
() ()
Subject V1 V2 Collapsed Trait
1 1 0 1 3.00
2 0 1 1 3.10
3 0 0 0 1.95
4 0 0 0 2.00
5 0 0 0 2.05
6 0 0 0 2.10
When there are only causal() variants
Collapsing (Li Leal,2008) works well, power
increased
32
33
() () (.) (.)
Subject V1 V2 V3 V4 Collapsed Trait
1 1 0 0 0 1 3.00
2 0 1 0 0 1 3.10
3 0 0 0 0 0 1.95
4 0 0 0 0 0 2.00
5 0 0 0 0 0 2.05
6 0 0 0 0 0 2.10
7 0 0 1 0 1 2.00
8 0 0 0 1 1 2.10
When there are causal() and non-causal(.)
variants
Collapsing still works, power reduced
33
34
() () (.) (.) (-) (-)
Subject V1 V2 V3 V4 V5 V6 Collapsed Trait
1 1 0 0 0 0 0 1 3.00
2 0 1 0 0 0 0 1 3.10
3 0 0 0 0 0 0 0 1.95
4 0 0 0 0 0 0 0 2.00
5 0 0 0 0 0 0 0 2.05
6 0 0 0 0 0 0 0 2.10
7 0 0 1 0 0 0 1 2.00
8 0 0 0 1 0 0 1 2.10
9 0 0 0 0 1 0 1 0.95
10 0 0 0 0 0 1 1 1.00
When there are causal() non-causal(.) and
causal (-) variants
Power of collapsing test significantly down
34
35
P-value Weighted Sum Test (PWST)
() () (.) (.) (-) (-)
Subject V1 V2 V3 V4 V5 V6 Collapsed pSum Trait
1 1 0 0 0 0 0 1 0.86 3.00
2 0 1 0 0 0 0 1 0.90 3.10
3 0 0 0 0 0 0 0 0.00 1.95
4 0 0 0 0 0 0 0 0.00 2.00
5 0 0 0 0 0 0 0 0.00 2.05
6 0 0 0 0 0 0 0 0.00 2.10
7 0 0 1 0 0 0 1 -0.02 2.00
8 0 0 0 1 0 0 1 0.08 2.10
9 0 0 0 0 1 0 1 -0.90 0.95
10 0 0 0 0 0 1 1 -0.88 1.00
t 1.61 1.84 -0.04 0.11 -1.84 -1.72
p(xt) 0.93 0.95 0.49 0.54 0.05 0.06
2(p-0.5) 0.86 0.90 -0.02 0.08 -0.90 -0.88
Rescaled left-tail p-value -1,1 is used as
weight
35
36
P-value Weighted Sum Test (PWST)
Power of collapsing test is retained even there
are bidirectional effects
36
37
PWSTQ-Q Plots Under the Null
Direct test Inflation of type I error
Corrected by permutation test (permutation of
phenotype)
37
38
Generalized Linear Mixed Model (GLMM) Weighted
Sum Test (WST)
38
39
GLMM WST
Y quantitative trait or logit(binary trait) a
intercept ß regression coefficient of weighted
sum m number of RVs to be collapsed wi
weight of variant i gi genotype (recoded) of
variant i Swigi weighted sum (WS) X
covariate(s), such as population structure
variable(s) t fixed effect(s) of X Z design
matrix corresponding to ? ? random polygene
effects for individual subjects, N(0, G),
G2s2K, K is the kinship matrix and s2 the
additive ploygene genetic variance
e residual
39
40
Weight
  • Base on allele frequency, binary(0,1) or
    continuous, fixed or variable threshold
  • Based on function annotation/prediction SIFT,
    PolyPhen etc.
  • Based on sequencing quality (coverage, mapping
    quality, genotyping quality etc.)
  • Data-driven, using both genotype and phenotype
    data, learning weight from data or adaptive
    selection, permutation test
  • Any combination

40
41
Application 1 Family Data
  • Adjusting relatedness in family data for
    non-data-driven test of rare variants.

Unadjusted
Adjusted
? N(0,2s2K)
41
42
  • Q-Q Plots of log10(P) under the Null

Li Leals collapsing test, ignoring family
structure, inflation of type-1 error
Li Leals collapsing test, modeling family
structure via GLMM, inflation is corrected
(From Zhang et al, 2011, BMC Proc.)
42
43
Application 2 Permuting Family Data
MMPT Mixed Model-based Permutation
Test Adjusting relatedness in family data for
data-driven permutation test of rare variants.
? N(0,2s2K)
43
44
  • Q-Q Plots under the Null

WSS
Permutation test, ignoring family structure,
inflation of type-1 error
aSum
SPWST
PWST
44
(From Zhang et al, 2011, IGES Meeting)
45
  • Q-Q Plots under the Null

WSS
Mixed model-based permutation test (MMPT),
modeling family structure, inflation corrected
aSum
SPWST
PWST
(From Zhang et al, 2011, IGES Meeting)
46
Burden Test vs. Non-burden Test
Burden test
Non-burden test
T-test, Likelihood Ratio Test, F-test, score
test,
SKAT sequence kernel association test
46
47
SKAT sequence kernel association test
48
Extension of SKAT to Family Data
kinship matrix
Polygenic heritability of the trait
Residual
Han Chen et al., 2012, Genetic Epidemiology
49
Other problems
  • Missing genotypes imputation
  • Genotyping errors QC (family consistency,
    sequence review)
  • Population Stratification
  • Inherited variants and de novo mutation
  • Family data linkage infomation
  • Variant validation and association validation
  • Public databases
  • And more

49
Write a Comment
User Comments (0)
About PowerShow.com