Title: Strategies for genomewide familybased association analysis James Degnan1, Jessica Su1, Cliona Molony
1Strategies for genome-wide family-based
association analysis James Degnan1, Jessica
Su1, Cliona Molony2, Eric Schadt2, Benjamin
Raby3, and Christoph Lange1 1Harvard School
of Public Health, 2Genetics, Rosetta
Inpharmatics, 3Harvard Medical School
2Outline
- Introductionthe data
- Marker-phenotype combinations
- Screening algorithm
- a. Univariate case
- b. Multivariate case (FBAT-PC)
- Simulations
- Conclusions
3The Data (from Rosetta)
- 287 Individuals from 15 CEPH pedigrees
- (grandparents,parents,children) with family
sizes between 13 and 17. - 2322 SNP markers (also 270 microsats not being
used) on each individual - 23,380 gene expressions measured on more than
half of individuals (167), some missing values.
Phenotypes do not include disease.
4Strategy
- To find marker-phenotype associations, we want to
look at a subset of the - 2322 x 23,380 54 million marker-phenotype
combinations. - To look for cis-acting SNPs, consider only SNPs
within 1.0 Mb of a gene. Define distance of SNP
to gene to be the minimum of the two distances
from the marker to each of the two ends of the
gene, or 0 if the SNP is in the gene.
5Marker-phenotype combinations
6Screening Algorithm (Van Steen et al., 2005)
- Compute conditional power based on conditional
mean model for each marker-phenotype combination. - Rank the combinations based on either conditional
power or heritability. (Which is better?) - Consider a marker-phenotype combination to be
significant if its FBAT p-value is less than
alpha and it has been screened.
7Best SNPS sorted by informative families and
heritability
8(No Transcript)
9Best SNPS sorted by conditional power only
10Computational issues
- The amount of time it takes to run each iteration
of PBAT is highly variableit depends on the
pattern of missingness in the marker data,
particular for the parents. - PBAT takes the same amount of time for each
marker, independent of the phenotype. If a
marker proves too computationally difficult, we
have to skip all phenotypes near that marker (16
on average, but up to 70).
11Reconstructability of parental markers in CEPH
pedigrees
12Method for speeding-up screen
- Pedigrees with missing parental markers can take
very long to analyze when parental genotypes
cannot be inferred. - A modification of the screening algorithm is to
only keep track of the top K (say, 500)
marker-phenotype combinations. - For each marker-phenotype, if the parents have
missing values that cannot be reconstructed, an
estimate of the conditional power can be obtained
by filling in the missing parental genotypes. - If the power is low compared to the top K
power-values, then this marker-phenotype
combination can be skipped otherwise it is
computed normally, and the list of the K most
powerful combinations is updated. - The end result should be the same list of the K
most powerful marker-phenotype combinations,
although many combinations will not have had the
conditional power computed using the original
(missing data).
13FBAT-PC
- In this approach, there is one overall phenotype
constructed for each marker. This phenotype is
the linear combination of expression values
within 1.0 Mb of the marker that maximizes
heritability. - In this approach, screening is done on the set of
markers, rather than the set of marker-phenotype
combinations, thus greatly reducing the power
values to be screened.
14FBAT-PC Results (all models)
All Three models Obs Marker Allele
Freq model infofam pvalue
power 1 TSC0616740 2 0.49510
0 6 0.93620 0.022409 2
TSC0301105 2 0.39306 0 6
0.36774 0.018697 3 TSC0849086
1 0.25515 0 5 0.78279
0.010796 4 TSC0050595 2
0.41500 0 6 0.10615
0.008893 5 TSC1143002 1
0.17403 1 5 0.76342
0.008672 6 TSC0606794 1
0.45939 0 5 0.39329
0.007477 7 TSC0057231 1
0.27228 0 6 0.59724
0.007053 8 TSC0448850 2
0.36022 0 6 0.39794
0.005986 9 TSC0057231 1
0.27228 1 5 0.57135
0.005553 10 TSC0078180 1
0.36585 0 5 0.23947
0.005377 11 TSC0202163 1
0.48276 0 7 0.03660
0.004496 12 TSC0686093 1
0.24384 0 6 0.62200
0.004218 13 TSC0026698 2
0.35294 0 6 0.39119
0.004015 14 TSC1082055 2
0.36000 1 5 0.12454
0.003550 15 TSC0287131 2
0.43284 0 6 0.99423
0.003488 16 TSC1027993 2
0.45098 0 7 0.45889
0.003411 17 TSC1106988 2
0.48477 0 7 0.20923
0.003209 18 TSC0336892 2
0.31281 0 6 0.85698
0.003146 19 TSC0722384 2
0.42822 1 6 0.74845
0.003107 20 TSC0113708 1
0.24755 0 6 0.86111
0.003094
15FBAT-PC results (additive and dominant models)
Obs Marker Allele Freq model
infofam pvalue power 1
TSC1143002 1 0.17403 1 5
0.76342 .008671941 2 TSC0057231
1 0.27228 1 5 0.57135
.005553376 3 TSC1082055 2
0.36000 1 5 0.12454
.003549725 4 TSC0722384 2
0.42822 1 6 0.74845
.003106552 5 TSC0930497 1
0.18687 1 6 0.43951
.002934976 6 TSC0300223 2
0.20297 1 5 0.02937
.002838104 7 TSC1021966 1
0.37685 1 6 0.87593
.002657709 8 TSC0127388 1
0.36318 1 5 0.16591
.002651893 9 TSC0507796 2
0.41707 1 5 0.80357
.002615662 10 TSC0130164 1
0.39791 1 5 0.78679
.002544931 11 TSC0221720 2
0.33659 1 6 0.29030
.002276393 12 TSC0873967 1
0.36683 1 5 0.63685
.002158055 13 TSC1085444 2
0.29268 1 5 0.32480
.002111099 14 TSC0925231 1
0.37745 1 5 0.72580
.002102141 15 TSC0483362 2
0.32250 1 5 0.03309
.002047252
16FBAT-PC interpretation
- Although no SNPs achieved genome-wide
significance, a few SNPs were promising despite
the low number of informative families - When most promising SNPs based on FBAT-PC were
analyzed using univariate approach, the lowest
p-value genes sometimes had low power.
17Univariate analysis of most promising SNP based
on FBAT-PC
18Simulations
- Simulations were performed based on simulating
200 trios, 1000 markers, and 10 gene expression
values for each marker. 100 markers were
considered genuinely cis-acting on 1 of the 10
genes near the marker (so 1 of the 10,000
marker-phenotype combinations are based on true
cis-acting SNPs). - Allele frequencies were drawn from a
Unif(0.1,0.5) distribution for each marker, and
phenotypes were drawn from a N(aX,1.0)
distribution, where X is the number of A alleles,
and a is the effect size assuming an additive
model.
19Power of PBAT screen as a function of the number
of markers
20References
- Van Steen, K., et al. Genomic screening and
replication using the same data set in
family-based association. Nature Genetics 37
683-691. - Monks, S.A., et al. Genetic inheritance of gene
expression in human cell lines. Am. J. Hum. Gen.
75 1094-1105