Strategies for genomewide familybased association analysis James Degnan1, Jessica Su1, Cliona Molony

About This Presentation

Title:

Strategies for genomewide familybased association analysis James Degnan1, Jessica Su1, Cliona Molony

Description:

1Harvard School of Public Health, 2Genetics, Rosetta Inpharmatics, 3Harvard Medical School ... The Data (from Rosetta) 287 Individuals from 15 CEPH pedigrees ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 21

Provided by: jdeg

Category:

more less

Transcript and Presenter's Notes

Title: Strategies for genomewide familybased association analysis James Degnan1, Jessica Su1, Cliona Molony

1
Strategies for genome-wide family-based
association analysis James Degnan1, Jessica
Su1, Cliona Molony2, Eric Schadt2, Benjamin
Raby3, and Christoph Lange1 1Harvard School
of Public Health, 2Genetics, Rosetta
Inpharmatics, 3Harvard Medical School
2
Outline

Introductionthe data
Marker-phenotype combinations
Screening algorithm
a. Univariate case
b. Multivariate case (FBAT-PC)
Simulations
Conclusions

3
The Data (from Rosetta)

287 Individuals from 15 CEPH pedigrees
(grandparents,parents,children) with family
sizes between 13 and 17.
2322 SNP markers (also 270 microsats not being
used) on each individual
23,380 gene expressions measured on more than
half of individuals (167), some missing values.
Phenotypes do not include disease.

4
Strategy

To find marker-phenotype associations, we want to
look at a subset of the
2322 x 23,380 54 million marker-phenotype
combinations.
To look for cis-acting SNPs, consider only SNPs
within 1.0 Mb of a gene. Define distance of SNP
to gene to be the minimum of the two distances
from the marker to each of the two ends of the
gene, or 0 if the SNP is in the gene.

5
Marker-phenotype combinations
6
Screening Algorithm (Van Steen et al., 2005)

Compute conditional power based on conditional
mean model for each marker-phenotype combination.
Rank the combinations based on either conditional
power or heritability. (Which is better?)
Consider a marker-phenotype combination to be
significant if its FBAT p-value is less than
alpha and it has been screened.

7
Best SNPS sorted by informative families and
heritability
8
(No Transcript)
9
Best SNPS sorted by conditional power only
10
Computational issues

The amount of time it takes to run each iteration
of PBAT is highly variableit depends on the
pattern of missingness in the marker data,
particular for the parents.
PBAT takes the same amount of time for each
marker, independent of the phenotype. If a
marker proves too computationally difficult, we
have to skip all phenotypes near that marker (16
on average, but up to 70).

11
Reconstructability of parental markers in CEPH
pedigrees
12
Method for speeding-up screen

Pedigrees with missing parental markers can take
very long to analyze when parental genotypes
cannot be inferred.
A modification of the screening algorithm is to
only keep track of the top K (say, 500)
marker-phenotype combinations.
For each marker-phenotype, if the parents have
missing values that cannot be reconstructed, an
estimate of the conditional power can be obtained
by filling in the missing parental genotypes.
If the power is low compared to the top K
power-values, then this marker-phenotype
combination can be skipped otherwise it is
computed normally, and the list of the K most
powerful combinations is updated.
The end result should be the same list of the K
most powerful marker-phenotype combinations,
although many combinations will not have had the
conditional power computed using the original
(missing data).

13
FBAT-PC

In this approach, there is one overall phenotype
constructed for each marker. This phenotype is
the linear combination of expression values
within 1.0 Mb of the marker that maximizes
heritability.
In this approach, screening is done on the set of
markers, rather than the set of marker-phenotype
combinations, thus greatly reducing the power
values to be screened.

14
FBAT-PC Results (all models)
All Three models Obs Marker Allele
Freq model infofam pvalue
power 1 TSC0616740 2 0.49510
0 6 0.93620 0.022409 2
TSC0301105 2 0.39306 0 6
0.36774 0.018697 3 TSC0849086
1 0.25515 0 5 0.78279
0.010796 4 TSC0050595 2
0.41500 0 6 0.10615
0.008893 5 TSC1143002 1
0.17403 1 5 0.76342
0.008672 6 TSC0606794 1
0.45939 0 5 0.39329
0.007477 7 TSC0057231 1
0.27228 0 6 0.59724
0.007053 8 TSC0448850 2
0.36022 0 6 0.39794
0.005986 9 TSC0057231 1
0.27228 1 5 0.57135
0.005553 10 TSC0078180 1
0.36585 0 5 0.23947
0.005377 11 TSC0202163 1
0.48276 0 7 0.03660
0.004496 12 TSC0686093 1
0.24384 0 6 0.62200
0.004218 13 TSC0026698 2
0.35294 0 6 0.39119
0.004015 14 TSC1082055 2
0.36000 1 5 0.12454
0.003550 15 TSC0287131 2
0.43284 0 6 0.99423
0.003488 16 TSC1027993 2
0.45098 0 7 0.45889
0.003411 17 TSC1106988 2
0.48477 0 7 0.20923
0.003209 18 TSC0336892 2
0.31281 0 6 0.85698
0.003146 19 TSC0722384 2
0.42822 1 6 0.74845
0.003107 20 TSC0113708 1
0.24755 0 6 0.86111
0.003094
15
FBAT-PC results (additive and dominant models)
Obs Marker Allele Freq model
infofam pvalue power 1
TSC1143002 1 0.17403 1 5
0.76342 .008671941 2 TSC0057231
1 0.27228 1 5 0.57135
.005553376 3 TSC1082055 2
0.36000 1 5 0.12454
.003549725 4 TSC0722384 2
0.42822 1 6 0.74845
.003106552 5 TSC0930497 1
0.18687 1 6 0.43951
.002934976 6 TSC0300223 2
0.20297 1 5 0.02937
.002838104 7 TSC1021966 1
0.37685 1 6 0.87593
.002657709 8 TSC0127388 1
0.36318 1 5 0.16591
.002651893 9 TSC0507796 2
0.41707 1 5 0.80357
.002615662 10 TSC0130164 1
0.39791 1 5 0.78679
.002544931 11 TSC0221720 2
0.33659 1 6 0.29030
.002276393 12 TSC0873967 1
0.36683 1 5 0.63685
.002158055 13 TSC1085444 2
0.29268 1 5 0.32480
.002111099 14 TSC0925231 1
0.37745 1 5 0.72580
.002102141 15 TSC0483362 2
0.32250 1 5 0.03309
.002047252
16
FBAT-PC interpretation

Although no SNPs achieved genome-wide
significance, a few SNPs were promising despite
the low number of informative families
When most promising SNPs based on FBAT-PC were
analyzed using univariate approach, the lowest
p-value genes sometimes had low power.

17
Univariate analysis of most promising SNP based
on FBAT-PC
18
Simulations

Simulations were performed based on simulating
200 trios, 1000 markers, and 10 gene expression
values for each marker. 100 markers were
considered genuinely cis-acting on 1 of the 10
genes near the marker (so 1 of the 10,000
marker-phenotype combinations are based on true
cis-acting SNPs).
Allele frequencies were drawn from a
Unif(0.1,0.5) distribution for each marker, and
phenotypes were drawn from a N(aX,1.0)
distribution, where X is the number of A alleles,
and a is the effect size assuming an additive
model.

19
Power of PBAT screen as a function of the number
of markers
20
References

Van Steen, K., et al. Genomic screening and
replication using the same data set in
family-based association. Nature Genetics 37
683-691.
Monks, S.A., et al. Genetic inheritance of gene
expression in human cell lines. Am. J. Hum. Gen.
75 1094-1105

Write a Comment

User Comments (0)

About PowerShow.com

Strategies for genomewide familybased association analysis James Degnan1, Jessica Su1, Cliona Molony - PowerPoint PPT Presentation

Strategies for genomewide familybased association analysis James Degnan1, Jessica Su1, Cliona Molony

1Harvard School of Public Health, 2Genetics, Rosetta Inpharmatics, 3Harvard Medical School ... The Data (from Rosetta) 287 Individuals from 15 CEPH pedigrees ... – PowerPoint PPT presentation