Strategies for genomewide familybased association analysis James Degnan1, Jessica Su1, Cliona Molony - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Strategies for genomewide familybased association analysis James Degnan1, Jessica Su1, Cliona Molony

Description:

1Harvard School of Public Health, 2Genetics, Rosetta Inpharmatics, 3Harvard Medical School ... The Data (from Rosetta) 287 Individuals from 15 CEPH pedigrees ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 21
Provided by: jdeg
Category:

less

Transcript and Presenter's Notes

Title: Strategies for genomewide familybased association analysis James Degnan1, Jessica Su1, Cliona Molony


1
Strategies for genome-wide family-based
association analysis James Degnan1, Jessica
Su1, Cliona Molony2, Eric Schadt2, Benjamin
Raby3, and Christoph Lange1 1Harvard School
of Public Health, 2Genetics, Rosetta
Inpharmatics, 3Harvard Medical School
2
Outline
  • Introductionthe data
  • Marker-phenotype combinations
  • Screening algorithm
  • a. Univariate case
  • b. Multivariate case (FBAT-PC)
  • Simulations
  • Conclusions

3
The Data (from Rosetta)
  • 287 Individuals from 15 CEPH pedigrees
  • (grandparents,parents,children) with family
    sizes between 13 and 17.
  • 2322 SNP markers (also 270 microsats not being
    used) on each individual
  • 23,380 gene expressions measured on more than
    half of individuals (167), some missing values.
    Phenotypes do not include disease.

4
Strategy
  • To find marker-phenotype associations, we want to
    look at a subset of the
  • 2322 x 23,380 54 million marker-phenotype
    combinations.
  • To look for cis-acting SNPs, consider only SNPs
    within 1.0 Mb of a gene. Define distance of SNP
    to gene to be the minimum of the two distances
    from the marker to each of the two ends of the
    gene, or 0 if the SNP is in the gene.

5
Marker-phenotype combinations
6
Screening Algorithm (Van Steen et al., 2005)
  • Compute conditional power based on conditional
    mean model for each marker-phenotype combination.
  • Rank the combinations based on either conditional
    power or heritability. (Which is better?)
  • Consider a marker-phenotype combination to be
    significant if its FBAT p-value is less than
    alpha and it has been screened.

7
Best SNPS sorted by informative families and
heritability
8
(No Transcript)
9
Best SNPS sorted by conditional power only
10
Computational issues
  • The amount of time it takes to run each iteration
    of PBAT is highly variableit depends on the
    pattern of missingness in the marker data,
    particular for the parents.
  • PBAT takes the same amount of time for each
    marker, independent of the phenotype. If a
    marker proves too computationally difficult, we
    have to skip all phenotypes near that marker (16
    on average, but up to 70).

11
Reconstructability of parental markers in CEPH
pedigrees
12
Method for speeding-up screen
  • Pedigrees with missing parental markers can take
    very long to analyze when parental genotypes
    cannot be inferred.
  • A modification of the screening algorithm is to
    only keep track of the top K (say, 500)
    marker-phenotype combinations.
  • For each marker-phenotype, if the parents have
    missing values that cannot be reconstructed, an
    estimate of the conditional power can be obtained
    by filling in the missing parental genotypes.
  • If the power is low compared to the top K
    power-values, then this marker-phenotype
    combination can be skipped otherwise it is
    computed normally, and the list of the K most
    powerful combinations is updated.
  • The end result should be the same list of the K
    most powerful marker-phenotype combinations,
    although many combinations will not have had the
    conditional power computed using the original
    (missing data).

13
FBAT-PC
  • In this approach, there is one overall phenotype
    constructed for each marker. This phenotype is
    the linear combination of expression values
    within 1.0 Mb of the marker that maximizes
    heritability.
  • In this approach, screening is done on the set of
    markers, rather than the set of marker-phenotype
    combinations, thus greatly reducing the power
    values to be screened.

14
FBAT-PC Results (all models)
All Three models Obs Marker Allele
Freq model infofam pvalue
power 1 TSC0616740 2 0.49510
0 6 0.93620 0.022409 2
TSC0301105 2 0.39306 0 6
0.36774 0.018697 3 TSC0849086
1 0.25515 0 5 0.78279
0.010796 4 TSC0050595 2
0.41500 0 6 0.10615
0.008893 5 TSC1143002 1
0.17403 1 5 0.76342
0.008672 6 TSC0606794 1
0.45939 0 5 0.39329
0.007477 7 TSC0057231 1
0.27228 0 6 0.59724
0.007053 8 TSC0448850 2
0.36022 0 6 0.39794
0.005986 9 TSC0057231 1
0.27228 1 5 0.57135
0.005553 10 TSC0078180 1
0.36585 0 5 0.23947
0.005377 11 TSC0202163 1
0.48276 0 7 0.03660
0.004496 12 TSC0686093 1
0.24384 0 6 0.62200
0.004218 13 TSC0026698 2
0.35294 0 6 0.39119
0.004015 14 TSC1082055 2
0.36000 1 5 0.12454
0.003550 15 TSC0287131 2
0.43284 0 6 0.99423
0.003488 16 TSC1027993 2
0.45098 0 7 0.45889
0.003411 17 TSC1106988 2
0.48477 0 7 0.20923
0.003209 18 TSC0336892 2
0.31281 0 6 0.85698
0.003146 19 TSC0722384 2
0.42822 1 6 0.74845
0.003107 20 TSC0113708 1
0.24755 0 6 0.86111
0.003094
15
FBAT-PC results (additive and dominant models)
Obs Marker Allele Freq model
infofam pvalue power 1
TSC1143002 1 0.17403 1 5
0.76342 .008671941 2 TSC0057231
1 0.27228 1 5 0.57135
.005553376 3 TSC1082055 2
0.36000 1 5 0.12454
.003549725 4 TSC0722384 2
0.42822 1 6 0.74845
.003106552 5 TSC0930497 1
0.18687 1 6 0.43951
.002934976 6 TSC0300223 2
0.20297 1 5 0.02937
.002838104 7 TSC1021966 1
0.37685 1 6 0.87593
.002657709 8 TSC0127388 1
0.36318 1 5 0.16591
.002651893 9 TSC0507796 2
0.41707 1 5 0.80357
.002615662 10 TSC0130164 1
0.39791 1 5 0.78679
.002544931 11 TSC0221720 2
0.33659 1 6 0.29030
.002276393 12 TSC0873967 1
0.36683 1 5 0.63685
.002158055 13 TSC1085444 2
0.29268 1 5 0.32480
.002111099 14 TSC0925231 1
0.37745 1 5 0.72580
.002102141 15 TSC0483362 2
0.32250 1 5 0.03309
.002047252
16
FBAT-PC interpretation
  • Although no SNPs achieved genome-wide
    significance, a few SNPs were promising despite
    the low number of informative families
  • When most promising SNPs based on FBAT-PC were
    analyzed using univariate approach, the lowest
    p-value genes sometimes had low power.

17
Univariate analysis of most promising SNP based
on FBAT-PC
18
Simulations
  • Simulations were performed based on simulating
    200 trios, 1000 markers, and 10 gene expression
    values for each marker. 100 markers were
    considered genuinely cis-acting on 1 of the 10
    genes near the marker (so 1 of the 10,000
    marker-phenotype combinations are based on true
    cis-acting SNPs).
  • Allele frequencies were drawn from a
    Unif(0.1,0.5) distribution for each marker, and
    phenotypes were drawn from a N(aX,1.0)
    distribution, where X is the number of A alleles,
    and a is the effect size assuming an additive
    model.

19
Power of PBAT screen as a function of the number
of markers
20
References
  • Van Steen, K., et al. Genomic screening and
    replication using the same data set in
    family-based association. Nature Genetics 37
    683-691. 
  • Monks, S.A., et al. Genetic inheritance of gene
    expression in human cell lines. Am. J. Hum. Gen.
    75 1094-1105
Write a Comment
User Comments (0)
About PowerShow.com