Statistical Issues and Methods in the Metaanalysis of Microarray Data PowerPoint PPT Presentation

presentation player overlay
1 / 30
About This Presentation
Transcript and Presenter's Notes

Title: Statistical Issues and Methods in the Metaanalysis of Microarray Data


1
Statistical Issues and Methods in the
Meta-analysis of Microarray Data
  • Debashis Ghosh
  • Department of Biostatistics
  • University of Michigan
  • http//www.sph.umich.edu/ghoshd

2
Outline
  • Introduction
  • Data Example Prostate Cancer Data
  • Statistical Issues
  • Proposed Method
  • Bioinformatic Investigations
  • Discussion

3
Introduction
  • Proliferation of related gene expression
    studies by several groups
  • Scientific goals typically center on the
    following questions
  • Biomarker/Biological gene discovery
  • Pathways

4
Cancer Gene Profiling Studies
5
cDNA and Tissue Microarray Paradigm
6
Introduction
  • Scientific interest in performing meta-analyses
    of gene expression data
  • Evidence
  • Development of public gene expression data
    warehouses (e.g. Stanford microarray database,
    Gene Express Omnibus at NCBI)
  • MIAME minimum amount of information necessary
    for a microarray experiment
  • Common ontology/text markup language for
    microarray datasets

7
Example Prostate Cancer Data
  • Four studies in which differential expression
    between cancerous tissue and normal tissue was
    assessed using cDNA microarrays
  • Dhanasekaran et al. (2001)
  • Luo et al. (2001)
  • Magee et al. (2001)
  • Welsh et al. (2001)

8
Background of studies
9
Affymetrix versus cDNA
  • Two different types of technologies
  • cDNA mRNA (test and reference) labelled with red
    and green dye, typically ratio of intensities for
    two channels is used
  • Affymetrix PM/MM technology goal of MM is to
    adjust for non-specific hybridization

10
Correlating Affy and cDNA data
  • Study by Kuo et al. (2002, Bioinformatics)
  • Dataset NCI 60 cell line dataset
  • Authors found poor correlation between the gene
    expression measurements
  • Conclusion data from spotted cDNA microarrays
    could not be directly combined with data from
    synthesized oligonucleotide arrays
  • Our solution start with within-study
    t-statistics for each genes as our data

11
Summary of t-statistic correlations
12
Missing Data
  • The same genes do not appear in all studies

13
Study clone description
14
Duplicate clones
  • Bioinformatic manipulations to determine
    genetic identity for each of the genes
  • 1119 duplicate spots in Dhanasekaran et al.
    (2001), 293 in Luo et al. (2001), 121 in Magee
    (2001) and 757 in Welsh et al. (2001)

15
(No Transcript)
16
Missing Data
  • Complicated nonnested, nonmonotone missing data
    pattern
  • Assume data are missing at random (MAR)

17
Individual Study Analysis
  • Custom software written in Perl
  • Study-specific, gene-specific p-values calculated
    using random permutation t-test
  • T-stat for an individual gene was calculated and
    compared to 10,000 random t-stats generated by
    randomly assigning the sample labels to the
    genes expression values

18
Individual Study Analysis
  • P-value then equaled the fraction of random
    t-stats that were greater than the actual t-stat.
  • To calculate q-value (or gene-specific false
    discovery rate), genes were sorted by p-value,
    and then the ratio of expected number of
    occurrences at or better than each p-value was
    computed
  • q (pn)/I

19
Individual Study Analysis
  • q-value

20
Individual Study Analysis
21
Random Simulation
22
Meta-Analysis of Microarrays
  • p-value summary statistics (S) were computed,
    using the p-values from the random permutation
    t-tests
  • Summary statistic p-values were calculated by a
    comparison to 100,000 summary statistics
    generated by randomly selecting a p-value from
    each individual study contributing to the
    meta-analysis

23
Meta-analysis of microarrays
  • As for individual studies, q-values were
    calculated and a ranked gene list generated.
  • Each gene in each study was normalized such that
    the mean gene expression of the benign samples
    equaled zero and the SD equaled one.

24
A Model for Microarray Meta-Analysis
25
Comparisons of Meta-Analyses
26
Meta-Analysis Heat Map
27
Meta-Analysis Heat Map
28
Dysregulation of Polyamine Biosynthesis in
Prostate Cancer
29
Dysregulation of Purine Biosynthesis in Prostate
Cancer
30
Acknowledgments
  • Department of Pathology, UM
  • Arul Chinnaiyan, M.D., Ph.D
  • Dan Rhodes
  • Terry Barrette
  • References
  • Rhodes, D. et al. (2002). Cancer Research.
  • Ghosh, D. et al. (2003). Statistical issues and
    methods for meta-analysis of microarray data a
    case study in prostate cancer. Available at
  • http//www.sph.umich.edu/ghoshd/COMPBIO/ .
Write a Comment
User Comments (0)
About PowerShow.com