Title: Statistical Issues and Methods in the Metaanalysis of Microarray Data
1Statistical Issues and Methods in the
Meta-analysis of Microarray Data
- Debashis Ghosh
- Department of Biostatistics
- University of Michigan
- http//www.sph.umich.edu/ghoshd
2Outline
- Introduction
- Data Example Prostate Cancer Data
- Statistical Issues
- Proposed Method
- Bioinformatic Investigations
- Discussion
3Introduction
- Proliferation of related gene expression
studies by several groups - Scientific goals typically center on the
following questions - Biomarker/Biological gene discovery
- Pathways
4Cancer Gene Profiling Studies
5cDNA and Tissue Microarray Paradigm
6Introduction
- Scientific interest in performing meta-analyses
of gene expression data - Evidence
- Development of public gene expression data
warehouses (e.g. Stanford microarray database,
Gene Express Omnibus at NCBI) - MIAME minimum amount of information necessary
for a microarray experiment - Common ontology/text markup language for
microarray datasets
7Example Prostate Cancer Data
- Four studies in which differential expression
between cancerous tissue and normal tissue was
assessed using cDNA microarrays - Dhanasekaran et al. (2001)
- Luo et al. (2001)
- Magee et al. (2001)
- Welsh et al. (2001)
8Background of studies
9Affymetrix versus cDNA
- Two different types of technologies
- cDNA mRNA (test and reference) labelled with red
and green dye, typically ratio of intensities for
two channels is used - Affymetrix PM/MM technology goal of MM is to
adjust for non-specific hybridization
10Correlating Affy and cDNA data
- Study by Kuo et al. (2002, Bioinformatics)
- Dataset NCI 60 cell line dataset
- Authors found poor correlation between the gene
expression measurements - Conclusion data from spotted cDNA microarrays
could not be directly combined with data from
synthesized oligonucleotide arrays - Our solution start with within-study
t-statistics for each genes as our data
11Summary of t-statistic correlations
12Missing Data
- The same genes do not appear in all studies
13Study clone description
14Duplicate clones
- Bioinformatic manipulations to determine
genetic identity for each of the genes - 1119 duplicate spots in Dhanasekaran et al.
(2001), 293 in Luo et al. (2001), 121 in Magee
(2001) and 757 in Welsh et al. (2001)
15(No Transcript)
16Missing Data
- Complicated nonnested, nonmonotone missing data
pattern - Assume data are missing at random (MAR)
17Individual Study Analysis
- Custom software written in Perl
- Study-specific, gene-specific p-values calculated
using random permutation t-test - T-stat for an individual gene was calculated and
compared to 10,000 random t-stats generated by
randomly assigning the sample labels to the
genes expression values
18Individual Study Analysis
- P-value then equaled the fraction of random
t-stats that were greater than the actual t-stat. - To calculate q-value (or gene-specific false
discovery rate), genes were sorted by p-value,
and then the ratio of expected number of
occurrences at or better than each p-value was
computed - q (pn)/I
19Individual Study Analysis
20Individual Study Analysis
21Random Simulation
22 Meta-Analysis of Microarrays
- p-value summary statistics (S) were computed,
using the p-values from the random permutation
t-tests - Summary statistic p-values were calculated by a
comparison to 100,000 summary statistics
generated by randomly selecting a p-value from
each individual study contributing to the
meta-analysis
23Meta-analysis of microarrays
- As for individual studies, q-values were
calculated and a ranked gene list generated. - Each gene in each study was normalized such that
the mean gene expression of the benign samples
equaled zero and the SD equaled one.
24A Model for Microarray Meta-Analysis
25Comparisons of Meta-Analyses
26Meta-Analysis Heat Map
27Meta-Analysis Heat Map
28Dysregulation of Polyamine Biosynthesis in
Prostate Cancer
29Dysregulation of Purine Biosynthesis in Prostate
Cancer
30Acknowledgments
- Department of Pathology, UM
- Arul Chinnaiyan, M.D., Ph.D
- Dan Rhodes
- Terry Barrette
- References
- Rhodes, D. et al. (2002). Cancer Research.
- Ghosh, D. et al. (2003). Statistical issues and
methods for meta-analysis of microarray data a
case study in prostate cancer. Available at - http//www.sph.umich.edu/ghoshd/COMPBIO/ .