Title: A 1000 Gene Approach to the CMap
1A 1000 Gene Approach to the C-Map
Aravind Subramanian March 11, 2007
2- the promise
- Small molecule gene-expression profiles reveal
connections b/w drugs?diseases?genes - the problem
- Whole-genome profiles are expensive!
- Affymetrix 300 / drug in one cell line
- Scaling to large chemical libraries, genotypes,
cell lines etc, prohibitively expensive - the 1000 gene solution
- Measure 1000 genes at high-throughput, low cost
- Use whole-genome reference datasets to infer the
remaining genes
3The 1000 gene solution
4Methodology Datasets
Compendium of diverse tissue types
1. Landmark definition dataset
- Pick 1000 universal landmarks
- Widely expressed genes
- Internally minimally-redundant
2. Cluster definition dataset
Reference dataset of whole-genome profiles
from small molecule pertuberations (currently
1000 prestwick compounds)
5Methodology Reconstructing C-Map instances
..
Experimental measurements of Landmark genes
(drug vs control) 1000 genes x N conditions
Correlations from whole-genome
reference pertuberations 22,283 genes x n
conditions
Reconstructed drug vs control whole-genome
instances 22,283 genes x N conditions
Measured gene (Landmark)
Conditions Drug vs control N gtgt n
Inferred gene
6Published C-Map connections Do we recover these?
Lamb et al Science, 2006
7HDAC query (n33)
Whole-genome profiles (22283 features)
1000 Landmarks
Hit score 0.88 NP lt 0.0001
Hit score 0.82 NP lt 0.0001
8Results Whole-genome profiles vs 1000 landmark
genes
HDAC
Estrogen antagonists
Estrogen agonists
Phenothiazines
Diet
Gedunin
Sirolimus
9Results
With an expanded number of queries (6 Science
27 new 33)
10Conclusion Price performance
To profile a 100K compound library
24x
1000 Genes 10 / drug _at_75 performance
Whole-genome 300 / drug
11A 1000 Gene Approach to the C-Map
Justin Lamb David Peck Todd Golub Members of the
Cancer Program
12The end