Title: Chemogenomics Methods
1Chemogenomics Methods
Paul Blower Phamacogenomics 725
Feb. 27, 2007
2Definition of Chemogenomics
- Use of genomics to measure the system-wide effect
of a compound on an intact biological system,
either cells or whole organisms - Also investigates the consequences of
differential gene/protein expression on cellular
response to compound treatment - Combines genomics or proteomic profiling with
chemoinformatics and statistical analysis
3Overview
- Chemogenomic methods applied to NCI-60
(Weinstein, NCI) - Case study of mdr1 (Huang, Sadee)
- Yeast deletion library (Giaever, Stanford)
- Connectivity map (Lamb, MIT/Broad)
4Conceptual Framework
5Molecular Profiling of NCI-60
DNA
Compound screening
Protein
RNA
Weinstein, Mol. Cancer Ther., 2006 5(11) 2601-5
6NCI Cancer Screening
- Tests compounds against 60 tumor cell lines
- Breast (5) Leukemia (6) Ovarian (7)
- CNS (6) Lung (9) Prostate (2)
- Colon (7) Melanoma (10) Kidney (8)
- Compounds tested for growth inhibition of tumor
cell lines, determine GI50 - Since 1990, gt100,000 compounds screened
Shoemaker, Nat. Rev. Cancer, 2006, 6,
813-23 Source http//dtp.nci.nih.gov/docs/cancer/
cancer_data.html
7NCI Gene Expression
- Compared cDNA from individual cell line (resting)
with cDNA from a pool of 12 cell lines - Microarray contained 9,703 DNA elements
- 3,700 named genes
- 1,900 human homologs
- 4,100 ESTs
- Selected 3748 genes passing QC and sequence
verified - Since 2000, NCI-60 gene expression measured on gt5
platforms
U. Scherf, et. al., Nature Genet., 2000, 24,
23644 source http//discover.nci.nih.gov
8Experimental Methods
Compound activity growth inhibition of tumor
cell lines
- Total protein assessed 48 hours after drug
treatment by sulforhodamine assay - Dose-response typically 5 points dilution series
10-4 10-8M
9Experimental Methods
cDNA Microarrays
mRNA from individual NCI-60 cell lines
mRNA from pool of 12 cell lines
cDNA labeled with cye5-dUTP (red) or cye3-dUTP
(green)
10Experimental Methods
Oligonucleotide microarrays (Affymetrix)
photolabile protecting group
- 25-mer oligos synthesized by photolithography
- Probes synthesized from sequence information
alone, no clones, PCR, etc. - Probe set 10 (PM, MM)-oligo pairs per gene,
selected for uniqueness - Signal is average (PM MM) over probe set
11AT Matrix
x
- Matrix of Pearson correlation coefficients
- Each row of A and T is normalized by mean and
standard deviation, - Matrices are multiplied to obtain AT
- Each entry is divided by n - 1, where n (60) is
the number of cell lines - Optionally, the rows and columns of the product
matrix are arranged in cluster order
12Gene-Compound Correlations
- Breast
- CNS
- Colon
- Leukemia
- Lung
- Melanoma
- Ovarian
- Prostate
- Renal
across NCI60 cell lines, r -0.87
13Compound-Gene Correlations
strong correlation (r 0.88)
weak correlation (r 0.29)
Pearson correlation coefficient
14Compound-Gene Correlations
Distribution of correlation coefficients between
4,463 compounds and 3,748 genes
15Heat map
16Sample Heatmap
17Sample Heatmap
Compound-gene correlations
18Unclustered Heatmap
Rows and columns in random order
19Clustering
- Purpose discover natural groupings in a set of
objects e.g., compounds, genes - Grouping done on basis of similarity between
pairs of objects - No assumptions made about number of groups
differs from classification
20Methods of Clustering
- Hierarchical produces tree structure
- Agglomerative bottom-up
- Divisive top-down
- Non-hierarchical
- Jarvis-Patrick
- K Means
21Agglomerative Hierarchical Clustering
- Basic Linkage method for N objects
- Start with N clusters and N x N distance matrix D
dik, giving distance between objects i and k - Search D for most similar pair of clusters U and
V - Merge U and V to form new cluster (UV)
- Update D by (a) removing the U and V rows and
columns, and (b) adding new row and column giving
distance to (UV) - Repeat steps 2 4 a total of (N-1) times
22Hierarchical Methods
- Determining distance between two clusters
- Single nearest neighbor
- Complete farthest neighbor
- Average average distance between all pair of
two clusters
23Hierarchical Clustering Methods
Method
Distance
single
d24
d15
complete
average
24Distance Matrix
Hypothetical set of 5 objects
1 2 3 4 5
1 2 3 4 5
25Single Linkage Clustering
Initial distance matrix
New distance matrix
1 2 3 4 5
(35) 1 2 4
(35) 1 2 4
1 2 3 4 5
26Single Linkage Clustering
Initial distance matrix
New distance matrix
(135) 2 4
(35) 1 2 4
(135) 2 4
(35) 1 2 4
Compute distance to cluster 1,3,5
d(135)2 mind12, d(35)2 7 d(135)4 mind14,
d(35)4 6
27Single Linkage Dendrogram
28Complete Linkage Clustering
Initial distance matrix
New distance matrix
1 2 3 4 5
(35) 1 2 4
(35) 1 2 4
1 2 3 4 5
Dendrogram
? ? ? ? ? 1 2
4 3 5
29Complete Linkage Clustering
Initial distance matrix
New distance matrix
(35) (24) 1
(35) 1 2 4
(35) (24) 1
(35) 1 2 4
Dendrogram
Compute distance to cluster 2,4
d(24)(35) maxd(35)2, d(35)4 10 d(24)1
maxd12, d14 9
? ? ? ? ? 1 2
4 3 5
30Complete Linkage Dendrogram
31Sample Data
Random set of 20 points
32Cluster Comparison
Set cutoff to yield 7 clusters
33Results of Single Linkage Clustering
7 clusters one large cluster and 6 singletons
34Results of Complete Linkage Clustering
7 clusters 5 compact clusters and 2 singletons
35Complete vs Single Linkage
- Complete Linkage
- tends to produce clusters with equal diameters
- can be severely distorted by outliers
- Single Linkage
- imposes no constraints on the shape of clusters
- has ability to detect elongated and irregular
clusters - but may miss compact clusters
36Clustering Based on Chemical Similarity
- Similarity measure calculate a numerical
distance between any two molecules - Characteristics of molecular descriptors
- Encode all important structural features
- Distinguish closely related analogs
- Common among diverse classes of compounds
- Easy to calculate
- Recognize chemically equivalent groups as similar
37Molecular Descriptors (Fingerprints)
Example of predefined set of 2D structural
features
A any atom except H Z N, O, S blue
any bond
382D Fingerprints
A any atom except H Z N, O, S blue
any bond
Fingerprint
Compound 1
1
6
39Chemical Similarity Measure
a 1 bits in compound A b 1 bits in
compound B c 1 bits in both A and B
40Example of 2D Fingerprint
Fingerprint
1
6
Compound 2
41Calculation of Tanimoto Coefficient
a 1 bits in compound A b 1 bits in
compound B c 1 bits in both A and B
a 11
b 10
c 9
dTan 9/12 0.75
42Case Study Multidrug Resistance in Cancer
- MDR1 (ABCB1) encodes P-glycoprotein
- ATP-dependent efflux pump
- Broad drug specificity doxorubicin, etoposide,
paclitaxel, vincristine, bisantrene - Tissue occurrence intestine, liver, kidney,
placenta, blood-brain barrier
Huang, Y. et. al. Pharmacogenomics J 2005, 5,
(2), 112-25
43ABC Transporters Involved in Chemoresistance
Common Names
Gene Symbol
Substrates
Associated disease
Neutral and cationic Organic compounds, many
anticancer drugs
Cancer chemoresistance
Pgp, MDR1
ABCB1
MRP1
ABCC1
Glutathione conjugates, organic anions, drugs
Unknown
Glutathione conjugates, organic anions, drugs
Dubin-Johnson syndrome
MRP2
ABCC2
Unknown
Glutathione conjugates, anti- Folates, bile
acids, etoposide
MRP3
ABCC3
Unknown
Nucleoside analogs, methotrexate
MRP4
ABCC4
Nucleoside analogs, cyclic nucleotides, organic
anions
MRP5
ABCC5
Unknown
MRP-6
ABCC6
Anionic cyclic pentapeptide
Pseudoxanthoma elasticum
ABCG2
Unknown
Anthracyclines, mitoxantrone
MXR, BCRP
Modified from Gottesman et al. Nature Reviews,
2002 Dean et al. 2001 Genome Research
44Datasets Used in Study
- Microarray data
- 70-mer oligonucleotide probes
- 732 human transporter genes
- NCI 60 cancer cell lines
- Compound activities from NCI database
- Growth inhibition (-log(GI50)) of 7,466 compounds
- Tested at least twice
- Less than 50 missing data
Source http//dtp.nci.nih.gov/docs/cancer/cancer
_data.html
45Conceptual Framework
60 Cell Lines
732 Genes
7,466 Cmpds
7,466 Cmpds
60 Cell Lines
27,000 Features
Weinstein, et. al. Science, 1997 275 343-49.
46Conceptual Framework
732 Genes
60 Cell Lines
7,466 Cmpds
60 Cell Lines
7,466 Cmpds
27,000 Features
732 Genes
SATT (Feature Gene Correlation)
27,000 Features
47Gene-Compound Correlations
- Breast
- CNS
- Colon
- Leukemia
- Lung
- Melanoma
- Ovarian
- Prostate
- Renal
OVCAR-8
across NCI60 cell lines, r -0.87
48Compound-MDR1 Correlations
3 IQR Outliers pos 3 neg 124
49Clustering of NCI-60 Cell Lines
Cell lines clustered on expression patterns of
1,376 selected genes Used correlation as distance
metric (1 r) Average linkage clustering Cell
line panels
U. Scherf, et. al., Nature Genet., 2000, 24,
23644, Fig. 2.
50MDR1-expressing Cell Lines
- NCI-60 cell lines over-expressing MDR1
- NCI/ADR-RES (unknown tissue origin)
- HCT-15 (colon)
- UO-31 (renal)
- Comparison cell line OVCAR-8 (ovarian)
- Most similar to NCI/ADR-RES comparing cDNA
expression patterns - Does not over-expressing MDR1
51Structure-Based Cluster Analysis
- Cluster 7,466 compounds on structural similarity
using Leadscope feature set with Tanimoto
coefficient - Complete linkage clustering gave 1,859 clusters,
Tanimoto cutoff 0.7 - 468 singletons
- For each cluster, computed mean compound-mdr1
correlation - 15 classes of 5 compounds with z-score gt 3.0
- 34 classes of 5 compounds with z-score lt -3.0
52Cluster Analysis
Large classes with negative correlations
ABCB1 (mdr1)
? GI50
Count Mean Z-score Mean Z-score
44 -0.42 -16.5 -2.22 -22.2 17 -0.36 -8.5 -1
.14 -7.2 17 -0.35 -8.4 -0.99 -6.3
? GI50 difference between LNSNCI/ADR-RES and
OVAOVCAR-8
53Cluster Analysis
Large classes with positive correlations
ABCB1 (mdr1)
? GI50
Count Mean Z-score Mean Z-score
41 0.12 6.3 0.0 1.8 15 0.21 6.1 0.24 2.8
13 0.11 3.5 -0.12 0.2
? GI50 difference between LNSNCI/ADR-RES and
OVAOVCAR-8
54Cluster Analysis
Two subclasses of ellipticine analogs
ABCB1 (mdr1)
? GI50
Count Mean Z-score Mean Z-score
17 -0.36 -8.5 -1.14 -7.2 13 0.11 3.5 -0.12
0.2
? GI50 difference between LNSNCI/ADR-RES and
OVAOVCAR-8
55R-Group Analysis of Ellipticines
R1 R2 Freq
MDR1 Correl.
DGI50
mean zscore
mean zscore
? GI50 difference between LNSNCI/ADR-RES and
OVAOVCAR-8
56Compounds Selected for Dosing
57Experimental Validation
- Cytotoxicity of selected ellipticine analogs in
NCI/ADR-RES was studied by - siRNA downregulation
- inhibition of MDR1 by Cyclosporin A
- Inhibition of MDR1-mediated efflux of fluorescent
markers
58Ellipticine Dosing Experiments
growth inhibition
drug concentration (µM)
drug concentration (µM)
Drug only
59Ellipticine Dosing Experiments
growth inhibition
drug concentration (µM)
drug concentration (µM)
5 µM CsA
Drug only
ABCB1 siRNA
Mock siRNA
5 µM CsA
Drug only
ABCB1 siRNA
Mock siRNA
Drug only
Drug only
60Ellipticine Dosing Experiments
growth inhibition
drug concentration (µM)
drug concentration (µM)
Drug only
61Summary of Dosing Experiments
- Ellipticinium analogs (NSC 155694 and 359449) are
MDR1 substrates - siRNA mediated down-regulation of MDR1 expression
decreased GI50 values in ADR-RES cells by at
least 2 fold. - Addition of cyclosporin A, an inhitibor of MDR1,
decreased GI50 values by at least 20 fold. - Neutral ellipticines (NSC 86717, 69187, 338258)
are not MDR1 substrates - No marked effect on GI50 values in ADR-RES cells
was observed.
62Analysis of Compound-Gene Correlations
- Summary
- General method for discovering associations
between compound classes and molecular targets - Rapidly identified compound classes with relevant
genes using NCI dataset - In-silico technique aids experimental design
- Can provide insights into mode of resistance and
mechanism of action
63Analysis of Compound-Gene Correlations
- References
- Huang, Y. et. al. Correlating gene expression
with chemical scaffolds of cytotoxic agents
ellipticines as substrates and inhibitors of
MDR1. Pharmacogenomics J 2005, 5, (2), 112-25 - Huang, Y. et. al. Membrane transporters and
channels role of the transportome in cancer
chemosensitivity and chemoresistance. Cancer Res
2004, 64, (12), 4294-301 - Blower, P. E. et. al. Pharmacogenomic analysis
correlating molecular substructure classes with
microarray gene expression data. Pharmacogenomics
J 2002, 2, (4), 259-71 - Staunton, J. E. et. al. Chemosensitivity
prediction by transcriptional profiling. PNAS
2001, 98, (19), 10787-10792 - Scherf, U. et. al. A gene expression database
for the molecular pharmacology of cancer. Nat
Genet 2000, 24, (3), 236-44 - Weinstein, J. N. et. al. An information-intensiv
e approach to the molecular pharmacology of
cancer. Science 1997, 275, (5298), 343-9
64Genomic Profiling with Yeast Library
- Heterozygous yeast deletion library
- each strain has single copy of gene
- 6,000 deletion strains
- covers 96.5 of yeast genome
- Heterozygous strain sensitive to any drug acting
on gene product - Haploinsufficiency profiling
Giaever, G. et. al. Proc Natl Acad Sci U S A
2004, 101, (3), 793-8
65Drug Induced Haploinsufficiency
- Drug treatment of heterozygous vs wild-type yeast
strains - ALG7 (asparagine-linked glycosyl-transferase)
known target of tunicamycin
Tunicamycin 0 mg/ml
0.5 mg/ml 2.0 mg/ml
Giaever, G. et. al. Nat Genet 1999, 21, (3),
278-83.
66Heterozygote Deletion Construct
DNA extracted and all tags amplified in a single
PCR reaction using common primers
kanamycin gene allows selection of yeast
tranformants
unique 20-base hybridization tag strain
specific bar-code
67Drug Sensitivity Library Profiling
no drug with drug
68Haploinsufficiency Profiling
- Heterzygous deletion library treated with 250 mM
methotrexate - Fitness defect score (FD) proportional to
likelihood of seeing experimental value - DFR1 is known target of methotrexate
- FOL2 required for biosynthesis of folic acid in
yeast upstream from DHF - no human homolog of FOL1
- human homolog of YBT1 encodes methotrexate
transporter, up-regulation causes methotrexate
resistance - YOR072W encodes suspected methotrexate transporter
69Haploinsufficiency Profiling
Atorvastatin (Lipitor) concentration
0 mM 62.5 mM 125 mM 250 mM
HMG1, HMG2 yeast isozymes of drug target
3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) reductase
70Haploinsufficiency Profiling
- Summary
- Systematic method for identifying gene products
that interact with drugs - Simultaneous screening of all interactions
between compounds and gene products in yeast - Chemical genetic probes for further functional
studies in other organisms - References
- Giaever, G. et. al. Chemogenomic profiling
identifying the functional interactions of small
molecules in yeast. Proc Natl Acad Sci U S A
2004, 101, (3), 793-8. - Giaever, G. et. al. Functional profiling of the
Saccharomyces cerevisiae genome. Nature 2002,
418, (6896), 387-91. - Giaever, G. et. al. Genomic profiling of drug
sensitivities via induced haploinsufficiency. Nat
Genet 1999, 21, (3), 278-83.
71Connectivity Map
- Database of drug-gene signatures for linking
drugs, genes and diseases - Profiled bioactive small molecules in four NCI60
cell lines - mRNA expression levels were measured after drug
treatment giving a reference signature - rank ordered list of genes ordered by
differential expression relative to control - database can be searched by comparing a query
signature of up- and down-regulated genes with
reference signatures
Lamb, J. et. al. Science 2006, 313, 1929-35
72Connectivity Map
- Search database by comparing a query signature
with reference signatures
Query signature list of up- and down-regulated
genes
Reference signatures - ranked gene lists for
compounds
Output lists of high and low scoring compounds
73Summary of Chemogenomics
- Consequences of differential gene/protein
expression on cellular response to compound
treatment - Chemoresistance studies using the NCI-60
- System-wide effect of a compound on an intact
biological system - Genomic profiling with yeast deletion library
- Connectivity map