Title: A novel approach to analysis of primary HTS data
1Compound Set Enrichment
- A novel approach to analysis of primary HTS data
Thibault Varin
Ansgar Schuffenhauer
Gubler, H., Parker, C., Zhang, JH., Raman, P.,
Ertl, P.
2INTRODUCTION
Compound Set Enrichment
3Introduction
- Active series identification Can relevant SAR be
extracted from primary HTS data? - Are activity data binary or continuous?
4IntroductionActive series identification
Hypothesis 1 Within primary HTS screening
data, structure activity relationships (SAR) are
apparent and can be used to help selecting active
compound classes.
5IntroductionAre the activity data binary or
continuous?
Activity
Scaffold 1
Scaffold 2
- Binary activity
- 1 active / 5 inactives
- Scaffold 1 Scaffold 2
Continuous activity Scaffold 1 gt Scaffold 2
6Introduction Are the activity data binary or
continuous?
Threshold 1 Activity
Threshold 2 Activity
Binary scaffold activity is different according
to the threshold
Hypothesis 2 Methods based on an activity
cut-off distort the activity information leading
to the incorrect assignment of active series of
compounds.
7METHODS
Compound Set Enrichment
8MethodsThe Scaffold Tree classification
The Scaffold Tree Visualization of the Scaffold
Universe by Hierarchical Scaffold Classification
A. Schuffenhauer, P. Ertl et al. J. Chem. Inf.
Model., 47, 47, 2007
9MethodsDatasets
- 7 PubChem bioassays
- Ranging from 9389 to 263679 compounds
- Ranging from 0.03 to 26.29 of active compounds
Hypothesis 1
PubChem Annotationfrom CRC
Simulation of the primary screening data
10Methods Single hypothesis test summary procedure
- 1. State the null and the alternative hypotheses
- H0 the scaffold is inactive
- H1 the scaffold is active
- 2. Specify a significance level a0.01
- 3. Compute the statistics and the p-value
)?p-valueprobability that the scaffold is
inactive (H0) - 4. Decision step
- p-valuegt a H0 is accepted
- p-valuelt a H0 is rejected and then H1 is
acceptedThe scaffold is active
11Methods The KS and the Binomial hypothesis tests
H0 there is no difference in the activity
distribution defined by compounds having the
scaffold S3-2 and the background distribution
H0 there is no difference in the proportion of
active compounds for compounds having the
scaffold S3-2 and the proportion of active
compounds for the full dataset.
Continuous data KS test
Binary data Binomial test
12Methods Multiple hypothesis tests Bonferroni
correction
- Problem of false positives
- a probability to identify as active an inactive
scaffold (for each test done...) - 100 inactive scaffolds probability to identify
an active by chance is equal 63 (1-0.99100)) - Suggests to test each scaffold at a critical
significance level equal to a 0.01 / Nbr of
scaffolds - Makes the assumption that the individual tests
are independent - Each level in the Scaffold Tree have been done
separately
13MethodsDetermining the activity of classes
Hypo 1
Hypo 2
Scaffold activity evaluation
Multiple hypothesis test correction (Bonferroni)
Comparison of results
14RESULTS
Compound Set Enrichment
15ResultsComparison of KSP and BTP predictions
Bioassay Total Total Total Total BPCA significantly actives BPCA significantly actives BPCA significantly actives BPCA non significantly actives BPCA non significantly actives BPCA non significantly actives
Bioassay KSP BTP ? BPCA KSP BTP ? KSP BTP ?
Hydroxysteroid dehydrogenase 330 231 99 199 183 168 15 147 63 84
Caspase-1 331 114 217 5 2 2 0 329 112 217
PK 12 4 8 12 3 3 0 9 1 8
Luciferase 67 12 55 15 13 11 2 54 1 53
Luciferase 178 48 130 41 32 35 -3 146 13 133
CYP450 2C9 58 33 25 34 34 31 3 24 2 22
CYP450 3A4 121 64 57 60 60 53 7 61 11 50
- With
- KSP KS Prediction
- BTP Binomial Threshold Prediction
- ? KSP-BTP
- BPCA Binomial PubChem Annotation
Both KSP and BTP retrieve BPCA significantly
active classes
Number of active classes KSP gt BTP
Most of new KSP active classes are not BPCA
significantly actives
16ResultsKSP significantly active scaffolds that
are in Pubchem inactives
Compound activity (PubChem Annotation) Active
Inconclusive Inactive
WA
WA
WA
WA
17ResultsPrioritize nodes instead of individual
scaffolds
Scaffold activity (KS Prediction /
Bonferroni) Non significantly active Significant
ly active
18ResultsVisualization tool (Peter Ertl)
19CONCLUSION
Compound Set Enrichment
20ConclusionCompound Set Enrichment
- Validation of initial hypotheses
- A method to mine HTS data and identify active
series of compounds - Chemical classification Scaffold Tree
- Statistical analysis Kolmogorov-Smirnov
hypothesis test - Multiple hypothesis test correction Bonferroni
correction - Use all primary data
- No activity cut-off
- Identification of new active scaffolds not
necessarily represented by very active compounds
(latent hits) during the primary screen
21With many thanks to
Acknowledgments
Primary mentor - Ansgar Schuffenhauer
Help MLI group
- Scientific advisers
- Christian Parker
- Hanspeter Gubler
- Ji-Hu Zhang
- Peter Ertl
- Edgar Jacoby
Fellowship Education office
- Discussions
- Martin Beibel
- Sebastian Bergling
- Meir Glick
- Alain Dietrich
- Marie-Cecile Didiot
22Questions?