Title: Novel Approaches To Molecular Similarity
1Novel Approaches To Molecular Similarity
- Andreas Bender, ab454_at_cam.ac.uk
- Unilever Centre for Molecular Informatics,
- University of Cambridge, UK
2Outline
- Molecular Similarity What is it and why is it
relevant? - Our approach to molecular similarity Finding
many active compounds vs. generalizability? - How to compare molecules
- The algorithm
- Results
- Special Feature Discovery of Binding Patterns
3Similarity Searching
- Complementary approach to substructural searching
- In substructure searching exact retrieval of a
subgraph of a molecule is performed - In similarity searching, an abstract molecular
representation in descriptor space is calculated
which is compared to abstract representations of
other molecules - For reviews see e.g.
- Bender, A. and Glen, R.C., Org. Biomol. Chem.,
2004, (2) 3204 3218. - (freely available from www.cheminformatics.org)
4Molecular Similarity The way it should be and
the way it is
- The God of Molecular Similarity (according to
Google Picture Search) - The Molecular Similarity Principle
- Small structural changes cause small property
differences - Basis of all current structure-property
predictions - A is active, B is not how about C?
- Solubility of A is known how does it change if
we add group X here?
5Does the Molecular Similarity Principle work (1)?
6Does the Molecular Similarity Principle work (2) ?
7The importance of shape
Slides courtesy of Hugo Kubinyi, Erlangen
Lectures see http//www.cheminformatics.org -gt
Links -gt Education
8Is it possible and sensible to define molecular
similarity?
- YES, but one needs to be careful
- Similarity depends on the Context (e.g. the
particular receptor easy in case of
non-directional properties, e.g. logP) - Similar changes may have different (even
detrimental!) effects, depending on system - Chiral molecules may have totally different
activities sometimes problematic to capture
9Our approaches to molecular similarity
- How to describe and compare molecules
- Description of the system in a suitable form
- Selection of important features
- Model generation and prediction
- Results
- Special Feature Discovery of Binding Patterns
10Descriptor Choice
112D Environment around an atom (MOLPRINT 2D,
a.k.a. Atom Environments)
Assign Sybyl mol2 atom types find
connections find connections to
connections create a tree down to n levels bin
the atom types for each level create a
fingerprint for this atom
N2
Level 0 Level 1 Level 2
Car Car
Car, Car, Car
1
2
1
1
These features are created for every (heavy) atom
in the molecule (J. Chem. Inf. Comput. Sci. 2004,
44, 170-178 2004, 44, 1710-1718)
12Feature Selection
- E.g. comparing faces first requires the
identification of key features. - How do we identify these?
- The same applies to molecules.
13B) Information-Gain Feature Selection
- How can we select the active compounds (red)?
Red Active Green Inactive
?
?
?
14C) Naïve Bayesian Classifier (classification by
presumptive evidence)
- The next step is to identify which molecules
belong to which class. - Example from e-mail classification (spam
detection) - Training set from nerdy chemists inbox
- Assign weighting factors to individual features,
depending on relative frequencies in training set
15Classification
- To do this we use a Naïve Bayesian Classifier
using the features (atom environments) we have
identified as being important. - Ratio gt 1 Class membership 1
- Ratio lt 1 Class membership 2
- F feature vector
- fifeature elements
16Application lead discovery
- If I have one active compound will I find
another one? - Database MDL Drug Data Report (MDDR)
- 957 ligands selected from MDDR
- 49 5HT3 Receptor antagonists,
- 40 Angiotensin Converting Enzyme inhib. (ACE),
- 111 HMG-Co-Reductase inhibitors (HMG),
- 134 PAF antagonists and
- 49 Thromboxane A2 antagonists (TXA2)
- 574 inactives
- Briem and Lessel, Perspect Drug Discov Des
2000, 20, 245-264. - Calculated Hit rate among ten nearest neighbours
for each molecule
17Comparison Single Queries
Using Tanimoto Coefficient
Using Bayesian
- Grey bars Briem and Lessel, Perspect. Drug Disc.
Des., 2000, 20, 245-264. - Black Bender, A., et al., J. Chem. Inf. Comput.
Sci., 2004, 44, 1708 1718.
18Combining Information of 5 Actives
Bender, A., et al., J. Chem. Inf. Comput. Sci.,
2004, 44, 170 178.
19TXA2, Graph-based Descriptors
Query
1
2
3
4
5
6
7
Very little diversity in heterocyclic systems
no patents, no money!
203D Environment around a surface point solvent
accessible surface using local surface properties
Central Point (Layer 0)
Points in Layer 1
Etc.
21Overall Performance Comparable to 2D methods
Bender, A. et al., J. Chem. Inf. Comput. Sci.
2004 (44) 170 178.
22TXA2, 7 Hits among Top 10 by MOLPRINT 3D
Query
1
2
3
4
5
6
7
23Which features are selected for classification?
- Even if your classifier works, do the selected
features make sense? - Set of active vs. inactive molecules
- Information Gain calculated for each feature,
those which are much more frequent among actives
are suspicious and might constitute the
pharmacophore - Look at features from HMG and TXA2
24Selected Features - HMG
- Binding Site HMG rigid lipophilic ring
25HMG-15
26TXA2
Yellow lipophilic side chains
- Yamamoto et al., J. Med. Chem. 1993 (36) 820
27TXA2-44
28TXA2-7
29Summary
- 2D Method Finds lots of active molecules but
they are similar to what is known already - (Bender, A., et al., J. Chem. Inf. Comput. Sci.
(2004) 44, 170-178 Bender, A., et al., J. Chem.
Inf. Comput. Sci., 2004, 44, 1708 1718.) - 3D Method Find less active compounds but
enables discovery of new chemotypes - (Bender, A. et al., J. Med. Chem., 2004, (47)
6569-6583.) - Features shown to correlate with binding patterns
30Acknowledgements
- Robert C Glen (Unilever Centre, Cambridge, UK)
- Hamse Y. Mussa (Unilever Centre, Cambridge, UK)
- Stephan Reiling (Aventis, Bridgewater, USA)
- David Patterson (Tripos)
- Software
- GRID, CACTVS, gOpenMol many, many others
- Funding
- The Gates Cambridge Trust, Unilever, Tripos