Title: Novel Approaches To Molecular Similarity Molecular Similarity Searching Using COSMO Screening Charge
1Novel Approaches To Molecular Similarity
(Molecular Similarity Searching Using COSMO
Screening Charges)
- Andreas Bender, ab454_at_cam.ac.uk
- Unilever Centre for Molecular Science
Informatics, - Chemistry Dept, University of Cambridge, UK
2Outline
- Introduction to Molecular Similarity
- What is it and why is it hugely important?
- Our approach to molecular similarity Finding
many active compounds vs. generalizability? - How to compare molecules
- The algorithm
- Results
- Special Feature Discovery of Binding Patterns
3Similarity Searching What is it?
- Describes the identification of molecules
similar to a given molecule (in analogy to the
psychological concept of similarity in
perception) - Complementary approach to substructural searching
- In substructure searching exact retrieval of a
subgraph of a molecule is performed - In similarity searching, an abstract molecular
representation in descriptor space is calculated
which is compared to abstract representations of
other molecules - For reviews see e.g.
- Bender, A. and Glen, R.C., Org. Biomol. Chem.,
2004, (2) 3204 3218. - (freely available from www.cheminformatics.org)
4Similarity Searching Why is it relevant?
- The old complaint It is expensive to bring drugs
to the market (800 Mio USD) and it takes long
(10 years) - Similarity searching can help find new drugs
(rather their early stage companions, hits or
leads) by picking the most promising compounds
to synthesize, test - Even de novo (computer-based) design of
completely new structures is possible - This decreases the need for animal testing, it
saves time and money - gt It is a good thing to do
Well well - but HOW?
5Similarity Searching Illustration
- We have a red Porsche, a blue Ferrari and a red
kettcar. Which ones are similar? - 1. Abstract representation Top Speed The Porsche
and the Ferrari are similar - 2. Abstract representation Colour The Porsche
and the kettcar are similar (!) - Same problem with molecules but they dont have
colours (means, the important properties are not
obvious). How do you encode those molecules to be
similar in some abstract representation?
6Molecular Similarity The way it should be and
the way it is
- The God of Molecular Similarity (according to
Google Picture Search) - The Molecular Similarity Principle
- Small structural changes cause small property
differences - Basis of all current structure-property
predictions - A is active, B is not how about C?
- Solubility of A is known how does it change if
we add group X here?
7Does the Molecular Similarity Principle work?
8The importance of shape
Slides courtesy of Hugo Kubinyi, Erlangen
Lectures see http//www.cheminformatics.org -gt
Links -gt Education
9What is the relevant property?
- Usually no one knows / it depends on the
particular system - Current descriptors treat molecules as static
entities but even by definition receptor
binding involves dynamical motions of the protein - No agreement exists which kind of interaction of
the ligand with the receptor actually causes (for
example agonistic or antagonistic) action Is it
occupancy? Is it on-off rates? Or some completely
different property? - How do you encode shape / surface properties??
(We are not dealing with 1-dimensional entities
like proteins / DNA, we encounter rings,
branching!)
10Is it possible and sensible to define molecular
similarity?
- YES, but one needs to be careful
- Similarity depends on the Context (e.g. the
particular receptor easy in case of
non-directional properties, e.g. solubility) - Similar changes may have different (even
detrimental!) effects, depending on system - Chiral molecules (same structure, but different
stereochemistry, like your left and right hand)
may have totally different activities sometimes
problematic to capture
11Our approaches to molecular similarity
- How to describe and compare molecules
- Description of the system in a suitable form
- Selection of important features
- Model generation and prediction
- Results
- Special Feature Discovery of Binding Patterns
12Descriptor Choice
132D Environment around an atom (MOLPRINT 2D,
a.k.a. Atom Environments)
Assign Sybyl mol2 atom types find
connections find connections to
connections create a tree down to n levels bin
the atom types for each level create a
fingerprint for this atom
N2
Level 0 Level 1 Level 2
Car Car
Car, Car, Car
1
2
1
1
These features are created for every (heavy) atom
in the molecule (Bender, A. et al., J. Chem. Inf.
Comput. Sci. 2004, 44, 170-178)
14Feature Selection
- E.g. comparing faces first requires the
identification of key features. - How do we identify these?
- The same applies to molecules.
15B) Information-Gain Feature Selection
- How can we select the active compounds (red)?
Red Active Green Inactive
?
?
?
16C) Naïve Bayesian Classifier (classification by
presumptive evidence)
- The next step is to identify which molecules
belong to which class. - Example from e-mail classification (spam
detection) - Training set from nerdy chemists inbox
- Assign weighting factors to individual features,
depending on relative frequencies in training set
17Classification
- To do this we use a Naïve Bayesian Classifier
using the features (atom environments) we have
identified as being important. - Ratio gt 1 Class membership 1
- Ratio lt 1 Class membership 2
- F feature vector
- fifeature elements
18Application lead discovery
- If I have one active compound will I find
another one? - Database MDL Drug Data Report (MDDR)
- 957 ligands selected from MDDR
- 49 5HT3 Receptor antagonists,
- 40 Angiotensin Converting Enzyme inhib. (ACE),
- 111 HMG-Co-Reductase inhibitors (HMG),
- 134 PAF antagonists and
- 49 Thromboxane A2 antagonists (TXA2)
- 574 inactives
- Briem and Lessel, Perspect Drug Discov Des
2000, 20, 245-264. - Calculated Hit rate among ten nearest neighbours
for each molecule
19Comparison Single Queries
Using Tanimoto Coefficient
Using Bayesian
- Grey bars Briem and Lessel, Perspect. Drug Disc.
Des., 2000, 20, 245-264. - Black Bender, A., et al., J. Chem. Inf. Comput.
Sci., 2004, 44, 1708 1718.
20Combining Information of 5 Actives
Bender, A., et al., J. Chem. Inf. Comput. Sci.,
2004, 44, 170 178.
21TXA2, Graph-based Descriptors
Query
1
2
3
4
5
6
7
Very little diversity in heterocyclic systems
no patents, no money, no good!
223D Environment around a surface point solvent
accessible surface using local surface properties
Central Point (Layer 0)
Points in Layer 1
Etc.
Bender, A., et al., J. Med. Chem., 2004, (47)
6569-6583 IEEE SMC 2004 Proceedings
23Overall Performance Comparable to 2D methods
Bender, A. et al., J. Chem. Inf. Comput. Sci.
2004 (44) 170 178.
24TXA2, 7 Hits among Top 10 by MOLPRINT 3D
Query
1
2
3
4
5
6
7
25Which features are selected for classification?
- Even if your classifier works, do the selected
features make sense? - Set of active vs. inactive molecules
- Information Gain calculated for each feature,
those which are much more frequent among actives
are suspicious and might constitute the
pharmacophore - Look at features from HMG-CoA Reductase Inhibitors
26Selected Features - HMG
- Binding Site HMG rigid lipophilic ring
27HMG-15
28Disadvantages
- Multiple probes had to be employed to cover
putative interactions sufficiently - Force fields neglect polarization /
back-polarization effects (that is, it calculated
charges which are not seen in solution) - Force fields (usually) employ point charges, thus
they dont capture directionality of some
interactions such as hydrogen bonds - -gt Use more sophisticated QM method!
29COSMO Calculation of screening charges in ideal
conductor
30Why COSMO-RS Properties?
- Interactions derived from first principles on
single scale - Gives directionality of H-acceptor lobes (unlike
most force fields exceptions are e.g. the XED
force field by Andy Vinter / Cresset) - Employs solvent model, polarization /
back-polarization - Classification in agreement with chemical
intuition (e.g. O of ester, but not O- is
H-bond acceptor) - Inaccessible atoms not used (no accessible
surface) - Secondary effects captured which are not
accounted for by atom-typing
31COSMO ?-Profile
32A HMG-CoA Reductase Inhibitor
- Statin binding to HMG-CoA reductase involves
charge interactions of a carboxylic acid group
and hydrogen bond donor/acceptor functions to the
pyruvate binding site - In addition large lipophilic groups of the ligand
is required which binds to a floppy lipophilic
pocket of the target protein. - Features can be well distinguished from ?
screening charges - Carboxylate is shown to the right (purple),
hydrogen bond acceptor functions beneath side
chain (red) - Hydrogen bond donor functions point towards
viewer (blue) while the lipophilic bulk of the
structure is given in green
33Encoding as 3-Point Pharmacophores
- Average ?-values calculated for each heavy atom
atom typing (pos, neg, lipo, acceptor, donor)
according to heavy atom average
Courtesy of Martin Stahl (Roche)
34Comparison to other methods
35Scaffolds Found
36Summary
- 2D Method Finds lots of active molecules but
they are similar to what is known already - (Bender, A., et al., J. Chem. Inf. Comput. Sci.
(2004) 44, 170-178 Bender, A., et al., J. Chem.
Inf. Comput. Sci., 2004, 44, 1708 1718.) - 3D Method Find less active compounds but
enables discovery of new chemotypes - (Bender, A. et al., J. Med. Chem., 2004, (47)
6569-6583.) - Features shown to correlate with binding patterns
- Similarity searching using screening charges
derived from first principles shows good
performance and possesses a sound theoretical
basis
37Acknowledgements
- Robert C Glen (Unilever Centre, Cambridge, UK)
- Hamse Y. Mussa (Unilever Centre, Cambridge, UK)
- Stephan Reiling (Aventis, Bridgewater, USA)
- Software
- GRID, CACTVS, gOpenMol many, many others
- Funding
- Bill Gates, Unilever, Tripos