Novel Approaches To Molecular Similarity Molecular Similarity Searching Using COSMO Screening Charge PowerPoint PPT Presentation

presentation player overlay
1 / 37
About This Presentation
Transcript and Presenter's Notes

Title: Novel Approaches To Molecular Similarity Molecular Similarity Searching Using COSMO Screening Charge


1
Novel Approaches To Molecular Similarity
(Molecular Similarity Searching Using COSMO
Screening Charges)
  • Andreas Bender, ab454_at_cam.ac.uk
  • Unilever Centre for Molecular Science
    Informatics,
  • Chemistry Dept, University of Cambridge, UK

2
Outline
  • Introduction to Molecular Similarity
  • What is it and why is it hugely important?
  • Our approach to molecular similarity Finding
    many active compounds vs. generalizability?
  • How to compare molecules
  • The algorithm
  • Results
  • Special Feature Discovery of Binding Patterns

3
Similarity Searching What is it?
  • Describes the identification of molecules
    similar to a given molecule (in analogy to the
    psychological concept of similarity in
    perception)
  • Complementary approach to substructural searching
  • In substructure searching exact retrieval of a
    subgraph of a molecule is performed
  • In similarity searching, an abstract molecular
    representation in descriptor space is calculated
    which is compared to abstract representations of
    other molecules
  • For reviews see e.g.
  • Bender, A. and Glen, R.C., Org. Biomol. Chem.,
    2004, (2) 3204 3218.
  • (freely available from www.cheminformatics.org)

4
Similarity Searching Why is it relevant?
  • The old complaint It is expensive to bring drugs
    to the market (800 Mio USD) and it takes long
    (10 years)
  • Similarity searching can help find new drugs
    (rather their early stage companions, hits or
    leads) by picking the most promising compounds
    to synthesize, test
  • Even de novo (computer-based) design of
    completely new structures is possible
  • This decreases the need for animal testing, it
    saves time and money
  • gt It is a good thing to do

Well well - but HOW?
5
Similarity Searching Illustration
  • We have a red Porsche, a blue Ferrari and a red
    kettcar. Which ones are similar?
  • 1. Abstract representation Top Speed The Porsche
    and the Ferrari are similar
  • 2. Abstract representation Colour The Porsche
    and the kettcar are similar (!)
  • Same problem with molecules but they dont have
    colours (means, the important properties are not
    obvious). How do you encode those molecules to be
    similar in some abstract representation?

6
Molecular Similarity The way it should be and
the way it is
  • The God of Molecular Similarity (according to
    Google Picture Search)
  • The Molecular Similarity Principle
  • Small structural changes cause small property
    differences
  • Basis of all current structure-property
    predictions
  • A is active, B is not how about C?
  • Solubility of A is known how does it change if
    we add group X here?

7
Does the Molecular Similarity Principle work?
8
The importance of shape
Slides courtesy of Hugo Kubinyi, Erlangen
Lectures see http//www.cheminformatics.org -gt
Links -gt Education
9
What is the relevant property?
  • Usually no one knows / it depends on the
    particular system
  • Current descriptors treat molecules as static
    entities but even by definition receptor
    binding involves dynamical motions of the protein
  • No agreement exists which kind of interaction of
    the ligand with the receptor actually causes (for
    example agonistic or antagonistic) action Is it
    occupancy? Is it on-off rates? Or some completely
    different property?
  • How do you encode shape / surface properties??
    (We are not dealing with 1-dimensional entities
    like proteins / DNA, we encounter rings,
    branching!)

10
Is it possible and sensible to define molecular
similarity?
  • YES, but one needs to be careful
  • Similarity depends on the Context (e.g. the
    particular receptor easy in case of
    non-directional properties, e.g. solubility)
  • Similar changes may have different (even
    detrimental!) effects, depending on system
  • Chiral molecules (same structure, but different
    stereochemistry, like your left and right hand)
    may have totally different activities sometimes
    problematic to capture

11
Our approaches to molecular similarity
  • How to describe and compare molecules
  • Description of the system in a suitable form
  • Selection of important features
  • Model generation and prediction
  • Results
  • Special Feature Discovery of Binding Patterns

12
Descriptor Choice
13
2D Environment around an atom (MOLPRINT 2D,
a.k.a. Atom Environments)
  • E.g. 6-aminoquinoline

Assign Sybyl mol2 atom types find
connections find connections to
connections create a tree down to n levels bin
the atom types for each level create a
fingerprint for this atom
N2
Level 0 Level 1 Level 2
Car Car
Car, Car, Car
1
2
1
1
These features are created for every (heavy) atom
in the molecule (Bender, A. et al., J. Chem. Inf.
Comput. Sci. 2004, 44, 170-178)
14
Feature Selection
  • E.g. comparing faces first requires the
    identification of key features.
  • How do we identify these?
  • The same applies to molecules.

15
B) Information-Gain Feature Selection
  • How can we select the active compounds (red)?

Red Active Green Inactive
?
?
?
16
C) Naïve Bayesian Classifier (classification by
presumptive evidence)
  • The next step is to identify which molecules
    belong to which class.
  • Example from e-mail classification (spam
    detection)
  • Training set from nerdy chemists inbox
  • Assign weighting factors to individual features,
    depending on relative frequencies in training set

17
Classification
  • To do this we use a Naïve Bayesian Classifier
    using the features (atom environments) we have
    identified as being important.
  • Ratio gt 1 Class membership 1
  • Ratio lt 1 Class membership 2
  • F feature vector
  • fifeature elements

18
Application lead discovery
  • If I have one active compound will I find
    another one?
  • Database MDL Drug Data Report (MDDR)
  • 957 ligands selected from MDDR
  • 49 5HT3 Receptor antagonists,
  • 40 Angiotensin Converting Enzyme inhib. (ACE),
  • 111 HMG-Co-Reductase inhibitors (HMG),
  • 134 PAF antagonists and
  • 49 Thromboxane A2 antagonists (TXA2)
  • 574 inactives
  • Briem and Lessel, Perspect Drug Discov Des
    2000, 20, 245-264.
  • Calculated Hit rate among ten nearest neighbours
    for each molecule

19
Comparison Single Queries
Using Tanimoto Coefficient
Using Bayesian
  • Grey bars Briem and Lessel, Perspect. Drug Disc.
    Des., 2000, 20, 245-264.
  • Black Bender, A., et al., J. Chem. Inf. Comput.
    Sci., 2004, 44, 1708 1718.

20
Combining Information of 5 Actives
Bender, A., et al., J. Chem. Inf. Comput. Sci.,
2004, 44, 170 178.
21
TXA2, Graph-based Descriptors
Query
1
2
3
4
5
6
7
Very little diversity in heterocyclic systems
no patents, no money, no good!
22
3D Environment around a surface point solvent
accessible surface using local surface properties
Central Point (Layer 0)
Points in Layer 1
  • Points in Layer 2

Etc.
Bender, A., et al., J. Med. Chem., 2004, (47)
6569-6583 IEEE SMC 2004 Proceedings
23
Overall Performance Comparable to 2D methods


Bender, A. et al., J. Chem. Inf. Comput. Sci.
2004 (44) 170 178.
24
TXA2, 7 Hits among Top 10 by MOLPRINT 3D
Query
1
2
3
4
5
6
7
25
Which features are selected for classification?
  • Even if your classifier works, do the selected
    features make sense?
  • Set of active vs. inactive molecules
  • Information Gain calculated for each feature,
    those which are much more frequent among actives
    are suspicious and might constitute the
    pharmacophore
  • Look at features from HMG-CoA Reductase Inhibitors

26
Selected Features - HMG
  • Binding Site HMG rigid lipophilic ring

27
HMG-15
28
Disadvantages
  • Multiple probes had to be employed to cover
    putative interactions sufficiently
  • Force fields neglect polarization /
    back-polarization effects (that is, it calculated
    charges which are not seen in solution)
  • Force fields (usually) employ point charges, thus
    they dont capture directionality of some
    interactions such as hydrogen bonds
  • -gt Use more sophisticated QM method!

29
COSMO Calculation of screening charges in ideal
conductor

30
Why COSMO-RS Properties?
  • Interactions derived from first principles on
    single scale
  • Gives directionality of H-acceptor lobes (unlike
    most force fields exceptions are e.g. the XED
    force field by Andy Vinter / Cresset)
  • Employs solvent model, polarization /
    back-polarization
  • Classification in agreement with chemical
    intuition (e.g. O of ester, but not O- is
    H-bond acceptor)
  • Inaccessible atoms not used (no accessible
    surface)
  • Secondary effects captured which are not
    accounted for by atom-typing

31
COSMO ?-Profile
32
A HMG-CoA Reductase Inhibitor
  • Statin binding to HMG-CoA reductase involves
    charge interactions of a carboxylic acid group
    and hydrogen bond donor/acceptor functions to the
    pyruvate binding site
  • In addition large lipophilic groups of the ligand
    is required which binds to a floppy lipophilic
    pocket of the target protein.
  • Features can be well distinguished from ?
    screening charges
  • Carboxylate is shown to the right (purple),
    hydrogen bond acceptor functions beneath side
    chain (red)
  • Hydrogen bond donor functions point towards
    viewer (blue) while the lipophilic bulk of the
    structure is given in green

33
Encoding as 3-Point Pharmacophores
  • Average ?-values calculated for each heavy atom
    atom typing (pos, neg, lipo, acceptor, donor)
    according to heavy atom average

Courtesy of Martin Stahl (Roche)
34
Comparison to other methods
35
Scaffolds Found
36
Summary
  • 2D Method Finds lots of active molecules but
    they are similar to what is known already
  • (Bender, A., et al., J. Chem. Inf. Comput. Sci.
    (2004) 44, 170-178 Bender, A., et al., J. Chem.
    Inf. Comput. Sci., 2004, 44, 1708 1718.)
  • 3D Method Find less active compounds but
    enables discovery of new chemotypes
  • (Bender, A. et al., J. Med. Chem., 2004, (47)
    6569-6583.)
  • Features shown to correlate with binding patterns
  • Similarity searching using screening charges
    derived from first principles shows good
    performance and possesses a sound theoretical
    basis

37
Acknowledgements
  • Robert C Glen (Unilever Centre, Cambridge, UK)
  • Hamse Y. Mussa (Unilever Centre, Cambridge, UK)
  • Stephan Reiling (Aventis, Bridgewater, USA)
  • Software
  • GRID, CACTVS, gOpenMol many, many others
  • Funding
  • Bill Gates, Unilever, Tripos
Write a Comment
User Comments (0)
About PowerShow.com