Title: De Novo design tools for the generation of synthetically accessible ligands
1De Novo design tools for the generation of
synthetically accessible ligands
Peter Johnson, Krisztina Boda, Shane Weaver,
Aniko Valko, Vilmos Valko
2Receptor Structure Based Drug Design
Objective
- To suggest potential leads that
- bind strongly to a given protein because of shape
and electrostatic complementarity - Are easy to synthesise
Approaches
- Docking methods (preferably flexible docking)
identify new lead structures by rapidly
screening a database of 3-D structures of known
compounds - De novo design methods (such as SPROUT)
construct a diverse set of entirely novel
potential leads from scratch
3SPROUT Components
4Problem with Large Answer Sets
De novo design programs such as SPROUT can
suggest large sets of entirely novel potential
leads
Powerful heuristics are necessary to evaluate
(and reduce) often large answer sets
5For de novo design prediction of synthetic
accessibilty is equally important
Hypothetical ligands, including those predicted
to bind very strongly, have no practical value
unless they can be readily synthesised.
Our Attempts to Provide Solutions
- CAESA (estimates synthetic accessibility)
- Complexity Analysis (estimates structural
complexity and drug-likeness) - SynSPROUT (avoids the problem by building
constraints into the structure generation process)
6CAESAComputer Assisted Estimation of Synthetic
Accessibility
7Goals of CAESA Project
- Clear need for automated method of ranking
hypothetical compounds according to perceived
ease of synthesis - Good synthetic chemists can do this job
themselves on small number of compounds but are
unwilling to do it for hundreds or thousands of
compounds - CAESA attempts to do the same job but never gets
bored!
8Estimation of Synthetic Accessibility Criteria
used by CAESA
- CAESA scores the synthetic accessibility of
structures - using two main criteria
- a) An estimate of structural complexity
- stereocentres
- complex topological features (fusions etc.)
- functional group complexity
- b) Availability of good starting materials
- rapid retrosynthetic analysis
- database of commercially available materials
- reaction rule base (editable)
9CAESA Components
10Automatic Selection of Starting Materials
- Starting Materials and Synthetic Accessibility
- Availability of suitable starting materials very
important factor - good starting materials can
dramatically reduce the difficulty of
synthesising a compound. - Good starting materials for part of the target
molecule means the analysis of structural
synthetic difficulty or complexity can be
directed to just those portions of the target
molecule that cannot be made from available
starting materials -
- Finding good starting materials through
retrosynthetic analysis also provides possible
synthetic routes as a byproduct
11Traditional Retrosynthetic Analysis
12Bidirectional Search for Synthetic Routes
13Example of Starting Material Selection
14Summary of CAESA Features
- CAESA carries out a retrosynthetic analysis which
terminates when a starting material from a
database (such as ACD) is found - Found starting materials are scored according to
length and difficulty of reaction sequence and
coverage of target compound - All chemistry rules and transformations are
described in editable text knowledge bases easily
modified by chemists - Quality of the analysis depends on the chemistry
included in the knowledge bases and the
comprehensiveness of the starting material
libraries - But CAESA is relatively slow and speedier methods
needed for pruning of large data sets
15Alternative ApproachComplexity Analysis
Based on statistical distribution of various
substitution patterns found in databases of
existing drugs and available starting
materials. Molecular Complexity Analysis of de
Novo Designed LigandsKrisztina Boda and A. Peter
JohnsonJ. Med. Chem. 2006 ASAP Web Release
Date 26-Jan-2006
16Assumption
If a molecular structure contains ring and chain
substitution patterns which are common in
existing drugs than the structure is likely to be
drug-like as well as readily synthesisable
available starting materials, then the structure
is likely to be readily synthesisable
Complexity analysis based on statistical
distribution of various substitution patterns
17Building Complexity Database
Enumerate chain patterns
Enumerate ring/ring substitution patterns
Database of chains
Database of rings/ring substitutions
18Atom Substitution Hierarchy
Ring (and chain) substitutions are organised in
hierarchies
The hierarchy stores
- Atom type sequence
- Number of occurrences
- Binding properties
Total occurrences of the topology 11,801
3591
1586
494
688
537
62
19Ligand Complexity Analysis
1. Enumerate ring and chain patterns
More Patterns
2. Generate canonical names for each atom pattern
Canonical name A
Canonical name B
Canonical name C
3. Match canonical name against the hierarchy
roots of the database
5. Rank structures by complexity score
Speed of Complexity Analysis 1000-1200
structures / minute on Linux PC (3GHz)
4. Retrieval of frequency of occurrences ?
Calculate score
20Calculation of Complexity Score
CONCEPT
Penalise atom patterns which
are infrequent or not present in the complexity
database.
Penalty values can be altered to tailor the
system for different applications.
In SPROUT the complexity analysis is followed by
ranking the putative ligands according to their
evaluated complexity score.
The penalty values used in the examples presented
here are 25, 20, 15, 10 for 1-,2-,3- and
4-centred chain patterns, 40 and 30 for rings and
ring substitutions.
21Validation ExperimentComparison with CAESA
- Both methods used to estimate synthetic
accessibility for the same set of 50 top selling
drugs
22CAESA vs. Complexity Analysis
Complexity scores are calculated using the
complexity database derived from available SMs
2.0 penalty for each identified stereo centre in
the structures.
Elapsed time CAESA 703 sec Complexity Analysis
8 sec
23Complexity Analysis vs CAESA
- More suitable for prioritization of thousands of
structures within a reasonable time frame. - Provides acceptable compromise between the speed
of the analysis and the accuracy of calculated
scores. - Because this approach is based on characteristics
of existing readily available compounds, simple
but novel structural features may be wrongly
identified as complex
24Yet another alternative approach Build synthetic
feasibility into the structure generation process
25SynSPROUT Approach
Classic SPROUT
SynSPROUT
Ease of synthesis is a key factor
in drug development
Build synthetic constraints
into structure generation process
fuse
Built in / user defined reactions Amide
formation Ether formation Ester formation Amine
alkylation Reductive amination etc.
spiro
new bond
SynSPROUT Scheme
VIRTUAL SYNTHESIS IN RECEPTOR CAVITY
Synthetic Knowledge Base
Fragment Library
Pool of readily available starting materials
Reliable high yielding reactions
Readily synthetisable putative ligand structures
26Current Status
- Promising structures with estimated high binding
affinity - SynSPROUT provides the equivalent to screening a
large number of combinatorial libraries - Potential for suggesting starting points for new
combinatorial libraries - Combination of a large starting material
library with a large reaction knowledgebase
causes a combinatorial problem even with
parallel processing - Restricting either size of library or number of
synthetic reactions gives acceptable run times
27De Novo Structure Generation vs. Lead Optimization
De Novo Structure Generation
Lead Optimization
To suggest better ligands
structurally similar to the bound one
AIM
To generate diverse putative
ligands from scratch
AIM
No structural information from any existing bound
ligand is utilised
The structure of a good bound ligand provides a
starting point (core)
28Variations on the SynSPROUT ThemeSPROUT LeadOpt
- Two modes for structure based lead optimisation
- Core Extension Extends core structure (derived
from lead) by virtual synthetic chemistry - Monomer Replacement Replaces monomers which
have been identified by retrosynthetic analysis
of a lead compound
29Core Extension
- Import the modified bound ligand (core)
identify substitution points (functional groups) - Generate core monomer product by performing
virtual synthetic reaction(s) at selected
functional groups - Estimate binding affinity for products
30Core Extension Scheme
Monomer Library
General Scheme
All possible core monomer combinations are
generated
Multiple low energy conformers detected
functional groups
Simulate synthetic reaction in the 3D context
of receptor site
Synthetic Knowledge Base
List of reactions (between functional groups)
Core Structure
31Automatic Monomer Library Generation
SDF file of 3D monomers
Perception Knowledge Base
Synthetic Knowledge Base
Atom Ring Perception
- Aromaticity
- Normalisation
- Hybridisation
- H-bonding
- properties
Functional Groups
Detect Functional Groups (joining points)
Synthetic rules
Monomer Library
Multiple low energy conformers detected
functional groups
32Synthetic Knowledge Base
CHEMICAL-LABEL ltCarboxylic Acidgt CSPCENTRE2(O)
-OHS1 CHEMICAL-LABEL ltPrimary
Aminegt C-NHS2CONNECTION1
EXPLANATION Amide Formation IF Carboxylic Acid
INTER Primary Amine THEN delete-atom 3
change-hybridization 5 to SP2 form-bond
- between 1 and 5 DIHEDRAL-ATOMS 2 1 5
4 DIHEDRAL 0 0 BOND-LENGTH 1.35 END-THEN
Steps of Joining Rules
- Steps of formation
- Hybridization changes
- Bond type
- Bond length
- Dihedral penalty/angle
33Importing the Core Structure (from MOL/PDB file
in Elephant module)
Importing from a pdb file pdb?mol converter is
invoked
Functional group(s) are automatically detected
when the core structure is imported into the
system
Hydrogen donor/acceptor or spheric target sites
anchor the imported core structure inside the
receptor cavity, partially restricting the
displacement of the core during lead
optimization, but allowing slight movements in
order to avoid boundary violations.
34Product Generation I.
Step I.
Generate products by mimicking synthetic
reactions between core monomers
35Product Generation II.
Step II.
Ligand flexibility generate multiple low energy
conformers
Rigid body docking
- Secondary conformers generated by twisting about
rotatable bonds of the low energy monomer
conformers - User defined parameters
- Max deviation
- Sampling of dihedral angles
- Max penalty
Primary monomer conformers generated by (a)
CORINA ROTATE (b) sampling discrete dihedral
angles around formed bonds
36Product Generation III.
Step III.
- Docking rejection of conformers with
- High internal energy
- Boundary violation
37Multiple Extension Points Combinatorial Problem
- Clients-Master-Slaves architecture
- Mixed SGI/Linux cluster network (TCP/IP socket
network communication)
Linux
SGI
Client1
Client2
Client3
Master
Each slave performs optimization on different
core monomer combination
38Case Study (CDK2)
39Case Study (CDK2)
40Case Study (CDK2)
41Case Study (CDK2)
42Case Study (Generated Products)
-7.95
-7.47
-7.82
-7.56
-7.75
-7.45
-7.60
-7.07
43Monomer Replacement
- Many lead compounds are composed of readily
available starting materials (monomers) linked by
reliable high yielding reactions - Retrosynthetic analysis can be used to identify
the monomers - Structurally related analogues could be generated
by exhaustive monomer replacement - Considerable efficiency gains if monomer library
is arranged in a hierarchy based on substructural
relationships
44Hierarchy Construction
Amide
45Hierarchy Usage
Amide
46Monomer Replacement
Do they exist in starting materials HIERARCHY?
Retro-synthetic analysis
47CASE STUDY Optimisation of SPROUT designed
inhibitors of p falciparum Dihydro-orotate
Dehydrogenase using Monomer Replacement
Initial lead compound MD-155 Sprout score -7.88
Retrosynthetic analysis finds amide formation and Ullmann/Suzuki reaction for monomer formation Monomer library aryl halides and p-halo-anilines 2D structures 1923 conformations 26916
48High scoring monomer replacement resultsMonomer
replacement gave 840 new structures (including
multiple conformers of the same structure)
Scores 7.50 to 9.30.
49Experimental Results for Some Ligands Suggested
by SPROUT LeadOpt Monomer Replacement
Starting Point
MD-155 PfDHODH Ki 3.0 mM HsDHODH Ki 11.0 nM MD-204 PfDHODH Ki 733 nM HsDHODH Ki 21.0 nM 4 fold enhancement in Ki for PfDHODH MD-213 PfDHODH Ki 478 nM HsDHODH Ki 21.7 nM 6 fold enhancement in Ki for PfDHODH
50Conclusions
- Scoring functions for assessment of binding
affinity of the hypothetical compounds produced
by de novo design are far from perfect - Hence only readily synthesisable putative ligands
will undergo experimental evaluation by medicinal
chemists - Assessment of synthetic feasibility is a
tractable problem
51Acknowledgements
- Matt Davies, Phil Bone and Timo Heikkala for
experimental work - Molecular Networks GmbH for providing CORINA
ROTATE - MDL for providing MDDR, one of the databases
used in the complexity analysis project - for sponsoring the lead optimization
project