De Novo design tools for the generation of synthetically accessible ligands - PowerPoint PPT Presentation

1 / 51

About This Presentation

Title:

De Novo design tools for the generation of synthetically accessible ligands

Description:

De Novo design tools for the generation of synthetically accessible ligands Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko – PowerPoint PPT presentation

Number of Views:142

Avg rating:3.0/5.0

Slides: 52

Provided by: Kriszti

Category:

more less

Transcript and Presenter's Notes

Title: De Novo design tools for the generation of synthetically accessible ligands

1
De Novo design tools for the generation of
synthetically accessible ligands
Peter Johnson, Krisztina Boda, Shane Weaver,
Aniko Valko, Vilmos Valko
2
Receptor Structure Based Drug Design
Objective

To suggest potential leads that
bind strongly to a given protein because of shape
and electrostatic complementarity
Are easy to synthesise

Approaches

Docking methods (preferably flexible docking)
identify new lead structures by rapidly
screening a database of 3-D structures of known
compounds
De novo design methods (such as SPROUT)
construct a diverse set of entirely novel
potential leads from scratch

3
SPROUT Components
4
Problem with Large Answer Sets
De novo design programs such as SPROUT can
suggest large sets of entirely novel potential
leads
Powerful heuristics are necessary to evaluate
(and reduce) often large answer sets
5
For de novo design prediction of synthetic
accessibilty is equally important
Hypothetical ligands, including those predicted
to bind very strongly, have no practical value
unless they can be readily synthesised.
Our Attempts to Provide Solutions

CAESA (estimates synthetic accessibility)
Complexity Analysis (estimates structural
complexity and drug-likeness)
SynSPROUT (avoids the problem by building
constraints into the structure generation process)

6
CAESAComputer Assisted Estimation of Synthetic
Accessibility

Glenn Myatt
Jon Baber

7
Goals of CAESA Project

Clear need for automated method of ranking
hypothetical compounds according to perceived
ease of synthesis
Good synthetic chemists can do this job
themselves on small number of compounds but are
unwilling to do it for hundreds or thousands of
compounds
CAESA attempts to do the same job but never gets
bored!

8
Estimation of Synthetic Accessibility Criteria
used by CAESA

CAESA scores the synthetic accessibility of
structures
using two main criteria
a) An estimate of structural complexity
stereocentres
complex topological features (fusions etc.)
functional group complexity
b) Availability of good starting materials
rapid retrosynthetic analysis
database of commercially available materials
reaction rule base (editable)

9
CAESA Components
10
Automatic Selection of Starting Materials

Starting Materials and Synthetic Accessibility
Availability of suitable starting materials very
important factor - good starting materials can
dramatically reduce the difficulty of
synthesising a compound.
Good starting materials for part of the target
molecule means the analysis of structural
synthetic difficulty or complexity can be
directed to just those portions of the target
molecule that cannot be made from available
starting materials
Finding good starting materials through
retrosynthetic analysis also provides possible
synthetic routes as a byproduct

11
Traditional Retrosynthetic Analysis
12
Bidirectional Search for Synthetic Routes
13
Example of Starting Material Selection
14
Summary of CAESA Features

CAESA carries out a retrosynthetic analysis which
terminates when a starting material from a
database (such as ACD) is found
Found starting materials are scored according to
length and difficulty of reaction sequence and
coverage of target compound
All chemistry rules and transformations are
described in editable text knowledge bases easily
modified by chemists
Quality of the analysis depends on the chemistry
included in the knowledge bases and the
comprehensiveness of the starting material
libraries
But CAESA is relatively slow and speedier methods
needed for pruning of large data sets

15
Alternative ApproachComplexity Analysis
Based on statistical distribution of various
substitution patterns found in databases of
existing drugs and available starting
materials. Molecular Complexity Analysis of de
Novo Designed LigandsKrisztina Boda and A. Peter
JohnsonJ. Med. Chem. 2006 ASAP Web Release
Date 26-Jan-2006
16
Assumption
If a molecular structure contains ring and chain
substitution patterns which are common in
existing drugs than the structure is likely to be
drug-like as well as readily synthesisable
available starting materials, then the structure
is likely to be readily synthesisable
Complexity analysis based on statistical
distribution of various substitution patterns
17
Building Complexity Database
Enumerate chain patterns
Enumerate ring/ring substitution patterns

1-centred

2-centred

3-centred

4-centred

Database of chains
Database of rings/ring substitutions
18
Atom Substitution Hierarchy
Ring (and chain) substitutions are organised in
hierarchies
The hierarchy stores

Atom type sequence
Number of occurrences
Binding properties

Total occurrences of the topology 11,801
3591
1586
494
688
537
62
19
Ligand Complexity Analysis
1. Enumerate ring and chain patterns
More Patterns
2. Generate canonical names for each atom pattern
Canonical name A
Canonical name B
Canonical name C
3. Match canonical name against the hierarchy
roots of the database
5. Rank structures by complexity score
Speed of Complexity Analysis 1000-1200
structures / minute on Linux PC (3GHz)
4. Retrieval of frequency of occurrences ?
Calculate score
20
Calculation of Complexity Score
CONCEPT
Penalise atom patterns which
are infrequent or not present in the complexity
database.
Penalty values can be altered to tailor the
system for different applications.
In SPROUT the complexity analysis is followed by
ranking the putative ligands according to their
evaluated complexity score.
The penalty values used in the examples presented
here are 25, 20, 15, 10 for 1-,2-,3- and
4-centred chain patterns, 40 and 30 for rings and
ring substitutions.
21
Validation ExperimentComparison with CAESA

Both methods used to estimate synthetic
accessibility for the same set of 50 top selling
drugs

22
CAESA vs. Complexity Analysis
Complexity scores are calculated using the
complexity database derived from available SMs
2.0 penalty for each identified stereo centre in
the structures.
Elapsed time CAESA 703 sec Complexity Analysis
8 sec
23
Complexity Analysis vs CAESA

More suitable for prioritization of thousands of
structures within a reasonable time frame.
Provides acceptable compromise between the speed
of the analysis and the accuracy of calculated
scores.
Because this approach is based on characteristics
of existing readily available compounds, simple
but novel structural features may be wrongly
identified as complex

24
Yet another alternative approach Build synthetic
feasibility into the structure generation process

25
SynSPROUT Approach
Classic SPROUT
SynSPROUT
Ease of synthesis is a key factor
in drug development
Build synthetic constraints
into structure generation process
fuse
Built in / user defined reactions Amide
formation Ether formation Ester formation Amine
alkylation Reductive amination etc.
spiro
new bond
SynSPROUT Scheme
VIRTUAL SYNTHESIS IN RECEPTOR CAVITY
Synthetic Knowledge Base
Fragment Library
Pool of readily available starting materials
Reliable high yielding reactions
Readily synthetisable putative ligand structures
26
Current Status

Promising structures with estimated high binding
affinity
SynSPROUT provides the equivalent to screening a
large number of combinatorial libraries
Potential for suggesting starting points for new
combinatorial libraries
Combination of a large starting material
library with a large reaction knowledgebase
causes a combinatorial problem even with
parallel processing
Restricting either size of library or number of
synthetic reactions gives acceptable run times

27
De Novo Structure Generation vs. Lead Optimization
De Novo Structure Generation
Lead Optimization
To suggest better ligands
structurally similar to the bound one
AIM
To generate diverse putative
ligands from scratch
AIM

No structural information from any existing bound
ligand is utilised
The structure of a good bound ligand provides a
starting point (core)

28
Variations on the SynSPROUT ThemeSPROUT LeadOpt

Two modes for structure based lead optimisation
Core Extension Extends core structure (derived
from lead) by virtual synthetic chemistry
Monomer Replacement Replaces monomers which
have been identified by retrosynthetic analysis
of a lead compound

29
Core Extension

Import the modified bound ligand (core)
identify substitution points (functional groups)
Generate core monomer product by performing
virtual synthetic reaction(s) at selected
functional groups
Estimate binding affinity for products

30
Core Extension Scheme
Monomer Library
General Scheme
All possible core monomer combinations are
generated
Multiple low energy conformers detected
functional groups
Simulate synthetic reaction in the 3D context
of receptor site
Synthetic Knowledge Base
List of reactions (between functional groups)
Core Structure
31
Automatic Monomer Library Generation
SDF file of 3D monomers
Perception Knowledge Base
Synthetic Knowledge Base
Atom Ring Perception

Aromaticity
Normalisation
Hybridisation
H-bonding
properties

Functional Groups
Detect Functional Groups (joining points)
Synthetic rules
Monomer Library
Multiple low energy conformers detected
functional groups

32
Synthetic Knowledge Base
CHEMICAL-LABEL ltCarboxylic Acidgt CSPCENTRE2(O)
-OHS1 CHEMICAL-LABEL ltPrimary
Aminegt C-NHS2CONNECTION1
EXPLANATION Amide Formation IF Carboxylic Acid
INTER Primary Amine THEN delete-atom 3
change-hybridization 5 to SP2 form-bond
- between 1 and 5 DIHEDRAL-ATOMS 2 1 5
4 DIHEDRAL 0 0 BOND-LENGTH 1.35 END-THEN
Steps of Joining Rules

Steps of formation
Hybridization changes
Bond type
Bond length
Dihedral penalty/angle

33
Importing the Core Structure (from MOL/PDB file
in Elephant module)
Importing from a pdb file pdb?mol converter is
invoked
Functional group(s) are automatically detected
when the core structure is imported into the
system
Hydrogen donor/acceptor or spheric target sites
anchor the imported core structure inside the
receptor cavity, partially restricting the
displacement of the core during lead
optimization, but allowing slight movements in
order to avoid boundary violations.

34
Product Generation I.
Step I.
Generate products by mimicking synthetic
reactions between core monomers
35
Product Generation II.
Step II.
Ligand flexibility generate multiple low energy
conformers
Rigid body docking

Secondary conformers generated by twisting about
rotatable bonds of the low energy monomer
conformers
User defined parameters
Max deviation
Sampling of dihedral angles
Max penalty

Primary monomer conformers generated by (a)
CORINA ROTATE (b) sampling discrete dihedral
angles around formed bonds
36
Product Generation III.
Step III.

Docking rejection of conformers with
High internal energy
Boundary violation

37
Multiple Extension Points Combinatorial Problem

Clients-Master-Slaves architecture
Mixed SGI/Linux cluster network (TCP/IP socket
network communication)

Linux
SGI

Client1
Client2
Client3
Master

Each slave performs optimization on different
core monomer combination
38
Case Study (CDK2)
39
Case Study (CDK2)
40
Case Study (CDK2)
41
Case Study (CDK2)
42
Case Study (Generated Products)
-7.95
-7.47
-7.82
-7.56
-7.75
-7.45
-7.60
-7.07
43
Monomer Replacement

Many lead compounds are composed of readily
available starting materials (monomers) linked by
reliable high yielding reactions
Retrosynthetic analysis can be used to identify
the monomers
Structurally related analogues could be generated
by exhaustive monomer replacement
Considerable efficiency gains if monomer library
is arranged in a hierarchy based on substructural
relationships

44
Hierarchy Construction
Amide
45
Hierarchy Usage
Amide
46
Monomer Replacement
Do they exist in starting materials HIERARCHY?
Retro-synthetic analysis
47
CASE STUDY Optimisation of SPROUT designed
inhibitors of p falciparum Dihydro-orotate
Dehydrogenase using Monomer Replacement
Initial lead compound MD-155 Sprout score -7.88
Retrosynthetic analysis finds amide formation and Ullmann/Suzuki reaction for monomer formation Monomer library aryl halides and p-halo-anilines 2D structures 1923 conformations 26916
48
High scoring monomer replacement resultsMonomer
replacement gave 840 new structures (including
multiple conformers of the same structure)
Scores 7.50 to 9.30.
49
Experimental Results for Some Ligands Suggested
by SPROUT LeadOpt Monomer Replacement
Starting Point
MD-155 PfDHODH Ki 3.0 mM HsDHODH Ki 11.0 nM MD-204 PfDHODH Ki 733 nM HsDHODH Ki 21.0 nM 4 fold enhancement in Ki for PfDHODH MD-213 PfDHODH Ki 478 nM HsDHODH Ki 21.7 nM 6 fold enhancement in Ki for PfDHODH
50
Conclusions

Scoring functions for assessment of binding
affinity of the hypothetical compounds produced
by de novo design are far from perfect
Hence only readily synthesisable putative ligands
will undergo experimental evaluation by medicinal
chemists
Assessment of synthetic feasibility is a
tractable problem

51
Acknowledgements

Matt Davies, Phil Bone and Timo Heikkala for
experimental work
Molecular Networks GmbH for providing CORINA
ROTATE
MDL for providing MDDR, one of the databases
used in the complexity analysis project
for sponsoring the lead optimization
project

Write a Comment

User Comments (0)