De Novo design tools for the generation of synthetically accessible ligands - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

De Novo design tools for the generation of synthetically accessible ligands

Description:

De Novo design tools for the generation of synthetically accessible ligands Peter Johnson, Krisztina Boda, Shane Weaver, Aniko Valko, Vilmos Valko – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 52
Provided by: Kriszti
Category:

less

Transcript and Presenter's Notes

Title: De Novo design tools for the generation of synthetically accessible ligands


1
De Novo design tools for the generation of
synthetically accessible ligands
Peter Johnson, Krisztina Boda, Shane Weaver,
Aniko Valko, Vilmos Valko
2
Receptor Structure Based Drug Design
Objective
  • To suggest potential leads that
  • bind strongly to a given protein because of shape
    and electrostatic complementarity
  • Are easy to synthesise

Approaches
  • Docking methods (preferably flexible docking)
    identify new lead structures by rapidly
    screening a database of 3-D structures of known
    compounds
  • De novo design methods (such as SPROUT)
    construct a diverse set of entirely novel
    potential leads from scratch

3
SPROUT Components
4
Problem with Large Answer Sets
De novo design programs such as SPROUT can
suggest large sets of entirely novel potential
leads
Powerful heuristics are necessary to evaluate
(and reduce) often large answer sets
5
For de novo design prediction of synthetic
accessibilty is equally important
Hypothetical ligands, including those predicted
to bind very strongly, have no practical value
unless they can be readily synthesised.
Our Attempts to Provide Solutions
  • CAESA (estimates synthetic accessibility)
  • Complexity Analysis (estimates structural
    complexity and drug-likeness)
  • SynSPROUT (avoids the problem by building
    constraints into the structure generation process)

6
CAESAComputer Assisted Estimation of Synthetic
Accessibility
  • Glenn Myatt
  • Jon Baber

7
Goals of CAESA Project
  • Clear need for automated method of ranking
    hypothetical compounds according to perceived
    ease of synthesis
  • Good synthetic chemists can do this job
    themselves on small number of compounds but are
    unwilling to do it for hundreds or thousands of
    compounds
  • CAESA attempts to do the same job but never gets
    bored!

8
Estimation of Synthetic Accessibility Criteria
used by CAESA
  • CAESA scores the synthetic accessibility of
    structures
  • using two main criteria
  • a) An estimate of structural complexity
  • stereocentres
  • complex topological features (fusions etc.)
  • functional group complexity
  • b) Availability of good starting materials
  • rapid retrosynthetic analysis
  • database of commercially available materials
  • reaction rule base (editable)

9
CAESA Components
10
Automatic Selection of Starting Materials
  • Starting Materials and Synthetic Accessibility
  • Availability of suitable starting materials very
    important factor - good starting materials can
    dramatically reduce the difficulty of
    synthesising a compound.
  • Good starting materials for part of the target
    molecule means the analysis of structural
    synthetic difficulty or complexity can be
    directed to just those portions of the target
    molecule that cannot be made from available
    starting materials
  • Finding good starting materials through
    retrosynthetic analysis also provides possible
    synthetic routes as a byproduct

11
Traditional Retrosynthetic Analysis
12
Bidirectional Search for Synthetic Routes
13
Example of Starting Material Selection
14
Summary of CAESA Features
  • CAESA carries out a retrosynthetic analysis which
    terminates when a starting material from a
    database (such as ACD) is found
  • Found starting materials are scored according to
    length and difficulty of reaction sequence and
    coverage of target compound
  • All chemistry rules and transformations are
    described in editable text knowledge bases easily
    modified by chemists
  • Quality of the analysis depends on the chemistry
    included in the knowledge bases and the
    comprehensiveness of the starting material
    libraries
  • But CAESA is relatively slow and speedier methods
    needed for pruning of large data sets

15
Alternative ApproachComplexity Analysis
Based on statistical distribution of various
substitution patterns found in databases of
existing drugs and available starting
materials. Molecular Complexity Analysis of de
Novo Designed LigandsKrisztina Boda and A. Peter
JohnsonJ. Med. Chem. 2006 ASAP Web Release
Date 26-Jan-2006
16
Assumption
If a molecular structure contains ring and chain
substitution patterns which are common in
existing drugs than the structure is likely to be
drug-like as well as readily synthesisable
available starting materials, then the structure
is likely to be readily synthesisable
Complexity analysis based on statistical
distribution of various substitution patterns
17
Building Complexity Database
Enumerate chain patterns
Enumerate ring/ring substitution patterns
  • 1-centred
  • 2-centred
  • 3-centred
  • 4-centred

Database of chains
Database of rings/ring substitutions
18
Atom Substitution Hierarchy
Ring (and chain) substitutions are organised in
hierarchies
The hierarchy stores
  • Atom type sequence
  • Number of occurrences
  • Binding properties

Total occurrences of the topology 11,801
3591
1586
494
688
537
62
19
Ligand Complexity Analysis
1. Enumerate ring and chain patterns
More Patterns
2. Generate canonical names for each atom pattern
Canonical name A
Canonical name B
Canonical name C
3. Match canonical name against the hierarchy
roots of the database
5. Rank structures by complexity score
Speed of Complexity Analysis 1000-1200
structures / minute on Linux PC (3GHz)
4. Retrieval of frequency of occurrences ?
Calculate score
20
Calculation of Complexity Score
CONCEPT
Penalise atom patterns which
are infrequent or not present in the complexity
database.
Penalty values can be altered to tailor the
system for different applications.
In SPROUT the complexity analysis is followed by
ranking the putative ligands according to their
evaluated complexity score.
The penalty values used in the examples presented
here are 25, 20, 15, 10 for 1-,2-,3- and
4-centred chain patterns, 40 and 30 for rings and
ring substitutions.
21
Validation ExperimentComparison with CAESA
  • Both methods used to estimate synthetic
    accessibility for the same set of 50 top selling
    drugs

22
CAESA vs. Complexity Analysis
Complexity scores are calculated using the
complexity database derived from available SMs
2.0 penalty for each identified stereo centre in
the structures.
Elapsed time CAESA 703 sec Complexity Analysis
8 sec
23
Complexity Analysis vs CAESA
  • More suitable for prioritization of thousands of
    structures within a reasonable time frame.
  • Provides acceptable compromise between the speed
    of the analysis and the accuracy of calculated
    scores.
  • Because this approach is based on characteristics
    of existing readily available compounds, simple
    but novel structural features may be wrongly
    identified as complex

24
Yet another alternative approach Build synthetic
feasibility into the structure generation process


25
SynSPROUT Approach
Classic SPROUT
SynSPROUT
Ease of synthesis is a key factor
in drug development
Build synthetic constraints
into structure generation process
fuse
Built in / user defined reactions Amide
formation Ether formation Ester formation Amine
alkylation Reductive amination etc.
spiro
new bond
SynSPROUT Scheme
VIRTUAL SYNTHESIS IN RECEPTOR CAVITY
Synthetic Knowledge Base
Fragment Library
Pool of readily available starting materials
Reliable high yielding reactions
Readily synthetisable putative ligand structures
26
Current Status
  • Promising structures with estimated high binding
    affinity
  • SynSPROUT provides the equivalent to screening a
    large number of combinatorial libraries
  • Potential for suggesting starting points for new
    combinatorial libraries
  • Combination of a large starting material
    library with a large reaction knowledgebase
    causes a combinatorial problem even with
    parallel processing
  • Restricting either size of library or number of
    synthetic reactions gives acceptable run times

27
De Novo Structure Generation vs. Lead Optimization
De Novo Structure Generation
Lead Optimization
To suggest better ligands
structurally similar to the bound one
AIM
To generate diverse putative
ligands from scratch
AIM

No structural information from any existing bound
ligand is utilised
The structure of a good bound ligand provides a
starting point (core)

28
Variations on the SynSPROUT ThemeSPROUT LeadOpt
  • Two modes for structure based lead optimisation
  • Core Extension Extends core structure (derived
    from lead) by virtual synthetic chemistry
  • Monomer Replacement Replaces monomers which
    have been identified by retrosynthetic analysis
    of a lead compound

29
Core Extension
  • Import the modified bound ligand (core)
    identify substitution points (functional groups)
  • Generate core monomer product by performing
    virtual synthetic reaction(s) at selected
    functional groups
  • Estimate binding affinity for products

30
Core Extension Scheme
Monomer Library
General Scheme
All possible core monomer combinations are
generated
Multiple low energy conformers detected
functional groups
Simulate synthetic reaction in the 3D context
of receptor site
Synthetic Knowledge Base
List of reactions (between functional groups)
Core Structure
31
Automatic Monomer Library Generation
SDF file of 3D monomers
Perception Knowledge Base
Synthetic Knowledge Base
Atom Ring Perception
  • Aromaticity
  • Normalisation
  • Hybridisation
  • H-bonding
  • properties

Functional Groups
Detect Functional Groups (joining points)
Synthetic rules
Monomer Library
Multiple low energy conformers detected
functional groups

32
Synthetic Knowledge Base
CHEMICAL-LABEL ltCarboxylic Acidgt CSPCENTRE2(O)
-OHS1 CHEMICAL-LABEL ltPrimary
Aminegt C-NHS2CONNECTION1
EXPLANATION Amide Formation IF Carboxylic Acid
INTER Primary Amine THEN delete-atom 3
change-hybridization 5 to SP2 form-bond
- between 1 and 5 DIHEDRAL-ATOMS 2 1 5
4 DIHEDRAL 0 0 BOND-LENGTH 1.35 END-THEN
Steps of Joining Rules
  • Steps of formation
  • Hybridization changes
  • Bond type
  • Bond length
  • Dihedral penalty/angle

33
Importing the Core Structure (from MOL/PDB file
in Elephant module)
Importing from a pdb file pdb?mol converter is
invoked
Functional group(s) are automatically detected
when the core structure is imported into the
system
Hydrogen donor/acceptor or spheric target sites
anchor the imported core structure inside the
receptor cavity, partially restricting the
displacement of the core during lead
optimization, but allowing slight movements in
order to avoid boundary violations.

34
Product Generation I.
Step I.
Generate products by mimicking synthetic
reactions between core monomers
35
Product Generation II.
Step II.
Ligand flexibility generate multiple low energy
conformers
Rigid body docking
  • Secondary conformers generated by twisting about
    rotatable bonds of the low energy monomer
    conformers
  • User defined parameters
  • Max deviation
  • Sampling of dihedral angles
  • Max penalty

Primary monomer conformers generated by (a)
CORINA ROTATE (b) sampling discrete dihedral
angles around formed bonds
36
Product Generation III.
Step III.
  • Docking rejection of conformers with
  • High internal energy
  • Boundary violation

37
Multiple Extension Points Combinatorial Problem
  • Clients-Master-Slaves architecture
  • Mixed SGI/Linux cluster network (TCP/IP socket
    network communication)

Linux
SGI

Client1
Client2
Client3
Master

Each slave performs optimization on different
core monomer combination
38
Case Study (CDK2)
39
Case Study (CDK2)
40
Case Study (CDK2)
41
Case Study (CDK2)
42
Case Study (Generated Products)
-7.95
-7.47
-7.82
-7.56
-7.75
-7.45
-7.60
-7.07
43
Monomer Replacement
  • Many lead compounds are composed of readily
    available starting materials (monomers) linked by
    reliable high yielding reactions
  • Retrosynthetic analysis can be used to identify
    the monomers
  • Structurally related analogues could be generated
    by exhaustive monomer replacement
  • Considerable efficiency gains if monomer library
    is arranged in a hierarchy based on substructural
    relationships

44
Hierarchy Construction
Amide
45
Hierarchy Usage
Amide
46
Monomer Replacement
Do they exist in starting materials HIERARCHY?
Retro-synthetic analysis
47
CASE STUDY Optimisation of SPROUT designed
inhibitors of p falciparum Dihydro-orotate
Dehydrogenase using Monomer Replacement
Initial lead compound MD-155 Sprout score -7.88
Retrosynthetic analysis finds amide formation and Ullmann/Suzuki reaction for monomer formation Monomer library aryl halides and p-halo-anilines 2D structures 1923 conformations 26916
48
High scoring monomer replacement resultsMonomer
replacement gave 840 new structures (including
multiple conformers of the same structure)
Scores 7.50 to 9.30.
49
Experimental Results for Some Ligands Suggested
by SPROUT LeadOpt Monomer Replacement
Starting Point
MD-155 PfDHODH Ki 3.0 mM  HsDHODH Ki 11.0 nM  MD-204 PfDHODH Ki 733 nM HsDHODH Ki 21.0 nM 4 fold enhancement in Ki for PfDHODH MD-213 PfDHODH Ki 478 nM HsDHODH Ki 21.7 nM 6 fold enhancement in Ki for PfDHODH
50
Conclusions
  • Scoring functions for assessment of binding
    affinity of the hypothetical compounds produced
    by de novo design are far from perfect
  • Hence only readily synthesisable putative ligands
    will undergo experimental evaluation by medicinal
    chemists
  • Assessment of synthetic feasibility is a
    tractable problem

51
Acknowledgements
  • Matt Davies, Phil Bone and Timo Heikkala for
    experimental work
  • Molecular Networks GmbH for providing CORINA
    ROTATE
  • MDL for providing MDDR, one of the databases
    used in the complexity analysis project
  • for sponsoring the lead optimization
    project
Write a Comment
User Comments (0)
About PowerShow.com