Title: Docking of small molecules
1Docking of small molecules using Discovery
Studio Tel-Aviv University
2Goal of the workshop Provide useful
information required for using Discover Studio
docking algorithms.
3- Outline
- A brief review on docking algorithms
- LigandFit the workflow
- CDocker the workflow
- Hands-on session
- Visualization tools in Discovery Studio
- Docking with LigandFit
- Docking with Cdocker
- Post-docking analisys
4The molecular docking problem
- Given two molecules with 3D conformations in
atomic details - Do the molecules bind to each other?
- How does the molecule-molecule complex looks
like? - How strong is the binding affinity?
5What do we dock?
- The two molecules might be
- A protein (enzyme, receptor) and a small
molecule (substrates, ligands) - A protein and a DNA molecule
- Two proteins
6Why do we dock?
- Drug discovery costs are too high 800
millions, 814 years, 10,000 compounds (DiMasi
et al. 2003 Dickson Gagnon 2004) - Drugs interact with their receptors in a highly
specific and complementary manner. - Core of the target-based structure-based drug
design (SBDD) for lead generation and
optimization. - Lead is a compound that
- shows biological activity,
- is novel, and
- has the potential of being structurally modified
for improved bioactivity and selectivity
7Three components of docking
pre- and/or during docking
Representation of receptor binding site and
ligand
during docking
Sampling of configuration space of the
ligand-receptor complex
during docking and scoring
Evaluation of ligand-receptor interactions
8Basic principles
- The association of molecules is based on
interactions - H-bonds
- salt bridges
- hydrophobic contacts
- electrostatic
- very strong repulsive (VdW) interactions on short
distances - Ligands are flexible
- Receptors are mostly rigid
9Sampling of configuration space of the
ligand-receptor complex
- Descriptor-matching using pattern-recognizing
geometric methods to match ligand and receptor
site descriptors - geometric, chemical, pharmacophore properties,
such as distance pairs, triplet, volume, vector,
hydrogen-bond, hydrophobic, charged, etc. - Molecular simulation MD (molecular dynamics), MC
(Monte Carlo) - Others GA (genetic algorithm), similarity,
fragment-based - No best method
10Molecular simulation MD MC
- Two major components
- The description of the degrees of freedom
- The energy evaluation
- The local movement of the atoms is performed
- Due to the forces present at each step in MD
(Molecular Dynamics) - Randomly in MC (Monte Carlo)
- Usually time consuming
- Search from a starting orientation to low-energy
configuration - Several simulations with different starting
orientation must be performed to get a
statistically significant result
11Genetic algorithm docking
- Requires the generation of an initial population
where conventional MC and MD require a single
starting structure in their standard
implementation. -
- The collection of genes (chromosome) is assigned
a fitness based on a scoring function. There are
three genetic operators - mutation operator randomly changes the value of a
gene - crossover exchanges a set of genes from one
parent chromosome to another - migration moves individual genes from one
sub-population to another.
12Docking programs
- Dock (UCSF)
- Autodock (Scripps)
- Glide (Schrodinger)
- ICM (Molsoft)
- FRED (Open Eye)
- Gold
- FlexX, etc.
13Evaluation of docking programs
- Evaluation of library ranking efficacy in virtual
screening. J Comput Chem. 2005 Jan
1526(1)11-22. -
- Evaluation of docking performance comparative
data on docking algorithms. J Med Chem. 2004 Jan
2947(3)558-65. - Impact of scoring functions on enrichment in
docking-based virtual screening an application
study on renin inhibitors. J Chem Inf Comput Sci.
2004 May-Jun44(3)1123-9. - And more.
14CDOCKER
LigandFit
Shape-based docking
CHARMm-based docking/refinement
Methodology
Screening of medium-size libraries in well
defined binding cavities
Screening of small libraries refinement of
docking poses
Usage
Medium
Medium-Slow
Speed
Site definition by ligand or receptor Pose
interactions filters
Binding site sphere definition Forcefield typing
Associated Tools and Utilities
15LigandFit
- Active-site finding
- Automatic active site location using flood
filling algorithm - Flexible docking of ligands
- Searches the ligand conformational space to find
the best fit into the protein active site - 1,000 conformations per sec
- Fast ligand scoring
- Initial scoring based on both internal energy of
ligand and interaction energy between ligand and
protein - With DS LigandScore, a variety of scoring
functions are available for final analysis
16LigandFit workflow
Define binding site/site partition
Generate ligand conformation
No. Monte Carlo trials
Fail
Ligand/Site Shape Match
Rank the poses
Pass
Position and Orient Ligand to Site
Apply scoring function(s)
Is it better than saved poses? Is it different
from saved poses?
Save pose in Save List
No
Yes Replace the worst pose
17Prepare your protein
- All hydrogens must be added
- All atom valencies must be satisfied for correct
atom typing - Use Tools ? Protein Modeling ? Clean to
-
- check structural disorder
- fix connectivity and bonds order
- add H at a specified pH
- Use Preferences ? Protein Utilities to set Clean
tool options
18Binding site identification
- Before beginning docking calculations
- Where is the binding site?
19Binding site characteristics
- Liang et al. 1998 found small molecule binding
sites to be - Indentations,
- Crevices, or
- Cavities
- And often the largest site is the true binding
site - Laskowski et al. 1996 reported an analysis of
cleft volumes - Often the ligand is bound in the largest cleft
- Usually the largest cleft is considerably larger
than the others
HSV-1 thymidine kinase
Abl tyrosine kinase
20Prepare you protein define the active site using
Site Search Algorithm
- Set up a grid around the protein
- Default resolution is 0.5 Å but can be adjusted
by the user
- Use a probe to test for Van der Waals clashes at
each grid point
21Site Search Algorithm
- Clean free points by an eraser
- Clean free grid points
- Eraser size can be varied
22Site Search Algorithm
- If the eraser is unable to enter a cavity, all
grid points inside the cavity are considered as a
site.
23Site editing
- A site definition can be modified
- Site Editing links in Binding Site Tool Panel
- Contraction/Expansion
- Site points are objects
- Manually selected and deleted
- Recommended
- manually remove tails
- expand 2-3 times
- Preferences ? Binding sites ? site opening.
Changes the eraser size (recommended size is 5)
24If the ligand is smaller than site use partition
site option
25Site search by protein shape
- Flood-filling algorithm identifies possible
binding sites - Fast (a few seconds)
- Will work on any protein shape
- Not sensitive to the orientation of the protein
in the grid
26Prepare your protein Interaction Filters
- If you know that certain /residue atom promote
your ligand-receptor interaction you can define
an interaction site - Select protein atom(s) as interaction sites
- Hydrogens for defining a donor on the protein
- Heavy atom (such as O, N) for an acceptor
- Select Carbons for Hydrophobic
- Attributes and type can be edited
- Accessed by Edit Attributes
- menu
- Right-click a selected object
- and select Attributes of
27Energy grid parameter
- Select a forcefield and partial charge
calculation method to be used in the evaluation
of ligand pose-receptor interaction energies
during docking - Dreiding - default
- PLP1 good for many (and mainly) hydrophobic
interactions - CFF more accurate then Dreiding time-consuming
though - Click on the arrow symbol to reveal advanced
parameters
28LigandFit conformational search
- Required for flexible fit of the ligand
- Monte Carlo search in torsional space
- Bond lengths and bond angles fixed
- Multiple torsion changes simultaneously
- Rings are not varied
- Upper limit of random dihedral perturbation is
180 - Lower limit depends on the number of rotating
atoms
29Monte Carlo (MC) trials dialog
- Perform Rigid Docking
- A docking mode that treats the ligand as a rigid
body. The ligand conformation is not changed
during docking - Use a fixed number of MC steps
-
- Specify a fixed number of iterations for the
Monte Carlo conformer generation which is
employed for all input ligands
- Use variable number of MC steps from table
- This table allows you to adjust the number of
iterations and consecutive - failures based on the number of ligand
torsion
30Docking mode
- Docking or Rigid-Body Minimization only
- Docking
- places ligand into the binding site
- shape matching and refinements done
- Rigid-Body Minimization
- position of input ligands specified by the
starting coordinates - rigid-body minimization of the ligand-protein
interaction energy - No attempt is made to place the ligand into the
binding site, so the input file should be
"pre-docked" for meaningful results
31Evaluating the ligand position
- Once fit is completed
- how good is it?
- Ligand position initially evaluated using
DockScore - Energy-based
- Grid-based
- Higher scores indicate better fit
- Choice of forcefields
- Dreiding - default
- PLP1 good for many (and mainly hydrophobic
intercations) - CFF is more accurate then Dreiding
time-consuming though
32Protein-ligand interaction filters
- Features may be used as a filter for docked
poses - Does not affect how a ligand is positioned or
optimized - Once a ligand is docked, its pose is examined to
find how many features are matched between the
receptor and the docked pose - The number of matched features influences whether
the pose will be saved to the Save List
33Scoring functions
- Used for final evaluation of positions after the
DockScore is computed - Used during LigandFit Docking Protocol
- Or evaluated for a completed run in Score Ligand
Poses Protocol - Choice of Scores
- LigScore1
- LigScore2
- PLP1
- PLP2
- Jain
- PMF
- Ludi
34Types of scoring functions
- Force field based nonbonded interaction terms as
the score, sometimes in combination with
solvation terms - Empirical multivariate regression methods to fit
coefficients of physically motivated structural
functions by using a training set of
ligand-receptor complexes with measured binding
affinity - Knowledge-based statistical atom pair potentials
derived from structural databases as the score - Other scores and/or filters based on chemical
properties, pharmacophore, contact, shape
complementary - Consensus scoring functions approach
35Force field based scoring functions
e.g. CharmM in CDocker
- Advantages
- FF terms are well studied and have some physical
basis - Transferable, and fast when used on a
pre-computed grid - Disadvantages
- Only parts of the relevant energies, i.e.,
potential energies sometimes enhanced by
solvation or entropy terms - Electrostatics often overestimated, leading to
systematic problems in ranking complexes
36Empirical scoring functions
LUDI PLP LigScore Jain
- Counts the number of interactions and assign a
score based on the number of occurrences - H-bonds, ionic interactions (easy to quantify)
- Hydrophobic interactions (more difficult to
assess and quantify) - Number of rotatable bonds frozen (link to
entropic cost of binding, quite difficult to
estimate) - Advantages fast direct estimation of binding
affinity
37Knowledge-based potentials of mean force scoring
functions (PMF)
- Assumptions
- An observed crystallographic complex represents
the optimum placement of the ligand atoms
relative to the receptor atoms - Advantages
- Similar to empirical, but more general (much more
distance data than binding energy data) - Disadvantages
- PMF are typically pair-wise, while the
probability to find atoms A and B at a distance r
is non-pairwise and depends also on surrounding
atoms
38Consensus Scoring
- Combination of several scoring functions
- The common top rankers get a higher consensus
rank than single outliers - False positives can be detected easier than one
singular scoring function - Advisable to use 2-4 well-suited scoring
functions for the consensus score
39Take home message
- There is no best method!
- Try different methods, force-fields, scoring
functions - Refer to your results as a suggestion
- Use the experimental data
40CDOCKER
LigandFit
Shape-based docking
CHARMm-based docking/refinement
Methodology
Screening of medium-size libraries in well
defined binding cavities
Screening of small libraries refinement of
docking poses
Usage
Medium
Medium-Slow
Speed
Site definition by ligand or receptor Pose
interactions filters
Binding site sphere definition Forcefield typing
Associated Tools and Utilities
41CDOCKER
- CDOCKER is a CHARMm-based docking algorithm
Generate Ligand Conformations Through High
Temperature Molecular Dynamics
Random (rigid-body) rotation Grid-based
Simulated Annealing
Full Minimization
Output of Refined Ligand Poses
42CDOCKER
- CHARMm-based docking/refinement algorithm
- Uses soft-core potentials and an optional grid
representation to dock ligands into the receptor
active site - High temperature MD to generate (10) starting
conformations - Take each conformation and perform random rigid
body rotations (10) - Minimise resulting structures (lt50)
43Prepare your protein
- All hydrogens must be added
- All atom valencies must be satisfied for correct
atom typing - Use Tools ? Protein Modeling ? Clean to
-
- check structural disorder
- fix connectivity and bonds order
- add H at a specified pH
- Use Preferences ? Protein Utilities to set Clean
tool options
44Prepare your protein define your binding site
- If you know the residues involved in the
interaction with your ligand you - can define your binding site
- Enlarge your site using attributes of the
site-sphere
45Advanced parameters
- Advanced parameters for
- Forcefield
- CHARMm
- cff
- Use Full Potential
- Grid extension
- Ligand partial charge method (MMFF/CHARMm)
- Final minimization
- Grid-Based
- Full potential
46Post docking tools
- Score your poses
- Consensus score
- Analyze your poses