Title: Introduo Modelagem Molecular
1Introdução à Modelagem Molecular
- Chemoinformatics and Medicinal Chemistry Group
- Departamento de Química UFMG
- http//www.nequim.qui.ufmg.br
- jlopes_at_netuno.lcc.ufmg.br
- jcdlopes_at_gmail.com
Julio C. D. Lopes
UFMG - Out 2007 Belo Horizonte, Brasil
2Molecular Informatics
- Storage, retrieval and manipulation of
information about molecules or molecular systems - Typically deals with large numbers of molecules
or molecular systems
Molecular Modeling
3Molecular Modeling
The compendium of methods for mimicking the
behavior of molecules or molecular systems
4Molecular Modeling
Molecular Modeling is concerned with the
description of the atomic and molecular
interactions that govern microscopic and
macroscopic behaviors of physical systems.
The essence of molecular modeling resides in the
connection between the macroscopic and the
microscopic world provided by the theory of
statistical mechanics.
Macroscopic Observable
Average of observable over selected microscopic
states
(Solvation energy, affinity between two
molecules, H-H distance, conformation, )
5Points for Consideration
- Remember
- Molecular modeling forms a model of the real
world - Thus we are studying the model, not the world
- A model is valid as long as it reproduces the
real world
6Why Use Molecular Modeling?(and not deal
directly with the real world?)
- Fast, accurate and relatively cheap way to
- Study molecular properties
- Rationalize and interpret experimental results
- Make predictions for yet unstudied systems
- Study hypothetical systems
- Design new molecules
7Some Molecular Properties
8Minimal Input for Molecular Modeling
- Topological properties
- Description of the covalent connectivity of the
molecules to be modeled - Structural properties
- The starting conformation of the molecule,
provided by an X-ray structure, NMR or a
theoretical model
9Minimal Input for Molecular Modeling
Energetical properties A force field describing
the force acting on each of the
molecules Thermodynamical properties A
thermodynamical ensemble that corresponds to the
experimental conditions od the system, e.g. N,V,T
or N,P,T or ..
10Molecular Simulations
- A method to sample all 3D structures
(conformations) of a molecule. Any molecular
property is an average of the values of this
property in all the different conformations.
11Force Fields
- A method to describe a molecule as a collection
of atoms held together by forces. Based on this
description, each of the many molecular 3D
structures is characterized by an energy value.
This value is then used to optimize the geometry
of the 3D structure. The optimized structure is
then used to calculate many molecular properties.
12Molecular Structure Saccharin
13Molecular Structure and Molecular Properties
- Property
- Activity
- Cell Permeability
- Toxicity
Structure
- Descriptors
- 1D e.g., Molecular weight
- 2D e.g., of rotatable bonds
- 3D e.g., Molecular volume
14Molecular Structure and Molecular Properties
- Property
- Activity
- Cell Permeability
- Toxicity
Structure
- Biological Targets
- 3D structures
- X-ray
- NMR
- Homology
15Which Molecule(s) Should we Test Next?
- Answer
- An (ordered) list of molecules
- Potential candidates
- Corporate database
- External databases
- Synthesis
- Information
- Biological activity
- Molecular properties
16The Drug Development Route
Lead Discovery
Lead Optimization
Design
Design
Lead
Synthesis
Synthesis
Biological Screening
Biological Screening
17Property Space
- Each axes describes a molecular property
(descriptor). - Each molecule is represented by a point.
- The distance between any two points represents
the degree of similarity between the
corresponding molecules in terms of the selected
descriptors.
18Lead Discovery
Locating Activity Islands Through Diversity
19Activity Optimization Through Focusing
Lead Discovery
20Lead Optimization All the Rest
- Efficacy
- Oral bioavailability
- Cell permeability
- Stability (CYP P450)
- Clearance
- Toxicity
- hERG channel
- Drug-drug interactions
- Selectivity
21Predictive Models
Active
- Use all available information to build a model
which can differentiate between active and
inactive compounds.
- Use the model to predict the activity of yet
unsynthesized compounds.
- Select for synthesis only compounds predicted to
be active.
22Types of Models
- CSAR Classification Structure Activity
Relationship - Qualitative data
- HTS data
- QSAR Quantitative Structure Activity
Relationship - Quantitative data
Property f(structure) Property f(desc1,
desc2, , descN)
23Docking and Scoring
24Docking and Scoring
25Introdução à Mecânica Molecular
- Chemoinformatics and Medicinal Chemistry Group
- Departamento de Química UFMG
- http//www.nequim.qui.ufmg.br
- jlopes_at_netuno.lcc.ufmg.br
- jcdlopes_at_gmail.com
Julio C. D. Lopes
UFMG - Out 2007 Belo Horizonte, Brasil
26Introduction
Energy minimization Single minimum Conformationa
l search Multiple minima Simulation methods A
complete quantification of the energy
surface Minima populated according to their free
energy
27Potential energy functions
QM ab initio distribution of electrons over the
system, given The position of the atom
cores. Gaussian94, Gamess, ... Semi-empirical
methods pre-calculated values or neglect of some
parts of the ab-initio calculation. MOPAC
(mopac6, -7, -93, -2000, -2002) Empirical
methods observed/fitted values for
interactions between atoms. Amber, MM3, CHARMM,
Gromos, ...
28Basic Concepts
Energy Energy as a function of a coordinate
X. Energy penalty for distorting X from its
equilibrium position. Steric Energy a
k(X-X0)2 (Hook's law). Parameterization Matchi
ng a function to a set of data points by varying
its parameters (a and k).
Minimization Finding the minimum of a potential
function. First Derivative force dE/dX
2k(X-X0) 0.
29Force Field
30Potential energy vs. bond length
Similar equation for valence angles
31Potential energy vs. dihedral angle
32Potential energy vs. dihedral angle
33Potential energy vs. dihedral angle
34van der Waals energy vs. distance
35The CHARMM Force Field
36The CHARMM Force Field
37CHARMM Parameter Set
38Other Interactions
- Hydrogen Bond
- Interaction of type D-H --- A
- The origin of this interaction is a dipole-dipole
attraction
- Hydrophobic Effect
- The origin of this interaction is a unfavorable
surface of contact between the water and an
apolar medium (entropic driving) - The apolar medium reorganizes to minimize the
water exposed surface
39Treatment of long range interactions
40Effect of cutoff on energy calculations
41Existing Force Fields
- AMBER (Assisted Model Building with Energy
Refinement) - Parameterized specifically for proteins and
nucleic acids. - Uses only 5 bonding and non-bonding terms along
with a sophisticated electrostatic treatment. - No cross terms are included.
- Results can be very good for proteins and nucleic
acids, less so for other systems. - CHARMM (Chemistry at Harvard Macromolecular
Mechanics) - Originally devised for proteins and nucleic
acids. - Now used for a range of macromolecules, molecular
dynamics, solvation, crystal packing, vibrational
analysis and QM/MM studies. - Uses 5 valence terms, one of which is
electrostatic term. - Basis for other force fields (e.g., MOIL).
42Existing Force Fields
- GROMOS (Gronigen molecular simulation)
- Popular for predicting the dynamical motion of
molecules and bulk liquids. - Also used for modeling biomolecules.
- Uses 5 valence terms, one of which is an
electrostatic term. - MM1, 2, 3, 4
- General purpose force fields for
(mono-functional) organic molecules. - MM2 was parameterized for a lot of functional
groups. - MM3 is probably one of the most accurate ways of
modeling hydrocarbons. - MM4 is very new and little is known about its
performance.
43Existing Force Fields
- MMFF (Merck Molecular Force Field)
- General purpose force fields mainly for organic
molecules. - MMFF94 was originally designed for molecular
dynamics simulations but is also widely used for
geometry optimization. - Uses 5 valence terms, one of which is an
electrostatic term and one cross term. - MMFF was parameterized based on high level ab
initio calculations. - OPLS (Optimized Potential for Liquid Simulations)
- Designed for modeling bulk liquids.
- Has been extensively used for modeling the
molecular dynamics of biomolecules. - Uses 5 valence terms, one of which is an
electrostatic term but no cross terms.
44Existing Force Fields
- Tripos (SYBYL force field)
- Designed for modeling organic and biomolecules.
- Often used for CoMFA analysis (QSAR method).
- Uses 5 valence terms, one of which is an
electrostatic term.
- CVFF (Consistent Valence Force Field)
- Parameterized for small organic (amides,
carboxylic acids, etc.) crystals and gas phase
structures. - Handles peptides, proteins, and a wide range of
organic systems. - Primarily intended for studies of structures and
binding energies, although it predicts
vibrational frequencies and conformational
energies reasonably well.
45Minimização da Energia Otimização da Geometria
- Chemoinformatics and Medicinal Chemistry Group
- Departamento de Química UFMG
- http//www.nequim.qui.ufmg.br
- jlopes_at_netuno.lcc.ufmg.br
- jcdlopes_at_gmail.com
Julio C. D. Lopes
UFMG - Out 2007 Belo Horizonte, Brasil
46Potential Energy Surface
A system of N atoms is defined by 3N Cartesian
coordinates or 3N-6 internal coordinates. These
define a multi-dimensional potential energy
surface (PES).
47Potential Energy Surface
48Minimization Definitions
Given a function Find values for the variables
for which f is a minimum
Functions Quantum mechanics energy Molecular
mechanics energy Variables Cartesian (molecular
mechanics) Internal (quantum mechanics) Minimizat
ion algorithms Derivatives-based Non
derivatives-based
49A Schematic Representation
Starting geometry
Easy to implement useful for well defined
structures Depends strongly on starting geometry
50Population of Minima
Active Structure
Most populated minimum
Global minimum
Most minimization method can only go downhill and
so locate the closest (downhill sense)
minimum. No minimization method can guarantee
the location of the global energy minimum. No
method has proven the best for all problems.
51Common minimization protocols
- First order algorithms
- Steepest descent
- Conjugated gradient
- Second order algorithms
- Newton-Raphson
- Adopted basis Newton Raphson (ABNR)
52Steepest Descent
- This is the simplest minimization method
- The first directional derivative (gradient) of
the potential is calculated and displacement is
added to every coordinate in the opposite
direction (the direction of the force). - Advantages Simple and fast.
- Disadvantages Inaccurate, usually does not
converge.
53Steepest Descent
SD is forced to make 90º turns between subsequent
steps and so is slow to converge.
54Conjugated gradient
- Uses first derivative information information
from previous steps the weighted average of the
current gradient and the previous step direction. - The weight factor is calculated from the ratio of
the previous and current steps. - This method converges much better than SD.
55Newton-Raphson algorithm
- Uses both first derivative (slope) and second
(curvature) information. - In the one-dimensional case
- Advantage Accurate and converges well.
- Disadvantage Computationally expensive, for
convergence, should start near a minimum.
56Adopted basis Newton Raphson (ABNR)
- An adaptation of the NR method that is especially
suitable for large systems. - Instead of using a full matrix, it uses a basis
that represents the subspace in which the system
made the most progress in the past. - Advantage Second derivative information,
convergence, faster than the regular NR method. - Disadvantages Still quite expensive, less
accurate than NR.
57Busca Conformacional
- Chemoinformatics and Medicinal Chemistry Group
- Departamento de Química UFMG
- http//www.nequim.qui.ufmg.br
- jlopes_at_netuno.lcc.ufmg.br
- jcdlopes_at_gmail.com
Julio C. D. Lopes
UFMG - Out 2007 Belo Horizonte, Brasil
58Conformational Analysis
- Conformers
- Structures differing only by rotation around one
or more bonds. - Constitute stationary points on the PES.
- Conformation
- Any point on the PES.
59Butane
CH3CH2
CH2CH3
60Cyclohexane
- Chair
- Global minimum
- 6 axial and 6 equatorial bonds
61Ring Inversion
62Population of Minima
Active Structure
Most populated minimum
Global minimum
63Sampling the PES
- Energy minimization
- Single minimum
- Conformational search
- Multiple minima
- Both methods produce results which only reflect
the enthalpic contribution to the free energy.
64Boltzmann Averaging and Conformational Search
- Approximations come about from the set of
conformers (or conformations), i.e., which is
considered. - If we choose to go with conformational searching,
we should at least look for a set of
energetically accessible minima.
65Boltzmann Averaged Properties
- Fraction of conformation i in an equilibrium
mixture
- Equilibrium molecular properties are obtained by
Boltzmann averaging the properties of the
individual conformations
66Conformational Search Outline
- Randomly or systematically generated starting
geometries
- Representative structures for each potential
minimum
67Systematic Methods
- Systematically vary each of the (N) rotatable
bonds in the molecule. - Number of conformations
- Limitations
- Number of processed structure rapidly increases
with N. - Problematic structures may be removed prior to
minimization. - Poor coverage of conformational space at the
beginning of the run.
68Systematic Methods Example
69Genetic Algorithm (GA)
- A method for global optimization.
- Uses ideas taken from evolution processes.
- Assumes that good parents have better chance to
produce good offspring.
70Genetic Algorithm (GA)
- Create an initial population of m conformations
- Each conformation is represented by a chromosome.
-
- In this chromosome, each torsion can be
represented by one of 2532 values.
- Calculate a fitness function (e.g., energy) for
each chromosome.
71Genetic Algorithm (GA)
- Select a number of chromosome pairs (e.g., m/2)
- Selection biased towards fitter (e.g., lower
energy) chromosomes - Object Fitness Roulette fraction
- A 3 1/4
- B 6 1/2
- C 2 1/6
- D 1 1/12
- Subject the new population to genetic operators
- Propagate highest-ranking individuals.
-
- Crossover (80)
-
- Point mutation (1)
- Replace least fit chromosomes by new chromosomes
and repeat the procedure on new population.
72Other Methods
Distance geometry Refinement of NMR
structures Monte Carlo Simulated
annealing Molecular dynamics High
temperature Simulated annealing
73MD or MC
Molecular dynamics Advantages Average
properties reflect free energies. Good
converge of local energy minima. Disadvantages
Requires energy derivatives. Slow crosses
of energy barriers of 2-3 kcal/mol. Monte
Carlo Advantages Average properties reflect
free energies. Can cross high energy
barriers. Disadvantages Do not require energy
derivatives. Slow convergence for large
molecules and ring systems.
74Simulated Annealing (SA) and simulated quenching
(SQ)
In these techniques, the temperature of the
system is raised and cooled several times during
a standard MD or MC simulation
Two different types of cooling can be
achieved - A slow protocol annealing - A
fast protocol quenching
75Solvent Treatment
76(No Transcript)
77(No Transcript)
78Monte Carlo Integration
79Thermodynamic Properties (NVT Ensemble) via Monte
Carlo Integration
- The probability of obtaining the configuration rN
- Z is the configuration integral (related to Q)
80Thermodynamic Properties via Monte Carlo
Integration
- Obtain a configuration of the system by
generating 3N Cartesian coordinates which are
assigned to the particles. - Calculate the potential energy of the
configuration. - Calculate the Boltzmann factor.
- Add the Boltzmann factor to the accumulated sum
of Boltzmann factors and the potential energy
contribution to its accumulated sum. - After Ntrial steps evaluate the mean potential
energy by
81Metropolis Monte Carlo
- Only a small part of the phase space (low energy
region) contribute to physical observables. - The above procedure is hampered by the presence
of many configurations with a negligible
contribution to the integral due to their high
energies. - Solution
- Bias the generation of configurations towards
those which make the most significant
contribution to the integral. - Metropolis Monte Carlo generates state with a
probability of exp(-V(rN)/kt) and counts them
equally (Simple MC generates states with equal
probability and then assign them a weight of
exp(-V(rN)/kt)). By doing so , sampling from the
NVT (canonical) ensemble is guaranteed.
82Energy Minimization (Geometry Optimization) and
Conformational Search
- A method to find a set of the most stable (lowest
in energy) 3D structures (conformers) of a
molecule. Any molecular property is a (weighted)
average of the values of this property in the
different conformers.