Title: Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA)
1Bioinformatics IVQuantitative
Structure-Activity Relationships
(QSAR)andComparative Molecular Field Analysis
(CoMFA)
Martin Ott
2Outline
- Introduction
- Structures and activities
- Regression techniques PCA, PLS
- Analysis techniques Free-Wilson, Hansch
- Comparative Molecular Field Analysis
3QSAR The Setting
Quantitative structure-activity relationships are
used when there is little or no receptor
information, butthere are measured activities of
(many) compounds They are also useful to
supplement docking studies which take much more
CPU time
4From Structure to Property
EC50
5From Structure to Property
LD50
6From Structure to Property
7QSAR Which Relationship?
Quantitative structure-activity relationships
correlate chemical/biological activitieswith
structural features or atomic, group ormolecular
properties within a range of structurally
similar compounds
8Free Energy of Binding
DGbinding DG0 DGhb DGionic DGlipo
DGrot DG0 entropy loss (translat.
rotat.) 5.4 DGhb ideal hydrogen bond
4.7 DGionic ideal ionic interaction
8.3 DGlipo lipophilic contact
0.17 DGrot entropy loss (rotat. bonds)
1.4 (Energies in kJ/mol per unit
feature)
9Free Energy of Binding andEquilibrium Constants
The free energy of binding is related to the
reaction constants of ligand-receptor complex
formation DGbinding 2.303 RT log K
2.303 RT log (kon / koff) Equilibrium constant
K Rate constants kon (association) and koff
(dissociation)
10Concentration as Activity Measure
- A critical molar concentration Cthat produces
the biological effectis related to the
equilibrium constant K - Usually log (1/C) is used (c.f. pH)
- For meaningful QSARs, activities needto be
spread out over at least 3 log units
11Molecules Are Not Numbers!
Where are the numbers? Numerical
descriptors
12An Example Capsaicin Analogs
X EC50(mM) log(1/EC50)
H 11.80 4.93
Cl 1.24 5.91
NO2 4.58 5.34
CN 26.50 4.58
C6H5 0.24 6.62
NMe2 4.39 5.36
I 0.35 6.46
NHCHO ? ?
13An Example Capsaicin Analogs
X log(1/EC50) MR p s Es
H 4.93 1.03 0.00 0.00 0.00
Cl 5.91 6.03 0.71 0.23 -0.97
NO2 5.34 7.36 -0.28 0.78 -2.52
CN 4.58 6.33 -0.57 0.66 -0.51
C6H5 6.62 25.36 1.96 -0.01 -3.82
NMe2 5.36 15.55 0.18 -0.83 -2.90
I 6.46 13.94 1.12 0.18 -1.40
NHCHO ? 10.31 -0.98 0.00 -0.98
MR molar refractivity (polarizability)
parameter p hydrophobicity parameter s
electronic sigma constant (para position) Es
Taft size parameter
14An Example Capsaicin Analogs
log(1/EC50) -0.89
0.019 MR
0.23 p
-0.31 s -0.14 Es
15Basic Assumption in QSAR
The structural properties of a compound
contributein a linearly additive way to its
biological activity provided there are no
non-linear dependencies of transport or binding
on some properties
16Molecular Descriptors
- Simple counts of features, e.g. of atoms,
rings,H-bond donors, molecular weight - Physicochemical properties, e.g. polarisability,
hydrophobicity (logP), water-solubility - Group properties, e.g. Hammett and Taft
constants, volume - 2D Fingerprints based on fragments
- 3D Screens based on fragments
172D Fingerprints
C N O P S X F Cl Br I Ph CO NH OH Me Et Py CHO SO CC C?C CN Am Im
1 1 1 0 0 1 0 0 1 0 1 1 1 1 1 0 0 0 0 1 0 0 1 0
18Principal Component Analysis (PCA)
- Many (gt3) variables to describe objects high
dimensionality of descriptor data - PCA is used to reduce dimensionality
- PCA extracts the most important factors
(principal components or PCs) from the data - Useful when correlations exist between
descriptors - The result is a new, small set of variables (PCs)
which explain most of the data variation
19PCA From 2D to 1D
20PCA From 3D to 3D-
21Different Views on PCA
- Statistically, PCA is a multivariate analysis
technique closely related to eigenvector analysis - In matrix terms, PCA is a decomposition of matrix
Xinto two smaller matrices plus a set of
residuals X TPT R - Geometrically, PCA is a projection technique in
which X is projected onto a subspace of reduced
dimensions
22Partial Least Squares (PLS)
(compound 1) (compound 2) (compound
3) (compound n)
y1 a0 a1x11 a2x12 a3x13 e1 y2 a0
a1x21 a2x22 a3x23 e2 y3 a0 a1x31
a2x32 a3x33 e3 yn a0 a1xn1
a2xn2 a3xn3 en Y XA E
X independent variables Y dependent variables
23PLS Cross-validation
- Squared correlation coefficient R2
- Value between 0 and 1 (gt 0.9)
- Indicating explanative power of regression
equation
With cross-validation
- Squared correlation coefficient Q2
- Value between 0 and 1 (gt 0.5)
- Indicating predictive power of regression
equation
24Free-Wilson Analysis
log (1/C) S aixi m xi presence of
group i (0 or 1) ai activity group
contribution of group i m activity value
of unsubstituted compound
25Free-Wilson Analysis
- Computationally straightforward
- Predictions only for substituents already
included - Requires large number of compounds
26Hansch Analysis
Drug transport and binding affinity depend
nonlinearly on lipophilicity log (1/C) a
(log P)2 b log P c Ss k P
n-octanol/water partition coefficient s
Hammett electronic parameter a,b,c regression
coefficients k constant term
27Hansch Analysis
- Fewer regression coefficients needed for
correlation - Interpretation in physicochemical terms
- Predictions for other substituents possible
28Pharmacophore
- Set of structural features in a drug molecule
recognized by a receptor - Sample features
- ? H-bond donor
- ? charge
- ? hydrophobic center
- Distances, 3D relationship
29Pharmacophore Selection
Pharmacophore
Dopamine
L lipophilic site A H-bond acceptor D
H-bond donor PD protonated H-bond donor
30Pharmacophore Selection
Pharmacophore
Dopamine
L lipophilic site A H-bond acceptor D
H-bond donor PD protonated H-bond donor
31Comparative Molecular Field Analysis (CoMFA)
- Set of chemically related compounds
- Common pharmacophore or substructure required
- 3D structures needed (e.g., Corina-generated)
- Flexible molecules are folded
intopharmacophore constraints and aligned
32CoMFA Alignment
33CoMFA Grid and Field Probe
(Only one molecule shown for clarity)
34Electrostatic Potential Contour Lines
35CoMFA Model Derivation
- Molecules are positioned in a regular
gridaccording to alignment - Probes are used to determine the molecular field
Van der Waals field (probe is neutral carbon)
Electrostatic field (probe is charged atom)
Evdw S (Airij-12 - Birij-6)
Ec S qiqj / Drij
363D Contour Map for Electronegativity
37CoMFA Pros and Cons
- Suitable to describe receptor-ligand interactions
- 3D visualization of important features
- Good correlation within related set
- Predictive power within scanned space
- Alignment is often difficult
- Training required