Title: Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA)
1Quantitative Structure-Activity Relationships
(QSAR)Comparative Molecular Field Analysis
(CoMFA)
Gijs Schaftenaar
2Outline
- Introduction
- Structures and activities
- Analysis techniques Free-Wilson, Hansch
- Regression techniques PCA, PLS
- Comparative Molecular Field Analysis
3QSAR The Setting
Quantitative structure-activity relationships are
used when there is little or no receptor
information, butthere are measured activities of
(many) compounds
4From Structure to Property
EC50
5From Structure to Property
LD50
6From Structure to Property
7QSAR Which Relationship?
Quantitative structure-activity relationships
correlate chemical/biological activitieswith
structural features or atomic, group ormolecular
properties. within a range of structurally
similar compounds
8Free Energy of Binding andEquilibrium Constants
The free energy of binding is related to the
reaction constants of ligand-receptor complex
formation DGbinding 2.303 RT log K
2.303 RT log (kon / koff) Equilibrium constant
K Rate constants kon (association) and koff
(dissociation)
9Concentration as Activity Measure
- A critical molar concentration Cthat produces
the biological effectis related to the
equilibrium constant K - Usually log (1/C) is used (c.f. pH)
- For meaningful QSARs, activities needto be
spread out over at least 3 log units
10Free Energy of Binding
DGbinding DG0 DGhb DGionic DGlipo
DGrot DG0 entropy loss (translat.
rotat.) 5.4 DGhb ideal hydrogen bond
4.7 DGionic ideal ionic interaction
8.3 DGlipo lipophilic contact
0.17 DGrot entropy loss (rotat. bonds)
1.4 (Energies in kJ/mol per unit
feature)
11Molecules Are Not Numbers!
Where are the numbers? Numerical
descriptors
12Basic Assumption in QSAR
The structural properties of a compound
contributein a linearly additive way to its
biological activity provided there are no
non-linear dependencies of transport or binding
on some properties
13An Example Capsaicin Analogs
X EC50(mM) log(1/EC50)
H 11.80 4.93
Cl 1.24 5.91
NO2 4.58 5.34
CN 26.50 4.58
C6H5 0.24 6.62
NMe2 4.39 5.36
I 0.35 6.46
NHCHO ? ?
14An Example Capsaicin Analogs
X log(1/EC50) MR p s Es
H 4.93 1.03 0.00 0.00 0.00
Cl 5.91 6.03 0.71 0.23 -0.97
NO2 5.34 7.36 -0.28 0.78 -2.52
CN 4.58 6.33 -0.57 0.66 -0.51
C6H5 6.62 25.36 1.96 -0.01 -3.82
NMe2 5.36 15.55 0.18 -0.83 -2.90
I 6.46 13.94 1.12 0.18 -1.40
NHCHO ? 10.31 -0.98 0.00 -0.98
MR molar refractivity (polarizability)
parameter p hydrophobicity parameter s
electronic sigma constant (para position) Es
Taft size parameter
15An Example Capsaicin Analogs
log(1/EC50) -0.89
0.019 MR
0.23 p
-0.31 s -0.14 Es
16An Example Capsaicin Analogs
X EC50(mM) log(1/EC50)
H 11.80 4.93
Cl 1.24 5.91
NO2 4.58 5.34
CN 26.50 4.58
C6H5 0.24 6.62
NMe2 4.39 5.36
I 0.35 6.46
NHCHO ? ?
17First Approaches The Early Days
- Free- Wilson Analysis
- Hansch Analysis
18Free-Wilson Analysis
log (1/C) S aixi m xi presence of
group i (0 or 1) ai activity group
contribution of group i m activity value
of unsubstituted compound
19Free-Wilson Analysis
- Computationally straightforward
- Predictions only for substituents already
included - Requires large number of compounds
20Hansch Analysis
Drug transport and binding affinity depend
nonlinearly on lipophilicity log (1/C) a
(log P)2 b log P c Ss k P
n-octanol/water partition coefficient s
Hammett electronic parameter a,b,c regression
coefficients k constant term
21Hansch Analysis
- Fewer regression coefficients needed for
correlation - Interpretation in physicochemical terms
- Predictions for other substituents possible
22Molecular Descriptors
- Simple counts of features, e.g. of atoms,
rings,H-bond donors, molecular weight - Physicochemical properties, e.g. polarisability,
hydrophobicity (logP), water-solubility - Group properties, e.g. Hammett and Taft
constants, volume - 2D Fingerprints based on fragments
- 3D Screens based on fragments
232D Fingerprints
C N O P S X F Cl Br I Ph CO NH OH Me Et Py CHO SO CC C?C CN Am Im
1 1 1 0 0 1 0 0 1 0 1 1 1 1 1 0 0 0 0 1 0 0 1 0
24Regression Techniques
- Principal Component Analysis (PCA)
- Partial Least Squares (PLS)
25Principal Component Analysis (PCA)
- Many (gt3) variables to describe objects high
dimensionality of descriptor data - PCA is used to reduce dimensionality
- PCA extracts the most important factors
(principal components or PCs) from the data - Useful when correlations exist between
descriptors - The result is a new, small set of variables (PCs)
which explain most of the data variation
26PCA From 2D to 1D
27PCA From 3D to 3D-
28Different Views on PCA
- Statistically, PCA is a multivariate analysis
technique closely related to eigenvector analysis - In matrix terms, PCA is a decomposition of matrix
Xinto two smaller matrices plus a set of
residuals X TPT R - Geometrically, PCA is a projection technique in
which X is projected onto a subspace of reduced
dimensions
29Partial Least Squares (PLS)
(compound 1) (compound 2) (compound
3) (compound n)
y1 a0 a1x11 a2x12 a3x13 e1 y2 a0
a1x21 a2x22 a3x23 e2 y3 a0 a1x31
a2x32 a3x33 e3 yn a0 a1xn1
a2xn2 a3xn3 en Y XA E
X independent variables Y dependent variables
30PLS Cross-validation
- Squared correlation coefficient R2
- Value between 0 and 1 (gt 0.9)
- Indicating explanative power of regression
equation
With cross-validation
- Squared correlation coefficient Q2
- Value between 0 and 1 (gt 0.5)
- Indicating predictive power of regression
equation
31PCA vs PLS
- PCA The Principle Components describe the
variance in the independent variables
(descriptors) - PLS The Principle Components describe the
variance in both the independent variables
(descriptors) and the dependent variable
(activity)
32Comparative Molecular Field Analysis (CoMFA)
- Set of chemically related compounds
- Common substructure required
- 3D structures needed (e.g., Corina-generated)
- Bioactive conformations of the active compounds
are to be aligned
33CoMFA Alignment
34CoMFA Grid and Field Probe
(Only one molecule shown for clarity)
35Electrostatic Potential Contour Lines
36CoMFA Model Derivation
- Molecules are positioned in a regular
gridaccording to alignment - Probes are used to determine the molecular field
Van der Waals field (probe is neutral carbon)
Electrostatic field (probe is charged atom)
Evdw S (Airij-12 - Birij-6)
Ec S qiqj / Drij
373D Contour Map for Electronegativity
38CoMFA Pros and Cons
- Suitable to describe receptor-ligand interactions
- 3D visualization of important features
- Good correlation within related set
- Predictive power within scanned space
- Alignment is often difficult
- Training required