Title: Kein Folientitel
1Introduction to QSAR (Quantitative Structure
Activity Relationships)
2Introduction to QSAR
- Example of a Qualitative Structure Activity
Relationship (SAR)
Affinity to Serotonin Receptor 1. decreases by
N-alkylation (R1/R2) 2. decreases by
bis-methylation (R3/R4) 3. increases by
methoxylation in R5 and/or R7 4. increases
by adding lipophilic substituents in R6
3Introduction to QSAR
- In contrast to Qualitative SAR, Quantitative SAR
(QSAR) seeks to find a mathematical relationship
between biological activity and molecular
properties - General form of a QSAR equation
- biol. activity f(P) with P molecular
property/ies - or more specifically
- biol. activity const. (c1.P1) (c2.P2)
(c3.P3) ... - Molecular properties (descriptors) P are
calculated for each molecule in the data set - Coefficients c and constant term are calculated
by statistical methods (e.g. multiple linear
regression)
4Molecular QSAR-Descriptors
- 1D Whole-molecule properties (e.g. molecular
weight, melting point, logP etc.) - 2D Substituent constants (e.g. ?, ?, molar
refractivity), fragment fingerprints,
topological indices - 3D Surface or field properties (e.g.
electrostatic potential, , steric fields,
hydrophobicity, solvent accessible surface
area etc.),
5Introduction to QSAR
- Why QSAR?
- QSAR models are derived from a series of
(similar) molecules with known activity (training
set) - If a statistically relevant QSAR model has been
found, it can be applied to new molecules in this
series (test set) in order to predict their
activity before biological testing (or even
before synthesis!)
6Introduction to QSAR
Example Analgesic activity of Capsaicin analogs
(taken from Walpole et al., Sandoz)
7Introduction to QSAR
8Introduction to QSAR
- The Gibbs-Helmholtz equation (?GRTlnK) tells us
that there is a logarithmic relationship between
equilibrium constants (e.g. EC50) and free energy
of binding - Thus, we have to transform the EC50 values to a
logarithmic scale
9Introduction to QSAR
- Now, we require some molecular properties
(descriptors)... - The Sandoz group decided to use two substituent
constants the hydrophobic constant ? and the
molar refractivity (MR) (correlated with the size
and polarizability of the substituents)
10Introduction to QSAR
- We can plot the descriptor values vs. Log EC50 ...
- ...and we can calculate linear equations for both
parameters - Log EC50 0.76 - (0.82)?
- Log EC50 1.14 - (0.07)MR
Our first QSAR equations!!!
11Introduction to QSAR
- How larger are the errors that we make?
?-Equation Log EC50 0.76 - (0.82)?
12Introduction to QSAR
- How larger are the errors that we make?
MR-Equation Log EC50 1.14 - (0.07)MR
13Introduction to QSAR
- How larger are the errors that we make?
Actual (measured) Log EC50 vs Predicted Log EC50
Correlation coefficients (R2) 0.88 0.53
14Introduction to QSAR
- Can we do any better by using both parameters in
the equation (multiple linear regression (MLR)
instead of simple linear regression)?
Best MLR Equation Log EC50 0.76 - (0.82)?
(0.0003)MR (Corr. Coeff. R2 0.89)
15Introduction to QSAR
- How can we validate these QSAR models?
- Prediction within the training set (e.g. by
leave-one-out cross validation) - leave out each compound once
- calculate QSAR model with remaining compounds
only - predict activity of left-out compound
- compare prediction with "true" affinity
- calculate "cross validated" R2 (often reported as
Q2) - Prediction of the test set
16Introduction to QSAR
- How can we validate these QSAR models?
- Prediction of the training set (cross
validation) - Log EC50 0.76 - (0.82)? R20.88 Q20.72
- Log EC50 1.14 - (0.07)MR R20.53 Q20.28
- Log EC50 0.76 - (0.82)? (0.0003)MR R20.89
Q20.58
17Introduction to QSAR
- How can we validate these QSAR models?
- Prediction of the test set (compound 6i in this
example)
- Log EC50 0.76 - (0.82)? Predicted EC50
for 6i 1.56 - Log EC50 1.14 - (0.07)MR Predicted
EC50 for 6i 0.42 - Log EC50 0.76 - (0.82)? (0.0003)MR
Predicted EC50 for 6i 1.57
- Now we have a problem....
18Introduction to QSAR
- Some problems associated with "classical" QSAR
- Only applicable within a chemical series
- A good training set must be available
- Activity data should be evenly spread
- Activity data should span 3-4 orders of magnitude
(log units) - Choice of meaningful descriptors
- Problem of extrapolation (e.g. descriptors of
test compounds lie out of descriptor range of
training set) - Non-linear relationships are hard to detect
19Artificial Neural Nets (ANN) An alternative way
of deriving QSAR models
20J. Zupan, J. Gasteiger
Neural Networksin Chemistryand Drug Design
Second Edition
WILEY-VCH, Weinheim, 1999
21Artificial Neural Nets
number of chemistry-related publications
927
855
743
441
498
290
105
30
3
5
22Artificial Neural Nets
The "100 Steps Paradoxon"
human brain computer reaction time of firing
of neuron clock ratebasic unit 10-3 sec 10-9
sec (500 MHz) recognition of the faceof a
friend 10-1 sec 10-1 sec no. of processing
steps 100 100,000,000
- The human brain works highly parallel
23Artificial Neural Nets
Visual Cortexof the Human Brain
24Artificial Neural Nets
Biological and Artificial Neurons
25Artificial Neural Nets
Input e.g. molecular descriptors Output e.g.
biological activity
26Artificial Neural Nets - Supervised learning
predict new compounds
change weights
27Artificial Neural Nets
Net architecture ("topology") in Capsaicin
example
28Artificial Neural Nets
Capsaicin dataset Predicted vs Actual Log EC50
using a trained Neural Net(Corr. Coeff. R2
0.92)
29Artificial Neural Nets
- Prediction of the test set (compound 6i)
- (Multiple) linear regression predictions
- Log EC50 0.76 - (0.82)? Pred. Log EC50 for 6i
1.56 - Log EC50 1.14 - (0.07)MR Pred. Log EC50 for
6i 0.42 - Log EC50 0.76 - (0.82)? (0.0003)MR Pred. Log
EC50 for 6i 1.57
- Neural Net prediction Pred. Log EC50 for 6i
1.05 - And the experimental affinity is......
- Log EC50 gt3!!!(totally inactive)
30Artificial Neural Nets - Unsupervised learning
?projection with preservation of the topology of
the input space
31Artificial Neural Nets - Unsupervised learning
- Molecules exhibit their activity via complex
surface properties - In a Kohonen Neural Net, compounds with similar
surfaces are placed in similar areas in 2D
space
32Artificial Neural Nets - Unsupervised learning
112 Dopamine and 60 Benzodiazepine Agonists in
the bulk of 8,323 structures of unknown activity
Kohonen map (40x30)
33Artificial Neural Nets - Unsupervised learning
benzodiazepine 50
benzodiazepine 39
benzodiazepine 43
benzodiazepine 22
benzodiazepine 49
34Artificial Neural Nets
Kohonen map unsupervised learning
multilayer network supervised learning
353D QSAR Linking QSAR with Molecular Modeling
363D QSAR
- Two standard methods
- CoMFA (Comparative Molecular Field Analysis,
Cramer et al.) - QuaSAR (Vedani et al.)
373D QSAR - The CoMFA Approach
383D QSAR - The CoMFA Approach
For the capsaicin example, CoMFA predicted Log
EC50-0.21!
393D QSAR - The CoMFA Approach