QSAR/QSPR Model development and Validation Essential for successful application and interpretation

About This Presentation

Title:

QSAR/QSPR Model development and Validation Essential for successful application and interpretation

Description:

Title: Validation: Essential for successful application and interpretation of QSAR/QSPR models Author: ALADDIN Last modified by: Dear User! Created Date – PowerPoint PPT presentation

Number of Views:214

Avg rating:3.0/5.0

Slides: 19

Provided by: ALAD163

Category:

more less

Transcript and Presenter's Notes

Title: QSAR/QSPR Model development and Validation Essential for successful application and interpretation

1
QSAR/QSPR Model development and
ValidationEssential for successful application
and interpretation
8th Iranian Workshop on Chemometrics, IASBS,
7-9 Feb 2009
Mohsen Kompany-Zareh
2
Content
3
31 molecules 53 descriptors
Selwood data D (31x53) , Y(31x1)
gtgt load selwood.txt gtgt Dselwood(,1end-1) gtgt
yselwood(,end)
4
Simplest model
Multiple Linear Regression
D b y
b D y
gtgt b0 D\y gtgt yEST Db0
Model is developed? Validation?
22 of 53 coeff.s are zero!!
b0
5
Problem

Sometimes a highly fitted and accurate model
for training set is not proper for validation
sets !!

Is not reliable !!
6
External validation
There are many different methods for selection of
members in training and test set.
Division to calibration and test sets
calD D(13end,)D(23end,)
valD D(33end,) caly
y(13end,)y(23end,) valy
y(33end,)
Model
calD
Developm.
caly
valD
valy
validation
b1calD\caly model development
7
gtgt calyESTcalDb1
gtgt valyESTvalDb1 external model
validation
?
?
Not good prediction
8
gtgt calyESTcalDb1 root mean square error of
calibr gtgt rmsec1sqrt(((caly-calyEST)'(caly-calyE
ST))/calDr)
RMSEC2.9396e-014
gtgt testyESTtestDb1 external model
validation gtgt rmsep1sqrt(((testy-testyEST)'(test
y-testyEST))/testDr)
?
Not good prediction
?
RMSEP2.2940
9
Train
Test
residual SS
10
Train
Test
Tot variance SS
11
Train
R2 1.0000
Test
?
q2 -8.5220
12
Training set
Internal validation
Cross validation
Leave-one-out
13
Training set
14
validation
developm
cumPRESS
subsamples molec.s in training set
15
LOO CV
for i 1Dr calX X(1i-1,)X(i1Dr,)
valX X(i,) caly
y(1i-1,)y(i1Dr,) valy y(i,)
b (calX\caly)' valyEST(i)
valXb press(i) ((valyEST(i)-valy).2
)' end cumpress sum(press) rmsecv
sqrt(cumpress/Dr) q2LOO1-((y-valyEST')'(y
-valyEST'))/
((y-mean(y))'(y-mean(y)))
16
q2LOO -4.8574
RMSECV 2.0397
gtgt q2ASYMPTOT1-(1-R2)(calDr/(calDr-calDc))2
q2ASYMPTOT 1.0000
gtgt if q2LOO-q2ASYMPTOTlt0.005,disp('reject'),end
REJECT
17
QUIK
4 correlated descriptors
2 1 1 1
4 2 2 2
6 3 3 3
8 4 4 4
10 5 5 5
10
20
30
40
50
M
y
gtgt corr(M)
gtgt psize(M,2) gtgt CorrEVsvds(corr(M),p)
1 1 1 1
1 1 1 1
1 1 1 1
1 1 1 1
It seems possible to use svd(M)
18
gtgt Ksum(abs((CorrEV/sum(CorrEV))-(1/p)))/(2(p-1)
/p)
All in a function
gtgt KMQUIK(M)
KM 1.0000
Maximum correlation between descriptors
gtgt KMYQUIK(M Y)
KMY 1.0000
if KMY-KMlt0.05,disp('reject'),else,disp('NOT
reject'), end
REJECT

Write a Comment

User Comments (0)