Title: The%20Receiver%20Operating%20Characteristic%20(ROC)%20Curve
1The Receiver Operating Characteristic (ROC) Curve
- EPP 245
- Statistical Analysis of
- Laboratory Data
2Binary Classification
- Suppose we have two groups for which each case is
a member of one or the other, and that we know
the correct classification (truth). - Suppose we have a prediction method that produces
a single numerical value, and that small values
of that number suggest membership in group 1 and
large values suggest membership in group 2
3- If we pick a cutpoint t, we can assign any case
with a predicted value t to group 1 and the
others to group 2. - For that value of t, we can compute the number
correctly assigned to group 2 and the number
incorrectly assigned to group 2 (true positives
and false positives). - For t small enough, all will be assigned to group
2 and for t large enough all will be assigned to
group 1. - The ROC curve is a plot of true positives vs.
false positives
4Juul's IGF data Description The 'juul'
data frame has 1339 rows and 6 columns. It
contains a reference sample of the
distribution of insulin-like growth factor
(IGF-I), one observation per subject in various
ages with the bulk of the data collected in
connection with school physical
examinations. Variables age a numeric
vector (years). menarche a numeric vector.
Has menarche occurred (code 1 no, 2
yes)? sex a numeric vector (1 boy, 2
girl). igf1 a numeric vector. Insulin-like
growth factor (mug/l). tanner a numeric
vector. Codes 1-5 Stages of puberty a.m.
Tanner. testvol a numeric vector. Testicular
volume (ml).
5Predicting Menarche
- Subset Juul data to only females between 8 and 20
years old - Predict menarch from age as a quantitative
variable and Tanner score as a qualitative
variable using dummy variables - Menarch re-coded to be 0/1
6. logistic men1 age tan2 tan3 tan4 tan5 Logistic
regression Number
of obs 519
LR chi2(5)
568.74
Prob gt chi2 0.0000 Log
likelihood -75.327218
Pseudo R2 0.7906 --------------------
--------------------------------------------------
-------- men1 Odds Ratio Std. Err.
z Pgtz 95 Conf. Interval ------------
-------------------------------------------------
---------------- age 3.944062
.7162327 7.56 0.000 2.762915
5.630151 tan2 .0444044 .0486937
-2.84 0.005 .0051761 .3809341
tan3 .1369598 .095596 -2.85 0.004
.0348712 .5379227 tan4 .6969611
.3898228 -0.65 0.519 .2328715
2.085935 tan5 9.169558 7.638664
2.66 0.008 1.791671 46.9287 ------------
--------------------------------------------------
---------------- . predict pmen (option p
assumed Pr(men1)) . predict pmen1, xb
7. histogram pmen . graph export pmenhist.wmf .
histogram pmen if men10, title("Pre-Menarch") .
graph export pmenhist0.wmf . histogram pmen if
men11, title("Post-Menarch") . graph export
pmenhist1.wmf . histogram pmen1 . graph export
pmen1hist.wmf . hist pmen1 if men10,
title("Pre-Menarche") . graph export
pmen1hist0.wmf . hist pmen1 if men11,
title("Post-Menarche") . graph export
pmen1hist1.wmf . lroc Logistic model for
men1 number of observations 519 area
under ROC curve 0.9867 . graph export
pmenroc.wmf
8(No Transcript)
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)