ROC Curves - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

ROC Curves

Description:

Test Statistic z = T - T asymptotically N(0,1) distribution. vT. 16 ... Provides point and confidence interval estimates of each curve's area and of the ... – PowerPoint PPT presentation

Number of Views:689
Avg rating:3.0/5.0
Slides: 22
Provided by: Lind334
Category:

less

Transcript and Presenter's Notes

Title: ROC Curves


1
ROC Curves Wilcoxon and Mann-Whitney Tests
  • Lindsay Jacks
  • Tutorial Presentation
  • CHL 5210 Categorical Data Analysis
  • October 16th, 2007

2
Outline
  • Binary Classification Model
  • ROC Curve
  • Area under the ROC Curve
  • Nonparametric Methods
  • Mann-Whitney Test
  • Wilcoxon Signed-Rank Test
  • SAS Code

3
Binary Classification Model
True Positive The actual value is positive and it
is classified as positive False Negative (Type
II Error) The actual value is positive but it is
classified as negative True Negative The actual
value is negative and it is classified as
negative False Positive (Type I Error) The
actual value is negative but it is classified as
positive
Confusion Matrix
4
Evaluation Metrics
  • True Positive Rate (TPR)
  • Positives correctly classified / Total positives
  • Sensitivity
  • False Positive Rate (FPR)
  • Negatives incorrectly classified / Total
    negatives
  • 1 - Specificity

5
ROC Curve
  • Receiver Operating Characteristic (ROC) curve
  • A technique for visualizing, organizing and
    selecting classifiers based on their performance
  • Two-dimensional graph in which the TPR is plotted
    on the Y axis and the FPR is plotted on the X
    axis
  • Sensitivity vs. (1 Specificity)
  • Depicts relative tradeoffs between benefits (true
    positives) and costs (false positives)

6
ROC Curve
  • The relationship between sensitivity and
    specificity can be described in the graph below
  • The best possible prediction
  • method produces a point in
  • the upper left corner
  • representing 100 sensitivity
  • and 100 specificity
  • If a diagnostic procedure
  • has no predictive value, the
  • relationship between
  • sensitivity and specificity is
  • linear

7
ROC Space
  • Each prediction result or one instance of a
    confusion matrix represents one point in the ROC
    space
  • A completely random guess gives a point along the
    diagonal line (B)
  • Points above the diagonal line (A, C) indicate
    good classification results
  • Points below the diagonal line (C) indicate
    incorrect results

8
Area under ROC curve (AUC)
  • The area under the ROC curve depends on the
    overlap of two normal distribution curves
  • The greater the overlap of the
  • curves, the smaller the area
  • under the ROC curve (the lower
  • the predictive power of the test)
  • The area of overlap indicates
  • where the test cannot distinguish
  • normal from disease
  • When the normal distribution
  • curves overlap totally, the ROC
  • curve turns into a diagonal line

9
Area under ROC curve (AUC)
  • To compare classifiers we may want to reduce the
    ROC performance to a single scalar value
    representing expected performance
  • ? Calculate the AUC
  • Since the AUC is a portion of the area of the
    unit square, its value will always be between 0
    and 1
  • However, because random guessing produces the
    diagonal line between (0, 0) and (1, 1), which
    has an area of 0.5, no realistic classifier
    should have an AUC less than 0.5
  • An ideal classifier has an area of 1

10
Area under ROC curve (AUC)
  • Important statistical property AUC is equivalent
    to the probability that the classifier will rank
    a randomly chosen positive instance higher than a
    randomly chosen negative instance
  • This is equivalent to the
  • Mann-Whitney statistic
  • Comparing two ROC curves
  • The graph represents the areas
  • under two ROC curves, A and B.
  • Classifier B has greater area and
  • therefore better average
  • performance

11
ROC Curve Applications
  • ROC analysis provides a tool to select possibly
    optimal models and to discard suboptimal ones
  • Related to cost/benefit analysis of diagnostic
    decision making
  • Widely used in medicine, radiology, psychology
    recently becoming more popular in areas like
    machine learning and data mining
  • The area under the ROC curve is equivalent to the
    Mann-Whitney statistic however, summarizing the
    ROC curve into a single number loses information
    about the pattern

12
Nonparametric Methods
  • Usually require the use of interval- or
    ratio-scaled data
  • Provide an alternative series of statistical
    methods that require no or very limited
    assumptions to be made about the data
  • Require no assumptions about the population
    probability distributions
  • ? Distribution-free methods

13
Mann-Whitney Test
  • Also known as Mann-Whitney-Wilcoxon (MWW) or
    Wilcoxon rank-sum test
  • A nonparametric alternative to the two-sample
    t-test which is based solely on the order in
    which the observations from the two samples fall
  • Method for determining whether there is a
    difference between two populations
  • Requirements
  • Data must be ordinal or continuous measurements
  • The two samples must be independent

14
Mann-Whitney Test
  • Null hypothesis H0 The two populations are
    identical.
  • Process
  • Combine independent samples into one sample
    (nn1n2)
  • Rank the combined data from lowest to highest
    values, with tied values being assigned the
    average of the tied rankings
  • Compute T, the sum of the ranks for the
    observations in the first sample
  • If the two populations are identical, the sum of
    the ranks of the first sample and those in the
    second sample should be close to the same value
  • Compare the observed value of T to the sampling
    distribution of T for identical populations

15
Mann-Whitney Test
  • Sampling distribution of T for identical
    populations (under H0)
  • Mean µT n1(n1n21)
  • 2
  • Variance vT n1n2(n1n21)
  • 12
  • Test Statistic z T - µT asymptotically N(0,1)
    distribution
  • vvT

16
Wilcoxon Signed-Rank Test
  • A nonparametric alternative to the paired t-test
    for the case of two related samples or repeated
    measurements on a single sample
  • Method for determining whether there is a
    difference between two populations
  • Requirements
  • Data must be interval measurements
  • Does not require assumptions about the form of
    the distribution of the measurements

17
Wilcoxon Signed-Rank Test
  • Test assumes there is information in the
    magnitudes of the differences between paired
    observations, as well as the signs
  • Null hypothesis H0 The two populations are
    identical.
  • Process
  • Compute the differences between the paired
    observations (discard any differences of zero)
  • Rank the absolute value of the differences from
    lowest to highest, with tied differences being
    assigned the average ranking of their positions
  • Give the ranks the sign of the original
    difference in the data
  • Sum the signed ranks and determine whether the
    sum is significantly different from zero

18
Wilcoxon Signed-Rank Test
  • Sampling distribution of T for identical
    populations (under H0)
  • Mean µT 0
  • Variance vT n(n1)(2n1)
  • 6
  • Test Statistic z T asymptotically N(0,1)
    distribution
  • vvT

19
SAS Code
  • ROC Curve
  • ROCPLOT macro
  • Produces a plot showing the ROC curve associated
    with a fitted binary-response model
  • Plot of the sensitivity against 1-specificity
    values associated with the observations'
    predicted event probabilities
  • You must first run the LOGISTIC procedure to
    fit the desired model

20
SAS Code
  • ROC Curve
  • ROC macro
  • Nonparametric comparison of areas under
    correlated ROC curves
  • Provides point and confidence interval estimates
    of each curve's area and of the pairwise
    differences among the areas
  • Tests of the pairwise differences are also given
  • You must first run the LOGISTIC procedure to
    fit each of the models whose ROC curves are to be
    compared

21
SAS Code
  • Mann-Whitney-Wilcoxon Test
  • PROC NPAR1WAY WILCOXON
  • CLASS variable
  • VAR variable
  • EXACT WILCOXON
  • Wilcoxon Signed-Rank Test
  • PROC UNIVARIATE
  • VAR variable
  • You must first perform a DATA step to create
    the difference SAS will not calculate the
    difference in PROC UNIVARIATE
Write a Comment
User Comments (0)
About PowerShow.com