Title: CostSensitive Classifier Evaluation using Cost Curves
1Cost-Sensitive Classifier Evaluation using Cost
Curves
MLJ 2006
- Robert Holte
- Computing Science Dept.
- University of Alberta
Joint work with Chris Drummond, NRC, Ottawa
Cost Curve Tool programmed by Alden Flatt
2Classifiers
- A classifier assigns an object to one of a
predefined set of categories or classes. - Example A metal detector either
- sounds an alarm, or
- stays quiet when someone walks through.
- This talk only 2 classes, positive and
negative.
3Two Types of Error
False positive (false alarm), FP alarm sounds
but person is not carrying metal
- False negative (miss), FN
- alarm doesnt sound but person is carrying metal
42-class Confusion Matrix
- Reduce the 4 numbers to two rates
- true positive rate TP (TP)/(P)
- false positive rate FP (FP)/(N)
- Rates are independent of class ratio
subject to certain conditions
5Example 3 classifiers
Classifier 1 TP 0.4 FP 0.3
Classifier 2 TP 0.7 FP 0.5
Classifier 3 TP 0.6 FP 0.2
6Assumptions
- Standard Cost Model
- correct classification costs 0
- cost of misclassification depends only on the
class, not on the individual example - costs are additive over a set of examples
- True FP and TP do not vary with time or location,
and are accurately estimated. - Costs and Class Distributions
- are not known precisely at evaluation time
- may vary with time
- may depend on where the classifier is deployed
7How to Evaluate Performance ?
- Scalar measure summarizing performance
- Accuracy
- Expected cost
- Area under the ROC curve
- Performance Visualization Techniques
- ROC curve
- Cost Curve
8Is AUC0.95 better than AUC0.75 ?
When positives outnumber negatives 251,
AUC0.95 has more than twice the error rate of
AUC0.75
9The Key Question When?
A
B
The key question is When is A better than B ?
10Whats Genuinely Good AboutScalar Measures ?
- we know how to average them, compute confidence
intervals, test for significance, etc. - being one-dimensional leaves the second dimension
free for other uses, e.g. - Learning curves
- Multiple datasets
- easily generalize to any number of classes
11Cost Curves
Classifier 1 TP 0.4 FP 0.3
Classifier 2 TP 0.7 FP 0.5
Classifier 3 TP 0.6 FP 0.2
12Operating Range
13Lower Envelope
14Varying a Threshold
always negative
always positive
15Taking Costs Into Account
Y FNX FP (1-X) So far, X p(), making Y
error rate
Y expected cost normalized to 0,1
16Averaging Cost Curves
17Cost Curve Avg. in ROC Space
18Confidence Interval Example
19Paired Resampling to Test Statistical Significance
For the 100 test examples in the negative class
FP for classifier1 (3010)/100 0.40 FP for
classifier2 (300)/100 0.30 FP2 FP1
-0.10
Resample this matrix 10000 times to get (FP2-FP1)
values. Do the same for the matrix based on
positive test examples. Plot and take 95
envelope as before.
20Statistical Significance Example
classifier1
classifier2
FP2-FP1
FN2-FN1
21Correlation between Classifiers
High Correlation (the preceding example)
Low Correlation
22Low correlation Low significance
classifier1
classifier2
FP2-FP1
FN2-FN1
23Limited Range of Significance
24Comparing J48 and AdaBoost
25Multiple Comparisons
26Learning Curves
27Conclusions
- Scalar performance measures, including AUC, do
not indicate when one classifier is better than
another. - Cost curves enable easy visualization of
- Average performance (expected cost)
- operating range
- confidence intervals on performance
- difference in performance and its significance
- See MLJ2006 paper for all the details
- Cost/ROC curve software is available.
Contact holte_at_cs.ualberta.ca