Evaluation of Results (classifiers, and beyond) Biplav Srivastava

About This Presentation

Title:

Description:

Number of Views:96

Avg rating:3.0/5.0

Slides: 16

Provided by: asu124

Learn more at: https://www.public.asu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Evaluation of Results (classifiers, and beyond) Biplav Srivastava

1
Evaluation of Results (classifiers, and
beyond)Biplav Srivastava

Sources
WittenFrank00 Witten, I.H. and Frank, E.
Data Mining - Practical Machine Learning Tools
and Techniques with JAVA Implementations,
Morgan Kaufmann, 2000.
Lim et al99 Lim, T.-S., Loh, W.-Y. and Shih,
Y.-S. A Comparison of Prediction Accuracy,
Complexity, and Training Time of Thirty-three
Old and New Classification Algorithms, Machine
Learning. Forthcoming. (Appendix containing
complete tables of error rates, ranks, and
training times download the data sets in C4.5
format)

2
Evaluation of MethodsIdeally find the best
onePractically find comparable classes

3
Types of classifiers

Decision trees
Neural Network
Statistical
Regression
Pi w0 w1x1 w2x2 ... wmxm
Use regression on data setMin Sum(Ci - (w0
w1x1 ... wmxm ))2to get the weights.
Classification
Perform regression for each class output 1 for
Ci, 0 otherwise, to get a linear expression
Given test data, evaluate each linear expression
and choose the class corresponding to the largest

4
Assumptions in classification

Data set is a representative sample
Prior probability is proportional to the
frequency in training sample sizes
Changing categories to numerical values
If attribute X takes k values C1 ,C2 ,...,Ck,
have (k-1) dimensional vector d1 ,d2 ,...,dk-1
such that di1 if X Ci and di0, otherwise. For
X Ck, the vector contains all zeros.

5
Error Rate of Classifier

6
Error Rate of Classifier (cont)

Fold choice can affect result
STRATIFICATION Class ratio in each set is same
as in whole data set.
Use K K-fold cross validation to overcome random
variation in fold choice.
Leave-one-out N-fold cross validation
Bootstrap
Training Set Select N data items at random with
substitution. Prob (1 - 1/N)N 1/e 0.368
Test Set Data Set - Training Set
e 0.632 e(test) 0.368 e(training)

7
Evaluation in Lim et al 99

8
Evaluation in Lim et al 99

Acceptable performance
If p is minimum error rate of all classifiers,
those within 1 standard error on p are accepted
Std error of p Sqrt(p(1-p)/N)
Statistical significance of error rates
Null hypothesis All algorithms with same mean
error rate ? Test differences of mean error
rates. (Hypothesis REJECTED).
Tukey method Difference between mean error
rates significant at 10 if differ by 0.058.

9
Evaluation in Lim et al 99

Rank analysis (no normality assumption)
Data Set(i) Sort according to ascending error
rate and assign rank (ties given avg rank)
Error rate 0.1 0.342 0.5 0.5 0.5 0.543 0.677
0.789 Rank 1 2 4 4
4 6 7 8
Get Mean Rank across all Data Set
Statistical significance of rank
Null Hypothesis on difference of mean ranks
(Friedman test) is REJECTED.
Difference in mean ranks greater than 8.7 is
significant at 10 level.

10
Evaluation in Lim et al 99

Equivalent classifiers from best (POL)
(Training time lt 10 min)
15 found with mean error rate
18 found with mean rank
Training time shown in order slower than fastest
classifier (10(x-1) to 10x)
Decision tree learners (C4.5, FACT - statistical
tests for splitter), regression methods train
fast
Spline-based statisticals and NNs slower

11
Evaluation in Lim et al 99

Size of trees (in decision trees)
Use 10-fold cross-validation
Noise typically reduces tree size
Scalability with data set size
If data set small, use bootstrap re-sampling
N items are drawn with substitution
Class attribute is randomly changed with prob
0.1 to a value from the valid set selected
uniformly
Else, use given data size

12
Evaluation in Lim et al 99