Evaluation of Results (classifiers, and beyond) Biplav Srivastava - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluation of Results (classifiers, and beyond) Biplav Srivastava

Description:

Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine Learning Tools ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 16
Provided by: asu124
Category:

less

Transcript and Presenter's Notes

Title: Evaluation of Results (classifiers, and beyond) Biplav Srivastava


1
Evaluation of Results (classifiers, and
beyond)Biplav Srivastava
  • Sources
  • WittenFrank00 Witten, I.H. and Frank, E.
    Data Mining - Practical Machine Learning Tools
    and Techniques with JAVA Implementations,
    Morgan Kaufmann, 2000.
  • Lim et al99 Lim, T.-S., Loh, W.-Y. and Shih,
    Y.-S. A Comparison of Prediction Accuracy,
    Complexity, and Training Time of Thirty-three
    Old and New Classification Algorithms, Machine
    Learning. Forthcoming. (Appendix containing
    complete tables of error rates, ranks, and
    training times download the data sets in C4.5
    format)

2
Evaluation of MethodsIdeally find the best
onePractically find comparable classes
  • Classification accuracy
  • Effect of noise on accuracy
  • Comprehensibility of result
  • Compactness
  • Complexity
  • Training time
  • Scalability with increase in sample size

3
Types of classifiers
  • Decision trees
  • Neural Network
  • Statistical
  • Regression
  • Pi w0 w1x1 w2x2 ... wmxm
  • Use regression on data setMin Sum(Ci - (w0
    w1x1 ... wmxm ))2to get the weights.
  • Classification
  • Perform regression for each class output 1 for
    Ci, 0 otherwise, to get a linear expression
  • Given test data, evaluate each linear expression
    and choose the class corresponding to the largest

4
Assumptions in classification
  • Data set is a representative sample
  • Prior probability is proportional to the
    frequency in training sample sizes
  • Changing categories to numerical values
  • If attribute X takes k values C1 ,C2 ,...,Ck,
    have (k-1) dimensional vector d1 ,d2 ,...,dk-1
    such that di1 if X Ci and di0, otherwise. For
    X Ck, the vector contains all zeros.

5
Error Rate of Classifier
  • If data set is large
  • use result from test data
  • test data is usually 1/3 of total data
  • If data size is small
  • K-fold cross validation (usually 10)
  • Divide data set into roughly equal K sets.
  • Use K-1 for training, the remaining for testing
  • Repeat K times and average the error rate

6
Error Rate of Classifier (cont)
  • Fold choice can affect result
  • STRATIFICATION Class ratio in each set is same
    as in whole data set.
  • Use K K-fold cross validation to overcome random
    variation in fold choice.
  • Leave-one-out N-fold cross validation
  • Bootstrap
  • Training Set Select N data items at random with
    substitution. Prob (1 - 1/N)N 1/e 0.368
  • Test Set Data Set - Training Set
  • e 0.632 e(test) 0.368 e(training)

7
Evaluation in Lim et al 99
  • 22 DT, 9 Statistical, 2 NN classifiers
  • Time Measurement
  • Use SPEC marks to rate platforms and scale
    results based on these marks.
  • Error rates
  • Calculation
  • For test data gt 1000, use its error rate
  • Else, use 10-fold cross validation
  • Does not use multiple 10-folds.

8
Evaluation in Lim et al 99
  • Acceptable performance
  • If p is minimum error rate of all classifiers,
    those within 1 standard error on p are accepted
  • Std error of p Sqrt(p(1-p)/N)
  • Statistical significance of error rates
  • Null hypothesis All algorithms with same mean
    error rate ? Test differences of mean error
    rates. (Hypothesis REJECTED).
  • Tukey method Difference between mean error
    rates significant at 10 if differ by 0.058.

9
Evaluation in Lim et al 99
  • Rank analysis (no normality assumption)
  • Data Set(i) Sort according to ascending error
    rate and assign rank (ties given avg rank)
    Error rate 0.1 0.342 0.5 0.5 0.5 0.543 0.677
    0.789 Rank 1 2 4 4
    4 6 7 8
  • Get Mean Rank across all Data Set
  • Statistical significance of rank
  • Null Hypothesis on difference of mean ranks
    (Friedman test) is REJECTED.
  • Difference in mean ranks greater than 8.7 is
    significant at 10 level.

10
Evaluation in Lim et al 99
  • Equivalent classifiers from best (POL)
    (Training time lt 10 min)
  • 15 found with mean error rate
  • 18 found with mean rank
  • Training time shown in order slower than fastest
    classifier (10(x-1) to 10x)
  • Decision tree learners (C4.5, FACT - statistical
    tests for splitter), regression methods train
    fast
  • Spline-based statisticals and NNs slower

11
Evaluation in Lim et al 99
  • Size of trees (in decision trees)
  • Use 10-fold cross-validation
  • Noise typically reduces tree size
  • Scalability with data set size
  • If data set small, use bootstrap re-sampling
  • N items are drawn with substitution
  • Class attribute is randomly changed with prob
    0.1 to a value from the valid set selected
    uniformly
  • Else, use given data size

12
Evaluation in Lim et al 99
  • log(training time) increases linearly with log(N)
  • Decision trees usually scale
  • C4.5 with rules does not scale, but trees do
  • QUEST, FACT (multi-attribute) scale
  • Regression methods usually scale
  • NN methods were not tested

13
Lim et al 99 Summary
  • Use method that fits the requirement
  • error rate, training time, ...
  • Decision tree with univariate splits (single
    attribute) for data interpretation
  • C4.5 trees are big, C4.5 rules dont scale
  • Simple regression methods are good fast, easy
    implementation, scalable

14
What more ?
  • Precision/ Recall
  • Cost based analysis

ACTUAL
15
Demonstration from WEKA
Write a Comment
User Comments (0)
About PowerShow.com