Title: Performance Evaluation: Estimation of Recognition rates
1Performance EvaluationEstimation ofRecognition
rates
Machine Learning Performance Evaluation
- J.-S. Roger Jang ( ??? )
- CSIE Dept., National Taiwan Univ.
- http//mirlab.org/jang
- jang_at_mirlab.org
2Outline
- Performance indices of a given classifier/model
- Accuracy (recognition rate)
- Computation load
- Methods to estimate the recognition rate
- Inside test
- One-sided holdout test
- Two-sided holdout test
- M-fold cross validation
- Leave-one-out cross validation
3 Synonym
- The following sets of synonyms will be use
interchangeably - Classifier, model
- Recognition rate, accuracy
4Performance Indices
- Performance indices of a classifier
- Recognition rate
- Requires an objective procedure to derive it
- Computation load
- Design-time computation
- Run-time computation
- Our focus
- Recognition rate and the procedures to derive it
- The estimated accuracy depends on
- Dataset
- Model (types and complexity)
5Methods for Deriving Recognition rates
- Methods to derive the recognition rates
- Inside test (resubstitution recog. rate)
- One-sided holdout test
- Two-sided holdout test
- M-fold cross validation
- Leave-one-out cross validation
- Data partitioning
- Training set
- Training and test sets
- Training, validating, and test sets
6Inside Test
- Dataset partitioning
- Use the whole dataset for training evaluation
- Recognition rate
- Inside-test recognition rate
- Resubstitution accuracy
7Inside Test (2)
- Characteristics
- Too optimistic since RR tends to be higher
- For instance, 1-NNC always has an RR of 100!
- Can be used as the upper bound of the true RR.
- Potential reasons for low inside-test RR
- Bad features of the dataset
- Bad method for model construction, such as
- Bad results from neural network training
- Bad results from k-means clustering
8One-side Holdout Test
- Dataset partitioning
- Training set for model construction
- Test set for performance evaluation
- Recognition rate
- Inside-test RR
- Outside-test RR
9One-side Holdout Test (2)
- Characteristics
- Highly affected by data partitioning
- Usually Adopted when design-time computation load
is high
10Two-sided Holdout Test
- Dataset partitioning
- Training set for model construction
- Test set for performance evaluation
- Role reversal
11Two-sided Holdout Test (2)
- Two-sided holdout test (used in GMDH)
Outside-test RR (RRA RRB)/2
12Two-sided Holdout Test (3)
- Characteristics
- Better usage of the dataset
- Still highly affected by the partitioning
- Suitable for models/classifiers with high
design-time computation load
13M-fold Cross Validation
- Data partitioning
- Partition the dataset into m fold
- One fold for test, the other folds for training
- Repeat m times
14M-fold Cross Validation (2)
construction
. . .
. . .
m disjoint sets
Model k
evaluation
. . .
Outside test
15M-fold Cross Validation (3)
- Characteristics
- When m2 ? Two-sided holdout test
- When mn ? Leave-one-out cross validation
- The value of m depends on the computation load
imposed by the selected model/classifier.
16Leave-one-out Cross Validation
- Data partitioning
- When mn and Si(xi, yi)
17Leave-one-out Cross Validation (2)
construction
. . .
0 or 100!
. . .
n i/o pairs
Model k
evaluation
. . .
Outside test
18Leave-one-out Cross Validation (3)
- General method for LOOCV
- Perform model construction (as a blackbox) n
times ? Slow! - To speed up the computation LOOCV
- Construct a common part that will be used
repeatedly, such as - Global mean and covariance for QC
- More info of cross-validation on Wikipedia
19Applications and Misuse of CV
- Applications of CV
- Input (feature) selection
- Model complexity determination
- Performance comparison among different models
- Misuse of CV
- Do not try to boost validation RR too much, or
you are running the risk of indirectly training
the left-out data!