Title: Formal Evaluation Techniques
1Formal Evaluation Techniques
27.1 What Should Be Evaluated?
- Supervised Model
- Training Data
- Attributes
- Model Builder
- Parameters
- Test Set Evaluation
3(No Transcript)
47.2 Tools for Evaluation
5Single-Valued Summary Statistics
- Mean
- Variance
- Standard deviation
6The Normal Distribution
7(No Transcript)
8Normal Distributions and Sample Means
- A distribution of means taken from random sets
of independent samples of equal size are
distributed normally. - Any sample mean will vary less than two
standard errors from the population mean 95 of
the time.
9Computing the Standard Error
- The population variance is estimated by
dividing the sample variance by the sample
size. - The standard error is computed by taking the
square root of the estimated population
variance.
10(No Transcript)
11A Classical Model for Hypothesis Testing
12(No Transcript)
137.3 Computing Test Set Confidence Intervals
14Computing 95 Test Set Confidence Intervals for
Whole Population
- Given a test set sample S of size n and error
rate E - Compute sample variance as V E(1-E)
- Compute the standard error (SE) as the square
root of V divided by n. - Calculate an upper bound error as E 2(SE)
- Calculate a lower bound error as E - 2(SE)
15- E 10, E0.1, n100
- Variance 0.1(1-0.1) 0.09
- SE(0.09/100)1/20.03
- We can be 90 confident that the actual test set
error rate lies somewhere between 2SE above and
2SE below 0.1. The actual TSER is between 0.04
and 0.16. - Test set accuracy is between 84 and 96
16- If number of instances is incereased, the size of
confidence (test set accuracy) is decreased. - If n1000,
- SE0.005
- Test set accuracy is between 88 and 92
17Cross validation
- If test data size is small, apply
cross-validation - Cross-validation
- Avalilable data is partitioned into n equal-size
units - For ith unit where i1,..,n
- n-1 units used for training, nth used for test,
with average accuracy ai - Model accuracy average(ai)
18bootstrapping
- Let training set selection process choose the
same training instance several times - Select n items from among n items with duplicates
- Training set contains approx 2/3 of the n
instance, after n times of selection - 1/3 used for testing
197.4 Comparing Supervised Learner Models
20- Null Hypothesis There is no significant
different in the test set error rate of two
supervised learner model built with the same
training data.
21Comparing Models with Independent Test Data
- where
- E1 The error rate for model M1
- E2 The error rate for model M2
- q (E1 E2)/2
- n1 the number of instances in test set A
- n2 the number of instances in test set B
- q(1-q) variance
- Â
22Comparing Models with a Single Test Dataset
- where
- E1 The error rate for model M1
- E2 The error rate for model M2
- q (E1 E2)/2
- n the number of test set instances
- Â
23Comparing Models with a Single Test Dataset
Example
- Test M1 with A, M2 with B, 100 instance each. M1
has 80 accuracy, M2 has 70. - We wish to know if M1 has performed significantly
better than M2. - E10.2, E20.3, q0.25, combined variance
q(1-q)0.1875, - P1.633, no sifnificant diffference between M1
and M2
247.5 Attribute Evaluation
25Locating Redundant Attributes with Excel
- Correlation Coefficient
- Positive Correlation
- Negative Correlation
- Curvilinear Relationship
26(No Transcript)
27(No Transcript)
28(No Transcript)
29Creating a Scatterplot Diagram with MS Excel
30(No Transcript)
31Hypothesis Testing for Numerical Attribute
Significance
32(No Transcript)
337.6 Unsupervised Evaluation Techniques
- Unsupervised Clustering for Supervised
Evaluation - Supervised Evaluation for Unsupervised
Clustering - Additional Methods
347.7 Evaluating Supervised Models with Numeric
Output
35Mean Squared Error
- where for the ith instance,
- ai actual output value
- ci computed output value
- Â
- Â
36Mean Absolute Error
- where for the ith instance,
- ai actual output value
- ci computed output value
- Â
- Â
37(No Transcript)