Title: Machine Learning in Practice Lecture 9
1Machine Learning in PracticeLecture 9
- Carolyn Penstein Rosé
- Language Technologies Institute/ Human-Computer
Interaction Institute
2Plan for the Day
- Announcements
- Questions?
- Assignment 4
- Quiz
- Todays Data Set Speaker Identification
- Weka helpful hints
- Visualizing Errors for Regression Problems
- Alternative forms of cross-validation
- Creating Train/Test Pairs
- Intro to Evaluation
3Speaker Identification
4Todays Data Set Speaker Identification
5Preprocessing Speech
Record speech to WAV files. Extract a variety of
acoustic and prosodic features.
6Predictions which algorithm will perform better?
- What previous data set does this remind you of?
7Notice Ranges and Contingencies
8Most Predictive Feature
9Least Predictive Feature
10What would 1R do?
11What would 1R do?
.16 Kappa
12Weka Helpful Hints
13Evaluating Numeric Prediction CPU data
14Visualizing Classifier Errors for Numeric
Prediction
15Creating Train/Test Pairs
First click here
16Creating Train/Test Pairs
If you pick unsupervised, youll get
non-stratified folds, otherwise youll get
stratified folds.
17Stratified versus Non-Stratified
- Wekas standard cross-validation is stratified
- Data is randomized before dividing it into folds
- Preserves distribution of class values across
folds - Reduces variance in performance
- Unstratified cross-validation means there is no
randomization - Order is preserved
- Advantage for matching predictions with instances
in Weka
18Stratified versus Non-Stratified
- Leave-one-out cross validation
- Train on all but one instance
- Iterate over all instances
- Extreme version of unstratified cross-validation
- If test set only has one instance, the
distribution of class values cannot be preserved - Maximizes amount of data used for training on
each fold
19Stratified versus Non-Stratified
- Leave-one-subpopulation out
- If you have several data points from the same
subpopulation - Speech data from the same speaker
- May have data from same subpopulation in train
and test - over-estimates overlap between train and test
- When is this not a problem?
- You can manually make sure that wont happen
- You have to do that by hand
20Creating Train/Test Pairs
If you pick unsupervised, youll get
non-stratified folds, otherwise youll get
stratified folds.
21Creating Train/Test Pairs
Now click here
22Creating Train/Test Pairs
23Creating Train/Test Pairs
24Creating Train/Test Pairs
25Creating Train/Test Pairs
If youre doing Stratified, make sure you have to
class attribute selected here.
26Creating Train/Test Pairs
27Creating Train/Test Pairs
28Creating Train/Test Pairs
29Doing Manual Train/Test
First load the training data on the Preprocess
tab
30Doing Manual Train/Test
Now select Supplied Test Set as the Test Option
31Doing Manual Train/Test
Then Click Set
32Doing Manual Train/Test
Next Load the Test set
33Doing Manual Train/Test
Then youre all set, so click on Start
34Evaluation Methodology
35Intro to Chapter 5
- Many techniques illustrated in Chapter 5 (ROC
curves, recall-precision curves) dont show up in
applied papers - They are useful for showing trade-offs between
properties of different algorithms - You see them in theoretical machine learning
papers
36Intro to Chapter 5
- Still important to understand what they represent
- The thinking behind the techniques will show up
in your papers - You need to know what your numbers do and dont
demonstrate - They give you a unified framework for thinking
about machine learning techniques - There is no cookie cutter for a good evaluation
37Confidence Intervals
- Mainly important if there is some question about
whether your data set is big enough - You average your performance over 10 folds, but
how certain can you be that the number you got is
correct? - We saw before that performance varies from fold
to fold
(
)
0
10
20
30
40
38Confidence Intervals
- We know that the distribution of categories found
in the training set and in the testing set
affects the performance - Performance on two different sets will not be the
same - Confidence intervals allow us to say that the
probability of the real performance value being
within a certain range from the observed value is
90
(
)
0
10
20
30
40
39Confidence Intervals
- Confidence limits come from the normal
distribution - Computed in terms of number of standard
deviations from the mean - If the data is normally distributed, there is a
15 chance of the real value being more than 1
standard deviation above the mean
40What is a significance test?
- How likely is it that the difference you see
occurred by chance? - How could the difference occur by chance?
(
(
)
)
0
10
20
30
40
If the mean of one distribution is within
the confidence interval of another, the
difference you observe could be by chance.
If you want plt.05, you need the 90 confidence
intervals. Find the corresponding Z scores from
a standard normal distribution table.
41Computing Confidence Intervals
- 90 confidence interval corresponds to z1.65
- 5 chance that a data point will occur to the
right of the rightmost edge of the interval - f percentage of successes
- N number of trials
- p (f z2/2N or- z(squrt(f/N f2/N
z2/4N2)))/(1 z2/N) - f75, N1000, c90 -gt 0.727,0.773
42Significance Tests
- If you want to know whether the difference in
performance between Approach A and Approach B is
significant - Get performance numbers for A and B on each fold
of a 10-fold cross validation - You can use the Experimenter or you can do the
computation in Excel or Minitab - If you use exactly the same folds across
approaches you can use a paired t-test rather
than an unpaired t-test
43Significance Tests
- Dont forget that you can get a significant
result by chance! - The Experimenter corrects for multiple
comparisons - Significance tests are less important if you have
a large amount of data and the difference in
performance between approaches is large
44Using the Experimenter
45Using the Experimenter
46Using the Experimenter
47Using the Experimenter
48Using the Experimenter
49Using the Experimenter
50Using the Experimenter
51Using the Experimenter
You should add Naïve Bayes, SMO, and J48
52Using the Experimenter
53Using the Experimenter
Click on Start
54Using the Experimenter
When its done, Click on Analyze
55Using the Experimenter
Click File to load the results file you saved
56Using the Experimenter
57Do Analysis
Explicitly select default settings here Then
select Kappa Here Then select Perform Test
58Do Analysis
Base case is what you are comparing with
59(No Transcript)
60CSV Output
61Analyze with Minitab
62More Complex Statistical Analyses
63Take Home Message
- We focused on practical, methodological aspects
of the topic of Evaluation - We talked about the concept of a confidence
interval and significance tests - We learned how to create Train/Test pairs for
manual cross-validation, which is useful for
preparing for an error analysis - We also learned how to use the Experimenter to do
experiments and run significance tests