Machine Learning in Practice Lecture 9 - PowerPoint PPT Presentation

1 / 63

About This Presentation

Title:

Machine Learning in Practice Lecture 9

Description:

Title: Slide 1 Author: cprose Last modified by: cprose Created Date: 5/31/2005 2:02:24 AM Document presentation format: On-screen Show (4:3) Company – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 64

Provided by: cpr1

Category:

more less

Transcript and Presenter's Notes

Title: Machine Learning in Practice Lecture 9

1
Machine Learning in PracticeLecture 9

Carolyn Penstein Rosé
Language Technologies Institute/ Human-Computer
Interaction Institute

2
Plan for the Day

Announcements
Questions?
Assignment 4
Quiz
Todays Data Set Speaker Identification
Weka helpful hints
Visualizing Errors for Regression Problems
Alternative forms of cross-validation
Creating Train/Test Pairs
Intro to Evaluation

3
Speaker Identification
4
Todays Data Set Speaker Identification
5
Preprocessing Speech
Record speech to WAV files. Extract a variety of
acoustic and prosodic features.
6
Predictions which algorithm will perform better?

What previous data set does this remind you of?

7
Notice Ranges and Contingencies
8
Most Predictive Feature
9
Least Predictive Feature
10
What would 1R do?
11
What would 1R do?
.16 Kappa
12
Weka Helpful Hints
13
Evaluating Numeric Prediction CPU data
14
Visualizing Classifier Errors for Numeric
Prediction
15
Creating Train/Test Pairs
First click here
16
Creating Train/Test Pairs
If you pick unsupervised, youll get
non-stratified folds, otherwise youll get
stratified folds.
17
Stratified versus Non-Stratified

Wekas standard cross-validation is stratified
Data is randomized before dividing it into folds
Preserves distribution of class values across
folds
Reduces variance in performance
Unstratified cross-validation means there is no
randomization
Order is preserved
Advantage for matching predictions with instances
in Weka

18
Stratified versus Non-Stratified

Leave-one-out cross validation
Train on all but one instance
Iterate over all instances
Extreme version of unstratified cross-validation
If test set only has one instance, the
distribution of class values cannot be preserved
Maximizes amount of data used for training on
each fold

19
Stratified versus Non-Stratified

Leave-one-subpopulation out
If you have several data points from the same
subpopulation
Speech data from the same speaker
May have data from same subpopulation in train
and test
over-estimates overlap between train and test
When is this not a problem?
You can manually make sure that wont happen
You have to do that by hand

20
Creating Train/Test Pairs
If you pick unsupervised, youll get
non-stratified folds, otherwise youll get
stratified folds.
21
Creating Train/Test Pairs
Now click here
22
Creating Train/Test Pairs
23
Creating Train/Test Pairs
24
Creating Train/Test Pairs
25
Creating Train/Test Pairs
If youre doing Stratified, make sure you have to
class attribute selected here.
26
Creating Train/Test Pairs
27
Creating Train/Test Pairs
28
Creating Train/Test Pairs
29
Doing Manual Train/Test
First load the training data on the Preprocess
tab
30
Doing Manual Train/Test
Now select Supplied Test Set as the Test Option
31
Doing Manual Train/Test
Then Click Set
32
Doing Manual Train/Test
Next Load the Test set
33
Doing Manual Train/Test
Then youre all set, so click on Start
34
Evaluation Methodology
35
Intro to Chapter 5

Many techniques illustrated in Chapter 5 (ROC
curves, recall-precision curves) dont show up in
applied papers
They are useful for showing trade-offs between
properties of different algorithms
You see them in theoretical machine learning
papers

36
Intro to Chapter 5

Still important to understand what they represent
The thinking behind the techniques will show up
in your papers
You need to know what your numbers do and dont
demonstrate
They give you a unified framework for thinking
about machine learning techniques
There is no cookie cutter for a good evaluation

37
Confidence Intervals

Mainly important if there is some question about
whether your data set is big enough
You average your performance over 10 folds, but
how certain can you be that the number you got is
correct?
We saw before that performance varies from fold
to fold

(
)
0
10
20
30
40
38
Confidence Intervals

We know that the distribution of categories found
in the training set and in the testing set
affects the performance
Performance on two different sets will not be the
same
Confidence intervals allow us to say that the
probability of the real performance value being
within a certain range from the observed value is
90

(
)
0
10
20
30
40
39
Confidence Intervals

Confidence limits come from the normal
distribution
Computed in terms of number of standard
deviations from the mean
If the data is normally distributed, there is a
15 chance of the real value being more than 1
standard deviation above the mean

40
What is a significance test?

How likely is it that the difference you see
occurred by chance?
How could the difference occur by chance?

(
(
)
)
0
10
20
30
40
If the mean of one distribution is within
the confidence interval of another, the
difference you observe could be by chance.
If you want plt.05, you need the 90 confidence
intervals. Find the corresponding Z scores from
a standard normal distribution table.
41
Computing Confidence Intervals

90 confidence interval corresponds to z1.65
5 chance that a data point will occur to the
right of the rightmost edge of the interval
f percentage of successes
N number of trials
p (f z2/2N or- z(squrt(f/N f2/N
z2/4N2)))/(1 z2/N)
f75, N1000, c90 -gt 0.727,0.773

42
Significance Tests

If you want to know whether the difference in
performance between Approach A and Approach B is
significant
Get performance numbers for A and B on each fold
of a 10-fold cross validation
You can use the Experimenter or you can do the
computation in Excel or Minitab
If you use exactly the same folds across
approaches you can use a paired t-test rather
than an unpaired t-test

43
Significance Tests

Dont forget that you can get a significant
result by chance!
The Experimenter corrects for multiple
comparisons
Significance tests are less important if you have
a large amount of data and the difference in
performance between approaches is large

44
Using the Experimenter
45
Using the Experimenter
46
Using the Experimenter
47
Using the Experimenter
48
Using the Experimenter
49
Using the Experimenter
50
Using the Experimenter
51
Using the Experimenter
You should add Naïve Bayes, SMO, and J48
52
Using the Experimenter
53
Using the Experimenter
Click on Start
54
Using the Experimenter
When its done, Click on Analyze
55
Using the Experimenter
Click File to load the results file you saved
56
Using the Experimenter
57
Do Analysis
Explicitly select default settings here Then
select Kappa Here Then select Perform Test
58
Do Analysis
Base case is what you are comparing with
59
(No Transcript)
60
CSV Output
61
Analyze with Minitab
62
More Complex Statistical Analyses
63
Take Home Message

We focused on practical, methodological aspects
of the topic of Evaluation
We talked about the concept of a confidence
interval and significance tests
We learned how to create Train/Test pairs for
manual cross-validation, which is useful for
preparing for an error analysis
We also learned how to use the Experimenter to do
experiments and run significance tests

Write a Comment

User Comments (0)