MSIS 563: Chapter 7 Numerical Prediction - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

MSIS 563: Chapter 7 Numerical Prediction

Description:

Relative absolute error 7.8325 % Root relative squared error 8.0595 ... Relative absolute error 7.9322 % Root relative squared error 7.9424 ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 21

Provided by: debabr

Category:

more less

Transcript and Presenter's Notes

Title: MSIS 563: Chapter 7 Numerical Prediction

1
MSIS 563 Chapter 7Numerical Prediction

Deb Dey
Professor and McCabe Fellow of Information
Systems
Faculty Director, MSIS Program, UW Business School

2
Why Numerical Prediction

Traditional classifiers assume a discrete goal
variable
Cannot use them on data sets with continuous goal
variables directly
Of course, the goal variable can be discretized
However, it may not be appropriate in some
situations
Numerical prediction techniques can return a
numerical value for the goal variable
Prediction related to economic growth, weather,
market condition

3
Linear Regression

Fit a multi-dimensional line
The goal (G) is the dependent variable
Must be numeric/continuous
Features (Ai) are independent variables, i1,2,K
Can be continuous or discrete
We will start with only numeric features

The challenge is the find the wis in such a way
that the total error (squared) is minimized
Error is the difference between the observed
value and the predicted value

4
One-dimensional Linear Regression

Given a set of (X,Y) observations
Fit a line YabX
Of course, observations would have a random error
? Yiobs Yiest
Pick a and b such that ??2 is minimized

5
Example
As is the case with all prediction techniques,
you should be ready to accept some error. The aim
is to keep the error to a minimum.
6
Multi-dimensional Regression

The formulae for the weights are quite
difficult
Require complex matrix notation
In principle, we can estimate the weights
Estimate the square of error for each row, add
them up, and minimize the total
Can be done easily in MS Excel using Solver
Most statistical and data mining packages would
support a regression model
Once a line is fitted
We can use it for prediction of test cases
Need a measure of accuracy for testing

7
Example
WEKA Result (linear1.arff) Linear Regression
Model G 0.5611 A1 2.5118 A2
2.9021 A3 3.461 A4
3.0972 Evaluation on training set
Summary Correlation coefficient 0.9967 Mean
absolute error 0.306 Root mean squared
error 0.3866 Relative absolute error 7.8325
Root relative squared error 8.0595 Total
Number of Instances 20
Mean squared error 2.9896/20 0.14948 Root
mean squared error (0.14948)0.5 0.3866
8
Multi-Dimensional Regression(Discrete Feature)
9
Making a Discrete Feature Numeric

This is exactly an opposite issue
Useful, for example, in linear regression
Two techniques
Converting a value to a binary (0/1) feature
Consider a discrete feature F with 3 (n) values
high, medium, and normal
Create 2 (n1) new features as F-High and
F-medium
F-Normal is not needed (it is dependent on the
other two)
Assign the values of 0 or 1 based on original
values of F
Replacing discrete value by a numeric one
Useful if there is a natural order among the
values
Order the values and assign a numeric equivalent

10
Multi-Dimensional Regression(Discrete Feature)
WEKA Result (linear2.arff) Linear Regression
Model G 0.408 A1medium,high
0.7046 A1high 2.5086 A2
2.9217 A3 3.4493 A4
3.6965 Evaluation on training set
Summary Correlation coefficient 0.9968 Mean
absolute error 0.3099 Root mean squared
error 0.381 Relative absolute error 7.9322
Root relative squared error 7.9424 Total
Number of Instances 20
11
Testing and Validation

Models making numerical predictions should also
be tested
Partitioning of data into training and testing
Same as before (see notes for Ch. 4)
Evaluation criteria
Cannot use previous measures such as accuracy,
stratified accuracy, or RIS
Need to measure the distance between the actual
(observed) values (ai) and the predicted
(estimated) ones (pi) for test case i

12
Performance Measures(Numeric Predictions for n
test cases)

Mean-squared error
Root mean-squared error
Mean absolute error
Relative squared error

Root relative squared error
Relative absolute error
Correlation coefficient

13
Comments on Linear Regression

Easy to build and use
Often works well in real-world situations
Limitations
The treatment of discrete features is not natural
The conversion process can be tedious, especially
if there are many discrete features each with
several distinct values
If the relationship is not linear, the prediction
can be quite bad
Of course, one can do non-linear regression, but
the number of alternative functional forms is
often too large to perform a meaningful search

14
Outliers and Robust Regression

Outliers are often noisy elements and can
seriously change the result
Demo
Possible ways of making the regression more
robust
Minimize absolute error instead of squared error
Remove outliers (i.e. 10 of points farthest from
the regression plane)
Minimize median instead of mean of squares
Note We have been minimizing the sum of squares
which is equivalent to minimizing the mean of
squares
Finds narrowest strip covering half the
observations

15
Least Median of Square
16
Example
WEKA Result (linear1.arff) Linear Regression
Model G 0.6126 A1 2.1711 A2
2.7891 A3 3.4815 A4
3.863 Evaluation on training set
Summary Correlation coefficient 0.9964 Mean
absolute error 0.3006 Root mean squared
error 0.4471 Relative absolute error 7.6938
Root relative squared error 9.3203 Total
Number of Instances 20
Solver does slightly worse than WEKA!
17
Regression Trees

Accounts for the problems associated with linear
regression
The overall goal may be a non-linear function of
the features, but it may be locally (piece-wise)
linear in the features
Design
Build a tree
Similar to decision trees
With each leaf node, associate a linear
regression to get the value

18
Building a Regression Tree

Branching (picking a feature)
Branching is done based on standard deviation
reduction (SDR) of the goal
Pick the one that has the highest SDR
Let T be the set of goal values, and let Ti be
the set of goal values along the i-th partition
(split) according to the chosen feature
Stopping
When there is no more attribute to split
When the maximum SDR is below a threshold
Typical threshold x of the original sd x5,10

19
Example
20
Missing Value Compensation
Here, m is the total number of instances without
missing values
Note The denominator is m, and not T as given in
the textbook (p. 204).

Write a Comment

User Comments (0)