MSIS 563: Chapter 7 Numerical Prediction - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

MSIS 563: Chapter 7 Numerical Prediction

Description:

Relative absolute error 7.8325 % Root relative squared error 8.0595 ... Relative absolute error 7.9322 % Root relative squared error 7.9424 ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 21
Provided by: debabr
Category:

less

Transcript and Presenter's Notes

Title: MSIS 563: Chapter 7 Numerical Prediction


1
MSIS 563 Chapter 7Numerical Prediction
  • Deb Dey
  • Professor and McCabe Fellow of Information
    Systems
  • Faculty Director, MSIS Program, UW Business School

2
Why Numerical Prediction
  • Traditional classifiers assume a discrete goal
    variable
  • Cannot use them on data sets with continuous goal
    variables directly
  • Of course, the goal variable can be discretized
  • However, it may not be appropriate in some
    situations
  • Numerical prediction techniques can return a
    numerical value for the goal variable
  • Prediction related to economic growth, weather,
    market condition

3
Linear Regression
  • Fit a multi-dimensional line
  • The goal (G) is the dependent variable
  • Must be numeric/continuous
  • Features (Ai) are independent variables, i1,2,K
  • Can be continuous or discrete
  • We will start with only numeric features
  • The challenge is the find the wis in such a way
    that the total error (squared) is minimized
  • Error is the difference between the observed
    value and the predicted value

4
One-dimensional Linear Regression
  • Given a set of (X,Y) observations
  • Fit a line YabX
  • Of course, observations would have a random error
  • ? Yiobs Yiest
  • Pick a and b such that ??2 is minimized

5
Example
As is the case with all prediction techniques,
you should be ready to accept some error. The aim
is to keep the error to a minimum.
6
Multi-dimensional Regression
  • The formulae for the weights are quite
    difficult
  • Require complex matrix notation
  • In principle, we can estimate the weights
  • Estimate the square of error for each row, add
    them up, and minimize the total
  • Can be done easily in MS Excel using Solver
  • Most statistical and data mining packages would
    support a regression model
  • Once a line is fitted
  • We can use it for prediction of test cases
  • Need a measure of accuracy for testing

7
Example
WEKA Result (linear1.arff) Linear Regression
Model G 0.5611 A1 2.5118 A2
2.9021 A3 3.461 A4
3.0972 Evaluation on training set
Summary Correlation coefficient 0.9967 Mean
absolute error 0.306 Root mean squared
error 0.3866 Relative absolute error 7.8325
Root relative squared error 8.0595 Total
Number of Instances 20
Mean squared error 2.9896/20 0.14948 Root
mean squared error (0.14948)0.5 0.3866
8
Multi-Dimensional Regression(Discrete Feature)
9
Making a Discrete Feature Numeric
  • This is exactly an opposite issue
  • Useful, for example, in linear regression
  • Two techniques
  • Converting a value to a binary (0/1) feature
  • Consider a discrete feature F with 3 (n) values
    high, medium, and normal
  • Create 2 (n1) new features as F-High and
    F-medium
  • F-Normal is not needed (it is dependent on the
    other two)
  • Assign the values of 0 or 1 based on original
    values of F
  • Replacing discrete value by a numeric one
  • Useful if there is a natural order among the
    values
  • Order the values and assign a numeric equivalent

10
Multi-Dimensional Regression(Discrete Feature)
WEKA Result (linear2.arff) Linear Regression
Model G 0.408 A1medium,high
0.7046 A1high 2.5086 A2
2.9217 A3 3.4493 A4
3.6965 Evaluation on training set
Summary Correlation coefficient 0.9968 Mean
absolute error 0.3099 Root mean squared
error 0.381 Relative absolute error 7.9322
Root relative squared error 7.9424 Total
Number of Instances 20
11
Testing and Validation
  • Models making numerical predictions should also
    be tested
  • Partitioning of data into training and testing
  • Same as before (see notes for Ch. 4)
  • Evaluation criteria
  • Cannot use previous measures such as accuracy,
    stratified accuracy, or RIS
  • Need to measure the distance between the actual
    (observed) values (ai) and the predicted
    (estimated) ones (pi) for test case i

12
Performance Measures(Numeric Predictions for n
test cases)
  • Mean-squared error
  • Root mean-squared error
  • Mean absolute error
  • Relative squared error
  • Root relative squared error
  • Relative absolute error
  • Correlation coefficient

13
Comments on Linear Regression
  • Easy to build and use
  • Often works well in real-world situations
  • Limitations
  • The treatment of discrete features is not natural
  • The conversion process can be tedious, especially
    if there are many discrete features each with
    several distinct values
  • If the relationship is not linear, the prediction
    can be quite bad
  • Of course, one can do non-linear regression, but
    the number of alternative functional forms is
    often too large to perform a meaningful search

14
Outliers and Robust Regression
  • Outliers are often noisy elements and can
    seriously change the result
  • Demo
  • Possible ways of making the regression more
    robust
  • Minimize absolute error instead of squared error
  • Remove outliers (i.e. 10 of points farthest from
    the regression plane)
  • Minimize median instead of mean of squares
  • Note We have been minimizing the sum of squares
    which is equivalent to minimizing the mean of
    squares
  • Finds narrowest strip covering half the
    observations

15
Least Median of Square
16
Example
WEKA Result (linear1.arff) Linear Regression
Model G 0.6126 A1 2.1711 A2
2.7891 A3 3.4815 A4
3.863 Evaluation on training set
Summary Correlation coefficient 0.9964 Mean
absolute error 0.3006 Root mean squared
error 0.4471 Relative absolute error 7.6938
Root relative squared error 9.3203 Total
Number of Instances 20
Solver does slightly worse than WEKA!
17
Regression Trees
  • Accounts for the problems associated with linear
    regression
  • The overall goal may be a non-linear function of
    the features, but it may be locally (piece-wise)
    linear in the features
  • Design
  • Build a tree
  • Similar to decision trees
  • With each leaf node, associate a linear
    regression to get the value

18
Building a Regression Tree
  • Branching (picking a feature)
  • Branching is done based on standard deviation
    reduction (SDR) of the goal
  • Pick the one that has the highest SDR
  • Let T be the set of goal values, and let Ti be
    the set of goal values along the i-th partition
    (split) according to the chosen feature
  • Stopping
  • When there is no more attribute to split
  • When the maximum SDR is below a threshold
  • Typical threshold x of the original sd x5,10

19
Example
20
Missing Value Compensation
Here, m is the total number of instances without
missing values
Note The denominator is m, and not T as given in
the textbook (p. 204).
Write a Comment
User Comments (0)
About PowerShow.com