Regression Methods - PowerPoint PPT Presentation

About This Presentation
Title:

Regression Methods

Description:

Title: Data Mining lecture Author: Arno Knobbe Last modified by: Arno Knobbe Created Date: 6/4/1996 5:33:28 PM Document presentation format: Letter Paper (8.5x11 in) – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 14
Provided by: Arno188
Category:

less

Transcript and Presenter's Notes

Title: Regression Methods


1
Regression Methods
2
Linear Regression
  • Simple linear regression (one predictor)
  • Multiple linear regression (multiple predictors)
  • Ordinary Least Squares estimation
  • Lasso regression
  • selects features by settingparameters to 0

3
Coefficient of Determination
  • Indicates how well a model fits the data
  • R2 (R squared)
  • R2 1-SSres/SStot
  • SSres S(yi-fi)2
  • SStot S(yi-y)2
  • between 0 and 1, if least squares model. Bigger
    range if other models are used
  • Explained variance
  • what percentage of the variance is explained by
    the model
  • linear least squares regression R2 r2

4
R Squared
  • visual interpretation of R2

5
Regression Trees
  • Regression variant of decision tree
  • Top-down induction
  • 2 options
  • Constant value in leaf (piecewise constant)
  • Local linear model in leaf (piecewise linear)

6
M5 algorithm (Quinlan, Wang)
  • M5, M5P in Weka (classifiers gt trees gt M5P)
  • Offers both regression trees and model trees
  • Model trees are default
  • -R option (buildRegressionTree) for piecewise
    constant

7
M5 algorithm (Quinlan, Wang)
  • Splitting criterion Standard Deviation Reduction
  • SDR sd(T) S sd(Ti)?Ti/T
  • Stopping criterion
  • Standard deviation below some threshold
    (0.05?sd(D))
  • Too few examples in node (e.g. 4)
  • Pruning (bottom-up)
  • Estimate error (nv)/(n-v)absolute error in
    node
  • n is examples in node, v is parameters in the
    model

8
Binary Splits
  • All splits are binary
  • Numeric as normal
  • Nominal order all values according to average
    (prior to induction)
  • Introduce k-1 indicator variables in this order
  • Example database of skiing slopes
  • avg(color green) 2.5
  • avg(color blue) 3.2
  • avg(color red) 7.7
  • avg(color black) 13.5
  • binary features Green, GreenBlue, GreenBlueRed,

9
Regression tree on Servo dataset (UCI)
10
Model tree on Servo dataset (UCI)
LM1 0.0833 motorB,A 0.0682 screwB,A
0.2215 screwA 0.1315 pgain4,3 0.3163
pgain3 - 0.1254 vgain1,2 0.3864
11
Regression in Cortana
  • Regression a natural setting in Subgroup
    Discovery
  • Local models, no prediction model
  • Subgroups are piecewise constant subsets

12
Subgroup Discover regression
13
Other regression models
  • Functions
  • LinearRegression
  • MultiLayerPerceptron (artificial neural network)
  • SMOreg (Support Vector Machine)
  • Lazy
  • IBK
  • Rules
  • M5Rule (decision list)
Write a Comment
User Comments (0)
About PowerShow.com