Multiple Regression

1 / 21
About This Presentation
Title:

Multiple Regression

Description:

The variance inflation factor for the jth independent (or predictor) variable xj ... Multicollinearity causes problems evaluating the p-values of the model ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 22
Provided by: mcgrawhill

less

Transcript and Presenter's Notes

Title: Multiple Regression


1
Multiple Regression
  • 12.8 Quadratic Terms in Regression Models
  • 12.9 Modelling Interaction (Moderator Variables)
  • 12.12 Model Building and Effects of
    Multicollinearity

2
Use of Quadratic Terms in Regression Models
Part 2
  • One useful form of linear regression is the
    quadratic regression model
  • Assume that we have n observations of x and y
  • The quadratic regression model relating y to x
    isy b0 b1x b2x2 e Where
  • b0 b1x b2x2 is the mean value of the
    dependent variable y when the value of the
    independent variable is x
  • b0, b1, and b2 are unknown regression parameters
    relating the mean value of y to x
  • e is an error term that describes the effects on
    y of all factors other than x and x2

3
More Variables
  • We have only looked at the simple case where we
    have y and x
  • That gave us the following quadratic regression
    model y b0 b1x b2x2 e
  • However, we are not limited to just two terms
  • The following would also be a valid quadratic
    regression model y b0 b1x1 b2x12 b3x2
    b4x3 e

4
12.9 Interaction
  • Multiple regression models often contain
    interaction variables
  • These are variables that are formed by
    multiplying two independent variables together
  • For example, x1x2
  • In this case, the x1x2 variable would appear in
    the model along with both x1 and x2
  • We use interaction variables when the
    relationship between the mean value of y and one
    of the independent variables is dependent on the
    value of another independent variable

5
Interaction Involving Dummy Variables
  • So far, have only considered dummy variables as
    stand-alone variables
  • Model so far isy b0 b1x b2D e Where D
    is dummy variable
  • However, can also look at interaction between
    dummy variable and other variables
  • That model would take the form y b0 b1x
    b2D b3xD e
  • With an interaction term, both the intercept and
    slope are shifted

6
12.12 Model Building and the Effectsof
Multicollinearity
Part 4
Multicollinearity refers to the condition where
the independent variables (or predictors) in a
model are dependent, related, or correlated with
each other Effects Hinders ability to use t
statistics and p-values to assess the relative
importance of predictors Does not hinder ability
to predict the dependent (or response)
variable Detection Scatter Plot
Matrix Correlation Matrix Variance Inflation
Factors (VIF)
7
12.8 Model Building and the Effects of
Multicollinearity
Example The Sale Territory Performance Case
8
Correlation Matrix
Example The Sale Territory Performance Case
9
Variance Inflation Factors (VIF)
The variance inflation factor for the jth
independent (or predictor) variable xj is
where Rj2 is the multiple coefficient of
determination for the regression model relating
xj to the other predictors x1,,xj-1,xj1, xk
Notes VIFj 1 implies xj not related to other
predictors max(VIFj) gt 10 suggest severe
multicollinearity mean(VIFj) substantially
greater than 1 suggests severe multicollinearity
10
Comparing Regression Models on R2,s, Adjusted
R2, and Prediction Interval
  • Multicollinearity causes problems evaluating the
    p-values of the model
  • Therefore, we need to evaluate more than the
    additional importance of each independent
    variable
  • We also need to evaluate how the variables work
    together
  • One way to do this is to determine if the overall
    model gives a high R2 and adjusted R2, a small s,
    and short prediction intervals

11
Comparing Regression Models on R2,s, Adjusted
R2, and Prediction Interval Continued
  • Adding any independent variable will increase R2
  • Even adding an unimportant independent variable
  • Thus, R2 cannot tell us (by decreasing) that
    adding an independent variable is undesirable
  • A better criterion is the size of the standard
    error s
  • If s increases when an independent variable is
    added, we should not add that variable
  • However, decreasing s alone is not enough
  • Adding a variable reduces degrees of freedom and
    that makes the prediction interval for y wider
  • Therefore, an independent variable should only be
    included if it reduces s enough to offset the
    higher t value and reduces the length of the
    desired prediction interval for y

12
Stepwise Regression and BackwardElimination
  • Testing various combinations of variables can be
    tedious
  • In many situations, it is useful to have an
    iterative model selection procedure
  • At each step, a single independent variable is
    added to or deleted from the model
  • The model is then reevaluated
  • This continues until a final model is found
  • There are two such approaches
  • Stepwise regression
  • Backward elimination

13
C Statistic
  • Another quantity for comparing regression models
    is called the C statistic
  • Also known as CP statistic
  • First, obtain mean square error for the model
    containing all p potential independent variables
  • Denoted s2p
  • Next, obtain SSE for a reduced model with k
    independent variables
  • Then C is determined by

14
C Statistic for Comparing Models
  • We want the value of C to be small
  • Adding unimportant independent variables will
    raise the value of C
  • While we want C to be small, we also wish to find
    a model for which C roughly equals k1
  • A model with C substantially greater than k1 has
    substantial bias and is undesirable
  • If a model has a small value of C and C for this
    model is less than k1, then it is not biased and
    the model should be considered desirable

15
Stepwise Regression 1
  • Assume there are p potential independent
    variables
  • Further, assume that p is large
  • Stepwise regression uses t statistics to
    determine the significance of the independent
    variables in various models
  • Stepwise regression needs two alpha values
  • aentry, the probability of a type I error
    related to entering an independent variable into
    the model
  • astay, the probability of a type I error related
    to retaining an independent variable that was
    previously entered into the model

16
Stepwise Regression 2
  • Step 1 The stepwise procedure considers the p
    possible one-independent variable regression
    models
  • Finds the variable with the largest absolute t
    statistic
  • Denoted as x1
  • If x1 is not significant at the aentry level,
    the process terminates by concluding none of the
    independent variables are significant
  • Otherwise, x1 is retained for use in Step 2

17
Stepwise Regression 3
  • Step 2 The stepwise procedure considers the p-1
    possible two-independent variable models of the
    formy b0 b1x1 b2xj e
  • For each new variable, it testsH0 b2 0Ha
    b2 ? 0
  • Pick the variable giving the largest t statistic
  • If resulting variable is significant, checks x1
    against astay to see if it should stay in the
    model
  • This is needed due to multicollinearity

18
Stepwise Regression 4
  • Further steps This adding and checking for
    removal continues until all non-selected
    independent variables are insignificant and will
    not enter model
  • Will also terminate when the variable to be added
    to the model is the one just removed from it

19
Backward Elimination
  • With backwards elimination, we begin with a full
    regression model containing a p potential
    independent variables
  • We then find the one having the smallest t
    statistic
  • If this variable is significant, we stop
  • If this variable is insignificant, it is dropped
    and the regression is rerun with p-1 potential
    independent variables
  • The process continues to remove variables
    one-at-a-time until all the variables are
    significant

20
Stepwise Regression for Sales Territory Case
21
Stepwise Regression for Sales Territory Case
Write a Comment
User Comments (0)