Title: Multiple Regression
1Multiple Regression
- 12.8 Quadratic Terms in Regression Models
- 12.9 Modelling Interaction (Moderator Variables)
- 12.12 Model Building and Effects of
Multicollinearity
2Use of Quadratic Terms in Regression Models
Part 2
- One useful form of linear regression is the
quadratic regression model - Assume that we have n observations of x and y
- The quadratic regression model relating y to x
isy b0 b1x b2x2 e Where - b0 b1x b2x2 is the mean value of the
dependent variable y when the value of the
independent variable is x - b0, b1, and b2 are unknown regression parameters
relating the mean value of y to x - e is an error term that describes the effects on
y of all factors other than x and x2
3More Variables
- We have only looked at the simple case where we
have y and x - That gave us the following quadratic regression
model y b0 b1x b2x2 e - However, we are not limited to just two terms
- The following would also be a valid quadratic
regression model y b0 b1x1 b2x12 b3x2
b4x3 e
412.9 Interaction
- Multiple regression models often contain
interaction variables - These are variables that are formed by
multiplying two independent variables together - For example, x1x2
- In this case, the x1x2 variable would appear in
the model along with both x1 and x2 - We use interaction variables when the
relationship between the mean value of y and one
of the independent variables is dependent on the
value of another independent variable
5Interaction Involving Dummy Variables
- So far, have only considered dummy variables as
stand-alone variables - Model so far isy b0 b1x b2D e Where D
is dummy variable - However, can also look at interaction between
dummy variable and other variables - That model would take the form y b0 b1x
b2D b3xD e - With an interaction term, both the intercept and
slope are shifted
612.12 Model Building and the Effectsof
Multicollinearity
Part 4
Multicollinearity refers to the condition where
the independent variables (or predictors) in a
model are dependent, related, or correlated with
each other Effects Hinders ability to use t
statistics and p-values to assess the relative
importance of predictors Does not hinder ability
to predict the dependent (or response)
variable Detection Scatter Plot
Matrix Correlation Matrix Variance Inflation
Factors (VIF)
712.8 Model Building and the Effects of
Multicollinearity
Example The Sale Territory Performance Case
8Correlation Matrix
Example The Sale Territory Performance Case
9Variance Inflation Factors (VIF)
The variance inflation factor for the jth
independent (or predictor) variable xj is
where Rj2 is the multiple coefficient of
determination for the regression model relating
xj to the other predictors x1,,xj-1,xj1, xk
Notes VIFj 1 implies xj not related to other
predictors max(VIFj) gt 10 suggest severe
multicollinearity mean(VIFj) substantially
greater than 1 suggests severe multicollinearity
10Comparing Regression Models on R2,s, Adjusted
R2, and Prediction Interval
- Multicollinearity causes problems evaluating the
p-values of the model - Therefore, we need to evaluate more than the
additional importance of each independent
variable - We also need to evaluate how the variables work
together - One way to do this is to determine if the overall
model gives a high R2 and adjusted R2, a small s,
and short prediction intervals
11Comparing Regression Models on R2,s, Adjusted
R2, and Prediction Interval Continued
- Adding any independent variable will increase R2
- Even adding an unimportant independent variable
- Thus, R2 cannot tell us (by decreasing) that
adding an independent variable is undesirable - A better criterion is the size of the standard
error s - If s increases when an independent variable is
added, we should not add that variable - However, decreasing s alone is not enough
- Adding a variable reduces degrees of freedom and
that makes the prediction interval for y wider - Therefore, an independent variable should only be
included if it reduces s enough to offset the
higher t value and reduces the length of the
desired prediction interval for y
12Stepwise Regression and BackwardElimination
- Testing various combinations of variables can be
tedious - In many situations, it is useful to have an
iterative model selection procedure - At each step, a single independent variable is
added to or deleted from the model - The model is then reevaluated
- This continues until a final model is found
- There are two such approaches
- Stepwise regression
- Backward elimination
13C Statistic
- Another quantity for comparing regression models
is called the C statistic - Also known as CP statistic
- First, obtain mean square error for the model
containing all p potential independent variables - Denoted s2p
- Next, obtain SSE for a reduced model with k
independent variables - Then C is determined by
14C Statistic for Comparing Models
- We want the value of C to be small
- Adding unimportant independent variables will
raise the value of C - While we want C to be small, we also wish to find
a model for which C roughly equals k1 - A model with C substantially greater than k1 has
substantial bias and is undesirable - If a model has a small value of C and C for this
model is less than k1, then it is not biased and
the model should be considered desirable
15Stepwise Regression 1
- Assume there are p potential independent
variables - Further, assume that p is large
- Stepwise regression uses t statistics to
determine the significance of the independent
variables in various models - Stepwise regression needs two alpha values
- aentry, the probability of a type I error
related to entering an independent variable into
the model - astay, the probability of a type I error related
to retaining an independent variable that was
previously entered into the model
16Stepwise Regression 2
- Step 1 The stepwise procedure considers the p
possible one-independent variable regression
models - Finds the variable with the largest absolute t
statistic - Denoted as x1
- If x1 is not significant at the aentry level,
the process terminates by concluding none of the
independent variables are significant - Otherwise, x1 is retained for use in Step 2
17Stepwise Regression 3
- Step 2 The stepwise procedure considers the p-1
possible two-independent variable models of the
formy b0 b1x1 b2xj e - For each new variable, it testsH0 b2 0Ha
b2 ? 0 - Pick the variable giving the largest t statistic
- If resulting variable is significant, checks x1
against astay to see if it should stay in the
model - This is needed due to multicollinearity
18Stepwise Regression 4
- Further steps This adding and checking for
removal continues until all non-selected
independent variables are insignificant and will
not enter model - Will also terminate when the variable to be added
to the model is the one just removed from it
19Backward Elimination
- With backwards elimination, we begin with a full
regression model containing a p potential
independent variables - We then find the one having the smallest t
statistic - If this variable is significant, we stop
- If this variable is insignificant, it is dropped
and the regression is rerun with p-1 potential
independent variables - The process continues to remove variables
one-at-a-time until all the variables are
significant
20Stepwise Regression for Sales Territory Case
21Stepwise Regression for Sales Territory Case