Title: Multiple Regression
1Multiple Regression
- Selecting the Best Equation
2Techniques for Selecting the "Best" Regression
Equation
- The best Regression equation is not necessarily
the equation that explains most of the variance
in Y (the highest R2). - This equation will be the one with all the
variables included. - The best equation should also be simple and
interpretable. (i.e. contain a small no. of
variables). - Simple (interpretable) Reliable - opposing
criteria. - The best equation is a compromise between these
two.
3- We will discuss several strategies for selecting
the best equation - Â
- All Possible Regressions
- Uses R2, s2, Mallows Cp
- Â Cp RSSp/s2complete - n-2(p1)
- "Best Subset" Regression
- Uses R2,Ra2, Mallows Cp
- Backward Elimination
- Stepwise Regression
4An Example
- In this example the following four chemicals are
measured - X1 amount of tricalcium aluminate, 3 CaO -
Al2O3 - X2 amount of tricalcium silicate, 3 CaO - SiO2
- X3 amount of tetracalcium alumino ferrite, 4
CaO - Al2O3 - Fe2O3 - X4 amount of dicalcium silicate, 2 CaO - SiO2
- Y heat evolved in calories per gram of
cement.
5The data is given below
6I All Possible Regressions
- Suppose we have the p independent variables X1,
X2, ..., Xp. - Then there are 2p subsets of variables
7- Variables in Equation Model
- no variables Y b0 e
- X1 Y b0 b1 X1 e
- X2 Y b0 b2 X2 e
- X3 Y b0 b3 X3 e
- X1, X2 Y b0 b1 X1 b2 X2 e
- X1, X3 Y b0 b1 X1 b3 X3 e
- X2, X3 Y b0 b2 X2 b3 X3 e and
- X1, X2, X3 Y b0 b1 X1 b2 X2 b2 X3 e
8- Use of R2
- 1. Assume we carry out 2p runs for each of the
subsets. - Divide the Runs into the following sets
- Set 0 No variables
- Set 1 One independent variable.
- ...
- Set p p independent variables.
- 2. Order the runs in each set according to R2.
- 3. Examine the leaders in each run looking for
consistent patterns - - take into account correlation between
independent variables.
9- Example (k4) X1, X2, X3, X4
- Variables in for leading runs 100 R2
- Set 1 X4. 67.5
- Set 2 X1, X2. 97.9
- X1, X4 97.2
- Set 3 X1, X2, X4. 98.234
- Set 4 X1, X2, X3, X4. 98.237
-
- Â
- Examination of the correlation coefficients
reveals a high correlation between X1, X3 (r13
-0.824) and between X2, X4 (r24 -0.973). - Â
- Best Equation Y b0 b1 X1 b4 X4 e
10Use of R2
Number of variables required, p, coincides with
where R2 begins to level out
11- Use of the Residual Mean Square (RMS) (s2)
- When all of the variables having a non-zero
effect have been included in the mode then the
residual mean square is an estimate of s2. - If "significant" variables have been left out
then RMS will be biased upward.
12- No. of Variables
- p RMS s2(p) Average s2(p)
- 1 115.06, 82.39,1176.31, 80.35 113.53
- 2 5.79,122.71,7.48,86.59.17.57 47.00
- 3 5.35, 5.33, 5.65, 8.20 6.13
- 4 5.98 5.98
-
- - run X1, X2 - run X1, X4 s2-
approximately 6.
13Use of s2
Number of variables required, p, coincides with
where s2 levels out
14- Use of Mallows Cp
- If the equation with p variables is adequate then
both s2complete and RSSp/(n-p-1) will be
estimating s2. - If "significant" variables have been left out
then RMS will be biased upward.
15- Then
- Thus if we plot, for each run, Cp vs p and look
for Cp close to p 1 then we will be able to
identify models giving a reasonable fit.
16- Run Cp p 1
- no variables 443.2 1
- Â
- 1,2,3,4 202.5, 142.5, 315.2, 138.7 2
- Â
- 12,13,14 2.7, 198.1, 5.5 3
- 23,24,34 62.4, 138.2, 22.4
- Â
- 123,124,134,234 3.0, 3.0, 3.5, 7.5 4
- Â
- 1234 5.0 5
-
17Use of Cp
Cp
p
Number of variables required, p, coincides with
where Cp becomes close to p 1
18II "Best Subset" Regression
- Similar to all possible regressions.
- If p, the number of variables, is large then the
number of runs , 2p, performed could be extremely
large. - In this algorithm the user supplies the value K
and the algorithm identifies the best K subsets
of X1, X2, ..., Xp for predicting Y.
19III Backward Elimination
- In this procedure the complete regression
equation is determined containing all the
variables - X1, X2, ..., Xp.
- Then variables are checked one at a time and the
least significant is dropped from the model at
each stage.
- The procedure is terminated when all of the
variables remaining in the equation provide a
significant contribution to the prediction of the
dependent variable Y.
20- The precise algorithm proceeds as follows
- Fit a regression equation containing all
variables in the equation.
21- 2. A partial F-test is computed for each of the
independent variables still in the equation. -
-
- Â
The Partial F statistic Â
where RSS1 the residual sum of squares with
all variables that are presently in the equation,
RSS2 the residual sum of squares with on of
the variables removed, and MSE1 the Mean
Square for Error with all variables that are
presently in the equation.
22- 3. The lowest partial F value is compared with Fa
for some pre-specified a .
If FLowest ? Fa then remove that variable and
return to step 2.
If FLowest gt Fa then accept the equation as it
stands.
23- Example (k4) (same example as before) X1,
X2, X3, X4
1. X1, X2, X3, X4 in the equation.
The lowest partial F 0.018 (X3) is compared
with Fa(1,8) 3.46 for a 0.01.
Remove X3.
24- 2. X1, X2, X4 in the equation.
The lowest partial F 1.86 (X4) is compared with
Fa(1,9) 3.36 for a 0.01.
Remove X4.
253. X1, X2 in the equation.
- Partial F for both variables X1 and X2 exceed
Fa(1,10) 3.36 for a 0.01.
Equation is accepted as it stands.
Y 52.58 1.47 X1 0.66 X2
Note F to Remove partial F.
26IV Stepwise Regression
- In this procedure the regression equation is
determined containing no variables in the model.
- Variables are then checked one at a time using
the partial correlation coefficient as a measure
of importance in predicting the dependent
variable Y.
- At each stage the variable with the highest
significant partial correlation coefficient is
added to the model.
- Once this has been done the partial F statistic
is computed for all variables now in the model is
computed to check if any of the variables
previously added can now be deleted.
27- This procedure is continued until no further
variables can be added or deleted from the model.
- The partial correlation coefficient for a given
variable is the correlation between the given
variable and the response when the present
independent variables in the equation are held
fixed.
- It is also the correlation between the given
variable and the residuals computed from fitting
an equation with the present independent
variables in the equation.
28- Example (k4) (same example as before) X1,
X2, X3, X4
1. With no variables in the equation.
The correlation of each independent variable with
the dependent variable Y is computed.
The highest significant correlation ( r
-0.821) is with variable X4.
Thus the decision is made to include X4.
Regress Y with X4
-significant thus we keep X4.
29- Compute partial correlation coefficients of Y
with all other independent variables given X4 in
the equation.
The highest partial correlation is with the
variable X1. ( rY1.42 0.915).
Thus the decision is made to include X1.
30Regress Y with X1, X4. R2 0.972 , F 176.63
. Â
Check to see if variables in the equation can be
eliminated Â
For X1 the partial F value 108.22 (F0.10(1,8)
3.46) Retain X1.
For X4 the partial F value 154.295 (F0.10(1,8)
3.46) Retain X4.
31- Compute partial correlation coefficients of Y
with all other independent variables given X4
and X1 in the equation.
The highest partial correlation is with the
variable X2. ( rY2.142 0.358). Thus the
decision is made to include X2.
Regress Y with X1, X2,X4. R2 0.982 .
Check to see if variables in the equation can be
eliminated
Lowest partial F value 1.863 for X4 (F0.10(1,9)
3.36) Remove X4 leaving X1 and X2 .
32Examples
- Using Statistical Packages