Title: Business Forecasting
1Business Forecasting
- Chapter 8
- Forecasting with Multiple Regression
2Chapter Topics
- The Multiple Regression Model
- Estimating the Multiple Regression ModelThe
Least Squares Method - The Standard Error of Estimate
- Multiple Correlation Analysis
- Partial Correlation
- Partial Coefficient of Determination
3Chapter Topics
(continued)
- Inferences Regarding Regression and Correlation
Coefficients - The F-Test
- The t-test
- Confidence Interval
- Validation of the Regression Model for
Forecasting - Serial or Autocorrelation
4Chapter Topics
(continued)
- Equal Variances or Homoscedasticity
- Multicollinearity
- Curvilinear Regression Analysis
- The Polynomial Curve
- Application to Management
- Chapter Summary
5The Multiple Regression Model
Relationship between one dependent and two or
more independent variables is a linear function.
Population Y-intercept
Population slopes
Random Error
Dependent (Response) Variable
Independent (Explanatory) Variables
6Interpretation of Estimated Coefficients
- Slope (bi)
- Estimated that the average value of Y changes by
bi for each 1 unit increase in Xi, holding all
other variables constant (ceterus paribus). - Example If b1 -2, then fuel oil usage (Y) is
expected to decrease by an estimated 2 gallons
for each 1 degree increase in temperature (X1),
given the inches of insulation (X2). - Y-Intercept (b0)
- The estimated average value of Y when all Xi 0.
7Multiple Regression Model Example
(F)
Develop a model for estimating heating oil used
for a single family home in the month of January,
based on average temperature and amount of
insulation in inches.
8Multiple Regression Equation Example
Excel Output
For each degree increase in temperature, the
estimated average amount of heating oil used is
decreased by 4.86 gallons, holding insulation
constant.
For each increase in one inch of insulation, the
estimated average use of heating oil is decreased
by 15.07 gallons, holding temperature constant.
9Multiple Regression Using Excel
- Stat Regression
- EXCEL spreadsheet for the heating oil example.
10Simple and Multiple Regression Compared
- Coefficients in a simple regression pick up the
impact of that variable (plus the impacts of
other variables that are correlated with it) and
the dependent variable. - Coefficients in a multiple regression account for
the impacts of the other variables in the
equation.
11Simple and Multiple Regression Compared Example
- Two simple regressions
- Multiple Regression
12Standard Error of Estimate
- Measures the standard deviation of the residuals
about the regression plane, and thus specifies
the amount of error incurred when the least
squares regression equation is used to predict
values of the dependent variable. - The standard error of estimate is computed by
using the following equation
13Coefficient of Multiple Determination
- Proportion of total variation in Y explained by
all X Variables taken together. -
- Never decreases when a new X variable is added
to model. - Disadvantage when comparing models.
14Adjusted Coefficient of Multiple Determination
- Proportion of variation in Y explained by all X
variables adjusted for the number of X variables
used and sample size -
- Penalizes excessive use of independent variables.
- Smaller than .
- Useful in comparing among models.
15Coefficient of Multiple Determination
- Adjusted R2
- Reflects the number of explanatory variables and
sample size - Is smaller than R2
16Interpretation of Coefficient of Multiple
Determination
- 96.32 of the total variation in heating oil can
be explained by temperature and amount of
insulation. - 95.71 of the total fluctuation in heating oil
can be explained by temperature and amount of
insulation after adjusting for the number of
explanatory variables and sample size.
17Using The Regression Equation to Make Predictions
Predict the amount of heating oil used for a home
if the average temperature is 30 and the
insulation is 6 inches.
The predicted heating oil used is 304.39 gallons.
18Predictions Using Excel
- Stat Regression
- Check the Confidence and Prediction Interval
Estimate box - EXCEL spreadsheet for the heating oil example.
19Residual Plots
- Residuals vs.
- May need to transform Y variable.
- Residuals vs.
- May need to transform variable.
- Residuals vs.
- May need to transform variable.
- Residuals vs. Time
- May have autocorrelation.
20Residual Plots Example
May be some non-linear relationship.
No Discernible Pattern
21Testing for Overall Significance
- Shows if there is a linear relationship between
all of the X variables together and Y. - Use F test statistic.
- Hypotheses
- H0 ?1 ?2 ?k 0 (No linear relationship)
- H1 At least one ?i ? 0 (At least one
independent variable affects Y.) - The Null Hypothesis is a very strong statement.
- The Null Hypothesis is almost always rejected.
22Testing for Overall Significance
(continued)
- Test Statistic
-
- where F has k numerator and (n-k-1) denominator
degrees of freedom.
23Test for Overall SignificanceExcel Output
Example
p value
k 2, the number of explanatory variables.
n - 1
24Test for Overall SignificanceExample Solution
- H0 ?1 ?2 ?k 0
- H1 At least one ?i ? 0
- ? 0.05
- df 2 and 12
- Critical Value
Test Statistic Decision Conclusion
?
F
157.24
(Excel Output)
Reject at ? 0.05
There is evidence that at least one independent
variable affects Y.
? 0.05
F
0
3.89
25Test for SignificanceIndividual Variables
- Shows if there is a linear relationship between
the variable Xi and Y. - Use t Test Statistic.
- Hypotheses
- H0 ?i 0 (No linear relationship.)
- H1 ?i ? 0 (Linear relationship between Xi and Y.)
26t Test StatisticExcel Output Example
t Test Statistic for X1 (Temperature)
t Test Statistic for X2 (Insulation)
27t Test Example Solution
Does temperature have a significant effect on
monthly consumption of heating oil? Test at ?
0.05.
H0 ?1 0 H1 ?1 ? 0 df 12 Critical
Values
Test Statistic Decision Conclusion
t Test Statistic -15.084
Reject H0 at ? 0.05
Reject H
Reject H
0
0
There is evidence of a significant effect of
temperature on oil consumption.
0.025
0.025
t
0
2.1788
-2.1788
28Confidence Interval Estimate for the Slope
Provide the 95 confidence interval for the
population slope ?1 (the effect of temperature on
oil consumption).
-5.56 ? ?1 ? -4.15
The estimated average consumption of oil is
reduced by between 4.15 gallons and 5.56 gallons
for each increase of 1 F.
29Contribution of a Single Independent Variable
- Let Xk be the independent variable of interest
-
- Measures the contribution of Xk in explaining
the total variation in Y.
30Contribution of a Single Independent Variable
From ANOVA section of regression for
From ANOVA section of regression for
Measures the contribution of in explaining Y.
31Coefficient of Partial Determination of
-
- Measures the proportion of variation in the
dependent variable that is explained by Xk ,
while controlling for (Holding Constant) the
other independent variables.
32Coefficient of Partial Determination for
(continued)
Example Model with two independent variables
33Coefficient of Partial Determination in Excel
- Stat Regression
- Check the Coefficient of partial determination
box. - EXCEL spreadsheet for the heating oil example.
34Contribution of a Subset of Independent Variables
- Let Xs be the subset of independent variables of
interest -
- Measures the contribution of the subset Xs in
explaining SST.
35Contribution of a Subset of Independent
Variables Example
Let Xs be X1 and X3
From ANOVA section of regression for
From ANOVA section of regression for
36Testing Portions of Model
- Examines the contribution of a subset Xs of
explanatory variables to the relationship with Y. - Null Hypothesis
- Variables in the subset do not improve
significantly the model when all other variables
are included. - Alternative Hypothesis
- At least one variable is significant.
37Testing Portions of Model
(continued)
- One-tailed Rejection Region
- Requires comparison of two regressions
- One regression includes everything.
- Another regression includes everything except the
portion to be tested.
38Partial F Test for the Contribution of a Subset
of X variables
- Hypotheses
- H0 Variables Xs do not significantly improve
the model, given all other variables included. - H1 Variables Xs significantly improve the
model, given all others included. - Test Statistic
- with df m and (n-k-1)
- m of variables in the subset Xs .
39Partial F Test for the Contribution of a Single
- Hypotheses
- H0 Variable Xj does not significantly improve
the model, given all others included. - H1 Variable Xj significantly improves the
model, given all others included. - Test Statistic
-
- With df 1 and (n-k-1)
- m 1 here
40Testing Portions of Model Example
Test at the ? 0.05 level to determine if the
variable of average temperature significantly
improves the model, given that insulation is
included.
41Testing Portions of Model Example
H0 X1 (temperature) does not improve model with
X2 (insulation) included. H1 X1 does improve
model
? 0.05, df 1 and 12 Critical Value 4.75
(For X2)
(For X1 and X2)
Conclusion Reject H0 X1 does improve model.
42Testing Portions of Model in Excel
- Stat Regression
- Calculations for this example are given in the
spreadsheet. When using Minitab, simply check the
box for partial coefficient of determination. - EXCEL spreadsheet for the heating oil example.
43Do We Need to Do This for One Variable?
- The F Test for the inclusion of a single
variable after all other variables are included
in the model is IDENTICAL to the t Test of the
slope for that variable. - The only reason to do an F Test is to test
several variables together.
44The Quadratic Regression Model
- Relationship between the response variable and
the explanatory variable is a quadratic
polynomial function. - Useful when scatter diagram indicates non-linear
relationship. - Quadratic Model
-
- The second explanatory variable is the square of
the first variable.
45Quadratic Regression Model
(continued)
Quadratic model may be considered when a scatter
diagram takes on the following shapes
Y
Y
Y
Y
X1
X1
X1
X1
?2 gt 0
?2 gt 0
?2 lt 0
?2 lt 0
?2 the coefficient of the quadratic term.
46Testing for Significance Quadratic Model
- Testing for Overall Relationship
- Similar to test for linear model
- F test statistic
- Testing the Quadratic Effect
- Compare quadratic model
- with the linear model
- Hypotheses
- (No quadratic term.)
- (Quadratic term is
needed.)
47Heating Oil Example
(F)
Determine if a quadratic model is needed for
estimating heating oil used for a single family
home in the month of January based on average
temperature and amount of insulation in inches.
48Heating Oil Example Residual Analysis
(continued)
Possible non-linear relationship
No Discernible Pattern
49Heating Oil Example t Test for Quadratic Model
(continued)
- Testing the Quadratic Effect
- Model with quadratic insulation term
- Model without quadratic insulation term
- Hypotheses
- (No quadratic term in
insulation.) - (Quadratic term is needed
in insulation.)
50Example Solution
Is quadratic term in insulation needed on monthly
consumption of heating oil? Test at ? 0.05.
H0 ?3 0 H1 ?3 ? 0 df 11 Critical
Values
Do not reject H0 at ? 0.05.
Reject H
Reject H
0
0
0.025
0.025
There is not sufficient evidence for the need to
include quadratic effect of insulation on oil
consumption.
Z
0
2.2010
-2.2010
0.2786
51Validation of the Regression Model
- Are there violations of the multiple regression
assumption? - Linearity
- Autocorrelation
- Normality
- Homoscedasticity
52Validation of the Regression Model
(Continued)
- The independent variables are nonrandom variables
whose values are fixed. - The error term has an expected value of zero.
- The independent variables are independent of each
other.
53Linearity
- How do we know if the assumption is violated?
- Perform regression analysis on the various forms
of the model and observe which model fits best. - Examine the residuals when plotted against the
fitted values. - Use the Lagrange Multiplier Test.
54Linearity (continued)
- Linearity assumption is met by transforming the
data using any one of several transformation
techniques. - Logarithmic Transformation
- Square-root Transformation
- Arc-Sine Transformation
55Serial or Autocorrelation
- Assumption of the independence of Y values is not
met. - A major cause of autocorrelated error terms is
the misspecification of the model. - Two approaches to determine if autocorrelation
exists - Examine the plot of the error terms as well as
the signs of the error term over time.
56Serial or Autocorrelation
(continued)
- DurbinWatson statistic could be used as a
measure of autocorrelation
57Serial or Autocorrelation
(continued)
- Serial correlation may be caused by
misspecification error such as an omitted
variable, or it can be caused by correlated error
terms. - Serial correlation problems can be remedied by a
variety of techniques - CochraneOrcutt and HildrethLu iterative
procedures
58Serial or Autocorrelation
(continued)
- Generalized least square
- Improved specification
- Various autoregressive methodologies
- First-order differences
59Homoscedasticity
- One of the assumptions of the regression model is
that the error terms all have equal variances. - This condition of equal variance is known as
homoscedasticity. - Violation of the assumption of equal variances
gives rise to the problem of heteroscedasticity. - How do we know if we have heteroscedastic
condition?
60Homoscedasticity
- Plot the residuals against the values of X.
- When there is a constant variance appearing as a
band around the predicted values, then we do not
have to be concerned about heteroscedasticity.
61Homoscedasticity
Constant Variance
Fluctuating Variance
Fluctuating Variance
Fluctuating Variance
62Homoscedasticity
- Several approaches have been developed to test
for the presence of heteroscedasticity. - GoldfeldQuandt test
- BreuschPagan test
- Whites test
- Engles ARCH test
63HomoscedasticityGoldfeldQuandt Test
- This test compares the variance of one part of
the sample with another using the F-test. - To perform the test, we follow these steps
- Sort the data from low to high of the independent
variable that is suspect for heteroscedasticity. - Omit the observations in the middle fifth or
one-sixth. This results in two groups with
. - Run two separate regression one for the low
values and the other with high values. - Observe the error sum of squares for each group
and label them as SSEL and SSEH.
64HomoscedasticityGoldfeld-Quandt Test (Continued)
- Compute the ratio of
- If there is no heteroscedasticity, this ratio
will be distributed as an F-Statistic with
degrees of freedom in the numerator and
denominator, where k is the number of
coefficients. - Reject the null hypothesis of homoscedasticity if
the ratio exceeds the F table value.
65Multicollinearity
- High correlation between explanatory variables.
- Coefficient of multiple determination measures
combined effect of the correlated explanatory
variables. - Leads to unstable coefficients (large standard
error).
66Multicollinearity
- How do we know whether we have a problem of
multicollinearity? - When a researcher observes a large coefficient of
determination ( ) accompanied by
statistically insignificant estimates of the
regression coefficients. - When one (or more) independent variable(s) is an
exact linear combination of the others, we have
perfect multicollinearity.
67Detect Collinearity (Variance Inflationary
Factor)
- Used to Measure Collinearity
- If is Highly Correlated with
the Other Explanatory Variables.
68Detect Collinearity in Excel
- Stat Regression
- Check the Variance Inflationary Factor (VIF)
box. - EXCEL spreadsheet for the heating oil example
- Since there are only two explanatory variables,
only one VIF is reported in the Excel
spreadsheet. - No VIF is gt5
- There is no evidence of collinearity.
69Chapter Summary
- Developed the Multiple Regression Model.
- Discussed Residual Plots.
- Addressed Testing the Significance of the
Multiple Regression Model. - Discussed Inferences on Population Regression
Coefficients. - Addressed Testing Portions of the Multiple
Regression Model.
70Chapter Summary
(continued)
- Described the Quadratic Regression Model.
- Addressed the violations of the regression
assumptions.