Title: Forecasting
1Forecasting Methods
Forecasting Methods
Quantitative
Qualitative
Causal
Time Series
Smoothing
Trend Projection
Trend Projection Adjusted for Seasonal Influence
2General Linear Model
- Models in which the parameters (?0, ?1, . . . ,
?p ) all - have exponents of one are called linear models.
- It does not imply that the relationship
between y and the xis is linear.
- A general linear model involving p independent
variables is
- Each of the independent variables z is a function
of x1, x2, ... , xk (the variables for which data
have been collected).
3General Linear Model
- The simplest case is when we have collected data
for just one variable x1 and want to estimate y
by using a straight-line relationship. In this
case z1 x1.
- This model is called a simple first-order model
with one predictor variable.
4Estimated Multiple Regression Equation
A simple random sample is used to compute
sample statistics b0, b1, b2, . . . , bp that are
used as the point estimators of the parameters
b0, b1, b2, . . . , bp.
The estimated multiple regression equation is
5Estimation Process
Multiple Regression Model E(y) ?0 ?1x1 ?2x2
. . . ?pxp e Multiple Regression
Equation E(y) ?0 ?1x1 ?2x2 . . . ?pxp
Unknown parameters are b0, b1, b2, . . . , bp
b0, b1, b2, . . . , bp provide estimates of b0,
b1, b2, . . . , bp
6Least Squares Method
- Computation of Coefficient Values
The formulas for the regression
coefficients b0, b1, b2, . . ., bp involve the
use of matrix algebra. We will rely on computer
software packages to perform the calculations.
7Multiple Regression Equation
- Example Butler Trucking Company
- To develop better work schedules, the managers
want to estimate the total daily travel time for
their drivers - Data
8Multiple Regression Equation
9Multiple Regression Model
- Example Programmer Salary Survey
A software firm collected data for a
sample of 20 computer programmers. A
suggestion was made that regression analysis
could be used to determine if salary was
related to the years of experience and the
score on the firms programmer aptitude test.
The years of experience, score on the
aptitude test, and corresponding annual salary
(1000s) for a sample of 20 programmers is
shown on the next slide.
10Multiple Regression Model
Exper.
Score
Score
Exper.
Salary
Salary
4 7 1 5 8 10 0 1 6 6
9 2 10 5 6 8 4 6 3 3
78 100 86 82 86 84 75 80 83 91
88 73 75 81 74 87 79 94 70 89
38 26.6 36.2 31.6 29 34 30.1 33.9 28.2 30
24 43 23.7 34.3 35.8 38 22.2 23.1 30 33
11Multiple Regression Model
Suppose we believe that salary (y) is related
to the years of experience (x1) and the score
on the programmer aptitude test (x2) by the
following regression model
y ?0 ?1x1 ?2x2 ?
where y annual salary (1000) x1 years
of experience x2 score on programmer
aptitude test
12Solving for the Estimates of ?0, ?1, ?2
Least Squares Output
Input Data
x1 x2 y 4 78 24 7 100 43 .
. . . . . 3 89 30
Computer Package for Solving Multiple Regression P
roblems
b0 b1 b2 R2 etc.
13Solving for the Estimates of ?0, ?1, ?2
- Excel Worksheet (showing partial data entered)
Note Rows 10-21 are not shown.
14Solving for the Estimates of ?0, ?1, ?2
- Excels Regression Dialog Box
15Solving for the Estimates of ?0, ?1, ?2
- Excels Regression Equation Output
Note Columns F-I are not shown.
16Estimated Regression Equation
SALARY 3.174 1.404(EXPER) 0.251(SCORE)
Note Predicted salary will be in thousands of
dollars.
17Interpreting the Coefficients
In multiple regression analysis, we
interpret each regression coefficient as
follows
bi represents an estimate of the change in y
corresponding to a 1-unit increase in xi when
all other independent variables are held
constant.
18Interpreting the Coefficients
b1 1. 404
Salary is expected to increase by 1,404
for each additional year of experience (when
the variable score on programmer attitude test
is held constant).
19Interpreting the Coefficients
b2 0.251
Salary is expected to increase by 251 for
each additional point scored on the programmer
aptitude test (when the variable years of
experience is held constant).
20Multiple Coefficient of Determination
- Relationship Among SST, SSR, SSE
SST SSR SSE
where SST total sum of squares SSR
sum of squares due to regression SSE
sum of squares due to error
21Multiple Coefficient of Determination
SSR
SST
22Multiple Coefficient of Determination
R2 SSR/SST
R2 500.3285/599.7855 .83418
- In general, R2 always increases as independent
variables are added to the model. - adjusting R2 for the number of independent
variables to avoid overestimating the impact of
adding an independent variable
23Adjusted Multiple Coefficient of Determination
- n denoting the number of observations
- p denoting the number of independent variables
24Adjusted Multiple Coefficient of Determination
- Excels Regression Statistics
25Assumptions About the Error Term ?
The error ? is a random variable with mean of
zero.
The variance of ? , denoted by ??2, is the same
for all values of the independent variables.
The values of ? are independent.
The error ? is a normally distributed random
variable reflecting the deviation between the y
value and the expected value of y given by ?0
?1x1 ?2x2 ... ?pxp.
26Multiple RegressionAnalysis with Two Independent
Variables
27Testing for Significance
In simple linear regression, the F and t tests
provide the same conclusion.
In multiple regression, the F and t tests have
different purposes.
28Testing for Significance F Test
The F test is used to determine whether a
significant relationship exists between the
dependent variable and the set of all the
independent variables.
The F test is referred to as the test for
overall significance.
29Testing for Significance t Test
If the F test shows an overall significance, the
t test is used to determine whether each of the
individual independent variables is significant.
A separate t test is conducted for each of the
independent variables in the model.
We refer to each of these t tests as a test for
individual significance.
30Testing for Significance F Test
H0 ?1 ?2 . . . ?p 0 Ha One or
more of the parameters is not equal to
zero.
Hypotheses
F MSR/MSE
Test Statistics
Rejection Rule
Reject H0 if p-value lt a or if F gt F? , where
F? is based on an F distribution with p d.f. in
the numerator and n - p - 1 d.f. in the
denominator.
31Testing for Significance F Test
- ANOVA Table for A Multiple Regression Model with
p Independent Variables
32F Test for Overall Significance
H0 ?1 ?2 0 Ha One or both of the
parameters is not equal to zero.
Hypotheses
For ? .05 and d.f. 2, 17 F.05 3.59 Reject
H0 if p-value lt .05 or F gt 3.59
Rejection Rule
33F Test for Overall Significance
p-value used to test for overall significance
34F Test for Overall Significance
Test Statistics
F MSR/MSE 250.16/5.85 42.76
Conclusion
p-value lt .05, so we can reject H0. (Also, F
42.76 gt 3.59)
35Testing for Significance t Test
Hypotheses
Test Statistics
Rejection Rule
Reject H0 if p-value lt a or if t lt -t????or t gt
t???? where t??? is based on a t
distribution with n - p - 1 degrees of freedom.
36t Test for Significance of Individual Parameters
Hypotheses
Rejection Rule
For ? .05 and d.f. 17, t.025 2.11 Reject H0
if p-value lt .05 or if t gt 2.11
37t Test for Significance of Individual Parameters
- Excels Regression Equation Output
Note Columns F-I are not shown.
t statistic and p-value used to test for the
individual significance of Experience
38t Test for Significance of Individual Parameters
- Excels Regression Equation Output
Note Columns F-I are not shown.
t statistic and p-value used to test for the
individual significance of Test Score
39t Test for Significance of Individual Parameters
Test Statistics
Conclusions
Reject both H0 ?1 0 and H0 ?2 0. Both
independent variables are significant.
40Testing for Significance Multicollinearity
The term multicollinearity refers to the
correlation among the independent variables.
When the independent variables are highly
correlated (say, r gt .7), it is not possible
to determine the separate effect of any
particular independent variable on the dependent
variable.
41Testing for Significance Multicollinearity
If the estimated regression equation is to be
used only for predictive purposes,
multicollinearity is usually not a serious
problem.
Every attempt should be made to avoid including
independent variables that are highly correlated.
42Modeling Curvilinear Relationships
- This model is called a second-order model with
one predictor variable.
43Modeling Curvilinear Relationships
- Example Reynolds, Inc.,
- Managers at Reynolds want to
- investigate the relationship
- between length of employment
- of their salespeople and the
- number of electronic laboratory
- scales sold.
- Data
44Modeling Curvilinear Relationships
- Scatter Diagram for the Reynolds Example
45Modeling Curvilinear Relationships
- Let us consider a simple first-order model and
the estimated regression is - Sales 111 2.38 Months,
- where
- Sales number of electronic laboratory
scales sold, - Months the number of months the
salesperson - has been employed
46Modeling Curvilinear Relationships
- MINITAB output first-order model
-
47Modeling Curvilinear Relationships
- Standardized Residual plot first-order model
- The standardized residual plot suggests that a
curvilinear relationship is needed
48Modeling Curvilinear Relationships
- Reynolds Example The second-order model
- The estimated regression equation is
- Sales 45.3 6.34 Months .0345
MonthsSq - where
- Sales number of electronic laboratory
scales sold, - MonthsSq the square of the number of
months the - salesperson has been
employed
49Modeling Curvilinear Relationships
- MINITAB output second-order model
50Modeling Curvilinear Relationships
- Standardized Residual plot second-order model
51Variable Selection Procedures
- Stepwise Regression
- Forward Selection
- Backward Elimination
Iterative one independent variable at a time is
added or deleted based on the F statistic
52Variable Selection Stepwise Regression
Any p-value lt alpha to enter ?
Compute F stat. and p-value for each
indep. variable not in model
No
No
Yes
Indep. variable with largest p-value
is removed from model
Any p-value gt alpha to remove ?
Yes
Stop
Compute F stat. and p-value for each
indep. variable in model
Indep. variable with smallest p-value is entered
into model
Start with no indep. variables in model
53Variable Selection Forward Selection
Start with no indep. variables in model
Compute F stat. and p-value for each
indep. variable not in model
Any p-value lt alpha to enter ?
Indep. variable with smallest p-value is entered
into model
Yes
No
Stop
54Variable Selection Backward Elimination
Start with all indep. variables in model
Compute F stat. and p-value for each
indep. variable in model
Any p-value gt alpha to remove ?
Indep. variable with largest p-value is removed
from model
Yes
No
Stop
55Qualitative Independent Variables
In many situations we must work with
qualitative independent variables such as gender
(male, female), method of payment (cash, check,
credit card), etc.
For example, x2 might represent gender where x2
0 indicates male and x2 1 indicates female.
In this case, x2 is called a dummy or indicator
variable.
56Qualitative Independent Variables
- Example Programmer Salary Survey
- As an extension of the problem involving the
- computer programmer salary survey, suppose
- that management also believes that the
- annual salary is related to whether the
- individual has a graduate degree in
- computer science or information systems.
- The years of experience, the score on the
programmer - aptitude test, whether the individual has a
relevant - graduate degree, and the annual salary (1000)
for each - of the sampled 20 programmers are shown on the
next - slide.
57Qualitative Independent Variables
Exper.
Score
Score
Exper.
Salary
Salary
Degr.
Degr.
4 7 1 5 8 10 0 1 6 6
9 2 10 5 6 8 4 6 3 3
78 100 86 82 86 84 75 80 83 91
88 73 75 81 74 87 79 94 70 89
38 26.6 36.2 31.6 29 34 30.1 33.9 28.2 30
No Yes No Yes Yes Yes No No No Yes
Yes No Yes No No Yes No Yes No No
24 43 23.7 34.3 35.8 38 22.2 23.1 30 33
58Estimated Regression Equation
y b0 b1x1 b2x2 b3x3
x3 is a dummy variable
59Qualitative Independent Variables
- Excels Regression Statistics
60Qualitative Independent Variables
61Qualitative Independent Variables
- Excels Regression Equation Output
Note Columns F-I are not shown.
Not significant
62More Complex Qualitative Variables
- If a qualitative variable has k levels, k - 1
dummy - variables are required, with each dummy variable
- being coded as 0 or 1.
For example, a variable with levels A, B, and C
could be represented by x1 and x2 values of (0,
0) for A, (1, 0) for B, and (0,1) for C.
Care must be taken in defining and interpreting
the dummy variables.
63More Complex Qualitative Variables
For example, a variable indicating level
of education could be represented by x1 and x2
values as follows
64Interaction
- If the original data set consists of observations
for y and two independent variables x1 and x2 we
might develop a second-order model with two
predictor variables.
- In this model, the variable z5 x1x2 is added to
account for the potential effects of the two
variables acting together.
- This type of effect is called interaction.
65Interaction
- Example Tyler Personal Care
- New shampoo products, two factors believed to
have the most influence on sales are unit selling
price and advertising expenditure. - Data
66Interaction
- Mean Unit Sales (1000s) for the Tyler Personal
Care Example - At higher selling prices, the effect of increased
advertising - expenditure diminishes. These observations
provide - evidence of interaction between the price and
advertising - expenditure variables.
67Interaction
- Mean Sales as
- a Function of
- Selling Price
- and Advertising
- Expenditure
68Interaction
- To account for the effect of interaction, use the
following regression model - where
- y unit sales (1000s),
- x1 price (),
- x2 advertising expenditure (1000s).
69Interaction
- General Linear Model involving three independent
- variables (z1, z2, and z3)
- where
- y Sales unit sales (1000s)
- z1 x1 (price) price of the product ()
- z2 x2 (AdvExp) advertising expenditure
(1000s) - z3 x1x2 (PriceAdv) interaction term
- (Price
times AdvExp)
70Interaction
- MINITAB Output for the Tyler Personal Care
Example