Title: Cost Estimation
1Cost Estimation
2Regression Day 2
- Continue multiple regression
- Assessing the validity of a regression analysis
- Omitted variable problems and multicollinearity
- Monticello
- Dummy variables
- Other factors
3Assessment
- Plausibility (face validity, smell test)
- Goodness of fit (R2)
- Confidence (F test)
- Specification tests
- Linearity
- Error independence (plots, D-W)
- Constant variance (plots)
- Expected value of error is zero (plots,
alternative theories) - Errors are normally distributed
- Independent variables are uncorrelated
- Omitted variables (biased estimates)
4Omitted variables
- The variable omitted from regression was machine
hours. How much does leaving it out matter?
Does it say the same thing as DLH? - Effect on the estimated parameters of the
included variables - The omitted variable is uncorrelated with the
included variables - The omitted variable is correlated
Biased intercept and std. error, butnot the
slope.
Even the slope coefficient is biased.
5Direction of slope bias . . .
- Positively related to both the included variable
and the dependent variable . . . - The estimated slope coefficient of the included
variable will be too big. - Otherwise . . .
- The estimated slope coefficient of the included
variable will be too small.
6Some problems with regression
- Nonlinear relationships
- Outliers
- Spurious relationships
- Data problems
- Inaccurate accounting cut-offs
- Inaccurate recording
- Arbitrarily allocated costs
- Missing data
- Inflation
7Group Work
8Melfort Mining Limited
- We are trying to predict overhead costs from tons
of ore extracted. - Simple regression and adequacy tests
- Interpretation
- Seasonal effects (1) effect of seasons on the
intercept, (2) effect of seasons on the slope. - Possible trend
9Using time series data
- Common in accounting applications
- Precautions called for with long time series
- structural changes production technology
- inflation
- auto regression (correlated errors)
- Possible solutions
- Transform the data, or
- model the cause
10Tests of analysis adequacy
- Plausibility Overhead 183,681 (40.36 of
tons of ore processed) - Linearity (see plot, slide 14)
- Goodness of fit R2 .21
- Low
- Standard error of the regression is 42,942.
- t-statistic 3.6, 48 dof. Reject the null
hypothesis that the slope is zero. - Uncorrelated errors The Durbin-Watson statistic
.0753. The critical lower limit is 1.50 and
the upper limit is 1.59 (at k 2, alpha .05
and n 50)
11Regression Plot
440000
420000
400000
380000
360000
340000
Total
320000
300000
280000
260000
240000
220000
2500
2750
3000
3250
3500
3750
4000
4250
4500
4750
Tons of Ore
Y 183680.998 40.326 X R2 .21
12Durbin-Watson
- Measures first-order positive correlation in
residuals. - The test statistics are upper and lower bounds
- Below dL ? There is a problem.
- Above dU ? There is no problem.
- Between dL and dU the test is inconclusive.
13Inference problems due to autocorrelation
- The data set contains less information than
assumed because the observations are not
independent. - Therefore, the standard error/deviation estimates
are smaller than they should be, so . . . - Confidence intervals are too narrow
- Implying more precision than is warranted.
- We will tend to reject the null hypothesis too
often.
14Plot the regression residuals against a time
variable.
15Bivariate Scattergram
120000
100000
Underestimating winter
80000
60000
40000
20000
Residual Total
0
-20000
-40000
-60000
-80000
Overestimating summer
-100000
0
2
4
6
8
10
12
14
Months in Year
16Model the situation
- Seasonality can be addressed with indicator
(dummy) variables. - Suppose the fixed part of overhead is different
for each season? - 1. Add an indicator for three of the
seasons, or - 2. Drop the intercept and add an an
indicator for each season, and - 3. Rerun the model with 4 (5),
independent variables.
17Hypothesis
Different intercepts, but equal slopes.
Overhead
Winter
Spring
Fall
Summer
Ore processed
18Results
19Equations To predict . . .
For summer Overhead 132,476 38.26 Tons
of Ore For spring Overhead (132,476
39,572) 38.26 Tons For fall Overhead
(132,476 75,942) 38.26 Tons For
winter Overhead (132,476 109,851) 38.26
Tons
20(No Transcript)
21(No Transcript)
22Normal Residual Plot
23Model the situation . . .
- Seasonality can affect the slope instead.
- To form variables that will capture seasonal
slope changes . . . - Create new variables by multiplying each seasonal
dummy variable by the tons of ore variable. - Enter three if tons of ore remains in the model
or enter all four if you drop tons of ore.
24Hypothesis
The slopes are different, but there is only one
intercept.
Overhead
Winter
Spring
Fall
Summer
Ore processed
25Results
26Equations To predict . . .
For summer Overhead 195,435 20.82 Tons
of Ore For spring Overhead 195,435 (20.82
10.51) Tons For fall Overhead 195,435
(20.82 21.12) Tons For winter Overhead
195,435 (20.82 30.38) Tons
27The Durbin-Watson is inconclusive
28Bivariate Scattergram
10000
8000
6000
4000
2000
Residual Total.3
0
-2000
-4000
-6000
-8000
-10000
0
10
20
30
40
50
60
Month
29Results
Questionable variables? Not a problem if
predicting O/H is focus.
30(No Transcript)
31(No Transcript)
32Correlated errors . . .
- D-W test is not applicable to regression
equations in which the place of the explanatory
variable is taken by the lagged value of the
dependent variable. - Plot the residuals and look for an obvious
relation to model, or - Choose a conservative alpha level.
33Montecello
- Plausibility?
- Goodness of fit (R2)
- Confidence (F test)
- Specification tests
- Linearity
- E(error) 0
- Constant variance (error term)
- Independent errors
- Explanatory variables are uncorrelated
- Normally distributed errors.
34(No Transcript)
35(No Transcript)
36The end!