Title: Heteroskedasticity
1Chapter 5
2A regression line
3What is in this Chapter?
- How do we detect this problem
- What are the consequences of this problem?
- What are the solutions?
4What is in this Chapter?
- First, We discuss tests based on OLS residuals,
likelihood ratio test, G-Q test and the B-P test.
The last one is an LM test. - Regarding consequences, we show that the OLS
estimators are unbiased but inefficient and the
standard errors are also biased, thus
invalidating tests of significance
5What is in this Chapter?
- Regarding solutions, we discuss solutions
depending on particular assumptions about the
error variance and general solutions. - We also discuss transformation of variables to
logs and the problems associated with deflators,
both of which are commonly used as solutions to
the heteroskedasticity problem.
65.1 Introduction
- The homoskedasticity variance of the error
terms is constant - The heteroskedasticity variance of the error
terms is non-constant - Illustrative ExampleÂ
- Table 5.1 presents consumption expenditures (y)
and income (x) for 20 families. - Suppose that we estimate the equation by ordinary
least squares. We get (figures in parentheses are
standard errors)
75.1 Introduction
- We get (figures in parentheses are standard
errors) - y0.847 0.899 x R2 0.986
- (0.703) (0.0253)
RSS31.074
85.1 Introduction
95.1 Introduction
105.1 Introduction
115.1 Introduction
125.1 Introduction
- The residuals from this equation are presented in
Table 5.3 - In this situation there is no perceptible
increase in the magnitudes of the residuals as
the value of x increases - Thus there does not appear to be a
heteroskedasticity problem.
135.2 Detection of Heteroskedasticity
- In the illustrative example in Section 5.1 we
plotted estimated residual against to
see whether we notice any systematic pattern in
the residuals that suggests heteroskedasticity in
the error. - Note however, that by virtue if the normal
equation, and are uncorrelated though
could be correlated with .
145.2 Detection of Heteroskedasticity
- Thus if we are using a regression procedure to
test for heteroskedasticity, we should use a
regression of on or a
regression of or
- In the case of multiple regression, we should use
powers of , the predicted value of , or
powers of all the explanatory variables.
155.2 Detection of Heteroskedasticity
- The test suggested by Anscombe and a test called
RESET suggested by Ramsey both involve regressing
and testing whether or
not the coefficients are significant. - The test suggested by White involves regressing
on all the explanatory variables
and their squares and cross products. For
instance, with explanatory variables x1, x2, x3,
it involves regressing
165.2 Detection of Heteroskedasticity
- Glejser suggested estimating regressions of the
type
-
- and so on and testing the hypothesis
175.2 Detection of Heteroskedasticity
- The implicit assumption behind all these tests is
that where
zi os an unknown variable and the different tests
use different proxies or surrogates for the
unknown function f(z).
185.2 Detection of Heteroskedasticity
195.2 Detection of Heteroskedasticity
205.2 Detection of Heteroskedasticity
- Thus there is evidence of heteroskedasticity even
in the log- linear from, although casually
looking at the residuals in Table 5.3, we
concluded earlier that the errors were
homoskedastic. - The Goldfeld-Quandt, to be discussed later in
this section, also did not reject the hypothesis
of homoskedasticity. - The Glejser tests, however, show significant
heteroskedasticity in the log-linear form.
21Assignment
- Redo this illustrative example
- The figure of the absolute value of the residual
and x variable - Linear form
- Log-linear form
- Three types of tests
- Linear form and log-linear form
- The e-view table
- Reject/accept the null hypothesis of homogenous
variance
225.2 Detection of Heteroskedasticity
- Some Other Tests (General tests)
- Likelihood Ratio Test
- Goldfeld and Quandt Test
- Breusch-Pagan Test
235.2 Detection of Heteroskedasticity
- Likelihood Ratio Test
- If the number of observations is large, one can
use a likelihood ratio test. - Divide the residuals (estimated from the OLS
regression) into k group with ni observations in
the i th group, . - Estimate the error variances in each group by
. - Let the estimate of the error variance from the
entire sample be .Then if we define as
245.2 Detection of Heteroskedasticity
- Goldfeld and Quandt Test
- If we do not have large samples, we can use the
Goldfeld and Quandt test. - In this test we split the observations into two
groups one corresponding to large values of x
and the other corresponding to small values of x
255.2 Detection of Heteroskedasticity
- Fit separate regressions for each and then apply
an F-test to test the equality of error
variances. - Goldfeld and Quandt suggest omitting some
observations in the middle to increase our
ability to discriminate between the two error
variances.
265.2 Detection of Heteroskedasticity
- Breusch-Pagan Test
- Suppose that .
- If there are some variables
that influence the error variance and if
, then the Breusch and Pagan test is atest
of the hypothesis - The function can be any function.
275.2 Detection of Heteroskedasticity
- For instance, f(x) can be ,and so
on. - The Breusch and pagan test does not depend on the
functional form. - Let
- S0 regression sum of squares from
a - regression of
- Then has a X 2
distribution with d.f.r. - This test is an asymptotic test. An intuitive
justification for the test will be given atter an
illustrative example.
285.2 Detection of Heteroskedasticity
- Illustrative Example
- Consider the data in Table 5.1. To apply the
Goldfeld-Quandt test we consider two groups of 10
observations each, ordered by the values of the
variable x. - The first group consists of observations 6, 11,
9, 4, 14, 15, 19, 20 ,1, and 16. - The second group consists of the remaining 10.
295.2 Detection of Heteroskedasticity
- Illustrative Example
- The estimate equations were
- Group 1 y1.0533 0.876 x R2
0.985 - (0.616) (0.038)
0.475 - Group 2 y3.279 0.835 x R2
0.904 - (3.443) (0.096)
3.154
305.2 Detection of Heteroskedasticity
- The F- ratio for the test is
- The 1 point for the F-distribution with d.f. 8
and 8 is 6.03. - Thus the F-value is significant at the 1 level
and we reject the hypothesis if homoskedasticity.
315.2 Detection of Heteroskedasticity
- Group 1 log y 0.128 0.934 x R2 0.992
- (0.079) (0.030)
-
0.001596 - Group 2 log y 0.276 0.902 x R2 0.912
- (0.352) (0.099)
-
0.002789 - The F-ratio for the test is
325.2 Detection of Heteroskedasticity
- For d.f. 8 and 8, the 5 point from the F-tables
is 3.44. - Thus if we use the 5 significance level, we do
not reject the hypothesis of homoskedasticity if
we consider the linear form but do not reject it
in the log-linear form. - Note that the White test rejected the hypothesis
in both the forms.
335.2 Detection of Heteroskedasticity
- Turning now to the Breusch and Pagan test, the
regression of
gave the following regression sums of squares. - For the linear form
- S 40.842 for the regression of
- S 40.065 for the regression of
- Also .
- The test statistic for the X2-test is (using
second regression)
345.2 Detection of Heteroskedasticity
- We use statistic as a X 2-distribution with d.f.
2 since two slop parameters are estimated. - This is significant at the 5 level, thus,
rejecting the hypothesis of homoskedasticity. - For the log-linear form, using only and
as regressors we get S0.000011 and
- The test statistic is
355.2 Detection of Heteroskedasticity
- Using the X 2-tables with d.f. 2 we see that this
is not significant even at the 50 level. - Thus, the test does not reject the hypothesis of
homoskedasticity in the log-linear form.
365.3 Consequences of Heteroskedasticity
- To see this, consider a very simple model with no
constant term - The least squares estimator of is
- If and are independent of
the . We have
and hence . - Thus is unbiased.
375.3 Consequences of Heteroskedasticity
- If the are mutually independent, denoting
by we write
385.3 Consequences of Heteroskedasticity
- Then dividing (5.1) by we have the model
-
-
- where has a constant variance
. - Since we are weighting the i th observation by
the OLS estimation of (5.3) is called
weighted least squares (WLS).
395.3 Consequences of Heteroskedasticity
- If is the WLS estimator of , we have
- and since the latter term has expectation zero,
we have
405.3 Consequences of Heteroskedasticity
- Thus the WLS estimator is also unbiased.
- We will show that is more efficient that the
OLS estimator . - We have
- and substituting in
equation(5.2), we have
415.3 Consequences of Heteroskedasticity
- Thus
- This expression is of the form
, - where .
- An example by Gauss
425.3 Consequences of Heteroskedasticity
- Thus it is less than 1 and is equal to 1 only if
and are proportional, that is,
and are proportional or is a
constant, which is the case if the errors are
homoskedastic.
435.3 Consequences of Heteroskedasticity
- Thus the OLS estimator is unbiased but efficient
(has a higher variance) than the WLS estimator. - Turning now to the estimation if the variance of
, it is estimated by - where RSS is the residual sum if squares from
the OLS model.
445.3 Consequences of Heteroskedasticity
- But
- The variance of by an expression whose
expected value is - whereas the true variance is
455.3 Consequences of Heteroskedasticity
- Thus the estimated variances are also biased.
- If and are positively correlated as is
often the case with economic data so that
,then the
expected value of the estimated variance is
smaller than the true variance.
465.3 Consequences of Heteroskedasticity
- Thus we would be underestimating the true
variance of the OLS estimator and getting shorter
confidence intervals than the true ones. - This also affects tests of hypotheses about
.
475.3 Consequences of Heteroskedasticity
- The solution to the heteroskedasticity problem
depends on the assumptions we make about the
sources of heteroskedasticity. - When we are not sure or this, we can at least try
to make corrections for the standard errors,
since we seen that the least squares estimator is
unbiased but inefficient, and moreover, the
standard errors are also biased.
485.3 Consequences of Heteroskedasticity
- White suggests that we use the formula (5.2) with
- substituted for .
- Using this formula we find that in the case of
the illustrative example with data in Table 5.1
the standard error of , the slope coefficient
is 0.027. - Earlier, we estimated it from the OLS regression
as 0.0253. - Thus the difference is really not very large in
this example.
495.4 Solutions to the Heteroskedasticity Problem
- There are two types of solutions that have been
suggested in the literature for the problem of
heteroskedasticity - Solutions dependent on particular assumptions
about si. - General solutions.
- We first discuss category 1. Here we have two
methods of estimation weighted least squares
(WLS) and maximum likelihood (ML).
505.4 Solutions to the Heteroskedasticity Problem
515.4 Solutions to the Heteroskedasticity Problem
Thus the constant term in this equation is the
slope coefficient in the original equation.
525.4 Solutions to the Heteroskedasticity Problem
- Prais and Houthakker found in their analysis of
family budget data that the errors from the
equation had variance increasing with household
income. - They considered a model
,that is, . - In this case we cannot divide the whole equation
by a known constant as before. - For this model we can consider a two-step
procedure as follows.
535.4 Solutions to the Heteroskedasticity Problem
- First estimate and by OLS.
- Let these estimators be and .
- Now use the WLS procedure as outlined earlier,
that is, regress on
and with no
constant term. - The limitation of the two-step procedure the
error involved in the first step will affect the
second step
545.4 Solutions to the Heteroskedasticity Problem
- This procedure is called a two-step weighted
least squares procedure. - The standard errors we get for the estimates of
and from this procedure are valid only
asymptotically. - The are asymptotic standard errors because the
weights have been estimated.
555.4 Solutions to the Heteroskedasticity Problem
- One can iterate this WLS procedure further, that
is, use the new estimates of and to
construct new weights and then use the WLS
procedure, and repeat this procedure until
convergence. - This procedure is called the iterated weighted
least squares procedure. However, there is no
gain in (asymptotic) efficiency by iteration.
565.4 Solutions to the Heteroskedasticity Problem
- If we make some specific assumptions about the
errors, say that they are normal - We can use the maximum likelihood method, which
is more efficient than the WLS if errors are
normal
575.4 Solutions to the Heteroskedasticity Problem
585.4 Solutions to the Heteroskedasticity Problem
595.4 Solutions to the Heteroskedasticity Problem
- Illustrative Example
- As an illustration, again consider the data
in Table 5.1.We saw earlier that regressing the
absolute values of the residuals on x (in
Glejsers tests) gave the following estimates - Now we regress
(with no constant term) where
.
605.4 Solutions to the Heteroskedasticity Problem
- The resulting equation is
- If we assume that
, the two-step WLS procedure would be as
follows.
615.4 Solutions to the Heteroskedasticity Problem
- Next we compute
- and regress
.The results were - The in these equations are not comparable.
But our interest is in estimates if the
parameters in the consumption function.
62Assignment
- Use the data of Table 5.1 to do the WLS
- Consider the log-liner form
- Run the Glejsers tests to check if the
log-linear regression model still has
non-constant variance - Estimate the non-constant variance and run the
WLS - Write a one-step program using Gauss program
635.4 Solutions to the Heteroskedasticity Problem
- Comparing the results with the OLS estimates
presented in Section 5.2, we notice that the
estimates of are higher than the OLS
estimates, the estimates of are lower, and
the standard errors are lower.
645.5 Heteroskedasticity and the Use of Deflators
- There are two remedies often suggested and used
for solving the heteroskedasticity problem - Transforming the data to logs
- Deflating the variables by some measure of
"size."
655.5 Heteroskedasticity and the Use of Deflators
665.5 Heteroskedasticity and the Use of Deflators
675.5 Heteroskedasticity and the Use of Deflators
- One important thing to note is that the purpose
in all these procedures of deflation is to get
more efficient estimates of the parameters - But once those estimates have been obtained, one
should make all inferencescalculation of the
residuals, prediction of future values,
calculation of elasticities at the means, etc.,
from the original equationnot the equation in
the deflated variables.
685.5 Heteroskedasticity and the Use of Deflators
- Another point to note is that since the purpose
of deflation is to get more efficient estimates,
it is tempting to argue about the merits of the
different procedures by looking at the standard
errors of the coefficients. - However, this is not correct, because in the
presence of heteroskedasticity the standard
errors themselves are biased, as we showed earlier
695.5 Heteroskedasticity and the Use of Deflators
- For instance, in the five equations presented
above, the second and third are comparable and so
are the fourth and fifth. - In both cases if we look at the standard errors
of the coefficient of X, the coefficient in the
undeflated equation has a smaller standard error
than the corresponding coefficient in the
deflated equation. - However, if the standard errors are biased, we
have to be careful in making too much of these
differences.
705.5 Heteroskedasticity and the Use of Deflators
- In the preceding example we have considered miles
M as a deflator and also as an explanatory
variable. - In this context we should mention some discussion
in the literature on "spurious correlation"
between ratios.
715.5 Heteroskedasticity and the Use of Deflators
- The argument simply is that even if we have two
variables X and Y that are uncorrelated, if we
deflate both the variables by another variable Z,
there could be a strong correlation between X/Z
and Y/Z because of the common denominator Z . - It is wrong to infer from this correlation that
there exists a close relationship between X and Y.
725.5 Heteroskedasticity and the Use of Deflators
- Of course, if our interest is in fact the
relationship between X/Z and Y/Z, there is no
reason why this correlation need be called
"spurious." - As Kuh and Meyer point out, "The question of
spurious correlation quite obviously does not
arise when the hypothesis to be tested has
initially been formulated in terms of ratios, for
instance, in problems involving relative prices.
735.5 Heteroskedasticity and the Use of Deflators
- Similarly, when a series such as money value of
output is divided by a price index to obtain a
'constant dollar' estimate of output, no question
of spurious correlation need arise. - Thus, spurious correlation can only exist when a
hypothesis pertains to undeflated variables and
the data have been divided through by another
series for reasons extraneous to but not in
conflict with the hypothesis framed an exact,
i.e., nonstochastic relation."
745.5 Heteroskedasticity and the Use of Deflators
- In summary, often in econometric work deflated or
ratio variables are used to solve the
heteroskedasticity problem - Deflation can sometimes be justified on pure
economic grounds, as in the case of the use of
"real" quantities and relative prices - In this case all the inferences from the
estimated equation will be based on the equation
in the deflated variables.
755.5 Heteroskedasticity and the Use of Deflators
- However, if deflation is used to solve the
heteroskedasticity problem, any inferences we
make have to be based on the original equation,
not the equation in the deflated variables - In any case, deflation may increase or decrease
the resulting correlations, but this is beside
the point. Since the correlations are not
comparable anyway, one should not draw any
inferences from them.
765.5 Heteroskedasticity and the Use of Deflators
- Illustrative Example
- In Table 5.5 we present data on
- y population density
- x distance from the central business
district - for 39 census tracts on the Baltimore area in
1970. It has been suggested (this is called the
density gradient model) that population density
follows the relationship - where A is the density of the central business
district.
775.5 Heteroskedasticity and the Use of Deflators
- The basic hypothesis is that as you move away
from the central business district population
density drops off. - For estimation purposes we take logs and write
785.5 Heteroskedasticity and the Use of Deflators
- where
. - Estimation of this equation by OLS gave the
following results (figures in oarenthese are
t-values, not standard errors)
795.5 Heteroskedasticity and the Use of Deflators
- The t-values are very high and the coefficients
and significantly different from zero (with
a significance level of less than 1).The sign of
is negative, as expected. - With cross-sectional data like these we expect
heteroskedasticity, and this could result in an
underestimation of the standard errors (and thus
an overestimation of the t-ratios).
805.5 Heteroskedasticity and the Use of Deflators
- To check whether there is heteroskedasticity, we
have to analyze the estimated residuals . - A plot if against showed a positive
relationship and hence Glejsers tests were
applied.
815.5 Heteroskedasticity and the Use of Deflators
- Defining by , the following
equations were estimated
825.5 Heteroskedasticity and the Use of Deflators
- We choose the specification that gives the
highest or equivalently the highest
t-value, since in the
case of only one regressor.
835.5 Heteroskedasticity and the Use of Deflators
- The estimated regressions with t-values in
parentheses were
845.5 Heteroskedasticity and the Use of Deflators
- All the t-statistics are significant, indicating
the presence of heteroskedasticity. - Based on the highest t-ratio, we chose the second
specification (although the fourth specification
is equally valid).
855.5 Heteroskedasticity and the Use of Deflators
- Deflating throughout by gives the regression
equations to be estimated as - The estimates were (figures in parentheses are
t-ratios)
865.6 Testing the Linear Versus Log-Linear
Functional Form
875.6 Testing the Linear Versus Log-Linear
Functional Form
- When comparing the linear with the log-linear
forms, we cannot compare the R2 because R2 is the
ratio of explained variance to the total variance
and the variances of y and log y are different - Comparing R2's in this case is like comparing two
individuals A and B, where A eats 65 of a carrot
cake and B eats 70 of a strawberry cake - The comparison does not make sense because there
are two different cakes.
885.6 Testing the Linear Versus Log-Linear
Functional Form
- The Box-Cox Test
- One solution to this problem is to consider a
more general model of which both the linear and
log-linear forms are special cases. Box and Cox
consider the transformation.
895.6 Testing the Linear Versus Log-Linear
Functional Form
- Box and Cox consider the regression model
- where .
- For the sake of simplicity of exposition we are
considering only one explanatory variable. - Also, instead of considering we can consider
. - For this is a log-linear model, and for
this is a linear model.
905.6 Testing the Linear Versus Log-Linear
Functional Form
- There are two main problems with the
specification in equation (5.7) - The assumption that the errors in (5.7) are
IN( 0 , ) for all values of is not a
reasonable assumption.
915.6 Testing the Linear Versus Log-Linear
Functional Form
- Since y gt0, unless 0 the definition of
in (5.6) imposes some constraints on
that depend on the unknown .Since y gt0, we
have, from equation(5.6), - However, we will ignore these problems and
describe the Box-Cox method.
925.6 Testing the Linear Versus Log-Linear
Functional Form
- Base on the specification given by equation (5.7)
Box and Cox suggest estimating by the ML
method. - We can then test the hypotheses
- If the hypothesis is accepted, we use
log y as the explained variable. - If the hypothesis is accepted, we use
log y as the explained variable. - A problem arises only if both hypotheses are
rejected or both accepted. In this case we have
to use the estimated , and work with
.
935.6 Testing the Linear Versus Log-Linear
Functional Form
- The ML method suggested by Box and Cox amounts to
the following procedure - Divide each y by the geometric mean of the ys.
- Now compute for different values of and
regress it on x. Compute the residual sum of
squares and denote it by . - choose the value of for which is
minimum. This value of is the ML estimator of
.
945.6 Testing the Linear Versus Log-Linear
Functional Form
- As a special case, consider the problem of
choosing between the linear and log-linear model - and
- What we do is first divide each by the
geometric mean of the ys. - Then we estimate the two regressions and choose
the one with the smaller residual sum of squares.
This is the Box-Cox procedure.
955.6 Testing the Linear Versus Log-Linear
Functional Form
- The BM Test
- This is the test suggested by Bera and McAleer.
- Suppose the log-linear and liner models to be
tested are given by - and
965.6 Testing the Linear Versus Log-Linear
Functional Form
- The BM test involves three steps
- Step 1
- Obtain the predicted values log
from the two equation, respectively. - The predicted value of from the log-linear
equation is exp (log ). The predicted value
of log from the linear equation is log .
975.6 Testing the Linear Versus Log-Linear
Functional Form
- Step 2
- Compute the artificial regressions
- and
- Let the estimated residuals from these two
regression equations be and
respectively.
985.6 Testing the Linear Versus Log-Linear
Functional Form
- Step 3
- The tests for are based on
in the artificial regressions - and
- We use the usual t-tests to test these
hypotheses.
995.6 Testing the Linear Versus Log-Linear
Functional Form
- Step 3
- If is accepted, we choose the
log-linear model. - If is accepted, we choose the linear
model. - A problem arises if both these hypotheses are
rejected or both are accepted.
100Summary
- 1. If the error variance is not constant for all
the observations, this is known as the
heteroskedasticity problem. The problem is
informally illustrated with an example in Section
5.1.
101Summary
- 2. First, we would like to know whether the
problem exists. For this purpose some tests have
been suggested. We have discussed the following
tests - (a) Ramsey's test.
- (b) Glejser's tests.
- (c) Breusch and Pagan's test.
- (d) White's test.
- (e) Goldfeld and Quandt's test.
- (f) Likelihood ratio test.
102Summary
- 3. The consequences of the heteroskedasticity
problem are - (a) The least squares estimators are unbiased but
inefficient. - (b) The estimated variances are themselves
biased. - If the heteroskedasticity problem is detected, we
can try to solve it by the use of weighted least
squares. - Otherwise, we can at least try to correct the
error variances .
103Summary
- 4. There are three solutions commonly suggested
for the heteroskedasticity problem - (a) Use of weighted least squares.
- (b) Deflating the data by some measure of
"size. - (c) Transforming the data to the logarithmic
form. - In weighted least squares, the particular
weighting scheme used will depend on the nature
of heteroskedasticity.
104Summary
- 5. The use of deflators is similar to the
weighted least squared method, although it is
done in a more ad hoc fashion. - 6. The question of estimation in linear versus
logarithmic form has received considerable
attention during recent years. Several
statistical tests have been suggested for testing
the linear versus logarithmic form.