Title: Heteroskedasticity
1Heteroskedasticity
2What is Heteroskedasticity?
- One of the assumptions of the CLR model
(assumption V) is that the error term in the
linear model is drawn from a distribution with a
constant variance - When this is the case, we say that the errors are
homoskedastic - If this assumption does not hold, we run into the
problem of heteroskedasticity - Heteroskedasticity, as a violation of the
assumptions of the CLR model, causes the OLS
estimates to lose some of their nice properties
3What is Heteroskedasticity?
- Example Suppose we take a cross-section sample
of firms from a certain industry and we want to
estimate a model with sales as the dependent
variable - We may suspect that sales of larger firms will
vary more than those of smaller firms, implying
that there will be heteroskedasticity - Heteroskedasticity is common in cross-section
data and needs to be identified and corrected
(firms, banks, insurance companies, mutual funds,
real estate, etc.)
4The Heteroskedasticity Problem
- Heteroskedasticity can be distinguished between
two versions - Pure Heteroskedasticity
- Impure Heteroskedasticity
- Pure heteroskedasticity refers to
heteroskedasticity that occurs in a correctly
specified model - This term can be used to distinguish it from the
case of impure heteroskedasticity caused by a
specification error, such as an omitted variable
bias - Typically, when we refer to heteroskedasticity,
we imply the pure version
5The Heteroskedasticity Problem
- If heteroskedasticity is present in a correctly
specified model, then the variance of the error
term is not constant - for i 1, 2, , n
- Implication Rather than being constant across
observations, the variance of the error term
changes depending on the observation
6The Heteroskedasticity Problem
- Heteroskedasticity is common in cross-section
data where there is a wide disparity between
large and small observed values of the variables - The larger is this disparity, the higher is the
chance that the error term will not have a
constant variance - In the most frequent model of heteroskedasticity,
the variance of the error term (?i) depends on an
exogenous variable (Zi)
7The Heteroskedasticity Problem
- We write
- The variable Z, called the proportionality
factor, may or may not be one of the explanatory
variables in the regression model - The higher is the value of Z, the higher is the
variance of the error term for observation i
8The Heteroskedasticity Problem
- Example Trying to explain firm sales with a
cross-section sample of firms, the variable Zi
may be the asset size of firm i - This would imply that the larger the asset size
of firm i the more volatile will be that firms
sales - This example shows that heteroskedasticity is
likely to occur in a cross-section sample because
of the larger variability in the observations of
the dependent variable - Heteroskedasticity may also occur in time series
data (more rarely) or when the quality of data
changes in the sample
9The Heteroskedasticity Problem
- If there is specification error in our model,
such as omitted variable bias, the errors may
exhibit impure heteroskedasticity - Example In the two-factor model of stock returns
estimated without the variable (INF), the errors
may exhibit heteroskedasticity - The error term now includes the variable (INF)
and the error term may be larger for larger
values of the variable (INF)
10The Impact of Heteroskedasticity on the OLS
Estimates
- Heteroskedasticity has the following implications
for the OLS estimates - OLS estimates are still unbiased
- OLS estimates do not have the minimum variance
anymore (not efficient) - Heteroskedasticity causes OLS to underestimate
the variances and standard errors of the
estimated coefficients - This implies that the t-test and F-test are not
reliable - The t-statistics tend to be higher leading us to
reject a null hypothesis that should not be
rejected
11Testing for the Presence of Heteroskedasticity
- There are several tests that can be used to
detect the presence of heteroskedasticity in the
data - We will cover two such tests
- The Breusch-Pagan Test
- The White Test
- These test the following null hypothesis against
the alternative that it is not true
12Testing for the Presence of Heteroskedasticity
- The steps of the Breusch-Pagan test are as
follows - Estimate the original regression model by OLS and
obtain the squared OLS residuals (one for each
observation) - Run a new linear regression of the squared OLS
residuals on all the explanatory variables - Obtain the R-sq of this regression
- Calculate the following F-statistic using the
above R-sq
13Testing for the Presence of Heteroskedasticity
- The above F-statistic follows an F distribution
with k degrees of freedom in the numerator and (n
k 1) degrees of freedom in the denominator - Reject the null hypothesis that there exists no
heteroskedasticity if the F-statistic is greater
than the critical F-value at the selected level
of significance - If the null cannot be rejected, then there exists
heteroskedasticity in the data and an alternative
estimation method to OLS must be followed
14Testing for the Presence of Heteroskedasticity
- The steps of the White test are as follows
- Estimate the original regression model by OLS and
obtain the squared OLS residuals (one for each
observation) - Run a new regression of the squared OLS residuals
on each explanatory variable X, the square of
each explanatory variable X, and the product of
each variable X times every other variable X - Example If our original model has two
explanatory variables, then we would run the
following regression at the second stage -
15Testing for the Presence of Heteroskedasticity
- Obtain the R-sq of this regression
- Calculate the test statistic nR2 where n is the
sample size and R-sq is that obtained in the
previous step - This statistic follows a chi-square distribution
with K degrees of freedom (K number of
variables included in the second stage
regression) - In our example above, there are five explanatory
variables, so the test statistic nR2 will follow
a chi-square distribution with five degrees of
freedom - Reject the null hypothesis that there exists no
heteroskedasticity if the test-statistic is
greater than the critical ?2-value at the
selected level of significance
16Correcting the Problem of Heteroskedasticity
- If heteroskedasticity is detected, an alternative
estimation approach to OLS must be used that
corrects this problem - Two commonly-used approaches are
- The method of Weighted Least Squares (WLS)
- Obtaining heteroskedasticity-corrected standard
errors - The WLS method transforms the original regression
model in order to make the errors have the same
variance - After this transformation takes place, the model
can be estimated by OLS since the
heteroskedasticity problem has been corrected
17Correcting the Problem of Heteroskedasticity
- Example Suppose we want to estimate the
following model - and that the variance of the error term takes
the following form - where ?2 is the assumed constant variance and Zi
is the proportionality factor
18Correcting the Problem of Heteroskedasticity
- If we divide the model by the proportionality
factor Zi we obtain the following model - The error term of the transformed model (ui) has
now a constant variance and, thus, the model can
be estimated by OLS - Note If the proportionality factor (or weight
variable) in the above regression is NOT any of
the explanatory variables, then we must include a
constant in the above model, otherwise a constant
is already included
19Correcting the Problem of Heteroskedasticity
- The heteroskedasticity-corrected standard errors
is the most popular method to correct for
heteroskedasticity - This approach improves the estimation of the
models standard errors (SE) without having to
transform the estimated model - Given that these SE are more accurate, they can
be used for t-tests and other hypotheses tests - Typically, the corrected SE will be larger
leading to lower t-statistics - This approach works better in large samples and
some software packages do include it (such as
SPSS ?)