Heteroskedasticity - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Heteroskedasticity

Description:

It can be shown that the GQ statistic has a F-distribution with (tl-k) d.o.f. in ... If GQ Fc we reject Ho. We find that the error is heteroskedastic. 11.14 ... – PowerPoint PPT presentation

Number of Views:918
Avg rating:3.0/5.0
Slides: 32
Provided by: unkn612
Category:

less

Transcript and Presenter's Notes

Title: Heteroskedasticity


1
Heteroskedasticity
  • Outline
  • 1) What is it?
  • 2) What are the consequences for our Least
    Squares estimator when we have heteroskedasticity
  • 3) How do we test for heteroskedasticity?
  • 4) How do we correct a model that has
    heteroskedasticity

2
What is Heteroskedasticity
Review the assumption of Gauss-Markov
  • Linear Regression Model
    y ?1 ?2x e
  • Error Term has a mean of zero E(e) 0 ? E(y)
    ?1 ?2x
  • Error term has constant variance Var(e) E(e2)
    a CONSTANT
  • In other words we assume all the observations
    are equally reliable
  • Error term is not correlated with itself (no
    serial correlation) Cov(ei,ej) E(eiej) 0
    i?j
  • Data on X are not random and thus are
    uncorrelated with the error term Cov(X,e)
    E(Xe) 0

This is the assumption of a homoskedastic error
?2
A homoskedastic error is one that has constant
variance. A heteroskedastic error is one that has
a nonconstant variance.
Heteroskedasticity is more commonly a problem for
cross-section data sets, although a time-series
model can also have a non-constant variance.
3
This diagram shows a constant variance (OLD
assumption) for the error term The line shows Ey
(food) for any given X. Notice that a family
making 500 Is expected to deviate ( )
from its Ey the same as a family making 1500.
Y (food)
f(yx)
.
?2 SAME value
.
Ey x
.
x ( income)
x500
X1000
X1500
4
This diagram shows a non-constant variance for
the error term that appears to increase as X
increases. The variation in the Gates food
budgetfrom their average is greater than for the
Correas.
f(yx)
y
.
.
?2 increases
E(yx)
.
x (income)
XCorreas 500
XTRUMP 1000
XGATES 1500
5
What are the causes?
Direct
Indirect
  • Scale effects
  • Structual shifts
  • Learning effects
  • Omitted variables
  • Outliers
  • Parameter variation

Again, heteroskedasticity is more commonly a
problem for cross-section data sets, although a
time-series model can also have a non-constant
variance.
6
What are the Implications for Least Squares?
  • We have to ask where did we used the
    assumption? Or why was the assumption needed in
    the first place?
  • We used the assumption in the derivation of the
    variance formulas for the least squares
    estimators, b1 and b2.
  • For b2 the formula for Var(b2) was

BUT this last step uses the assumption that ?t2
is a constant ?2.
7
If this is not the case, then the formula is
Remember
Therefore, if we ignore the problem of a
heteroskedastic error and estimate the variance
of b2 using the formula on the previous slide,
when in fact we should have used the formula
directly on this slide, then our estimates of
the variance of b2 are wrong. Any hypothesis
tests or confidence intervals based on them will
be invalid. Note, however, that the proof of
Unbiasedness E(b2) ??2 did not use the
assumption of a homoskedastic error. Therefore,
a heteroskedastic error will not bias the
coefficient estimates, but it will bias the
estimates of their variances.
8
In other words, if there is heteroskedasticity
  • OLS estimators are still LINEAR and UNBIASED
  • OLS estimators are NOT EFFICIENT
  • Usual formulas give INCORRECT STANDARD ERRORS for
    OLS
  • Any hypothesis tests or confidence intervals
    based on the usual formulas for the standard
    errors are WRONG

9
How do We Test for a Heteroskedastic Error
  • 1) Visual Inspection of the residuals
  • Because we never observe actual values for the
    error term, we never know for sure whether it is
    heteroskecastic or not. However, we can run a
    least squares regression and examine the
    residuals to see if they show a pattern
    consistent with a non- constant variance.

10
This regression resulted in the following
residuals plotted against the variable X (weekly
income). It appears as the the variation in the
residuals increases with higher values of X,
suggesting a heteroskedastic error.
11
  • Formal Tests for Heteroskedasticity There are
    many different tests that can be used for
    heteroskedasticity. We will look at 4 of them
  • 2) Goldfeld Quandt Test
  • a) Suppose we think that the error might be
    heteroskedastic. We examine the residuals and
    notice that the variance in the residuals appears
    to be larger for larger values of a (continuous)
    dependent variable xj
  • Note that it is necessary to make some
    assumption about the form of the
    heteroskedasticity, that is, an assumption about
    how the variance of et changes. For the food
    expenditure problem, the residuals tell us that
    an increasing function of xt (weekly income) is a
    good candidate. Other models may have a variance
    that is a decreasing function of xt or is a
    function of some variable other than xt.

12
  • The idea behind the Goldfeld Quandt Test
  • We want to test if there is heterok. of the kind
    that is proportional to xj
  • Sort the data in descending order by the variable
    xj that you think causes the heterosk., and then
    split the data in half. Omit a few of the middle
    observations.
  • Run the regression on each half of the data.
  • Conduct a formal hypothesis test to decide
    whether or not there is a heteroskedastic error
    based on an examination of the SSE from each
    half.
  • If the error is heteroskedastic with a larger
    variance for the larger values of xt , then we
    should find

And where SSElarge comes from the the regression
using the subset of large values of xt., which
has tlarge observations SSEsmall comes from the
regression using the subsetof small values of
xt, which has tsmall observations
Where
13
  • Conducting the Test

The error is Homoskedastic so that
The error is Heteroskedastic
It can be shown that the GQ statistic has a
F-distribution with (tl-k) d.o.f. in the
numerator and (ts-k) d.o.f. in the
denominator. If GQ gt Fc ? we reject Ho. We find
that the error is heteroskedastic.
14
Food Expenditure Example
This code sorts the data according to X because
we believe that the error variance is increasing
in xt.
proc sort datafood
by descending x
data food_large
set food
if _n_ lt 20
proc reg
bigvalues model y
x data food_small
set food
if _n_
gt 21 proc reg

littlevalues model y x run
This code estimates the model for the first 20
observations, which are the observations with
large values of xt.
This code estimates the model for the second 20
observations, which are the observations will
small values of xt.
15
The REG Procedure
Model bigvalues
Dependent Variable y
Analysis of Variance
Sum of
Mean Source DF Squares
Square F Value Pr gt F   Model
1 4756.81422 4756.81422
2.08 0.1663 Error 18
41147 2285.93938 Corrected Total
19 45904   Root MSE
47.81150 R-Square 0.1036
Dependent Mean 148.32250 Adj R-Sq
0.0538 Coeff Var
32.23483 Parameter
Estimates Parameter
Standard Variable DF Estimate
Error t Value Pr gt t Intercept
1 48.17674 70.24191 0.69
0.5015 x 1 0.11767
0.08157 1.44 0.1663
The REG Procedure
Model littlevalues
Dependent Variable y  
Analysis of Variance
Sum of
Mean Source DF Squares
Square F Value Pr gt F Model
1 8370.95124 8370.95124
12.27 0.0025 Error 18
12284 682.45537 Corrected Total
19 20655   Root MSE
26.12385 R-Square 0.4053
Dependent Mean 112.30350 Adj R-Sq
0.3722 Coeff Var
23.26183   Parameter
Estimates Parameter
Standard Variable DF Estimate
Error t Value Pr gt t Intercept
1 12.93884 28.96658 0.45
0.6604 x 1 0.18234
0.05206 3.50 0.0025
Fc 2.22 (see SAS) ? Reject Ho
16
  • 3) Parks Test
  • this test is described in Gujarati 1995, p.369.
    This test proposes that the error variance is a
    log-log function of one (or more) explanatory
    variable(s), say X. In in the form
  • Note that the relationship is NOT LINEAR, like
    before. Look at Figure6.3(b) and Figure6.3(c) in
    the book.
  • Follow OLS estimation, and use the OLS estimated
    residuals êt in the auxiliary regression
    ln(êt2) b0 b1 ln(Xt) ut
  • The test statistic is the t-ratio on the
    parameter estimate for b1. If the t-ratio shows
    that the estimated parameter b1 is significantly
    different from zero then there is evidence for
    heteroskedasticity. Since this is an approximate
    test it is appropriate to consider that the test
    statistic has an asymptotic normal distribution
    so that at a 5 significance level the critical
    value is 1.96.
  • The Park test and the Golfeldt-Quandt tests
    require precise before hand knowledge of the
    cause of the HET the xj variable(s) causing the
    HET AND the functional form of this HET.  If you
    have such knowledge, then the Park test and the
    Goldfeldt-Quandt tests are more powerful (less
    Type II error, less often do you accept a FALSE
    null compared to accepting a true alternate - an
    alternate that you specified) than the subsequent
    tests. The Park test is only asymptotically true.

17
  • 4) Breusch-Pagan Test Is there some variation in
    the squared residuals which can be explained by
    variation in some independent variables?
  • Estimate the OLS regression and obtain the
    residuals
  • Use the squared residuals as the dependent
    variables in a secondary equation that includes
    the independent variables suspected of being
    related to error term.
  •         êt2 b0 b1 lnXt b1 lnXt
    .ut
  • Test the joint hypothesis that coefficients of
    ALL the Xs in the second regression are zero. (An
    F test of significance. Use TEST in SAS. See Ch
    8.1 and 8.2 )
  • Can also test nR2?2df where R2 is the R-sqred
    from the auxiliary regression and dfnumber of
    regressors (Xs) in auxiliary regression.
  • The Park and Goldfeld-Quandt tests require
    knowledge of the form of the HET - the particular
    functional form.  If you have such knowledge,
    then the previous tests are more powerful (less
    Type II error, less often do you accept a FALSE
    null compared to accepting a true alternate - an
    alternate that you specified).  The Breusch-Pagan
    test does not require knowledge of the functional
    form of the HET, but it still assumes that we
    know which variables cause it. It is also
    sensitive to deviations in normality.

18
(No Transcript)
19
  • 5) Whites Test variation of Breush-Pagan, but
    using ALL THE Xs
  • Estimate the OLS regression and obtain the
    residuals
  • Use the squared residuals as the dependent
    variable in a secondary equation that includes
    EVERY ONE of the explanatory variables, their
    squares, and all their pair cross products (i.e.
    x1x2, x1x3 and x2x3 but not x1x2x3)
  •         êt2 b0 b1Xt b2Zt b3Xt2
  • b4Zt2 b5XtZt.ut
  • Test the joint hypothesis that coefficients of
    ALL the Xs in the second regression are zero. (An
    F test of significance. Use TEST in SAS. See Ch
    8.1 and 8.2 )
  • Can also test nR2?2df where R2 is the R-sqred
    from the auxiliary regression and dfnumber of
    regressors (Xs) in auxiliary regression.
  • This test does not assume knowledge of which
    variables cause the HET. If you have such
    knowledge, all the previous test are more
    powerful (less Type II error). The White test is
    only asymptotically true (needs lots of data),
    and is more commonly used now. SAS can do it
    automatically

PROC REG DATA whatever MODEL whatever
whatever /SPEC RUN QUIT
20
I am running PROC REG with the ACOV and SPEC
options to obtain gtheteroscadascity consistent
(White-corrected") test statistics. I need to
gtcollect these test statistics into a SAS
dataset. The variance covariance matrix output
that is output into the parameters dataset using
the OUTEST and COVOUT options does not seem to
be White corrected. Any suggestions gt as gt to how
I might pull out the test statistics?
Thanks. Proc reg datafile1 model y x / acov
spec ods output ParameterEstimatesthe_parms
AcovEst the_acov SpecTest the_spec
which will yield files named (the_parms,
the_acov, and the_spec) with the tables from
those sections of the output.
21
How Do We Correct for a Heteroskedastic Error?
  • Just redefine the variables (for example use
    income per capita instead of income). This works
    some times
  • Robust OLS estimation using the White Standard
    Errors earlier we saw that in the presence of
    heteroskedasticity, the correct formula for the
    variance of b2 is
  • So we just run OLS, and calculate the variance
    of the betas separately, with the formula above.
    In this formula, we use the squared residual for
    each observation as the estimate of its variance,
    which are called Whites Estimatorsof the error
    variance.
  • Remember, OLS parameter estimates are still
    UNBIASED Eb true ß.
  • We will not do this by hand though.. Fortunately,
    White asymptotic covariance estimation can be
    performed with the ACOV option in SAS PROC REG.
    (Also explore PROC ROBUSTREG)
  • PROC REG DATA thedata MODEL depvar
    indep vars / ACOV RUN QUIT
  • HOWEVER The variances are reported separtely in
    the White var-cov section of SAS output. BEWARE
    that the t stats that SAS reports in the regular
    regression output are WRONG. So we have to
    calculate them manual dividing
    estimate/sqrt_of_variance.

22
How Do We Correct for a Heteroskedastic Error?
  • It is a pain to have to calculate the t
    statistics manually from the regression output.
    There is a way to make SAS do this for us too
  • PROC REG DATA thedata MODEL depvar
    firstX secondX thridX / ACOV TEST firstX
    0
  • TEST secondX 0
  • TEST thridX 0 RUN QUIT
  • This will provide us with the correct
    t-statistics and p-values for each of the
    regressors, so we do not have to calculate them
    manually.
  • It is important to say that this only works for
    large samples (LOTS OF DATA!!!).

23
  • 3) Generalized Least Squares (GLS)
  • Idea Transform the model with a heteroskedastic
    error into a model with a homoskedastic error.
    Then apply the method of least squares. This
    requires us to assume a specification for the
    error variance. As earlier, we will assume that
    the variance increases with xt.

Where
Transform the model by dividing every piece of it
by the standard deviation of the error.
24
This new model has an error term that is the
original error term divided by the square root of
xt. Its variance is constant.
This method is called Weighted Least
Squares. It is more efficient than simply
applying Least Squares to the model. Least
Squares gives equal weight to all observations.
Weighted Least Squares gives each observation a
weight that is inversely related to its value of
the square root of xt. Therefore, large values
of xt which we have assumed have a large variance
will get less weight than smaller values of xt
when estimating the intercept and slope of the
regression line
25
We need to estimate this model
This requires us to construct 3 new variables .
. and to estimate the model
Notice that it does NOT have an INTERCEPTso use
the /NOINT option in SAS
It is possible to do this in SAS automatically
using PROC REG DATA thedata MODEL depvar
indep vars / noint WEIGHT variabletoweightby
RUN QUIT / or PROC MODEL or PROC GLM /
26
SAS code to do test for Heterosk and perform
Weighted Least Squares NOTE look at 11.24 for
another way to do it
data whatever set whatever ystar y/sqrt(x)
x1star 1/sqrt(x)
x2star x/sqrt(x)
output run proc
reg data whatever foodglsmodel ystarx1star
x2star/noint / noint to run the model without
an intercept / run

27
(No Transcript)
28
SAS code to test for Heterok. another way
/ We have the variables dep_var, inc and height
from our dataset whatever / / The following
code just runs the tests / proc model
datawhatever parms a1 b1 b2 / declares
parameters of a model. Each parameter has a
single value associated with it which is the
same for all observations / dep_var a1
b1 inc b2 height fit / fit estimates
the model / / white pagan(inc height) / we do
Whites test, and Pagans test on the
varsinc and heigh / outresid1 outresid /
output residual to outside file / run /
white and pagan may also work with PROC REG.
We may not have to necessarily use proc model
/
29
SAS code to perform weighted least squares
another way
proc model datawhatever parms a1 b1
b2 inc2_inv 1/inc2 / we create the weights.
In this case they are 1/var_squared because
we are assuming heterok. Is of the
form sigma2_t constant_sigma2 X_jt2
/ exp a1 b1 inc b2 inc2 fit exp
/ fit exp tell SAS to estimate just
the dependent variable exp. We could ommit
the exp and just write fit because there
is only one equation being fitted in this
model / / NOTE the model above DOES have an
intercept. This is because we assumed the form of
the heterok. to be such that the sigma depends on
the SQUARE of the X . See also 11.26 / /
white pagan(1 inc inc2) weight inc2_inv /
tells SAS to divide all the indep vars in
the model by the variable inc2_inv /
run / NOTE II WEIGHT vartoweightby
works also with proc reg and proc autoreg
commands, right before the run statement /
30
  • If instead, the proportional heteroskedasticity
    is suspected to be
  • of the form
  • Then the model would be
  • We could proceed by forming the variable
  • and then proceeding EXACTLY as we did before,
    with the model

31
  • Testing for Heteroscedasticity White test in SAS
  • The regression model is specified as , where the
    's are identically and independently
    distributed and .If the 's are not
    independent or their variances are not constant,
    the parameter estimates are unbiased, but the
    estimate of the covariance matrix is
    inconsistent. In the case of heteroscedasticity,
    the ACOV option provides a consistent estimate of
    the covariance matrix. If the regression data are
    from a simple random sample, the ACOV option
    produces the covariance matrix. This matrix is
  • (X'X)-1 (X' diag(ei2)X) (X'X)-1
  • where
  • ei yi - xi b
  • The SPEC option performs a model specification
    test. The null hypothesis for this test maintains
    that the errors are homoscedastic, independent of
    the regressors and that several technical
    assumptions about the model specification are
    valid. For details, see theorem 2 and assumptions
    1 -7 of White (1980). When the model is correctly
    specified and the errors are independent of the
    regressors, the rejection of this null hypothesis
    is evidence of heteroscedasticity. In
    implementing this test, an estimator of the
    average covariance matrix (White 1980, p. 822) is
    constructed and inverted. The nonsingularity of
    this matrix is one of the assumptions in the null
    hypothesis about the model specification. When
    PROC REG determines this matrix to be numerically
    singular, a generalized inverse is used and a
    note to this effect is written to the log. In
    such cases, care should be taken in interpreting
    the results of this test.
  • When you specify the SPEC option, tests listed in
    the TEST statement are performed with both the
    usual covariance matrix and the
    heteroscedasticity consistent covariance matrix.
    Tests performed with the consistent covariance
    matrix are asymptotic. For more information,
    refer to White (1980).
  • Both the ACOV and SPEC options can be specified
    in a MODEL or PRINT statement.
Write a Comment
User Comments (0)
About PowerShow.com