Heteroskedasticity - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Heteroskedasticity

Description:

Graph the errors (or error squared) against the independent variable(s) ... The errors show a systematic relationship with the independent variables. Lecture 17 ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 26
Provided by: irene8
Learn more at: https://eml.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: Heteroskedasticity


1
Heteroskedasticity
  • Lecture 17

2
Todays plan
  • How to test for it graphs, Park and Glejser
    tests
  • What we can do if we find heteroskedasticity
  • How to estimate in the presence of
    heteroskedasticity

3
Palm Beach County revisited
  • How far is Palm Beach an outlier?
  • Can the outlier be explained by
    heteroskedasticity?
  • If so, what are the consequences?
  • Heteroskedasticity will affect the variance of
    the regression line
  • It will consequently affect the variance of the
    estimated coefficients and estimated 95 percent
    confidence interval for the prediction (see
    Lecture 10).
  • L17.xls provides an example of how to work
    through a problem like this using Excel

4
Palm Beach County revisited (2)
  • Palm Beach is a good example to use since there
    are scale effects in the data
  • The voting pattern shows that the voting behavior
    and number of registered voters are related to
    the population in each county
  • As the county gets larger, voting patterns may
    diverge from what would be assumed given the
    number of registered voters
  • Note from the graph as we move away from the
    origin, the difference between registered Reform
    voters and Reform votes cast increases
  • Well hypothesize that this will have an affect
    on heteroskedasticity

5
Notation
  • Heteroskedasticity is observed as cross-section
    variability in the data
  • data across units at point in time
  • In our notation, heteroskedasticity is
  • E(ei2) ? ?2
  • We can also write
  • E(ei2) ?i2
  • This means that we expect variable variance the
    variance changes with each unit of observation

6
Consequences
  • When heteroskedasticity is present
  • 1) OLS estimator is still linear
  • 2) OLS estimator is still unbiased
  • 3) OLS estimator is not efficient - the minimum
  • variance property no longer holds
  • 4) Estimates of the variances are biased
  • 5)
  • is not an unbiased estimator of sYX2
  • 6) We cant trust the confidence intervals or
  • hypothesis tests (t-tests F-tests) we
    may draw the
  • wrong conclusions

7
Consequences (2)
  • When BLUE holds and there is homoskedasticity,
    the first-order condition gives
  • With heteroskedasticity, we have
  • If we substitute the equation for ci to both
    equations, we find

where
8
Cases
  • With homoskedasticity around each point, the
    variance around the regression line is constant
  • With heteroskedasticity around each point, the
    variance around the regression line varies with
    each value of the independent variable (with each
    i)

9
Detecting heteroskedasticity
  • There are three ways of detecting
    heteroskedastiticy
  • 1) Graphically
  • 2) Park Test
  • 3) Glejser Test

10
Graphical detection
  • Graph the errors (or error squared) against the
    independent variable(s). Note you can use either
    e or e2 on the y-axis.
  • With homoskedasticity we have E(ei, X) 0
  • The errors are independent of the independent
    variables
  • With heteroskedasticity we can get a variety of
    patterns
  • The errors show a systematic relationship with
    the independent variables

11
Graphical detection (2)
  • Using the Palm Beach example (L17.xls), the
    estimated regression equation was
  • The errors of this equation, can be graphed
    against the number of registered Reform party
    voters, (the independent variable)
  • Graph shows that the errors increasing with the
    number of registered reform voters
  • While the graphs may be convincing, we also want
    to use a test to confirm this. We have two

12
Park Test
  • Procedure
  • 1) Run regression Yi a bXi ei despite the
    heteroskedasticity problem (it can also be
    multivariate)
  • 2) Obtain residuals (ei), square them (ei2), and
    take their logs (ln ei2)
  • 3) Run a spurious regression
  • 4) Do a hypothesis test on with H0 g1 0
  • 5) Look at the results of the hypothesis test
  • reject the null you have heteroskedasticity
  • fail to reject the null homoskedasticity, or
    which is a constant

13
Glejser Test
  • When we use the Glejser, were looking for a
    scaling effect
  • The procedure
  • 1) Run the regression (it can also be
    multivariate)
  • 2) Collect ei terms
  • 3) Take the absolute value of the errors
  • 4) Regress ei against independent variable(s)
  • you can run different kinds of regressions

14
Glejser Test (2)
  • 4) continued
  • If heteroskedasticity takes one of these forms,
    this will suggest an appropriate transformation
    of the model
  • The null hypothesis is still H0 g1 0 since
    were testing for a relationship between the
    errors and the independent variables
  • We reach the same conclusions as in the Park Test

15
A cautionary note
  • The errors in the Park Test (vi) and the Glejser
    Test (ui) might also be heteroskedastic.
  • If this is the case, we cannot trust the
    hypothesis test H0 g1 0 or the t-test
  • If we find heteroskedastic disturbances in the
    data, what can we do?
  • Estimate the model Yi a bXi ei using
    weighted least squares
  • Well look at two examples of weighted least
    squares one where we know the true variance, and
    one where we dont

16
Correction with known ?i2
  • Given that the true variance is known and our
    model is
  • Yi a bXi ei
  • Consider the following transformation of the
    model
  • In the transformed model, let
  • So the expected value of the error squared is

17
Correction with known ?i2 (2)
  • Given that there is heteroskedasticity, E(ei2)
    ?i2
  • thus
  • In this simplistic example, we re-weighted model
    by the constant ?i
  • What this example shows when the variance is
    known, we must transform our model to obtain a
    homoskedastic error term.

18
Correction with unknown ?i2
  • Given an unknown variance, we need to state the
    ad-hoc but plausible assumptions with our
    variance ?i2 (how the errors vary with the
    independent variable)
  • For example we can assert that E(ei2) ?2Xi
  • Remember Glejser Test allows us to choose a
    relationship between the errors and the
    independent variable

19
Correction with unknown ?i2 (2)
  • In this example you would transform the
    estimating equation by dividing through by
    to get
  • Letting
  • The expected value of this error squared is

20
Correction with unknown ?i2 (3)
  • Recalling an earlier assumption, we find
  • When we dont know the true variance we re-scale
    the estimating equation by the independent
    variable

21
Returning to Palm Beach
  • On L17.xls we have presidential election data by
    county in Florida
  • To get a correct estimating equation, we can run
    a regression without Palm Beach if we think its
    an outlier.
  • Then we can see if we can obtain a prediction for
    the number of reform votes cast in Palm Beach
  • We can perform a Glejser Test for the regression
    excluding Palm Beach
  • We run a regression of the absolute value of the
    errors (ei)against registered Reform voters (Xi)

22
Returning to Palm Beach (2)
  • The t-test rejects the null
  • this indicates the presence of heteroskedasticity
  • We can re-scale the model in different ways or
    introduce a new independent variable (such as the
    total number of registered voters by county)
  • Keep transforming the model and running the
    Glejser Test
  • When we fail to reject the null there is no
    longer heteroskedasticity in the model

23
Robust estimation
  • Heteroskedastic tests not used any more. Most
    software reports robust standard errors. Note
    that this is also the approach of the text book.
  • Have looked at tests for heteroskedasticity to
    get you used to weighted least squares.
    Important for the topics to come.
  • Robust standard errors report approximations to
    the estimation of the variance for the
    coefficient when there is a non-constant
    variance. It only holds for large samples.
  • Know that for a homoskedastic error term
    Var(uiXi) s2
  • Var(b) s2/Sxi2

24
Robust estimation (2)
  • Using analogous arguments, we can state that for
    the heteroskedastic case Var(uiXi) si2
  • Var(b) si2 Sxi2 /(Sxi2)2
  • This can be approximated (in the bi-variate model
    case) by
  • Var(b) Sxi2ui2 /(Sxi2)2
  • See L17_robust.xls and hetero.pdf to compare the
    results from calculating the robust standard
    error on the spreadsheet using EXCEL and the
    results from STATA for robust estimation.

25
Summary
  • Even with re-weighted equations, we might still
    have heteroskedastic errors
  • so we have to rerun the Glejser Test until we
    cannot reject the null
  • If we cannot reject the null, we may have to
    rethink our model transformation
  • if we suspect a scale effect, we may want to
    introduce new scaling variables
  • Variables from the re-scaled equation are
    comparable with the coefficients from the
    original model
Write a Comment
User Comments (0)
About PowerShow.com