Heteroske...what? - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

Heteroske...what?

Description:

If the sampling distribution centers on the true population mean, our estimates ... Plan A: If heteroskedasticity comes from one specific variable, we can use that ... – PowerPoint PPT presentation

Number of Views:120

Avg rating:3.0/5.0

Slides: 45

Provided by: DAN3179

Category:

more less

Transcript and Presenter's Notes

Title: Heteroske...what?

1
Heteroske...what?
2
O.L.S. is B.L.U.E.

BLUE means Best Linear Unbiased Estimator.
What does that mean?
We need to define
Unbiased The mean of the sampling distribution
is the true population parameter.
What is a sampling distribution?
Imagine taking a sample, finding b, take another
sample, find b again, and repeat over and over.
Describes the possible values b can take on in
repeated sampling

3
We hope that
ß
If the sampling distribution centers on the true
population mean, our estimates will, on average
be right. We get this with the 10 assumptions
4
If some assumptions dont hold
ß
Average

We can get a biased estimate. That is,

5
Bias is Bad

If your parameter estimates are unbiased, your
answers (coefficients) relating x and y are
wrong. They do not describe the true
relationship.

6
Efficiency / Inefficiency

What makes one unbiased estimator better than
another?

7
Efficiency

Sampling Distributions with less variance
(smaller standard errors) are more efficient
OLS is the Best linear unbiased estimator
because its sampling distribution has less
variance than other estimators.

OLS Regression
LAV Regression
8
Under the 10 regression assumptions and assuming
normally distributed errors

We will get estimates using OLS
Those estimates will be unbiased
Those estimates will be efficient (the best)
They will be the Best Unbiased Estimator out of
all possible estimators

9
If we violate

Perfect Collinearity or n gt k
We cannot get any estimatesnothing we can do to
fix it
Normal Error Term assumption
OLS is BLUE, but not BUE.
Heteroskedasticity or Serial Correlation
OLS is still unbiased, but not efficient
Everything else (omitted variables, endogeneity,
linearity)
OLS is biased

10
What do Bias and Efficiency Mean?
ß
ß

Biased, but very efficient
Unbiased, but inefficient
ß
ß

Biased and inefficient
Unbiased and efficient
11
Today Heteroskedasticity

Consequence OLS is still Unbiased, but it is not
efficient (and std. errors are wrong)
Today we will learn
How to diagnose Heteroskedasticity
How to remedy Heteroskedasticity
New Estimator for coefficients and std. errs.
Keep OLS estimator but fix std. errs.

12
What is heteroskedasticity?

Heteroskedasticity occurs when the size of the
errors varies across observations. This arises
generally in two ways.
When increases in an independent variable are
associated with changes in error in prediction.

13
What is Heteroskedasticity?

Heteroskedasticity occurs when the size of the
errors varies across observations. This arises
generally in two ways.
When you have subgroups or clusters in your
data.
We might try to predict presidential popularity.
We measure average popularity in each year. Of
course, there are clusters of years where the
same president is in office. Because each
president is unique, the errors in predicting
Bushs popularity are likely to be a bit
different from the errors predicting Clintons.

14
How do we recognize this beast?

Three Methods
Think about your datalook for analogs of the two
ways heteroskedasticity can strike.
Graphical Analysis
Formal statistical test

15
Graphical Analysis

Plot residuals against and independent
variables.
Expect to see residuals randomly clustered around
zero
However, you might see a pattern. This is bad.
Examples

16
As x increases, so does the
As x increases, the
error variance
error variance decreases
scatter resid x
scatter resid x
rvfplot (or scatter resid yhat)
As the predicted value of y increases, So
does the error variance
17
Good Examples
scatter y x
scatter resid x
rvfplot (scatter resid yhat)
18
Formal Statistical Tests

Whites Test
Heteroskedasticity occurs when the size of the
errors is correlated with one or more independent
variables.
We can run OLS, get the residuals, and then see
if they are correlated with the independent
variables

19
More Formally,
state district turnout diplomau mdnincm pred_turnout residual
AL 1 151,188 14.7 27,360 200,757.4 -49,569.4
AL 2 216,788 16.7 29,492 205,330 11,457.96
AL 3 147,317 12.3 26,800 197,491.7 -50,174.7
AL 4 226,409 8.1 25,401 191,310.8 35,098.16
AL 5 186,059 20.4 33,189 213,514.6 -27,455.6
20
So, if error increases with x, we violate
heteroskedasticity

If we can predict error with a regression line,
we have heteroskasticity.
To make this prediction, we need to make
everything positive (square it)

21
So, if error increases with x, we violate
homoskedasticity

Finally, we use these squared residuals as the
dependent variable in a new regression.
If we can predict increases/decreases in the size
of the residual, we have found evidence of
heteroskedasticity

For Ind. Vars., we use the same ones as in the
original regression plus their squares and their
cross-products.

22
The Result

Take the r2 from this regression and multiply it
by n.
This test statistic is distributed ?2 with
degrees of freedom equal to the number of
independent variables in the 2nd regression
In other words, r2n is the ?2 you calculate from
your data, compare it to a critical ?2 from a ?2
table. If your ?2 is greater than ?2 then you
reject the null hypothesis (of homoskedasticity)

23
A Sigh of Relief

Stata will calculate this for you
After running the regression, type
imtest, white

. imtest, white White's test for Ho
homoskedasticity against Ha
unrestricted heteroskedasticity chi2(5)
9.97 Prob gt chi2
0.0762 Cameron Trivedi's decomposition of
IM-test -----------------------------------------
---------- Source chi2
df p ---------------------------------------
----------- Heteroskedasticity 9.97
5 0.0762 Skewness 3.96
2 0.1378 Kurtosis -28247.96
1 1.0000 ----------------------------------
---------------- Total
-28234.03 8 1.0000 -----------------------
----------------------------
24
An Alternative Test Breusch/Pagan

Based on similar logic
Three changes
Instead of using e2 as the D.V. in the 2nd
regression, use where
Instead of using every variable (plus squares and
cross-products), you specify the variables you
think are causing the heteroskedasticity
Alternatively, use only as a catch all

25
An Alternative Test Breusch/Pagan

3. Test Statistic is RegSS from 2nd regression
divided by 2. It is distributed ?2 with degrees
of freedom equal to the number of independent
variables in the 2nd regression.

26
Stata Command hettest
. hettest Breusch-Pagan / Cook-Weisberg test for
heteroskedasticity Ho Constant
variance Variables fitted values of
turnout chi2(1) 8.76
Prob gt chi2 0.0031 . hettest
senate Breusch-Pagan / Cook-Weisberg test for
heteroskedasticity Ho Constant
variance Variables senate
chi2(1) 4.59 Prob gt chi2
0.0321 . hettest , rhs Breusch-Pagan /
Cook-Weisberg test for heteroskedasticity
Ho Constant variance Variables
diplomau mdnincm senate guber chi2(4)
11.33 Prob gt chi2 0.0231
27
What are you gonna do about it?

Two Remedies
We might need to try a different estimator. This
will be the Generalized Least Squares
estimator. This GLS Estimator can be applied
to data with heteroskedasticity and serial
correlation.
OLS is still consistent (just inefficient) and
Standard Errors are wrong. We could fix the
standard errors and stick with OLS.

28
Generalized Least Squares

When used to correct heteroskedasticity, we refer
to GLS as Weighted Least Squares or WLS.
Intuition
Some data points
have better quality
information about
the regression line
than others because
they have less error.
We should give those
observations more
weight.

29
Non-Constant Variance

We want constant error variance for all
observations,
E(ei2) s 2 , estimated by RMSE
However, with Heteroskedasticity, error variance
(si2) is not constant
E(ei2) si2, not constant (indexed by i)
If we know what si2 is, we can re-weight the
equation to make the error variance constant

30
Re-weighting the regression
Begin with the formula Add x0i, a variable that
is always 1
Divide through by si to weight it
We can simplify notation and show its really
just a regression with transformed variables.
Last, we just need to show that the
transformation makes the new error term, ei,
constant
31
GLS vs. OLS

In OLS, we minimize the sum of the squared
errors
In GLS, y we minimize a weighted sum of the
squared errors.

let
Set partial derivatives to 0, solve for a and b
to get eqs.
32
GLS vs. OLS

Minimize Errors
Minimize Weighted Errors
GLS (WLS) is just doing OLS with transformed
variables.
In the same way that we transformed a
non-linear data to fit the assumptions of OLS, we
can transform the data with weights to help
heteroskedastic data meet the assumptions of OLS

33
GLS vs. OLS

In Matrix form,
OLS b (xx)-1xy
GLS b (xO-1x)-1x O-1y
Weights are included in a matrix, O-1

34
Problem

We rarely know exactly how to weight our data
Solutions
Plan A If heteroskedasticity comes from one
specific variable, we can use that variable as
the weight
Alternatively, we could run OLS and use the
residuals to estimate the weights (observations
with large OLS residuals get little weight in the
WLS estimates)

35
Plan A A Single, Known, Villain

Example Household income
Households that earn little must spend it all on
necessities. When income is low, there is little
variance in spending.
Households that earn a great deal can either
spend it all or buy just essentials and save the
rest. More error variance as income increases
Note the changes in interpretation

36
Plan B Estimate the weights

Running OLS, get an estimate of the residuals
Regress those residuals (squared) on the set of
independent variables and get predicted values
Use those predicted values as the weights
Because this is GLS that is doable, it is
called Feasible GLS or FGLS
FGLS is asymptotically equal to GLS as sample
size goes to infinity

37
I dont want to do GLS

I dont blame you
Usually best if we know something about the
nature of the heteroskedasticity
OLS was unbiased, why cant we just use that?
Inefficient (but only problematic with very
severe heteroskedasticity)
Incorrect Standard Errors (formula changes)
What if we could just fix standard errors?

38
White Standard Errors

We can use OLS and just fix the Standard Errors.
There are a number of ways to do this, but the
classic is White Standard Errors
Number of names for this
White Std. Errs.
Huber-White Std. Errs.
Robust Std. Errs.
Heteroskedastic Consistent Std. Errs.

39
The big idea

In OLS, Standard Errors come from the
Variance-Covariance Matrix.
Std. Err. is the Std. Dev. Of a Sampling
Distribution
Variance is the square of the Standard Deviation
(Std. Dev. is the square root of variance)
Variance Covariance matrix for OLS is given by
se2(XX)-1

. vce Variances
diplomau mdnincm _cons ----------------------
------------------ diplomau 254467
mdnincm -178.899 .187128 _cons
1.4e06 -3172.43 9.3e07
40
With Heteroskedasticity

Variance Covariance matrix for OLS is given by
se2(XX)-1
Variance Covariance matrix under
heteroskedasticity is given by
(XX)-1 (XO-1X) (XX)-1
Problem We still dont know Sigma
Solution We can estimate (XO-1X) quite well
using OLS residuals by

where xi is the row of X for obs. i
41
In Stata

Specify the robust option after regression

. regress turnout diplomau mdnincm,
robust Regression with robust standard errors
Number of obs 426
F( 2, 423) 33.93
Prob gt F
0.0000
R-squared 0.1291
Root MSE
47766 -------------------------------------------
----------------------
Robust turnout Coef. Std. Err. t
Pgtt 95 Conf. Interval ---------------------
------------------------------------------- diplom
au 1101.359 548.7361 2.01 0.045 22.77008
2179.948 mdnincm 1.111589 .4638605 2.40
0.017 .19983 2.023347 _cons 154154.4
9903.283 15.57 0.000 134688.6
173620.1 -----------------------------------------
------------------------
42
Drawbacks

OLS is still inefficient (though this is not much
of a problem unless heteroskedasticity is really
bad)
Requires larger sample sizes to give good
estimates of Std. Errs. (which means t tests are
only OK asymptotically)
If there is no heteroskedasticity and you use
robust SEs, you do slightly worse than regular
Std. Errs.

43
Moral of the Story

If you know something about the nature of the
heteroskedasticity, WLS is goodBLUE
If you dont, use OLS with robust Std. Errs.
Now, Group heteroskedasticity

44
Group Heteroskedasticity