Title: Heteroske...what?
1Heteroske...what?
2O.L.S. is B.L.U.E.
- BLUE means Best Linear Unbiased Estimator.
- What does that mean?
- We need to define
- Unbiased The mean of the sampling distribution
is the true population parameter. - What is a sampling distribution?
- Imagine taking a sample, finding b, take another
sample, find b again, and repeat over and over.
Describes the possible values b can take on in
repeated sampling
3We hope that
ß
If the sampling distribution centers on the true
population mean, our estimates will, on average
be right. We get this with the 10 assumptions
4If some assumptions dont hold
ß
Average
- We can get a biased estimate. That is,
-
5Bias is Bad
- If your parameter estimates are unbiased, your
answers (coefficients) relating x and y are
wrong. They do not describe the true
relationship.
6Efficiency / Inefficiency
- What makes one unbiased estimator better than
another?
7Efficiency
- Sampling Distributions with less variance
(smaller standard errors) are more efficient - OLS is the Best linear unbiased estimator
because its sampling distribution has less
variance than other estimators.
OLS Regression
LAV Regression
8Under the 10 regression assumptions and assuming
normally distributed errors
- We will get estimates using OLS
- Those estimates will be unbiased
- Those estimates will be efficient (the best)
- They will be the Best Unbiased Estimator out of
all possible estimators
9If we violate
- Perfect Collinearity or n gt k
- We cannot get any estimatesnothing we can do to
fix it - Normal Error Term assumption
- OLS is BLUE, but not BUE.
- Heteroskedasticity or Serial Correlation
- OLS is still unbiased, but not efficient
- Everything else (omitted variables, endogeneity,
linearity) - OLS is biased
10What do Bias and Efficiency Mean?
ß
ß
Biased, but very efficient
Unbiased, but inefficient
ß
ß
Biased and inefficient
Unbiased and efficient
11Today Heteroskedasticity
- Consequence OLS is still Unbiased, but it is not
efficient (and std. errors are wrong) - Today we will learn
- How to diagnose Heteroskedasticity
- How to remedy Heteroskedasticity
- New Estimator for coefficients and std. errs.
- Keep OLS estimator but fix std. errs.
12What is heteroskedasticity?
- Heteroskedasticity occurs when the size of the
errors varies across observations. This arises
generally in two ways. - When increases in an independent variable are
associated with changes in error in prediction.
13What is Heteroskedasticity?
- Heteroskedasticity occurs when the size of the
errors varies across observations. This arises
generally in two ways. - When you have subgroups or clusters in your
data. - We might try to predict presidential popularity.
We measure average popularity in each year. Of
course, there are clusters of years where the
same president is in office. Because each
president is unique, the errors in predicting
Bushs popularity are likely to be a bit
different from the errors predicting Clintons.
14How do we recognize this beast?
- Three Methods
- Think about your datalook for analogs of the two
ways heteroskedasticity can strike. - Graphical Analysis
- Formal statistical test
15Graphical Analysis
- Plot residuals against and independent
variables. - Expect to see residuals randomly clustered around
zero - However, you might see a pattern. This is bad.
- Examples
16 As x increases, so does the
As x increases, the
error variance
error variance decreases
scatter resid x
scatter resid x
rvfplot (or scatter resid yhat)
As the predicted value of y increases, So
does the error variance
17Good Examples
scatter y x
scatter resid x
rvfplot (scatter resid yhat)
18Formal Statistical Tests
- Whites Test
- Heteroskedasticity occurs when the size of the
errors is correlated with one or more independent
variables. - We can run OLS, get the residuals, and then see
if they are correlated with the independent
variables
19More Formally,
state district turnout diplomau mdnincm pred_turnout residual
AL 1 151,188 14.7 27,360 200,757.4 -49,569.4
AL 2 216,788 16.7 29,492 205,330 11,457.96
AL 3 147,317 12.3 26,800 197,491.7 -50,174.7
AL 4 226,409 8.1 25,401 191,310.8 35,098.16
AL 5 186,059 20.4 33,189 213,514.6 -27,455.6
20So, if error increases with x, we violate
heteroskedasticity
- If we can predict error with a regression line,
we have heteroskasticity. - To make this prediction, we need to make
everything positive (square it)
21So, if error increases with x, we violate
homoskedasticity
- Finally, we use these squared residuals as the
dependent variable in a new regression. - If we can predict increases/decreases in the size
of the residual, we have found evidence of
heteroskedasticity
- For Ind. Vars., we use the same ones as in the
original regression plus their squares and their
cross-products.
22The Result
- Take the r2 from this regression and multiply it
by n. - This test statistic is distributed ?2 with
degrees of freedom equal to the number of
independent variables in the 2nd regression - In other words, r2n is the ?2 you calculate from
your data, compare it to a critical ?2 from a ?2
table. If your ?2 is greater than ?2 then you
reject the null hypothesis (of homoskedasticity)
23A Sigh of Relief
- Stata will calculate this for you
- After running the regression, type
- imtest, white
. imtest, white White's test for Ho
homoskedasticity against Ha
unrestricted heteroskedasticity chi2(5)
9.97 Prob gt chi2
0.0762 Cameron Trivedi's decomposition of
IM-test -----------------------------------------
---------- Source chi2
df p ---------------------------------------
----------- Heteroskedasticity 9.97
5 0.0762 Skewness 3.96
2 0.1378 Kurtosis -28247.96
1 1.0000 ----------------------------------
---------------- Total
-28234.03 8 1.0000 -----------------------
----------------------------
24An Alternative Test Breusch/Pagan
- Based on similar logic
- Three changes
- Instead of using e2 as the D.V. in the 2nd
regression, use where - Instead of using every variable (plus squares and
cross-products), you specify the variables you
think are causing the heteroskedasticity - Alternatively, use only as a catch all
25An Alternative Test Breusch/Pagan
- 3. Test Statistic is RegSS from 2nd regression
divided by 2. It is distributed ?2 with degrees
of freedom equal to the number of independent
variables in the 2nd regression.
26Stata Command hettest
. hettest Breusch-Pagan / Cook-Weisberg test for
heteroskedasticity Ho Constant
variance Variables fitted values of
turnout chi2(1) 8.76
Prob gt chi2 0.0031 . hettest
senate Breusch-Pagan / Cook-Weisberg test for
heteroskedasticity Ho Constant
variance Variables senate
chi2(1) 4.59 Prob gt chi2
0.0321 . hettest , rhs Breusch-Pagan /
Cook-Weisberg test for heteroskedasticity
Ho Constant variance Variables
diplomau mdnincm senate guber chi2(4)
11.33 Prob gt chi2 0.0231
27What are you gonna do about it?
- Two Remedies
- We might need to try a different estimator. This
will be the Generalized Least Squares
estimator. This GLS Estimator can be applied
to data with heteroskedasticity and serial
correlation. - OLS is still consistent (just inefficient) and
Standard Errors are wrong. We could fix the
standard errors and stick with OLS.
28Generalized Least Squares
- When used to correct heteroskedasticity, we refer
to GLS as Weighted Least Squares or WLS. - Intuition
- Some data points
- have better quality
- information about
- the regression line
- than others because
- they have less error.
- We should give those
- observations more
- weight.
29Non-Constant Variance
- We want constant error variance for all
observations, - E(ei2) s 2 , estimated by RMSE
- However, with Heteroskedasticity, error variance
(si2) is not constant - E(ei2) si2, not constant (indexed by i)
- If we know what si2 is, we can re-weight the
equation to make the error variance constant
30Re-weighting the regression
Begin with the formula Add x0i, a variable that
is always 1
Divide through by si to weight it
We can simplify notation and show its really
just a regression with transformed variables.
Last, we just need to show that the
transformation makes the new error term, ei,
constant
31GLS vs. OLS
- In OLS, we minimize the sum of the squared
errors - In GLS, y we minimize a weighted sum of the
squared errors.
let
Set partial derivatives to 0, solve for a and b
to get eqs.
32GLS vs. OLS
- Minimize Errors
- Minimize Weighted Errors
- GLS (WLS) is just doing OLS with transformed
variables. - In the same way that we transformed a
non-linear data to fit the assumptions of OLS, we
can transform the data with weights to help
heteroskedastic data meet the assumptions of OLS
33GLS vs. OLS
- In Matrix form,
- OLS b (xx)-1xy
- GLS b (xO-1x)-1x O-1y
- Weights are included in a matrix, O-1
34Problem
- We rarely know exactly how to weight our data
- Solutions
- Plan A If heteroskedasticity comes from one
specific variable, we can use that variable as
the weight - Alternatively, we could run OLS and use the
residuals to estimate the weights (observations
with large OLS residuals get little weight in the
WLS estimates)
35Plan A A Single, Known, Villain
- Example Household income
- Households that earn little must spend it all on
necessities. When income is low, there is little
variance in spending. - Households that earn a great deal can either
spend it all or buy just essentials and save the
rest. More error variance as income increases - Note the changes in interpretation
36Plan B Estimate the weights
- Running OLS, get an estimate of the residuals
- Regress those residuals (squared) on the set of
independent variables and get predicted values - Use those predicted values as the weights
- Because this is GLS that is doable, it is
called Feasible GLS or FGLS - FGLS is asymptotically equal to GLS as sample
size goes to infinity
37I dont want to do GLS
- I dont blame you
- Usually best if we know something about the
nature of the heteroskedasticity - OLS was unbiased, why cant we just use that?
- Inefficient (but only problematic with very
severe heteroskedasticity) - Incorrect Standard Errors (formula changes)
- What if we could just fix standard errors?
38White Standard Errors
- We can use OLS and just fix the Standard Errors.
There are a number of ways to do this, but the
classic is White Standard Errors - Number of names for this
- White Std. Errs.
- Huber-White Std. Errs.
- Robust Std. Errs.
- Heteroskedastic Consistent Std. Errs.
39The big idea
- In OLS, Standard Errors come from the
Variance-Covariance Matrix. - Std. Err. is the Std. Dev. Of a Sampling
Distribution - Variance is the square of the Standard Deviation
(Std. Dev. is the square root of variance) - Variance Covariance matrix for OLS is given by
se2(XX)-1
. vce Variances
diplomau mdnincm _cons ----------------------
------------------ diplomau 254467
mdnincm -178.899 .187128 _cons
1.4e06 -3172.43 9.3e07
40With Heteroskedasticity
- Variance Covariance matrix for OLS is given by
se2(XX)-1 - Variance Covariance matrix under
heteroskedasticity is given by - (XX)-1 (XO-1X) (XX)-1
- Problem We still dont know Sigma
- Solution We can estimate (XO-1X) quite well
using OLS residuals by
where xi is the row of X for obs. i
41In Stata
- Specify the robust option after regression
. regress turnout diplomau mdnincm,
robust Regression with robust standard errors
Number of obs 426
F( 2, 423) 33.93
Prob gt F
0.0000
R-squared 0.1291
Root MSE
47766 -------------------------------------------
----------------------
Robust turnout Coef. Std. Err. t
Pgtt 95 Conf. Interval ---------------------
------------------------------------------- diplom
au 1101.359 548.7361 2.01 0.045 22.77008
2179.948 mdnincm 1.111589 .4638605 2.40
0.017 .19983 2.023347 _cons 154154.4
9903.283 15.57 0.000 134688.6
173620.1 -----------------------------------------
------------------------
42Drawbacks
- OLS is still inefficient (though this is not much
of a problem unless heteroskedasticity is really
bad) - Requires larger sample sizes to give good
estimates of Std. Errs. (which means t tests are
only OK asymptotically) - If there is no heteroskedasticity and you use
robust SEs, you do slightly worse than regular
Std. Errs.
43Moral of the Story
- If you know something about the nature of the
heteroskedasticity, WLS is goodBLUE - If you dont, use OLS with robust Std. Errs.
- Now, Group heteroskedasticity
44Group Heteroskedasticity
- No GLS/WLS option
- There is a Robust Std. Err. Option
- Essentially Stacks clusters into their own kind
of mini-White correction