Title: Econometric Analysis of Panel Data
1Econometric Analysis of Panel Data
- William Greene
- Department of Economics
- Stern School of Business
2Endogeneity
- y X?e,
- Definition Eex?0
- Why not?
- Omitted variables
- Unobserved heterogeneity (equivalent to omitted
variables) - Measurement error on the RHS (equivalent to
omitted variables) - Structural aspects of the model
- Endogenous sampling and attrition
- Simultaneity (?)
3Instrumental Variable Estimation
- One problem variable the last one
- yit ?1x1it ?2x2it ?KxKit eit
- EeitxKit ? 0. (0 for all others)
- There exists a variable zit such that
- ExKit x1it, x2it,, xK-1,it,zit g(x1it,
x2it,, xK-1,it,zit) - In the presence of the other variables, zit
explains xit - Eeit x1it, x2it,, xK-1,it,zit 0
- In the presence of the other variables, zit
and eit are uncorrelated. - A projection interpretation In the projection
- XKt ?1x1it, ?2x2it ?k-1xK-1,it ?K
zit, - ?K ? 0.
4The First IV Study Natural Experiment(Snow, J.,
On the Mode of Communication of Cholera,
1855)http//www.ph.ucla.edu/epi/snow/snowbook3.ht
ml
- London Cholera epidemic, ca 1853-4
- Cholera f(Water Purity,u)e.
- Causal effect of water purity on cholera?
- Purityf(cholera prone environment (poor, garbage
in streets, rodents, etc.). Regression does not
work. - Two London water companies
- Lambeth Southwark
-
- Main sewage discharge
River Thames
Paul Grootendorst A Review of Instrumental
Variables Estimation of Treatment
Effectshttp//individual.utoronto.ca/grootendors
t/pdf/IV_Paper_Sept6_2007.pdf
5IV Estimation
- Choleraf(Purity,u)e
- Z water company
- Cov(Cholera,Z)dCov(Purity,Z)
- Z is randomly mixed in the population (two full
sets of pipes) and uncorrelated with behavioral
unobservables, u) - CholeraadPurityue
- Purity Meanrandom variation?u
- Cov(Cholera,Z) dCov(Purity,Z)
6Cornwell and Rupert Data
Cornwell and Rupert Returns to Schooling Data,
595 Individuals, 7 YearsVariables in the file
are EXP work experienceWKS weeks
workedOCC occupation, 1 if blue collar, IND
1 if manufacturing industrySOUTH 1 if
resides in southSMSA 1 if resides in a city
(SMSA)MS 1 if marriedFEM 1 if
femaleUNION 1 if wage set by union
contractED years of educationLWAGE log of
wage dependent variable in regressions These
data were analyzed in Cornwell, C. and Rupert,
P., "Efficient Estimation with Panel Data An
Empirical Comparison of Instrumental Variable
Estimators," Journal of Applied Econometrics, 3,
1988, pp. 149-155. Â See Baltagi, page 122 for
further analysis. Â The data were downloaded from
the website for Baltagi's text.
7(No Transcript)
8Specification Quadratic Effect of Experience
9The Effect of Education on LWAGE
10What Influences LWAGE?
11An Exogenous Influence
12Instrumental Variables
- Structure
- LWAGE (ED,EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) - ED (MS, FEM)
- Reduced Form LWAGE ED (MS, FEM),
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION
13Two Stage Least Squares Strategy
- Reduced Form LWAGE ED (MS, FEM,X),
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION - Strategy
- (1) Purge ED of the influence of everything but
MS, FEM (and the other variables). Predict ED
using all exogenous information in the sample (X
and Z). - (2) Regress LWAGE on this prediction of ED and
everything else. - Standard errors must be adjusted for the
predicted ED
14OLS
15The weird results for the coefficient on ED
happened because the instruments, MS and FEM are
dummy variables. There is not enough variation
in these variables.
16Source of Endogeneity
- LWAGE f(ED,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) ? - ED f(MS,FEM,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) u
17Remove the Endogeneity
- LWAGE f(ED,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) u ? - LWAGE f(ED,
EXP,EXPSQ,WKS,OCC,
SOUTH,SMSA,UNION) u ? - Strategy
- Estimate u
- Add u to the equation. ED is uncorrelated with ?
when u is in the equation.
18Auxiliary Regression for ED to Obtain Residuals
19OLS with Residual (Control Function) Added
2SLS
20A Warning About Control Functions
Sum of squares is not computed correctly because
U is in the regression. A general result. Control
function estimators usually require a fix to the
estimated covariance matrix for the estimator.
21On Sat, May 3, 2014 at 448 PM, Â wrote Dear
Professor Greene, I am giving an Econometrics
course in Brazil and we are using your great
textbook. I got a question which I think only you
can help me. In our last class, I did a formal
proof that var(beta_hat_OLS) is lower or equal
than var(beta_hat_2SLS), under homoscedasticity.Â
We know this assertive is also valid under
heteroscedasticity, but a graduate student asked
me the proof (which is my problem). Do you know
where can I find it?
22(No Transcript)
23(No Transcript)
24(No Transcript)
25The General Problem
26Instrumental Variables
- Framework y X? ?, K variables in X.
- There exists a set of K variables, Z such that
- plim(ZX/n) ? 0 but plim(Z?/n) 0
- The variables in Z are called instrumental
variables. - An alternative (to least squares) estimator of ?
is - bIV (ZX)-1Zy
- We consider the following
- Why use this estimator?
- What are its properties compared to least
squares? - We will also examine an important application
27IV Estimators
- Consistent
- bIV (ZX)-1Zy
- (ZX/n)-1 (ZX/n)ß (ZX/n)-1Ze/n
- ß (ZX/n)-1Ze/n ? ß
- Asymptotically normal (same approach to proof
as for OLS) - Inefficient to be shown.
- By construction, the IV estimator is
consistent. We have an estimator that is
consistent when least squares is not.
28IV Estimation
- Why use an IV estimator? Suppose that X and ?
are not uncorrelated. Then least squares is
neither unbiased nor consistent. - Recall the proof of consistency of least squares
- b ? (XX/n)-1(X?/n).
- Plim b ? requires plim(X?/n) 0. If this
does not hold, the estimator is inconsistent.
29A Popular Misconception
- If only one variable in X is correlated with ?,
the other coefficients are consistently
estimated. False. - The problem is smeared over the other
coefficients.
30Consistency and Asymptotic Normality of the IV
Estimator
31Asymptotic Covariance Matrix of bIV
32Asymptotic Efficiency
- Asymptotic efficiency of the IV estimator. The
variance is larger than that of LS. (A large
sample type of Gauss-Markov result is at work.) - (1) Its a moot point. LS is inconsistent.
- (2) Mean squared error is uncertain
- MSEestimatorßVariance square of bias.
- IV may be better or worse. Depends on the data
33Two Stage Least Squares
- How to use an excess of instrumental variables
- (1) X is K variables. Some (at least one) of
the K - variables in X are correlated with e.
- (2) Z is M gt K variables. Some of the variables
in - Z are also in X, some are not. None of the
- variables in Z are correlated with e.
- (3) Which K variables to use to compute ZX and
Zy?
34Choosing the Instruments
- Choose K randomly?
- Choose the included Xs and the remainder
randomly? - Use all of them? How?
- A theorem (Brundy and Jorgenson, ca. 1972) There
is a most efficient way to construct the IV
estimator from this subset - (1) For each column (variable) in X, compute the
predictions of that variable using all the
columns of Z. - (2) Linearly regress y on these K predictions.
- This is two stage least squares
35Algebraic Equivalence
- Two stage least squares is equivalent to
- (1) each variable in X that is also in Z is
replaced by itself. - (2) Variables in X that are not in Z are replaced
by predictions of that X using - All other variables in X that are not correlated
with e - All the variables in Z that are not in X.
36The weird results for the coefficient on ED
happened because the instruments, MS and FEM are
dummy variables. There is not enough variation
in these variables.
372SLS Algebra
38Asymptotic Covariance Matrix for 2SLS
392SLS Has Larger Variance than LS
40Estimating s2
41Robust estimation of VC
Predicted X
Actual X
422SLS vs. Robust Standard Errors
-------------------------------------------------
- Robust Standard Errors
----------------------------------------
------- Variable Coefficient Standard
Error b/St.Er. ------------------------------
----------------- B_1 45.4842872
4.02597121 11.298 B_2 .05354484
.01264923 4.233 B_3
-.00169664 .00029006 -5.849 B_4
.01294854 .05757179 .225 B_5
.38537223 .07065602 5.454 B_6
.36777247 .06472185 5.682
B_7 .95530115 .08681261 11.000
-----------------------------------------------
--- 2SLS Standard Errors
---------------------------------------
-------- Variable Coefficient Standard
Error b/St.Er. ------------------------------
----------------- B_1 45.4842872
.36908158 123.236 B_2 .05354484
.03139904 1.705 B_3
-.00169664 .00069138 -2.454 B_4
.01294854 .16266435 .080 B_5
.38537223 .17645815 2.184 B_6
.36777247 .17284574 2.128
B_7 .95530115 .20846241 4.583
43Endogeneity Test? (Hausman)
- Exogenous
EndogenousOLS Consistent, Efficient
Inconsistent 2SLS Consistent,
Inefficient Consistent - Base a test on d b2SLS -
bOLS Use a Wald statistic,
dVar(d)-1d - What to use for the variance
matrix? Hausman V2SLS - VOLS
44Hausman Test
45Hausman Test One at a Time?
46Endogeneity Test Wu
- Considerable complication in Hausman test (text,
pp. 234-237) - Simplification Wu test.
- Regress y on X and estimated for the
endogenous part of X. Then use an ordinary Wald
test.
47Wu Test
48Regression Based Endogeneity Test
49Testing Endogeneity of WKS
(1) Regress WKS on 1,EXP,EXPSQ,OCC,SOUTH,SMSA,MS.
Uresidual, WKSHATprediction (2) Regress
LWAGE on 1,EXP,EXPSQ,OCC,SOUTH,SMSA,WKS, U or
WKSHAT ---------------------------------------
--------------------------- Variable
Coefficient Standard Error b/St.Er.PZgtz
Mean of X ----------------------------------
-------------------------------- Constant
-9.97734299 .75652186 -13.188 .0000
EXP .01833440 .00259373 7.069
.0000 19.8537815 EXPSQ -.799491D-04
.603484D-04 -1.325 .1852 514.405042 OCC
-.28885529 .01222533 -23.628
.0000 .51116447 SOUTH -.26279891
.01439561 -18.255 .0000 .29027611 SMSA
.03616514 .01369743 2.640
.0083 .65378151 WKS .35314170
.01638709 21.550 .0000 46.8115246 U
-.34960141 .01642842 -21.280
.0000 -.341879D-14 ---------------------------
--------------------------------------- Varia
ble Coefficient Standard Error
b/St.Er.PZgtz Mean of X ----------------
----------------------------------------------
---- Constant -9.97734299 .75652186
-13.188 .0000 EXP .01833440
.00259373 7.069 .0000 19.8537815 EXPSQ
-.799491D-04 .603484D-04 -1.325
.1852 514.405042 OCC -.28885529
.01222533 -23.628 .0000 .51116447 SOUTH
-.26279891 .01439561 -18.255
.0000 .29027611 SMSA .03616514
.01369743 2.640 .0083 .65378151 WKS
.00354028 .00116459 3.040
.0024 46.8115246 WKSHAT .34960141
.01642842 21.280 .0000 46.8115246
50General Test for Endogeneity
51Alternative to Hausmans Formula?
- H test requires the difference between an
efficient and an inefficient estimator. - Any way to compare any two competing estimators
even if neither is efficient? - Bootstrap? (Maybe)
52(No Transcript)
53Weak Instruments
- Symptom The relevance condition, plim ZX/n not
zero, is close to being violated. - Detection
- Standard F test in the regression of xk on Z. F lt
10 suggests a problem. - F statistic based on 2SLS see text p. 351.
- Remedy
- Not much most of the discussion is about the
condition, not what to do about it. - Use LIML? Requires a normality assumption.
Probably not too restrictive.
54Weak Instruments (cont.)
55Weak Instruments
56A study of moral hazardRiphahn, Wambach,
Million Incentive Effects in the Demand for
HealthcareJournal of Applied Econometrics,
2003Did the presence of the ADDON insurance
influence the demand for health care doctor
visits and hospital visits?For a simple
example, we examine the PUBLIC insurance (89)
instead of ADDON insurance (2).
57Application Health Care Panel Data
German Health Care Usage Data, 7,293 Individuals,
Varying Numbers of PeriodsVariables in the file
areData downloaded from Journal of Applied
Econometrics Archive. This is an unbalanced panel
with 7,293 individuals. They can be used for
regression, count models, binary choice, ordered
choice, and bivariate binary choice. Â This is a
large data set. Â There are altogether 27,326
observations. Â The number of observations ranges
from 1 to 7. Â (Frequencies are 11525, 22158,
3825, 4926, 51051, 61000, 7987). Note, the
variable NUMOBS below tells how many observations
there are for each person. This variable is
repeated in each row of the data for the person.Â
(Downloaded from the JAE Archive)
DOCTOR 1(Number of doctor visits gt 0)
HOSPITAL 1(Number of hospital
visits gt 0) HSAT Â
health satisfaction, coded 0 (low) - 10 (high) Â
DOCVIS Â number of doctor
visits in last three months
HOSPVIS Â number of hospital visits in last
calendar year PUBLIC Â
insured in public health insurance 1 otherwise
0 ADDON Â insured by
add-on insurance 1 otherswise 0
HHNINC Â household nominal monthly net
income in German marks / 10000.
(4 observations with
income0 were dropped) HHKIDS
children under age 16 in the household 1
otherwise 0 EDUC Â years
of schooling AGE age in
years MARRIED marital
status EDUC years of
education
58Evidence of Moral Hazard?
59Regression Study
60Endogenous Dummy Variable
- Doctor Visits f(Age, Educ, Health,
Presence of Insurance,
Other unobservables) - Insurance f(Expected Doctor Visits,
Other unobservables)
61Approaches
- (Parametric) Control Function Build a structural
model for the two variables (Heckman) - (Semiparametric) Instrumental Variable Create an
instrumental variable for the dummy variable
(Barnow/Cain/ Goldberger, Angrist, Current
generation of researchers) - (?) Propensity Score Matching (Heckman et al.,
Becker/Ichino, Many recent researchers)
62Heckmans Control Function Approach
- Y xß dT EeT e - EeT
- ? EeT , computed from a model for whether
T 0 or 1
Magnitude 11.1200 is nonsensical in this
context.
63Instrumental Variable Approach
- Construct a prediction for T using only the
exogenous information - Use 2SLS using this instrumental variable.
Magnitude 23.9012 is also nonsensical in this
context.
64Propensity Score Matching
- Create a model for T that produces probabilities
for T1 Propensity Scores - Find people with the same propensity score some
with T1, some with T0 - Compare number of doctor visits of those with T1
to those with T0.
65Treatment Effect
- Earnings and Education Effect of an additional
year of schooling - Estimating Average and Local Average Treatment
Effects of Education when Compulsory Schooling
Laws Really Matter - Philip Oreopoulos
- AER, 96,1, 2006, 152-175
66Treatment Effects and Natural Experiments
67How do panel data fit into this?
- We can use the usual models.
- We can use far more elaborate models
- We can study effects through time
- Observations are surely correlated.
- The same individual is observed more than once
- Unobserved heterogeneity that appears in the
disturbance in a cross section remains persistent
across observations (on the same unit). - Procedures must be adjusted.
- Dynamic effects are likely to be present.
-
68Appendix Structure and Regression
69Least Squares Revisited
70Inference with IV Estimators
71Comparing OLS and IV
72Testing for Endogeneity(?)
73Structure vs. Regression
- Reduced Form vs. Stuctural Model
- Simultaneous equations origin
- Q(d) a0 a1P a2I e(d) (demand)Q(s) b0
b1P b2R e(s) (supply)Q(.) Q(d)
Q(s)What is the effect of a change in I on
Q(.)?(Not a regression) - Reduced form Q c0 c1I c2R
v.(Regression) - Modern concepts of structure vs. regression The
search for causal effects.
74Implications
- The structure is the theory
- The regression is the conditional mean
- There is always a conditional mean
- It may not equal the structure
- It may be linear in the same variables
- What is the implication for least squares
estimation? - LS estimates regressions
- LS does not necessarily estimate structures
- Structures may not be estimable they may not be
identified.
75Structure and Regression
- Simultaneity? What if Eex?0
- yx?e, xdyu. Covx, e?0
- x? is not the regression?
- What is the regression?
- Reduced form Assume e and u are uncorrelated.
- y ?/(1- ?d)u 1/(1- ?d)e
- x 1/(1- ?d)u d /(1- ?d)e
- Covx,y/Varx ?
- The regression is y ?x v, where Evx0
76Structure vs. Regression
Supply a bPrice cCapacity Demand A
BPrice CIncome