Title: NonExperimental Data: Natural Experiments and more on IV
1Non-Experimental DataNatural Experiments and
more on IV
2(No Transcript)
3Non-Experimental Data
- Refers to all data that has not been collected as
part of experiment - Quality of analysis depends on how well one can
deal with problems of - Omitted variables
- Reverse causality
- Measurement error
- selection
- Or how close one can get to experimental
conditions
4Natural/ Quasi Experiments
- Used to refer to situation that is not
experimental but is as if it was - Not a precise definition saying your data is a
natural experiment makes it sound better - Refers to case where variation in X is good
variation (directly or indirectly via
instrument) - A Famous Example London, 1854
5The Case of the Broad Street Pump
- Regular cholera epidemics in 19th century London
- Widely believed to be caused by bad air
- John Snow thought bad water was cause
- Experimental design would be to randomly give
some people good water and some bad water - Ethical Problems with this
6Soho Outbreak August/September 1854
- People closest to Broad Street Pump most likely
to die - But breathe same air so does not resolve air vs.
water hypothesis - Nearby workhouse had own well and few deaths
- Nearby brewery had own well and no deaths
(workers all drank beer)
7Why is this a Natural experiment?
- Variation in water supply as if it had been
randomly assigned other factors (air) held
constant - Can then estimate treatment effect using
difference in means - Or run regression of death on water source
distance to pump, other factors - Strongly suggests water the cause
- Woman died in Hampstead, niece in Islington
8Whats that got to do with it?
- Aunt liked taste of water from Broad Street pump
- Had it delivered every day
- Niece had visited her
- Investigation of well found contamination by
sewer - This is non-experimental data but analysed in a
way that makes a very powerful case no theory
either
9Methods for Analysing Data from Natural
Experiments
- If data is as if it were experimental then can
use all techniques described for experimental
data - OLS (perhaps Snow case)
- IV to get appropriate units of measurement
- Will say more about IV than OLS
- IV perhaps more common
- If can use OLS not more to say
- With IV there is more to say weak instruments
10Conditions for Instrument Validity
- To be valid instrument
- Must be correlated with X - testable
- Must be uncorrelated with error untestable
have to argue case for this assumption - These conditions guaranteed with instrument for
experimental data - But more problematic for data from
quasi-experiments
11Bombs, Bones and BreakpointsThe Geography of
Economic Activity Davis and Weinstein, AER, 2002
- Existence of agglomerations (e.g. cities) a
puzzle - Land and labour costs higher so why dont firms
relocate to increase profits - Must be some compensatory productivity effect
- Different hypotheses about this
- Locational fundamentals
- Increasing returns (Krugman) path-dependence
12Testing these Hypotheses
- Consider a temporary shock to city population
- Locational fundamentals theory would predict no
permanent effect - Increasing returns would suggest permanent effect
- Would like to do experiment of randomly assigning
shocks to city size - This is not going to happen
13The Davis-Weinstein idea
- Use US bombing of Japanese cities in WW2
- This is a natural experiment not a true
experiment because - WW2 not caused by desire to test theories of
economic geography - Pattern of US bombing not random
- Sample is 303 Japanese cities, data is
- Population before and after bombing
- Measures of destruction
14Basic Equation
- ?si,47-40 is change in population just before and
after war - ?si,60-47 is change in population at later period
- How to test hypotheses
- Locational fundamentals predicts ß1-1
- Increasing returns predicts ß10
15The IV approach
- ?si,47-40 might be influenced by both permanent
and temporary factors - Only want part that is transitory shock caused by
war damage - Instrument ?si,47-40 by measures of death and
destruction
16The First-Stage Correlation of ?si,47-40 with Z
17Why Do We Need First-Stage?
- Establishes instrument relevance correlation of
X and Z - Gives an idea of how strong this correlation is
weak instrument problem - In this case reported first-stage not obviously
that implicit in what follows - That would be bad practice
18The IV Estimates
19Why Are these other variables included?
- Potential criticisms of instrument exogeneity
- Government post-war reconstruction expenses
correlated with destruction and had an effect on
population growth - US bombing heavier of cities of strategic
importance (perhaps they had higher growth rates)
- Inclusion of the extra variables designed to head
off these criticisms - Assumption is that of exogeneity conditional on
the inclusion of these variables - Conclusion favours locational fundamentals view
20An additional piece of supporting evidence.
- Always trying to build a strong evidence base
many potential ways to do this, not just
estimating equations
21The Problem of Weak Instruments
- Say that instruments are weak if correlation
between X and Z low (after inclusion of other
exogenous variables) - Rule of thumb - If F-statistic on instruments in
first-stage less than 10 then may be problem
(will explain this a bit later)
22Why Do Weak Instruments Matter?
- A whole range of problems tend to arise if
instruments are weak - Asymptotic problems
- High asymptotic variance
- Small departures from instrument exogeneity lead
to big inconsistencies - Finite-Sample Problems
- Small-sample distirbution may be very different
from asymptotic one - May be large bias
- Computed variance may be wrong
- Distribution may be very different from normal
23Asymptotic Problems ILow precision
- asymptotic variance of IV estimator is larger the
weaker the instruments - Intuition variance in any estimator tends to be
lower the bigger the variation in X think of
s2(XX)-1 - IV only uses variation in X that is associated
with Z - As instruments get weaker using less and less
variation in X
24Asymptotic Problems IISmall Departures from
Instrument Exogeneity Lead to Big Inconsistencies
- Suppose true causal model is
- yXßZ?e
- So possibly direct effect of Z on y.
- Instrument exogeneity is ?0.
- Obviously want this to be zero but might hope
that no big problem if close to zero a small
deviation from exogeneity
25But this will not be the case if instruments
weak consider just-identified case
- If instruments weak then SZX small so SZX-1 large
so ? multiplied by a large number
26An Example The Return to Education
- Economists long-interested in whether investment
in human capital a good investment - Some theory shows that coefficient on s in
regression - yß0ß1sß2xe
- Is measure of rate of return to education
- OLS estimates around 8 - suggests very good
investment - Might be liquidity constraints
- Might be bias
27Potential Sources of Bias
- Most commonly mentioned is ability bias
- Ability correlated with earnings independent of
education - Ability correlated with education
- If ability omitted from x variables then usual
formula for omitted variables bias suggests
upward bias in OLS estimate
28Potential Solution
- Find an instrument correlated with education but
uncorrelated with ability (or other excluded
variables) - Angrist-Krueger Does Compulsory Schooling
Attendance Affect Schooling and Earnings , QJE
1991, suggest using quarter of birth - Argue correlated with education because of school
start age policies and school leaving laws
(instrument relevance) - Dont have to accept this can test it
29A graphical version of first-stage (correlation
between education and Z)
30In this case
- Their instrument is binary so IV estimator can be
written in Wald form - And this leads to following expression for
potential inconsistency
- Note denominator is difference in schooling for
those born in first- and other quarters - Instrument will be weak if this difference is
small
31Their Results
32Interpretation (and Potential Criticism)
- IV estimates not much below OLS estimates (higher
in one case) - Suggests ability bias no big deal
- But instrument is weak
- Being born in 1st quarter reduces education by
0.1 years - Means ? will be multiplied by 10
33But why should we have ??0
- Remember this would imply a direct effect of
quarter of birth on earnings, not just one that
works through the effect on education - Bound, Jaeger and Baker argued that evidence that
quarter of birth correlated with - Mental and physical health
- Socioeconomic status of parents
- Unlikely that any effects are large but dont
have to be when instruments are weak
34An example UK data
Effect is small but significantly different from
zero
35A Back-of-the-Envelope Calculation
- Being born in first quarter means 0.01 less
likely to have a managerial/professional parent - Being a manager/professional raises log earnings
by 0.64 - Correlation between earnings of children and
parents 0.4 - Effect on earnings through this route
0.010.640.40.00256 i.e. ¼ of 1 per cent - Small but weak instrument causes effect on
inconsistency of IV estimate to be multiplied by
10 0.0256 - Now large relative to OLS estimate of 0.08
36Summary
- Small deviations from instrument exogeneity lead
to big inconsistencies in IV estimate if
instruments are weak - Suspect this is often of great practical
importance - Quite common to use odd instrument argue that
no reason to believe it is correlated with e
but show correlation with X
37Finite Sample Problems
- This is a very complicated topic
- Exact results for special cases, approximations
for more general cases - Hard to say anything that is definitely true but
can give useful guidance - Problems in 3 areas
- Bias
- Incorrect measurement of variance
- Non-normal distribution
- But really all different symptoms of same thing
38Review and Reminder
- If ask STATA to estimate equation by IV
- Coefficients compute using formula given
- Standard errors computed using formula for
asymptotic variance - T-statistics, confidence intervals and p-values
computed using assumption that estimator is
unbiased with variance as computed and normally
distributed - All are asymptotic results
39Difference between asymptotic and finite-sample
distributions
- This is normal case
- Only in special cases e.g. linear regression
model with normally distributed errors are
small-sample and asymptotic distributions the
same. - Difference likely to be bigger
- The smaller the sample size
- The weaker the instruments
40Rule of Thumb for Weak Instruments
- F-test for instruments in first-stage gt10
- Stricter than significant e.g. if one instrument
F10 equivalent to t3.3
41Conclusion
- Natural experiments useful source of knowledge
- Often requires use of IV
- Instrument exogeneity and relevance need
justification - Weak instruments potentially serious
- Good practice to present first-stage regression
- Finding more robust alternative to IV an active
research area