Review of Probability and Statistics - PowerPoint PPT Presentation

1 / 56
About This Presentation
Title:

Review of Probability and Statistics

Description:

Instrumental Variables (IV) estimation is used when your model has endogenous x's ... Method extends to multiple endogenous variables need to be sure that we have ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 57
Provided by: PatriciaM47
Category:

less

Transcript and Presenter's Notes

Title: Review of Probability and Statistics


1
Instrumental Variables and Two Stage Least
Squares
  • y b0 b1x1 b2x2 . . . bkxk u
  • x1 p0 p1z p2x2 . . . pkxk v

2
OUTLINE
  • When we need Instrumental Variables?
  • What is an Instrumental Variable?
  • An Example
  • IV Estimation in the Simple RM
  • IV Estimator
  • Inference
  • Poor Instruments
  • Some Applications
  • IV Estimation in the Multiple RM
  • 2SLS Estimator
  • Inference
  • An Application Institutions and Development
  • Addressing Errors-in-Variables with IV Estimation
  • Testing for Endogeneity
  • Testing for Overidentifying Restrictions
  • 2SLS with Heteroscedasticity
  • 2SLS with Serial Correlation

3
1. When we need Instrumental Variables?
  • Instrumental Variables (IV) estimation is used
    when your model has endogenous xs
  • That is, whenever Cov(x,u) ? 0
  • Thus, IV can be used to address the problem of
    omitted variable bias
  • Additionally, IV can be used to solve the classic
    errors-in-variables problem

4
2. What is an Instrumental Variable?
  • In order for a variable, z, to serve as a valid
    instrument for x, the following three conditions
    must be true
  • (1) The instrument must be exogenous
  • That is, Cov(z,u) 0
  • (2) The instrument must be correlated with the
    endogenous variable x
  • That is, Cov(z,x) ? 0
  • (3) The instrument should not be a regressor in
    the equation for y, or being perfectly correlated
    with the regressors in that equation.

5
What is an Instrumental Variable? (cont.)
  • Conditions (2) and (3) are simple to verify. They
    can be tested using the data.
  • For condition (2) Just testing H0 p1 0 in x
    p0 p1z v
  • For condition (3) We can just look at the
    R-square in the regression of z on all the
    regressors other than x.

6
What is an Instrumental Variable? (cont.)
  • Condition (1), Cov(z,u) 0, is the key one. And
    it can not be tested
  • To justify that condition (1) holds we need to
    have a model with a clear interpretation of which
    are the variables in the error term u.
  • We have to use economic theory and common sense
    to decide if it makes sense to assume Cov(z,u) 0

7
3. An Example
  • Problem Estimate effect of treatment (T) on
    outcome (Y). i.e., estimate ?1 in
  • Yi ?0 ?1 Ti ui
  • For simplicity, suppose
  • Dichotomous treatment variable T1 if treated, 0
    otherwise
  • Homogeneous treatment effect (?1)
  • No other regressors.

8
3. An Example
  • For concretness, supose that we are interested in
    the effect of an investment subsidy on firms
    capital investment.
  • Ti is the binary variable that indicates if a
    firm has applied for and has been granted the
    subsidy.
  • Yi represents the firms investment rate.

9
3. An Example
  • OLS estimation yields the estimator

However, the key assumption for the consistency
of the OLS estimator (no correlation between T
and u) is unlikely to hold because treatment is
related to omitted factors u influencing
outcome.
10
Four Solutions to this Problem
  • Randomized Controlled Trial
  • Natural Experiments Find similar observations
    with different treatment for arbitrary reasons
    (e.g. regulatory rules, law changes).Difference-
    in-Difference estimates
  • Control for Observable Differences
  • Attempt to condition on sufficient X's such that
    E(Tu)0
  • Then estimate directly by least squares
  • (1) Y ?0 ?1 T X? u
  • Instrumental Variables (IV)
  • Suppose exists instrumental variable (Z) that
    is
  • (A1) correlated with treatment E(Z T) ? 0
  • (A2) Uncorrelated with residual E(Zu)0

11
Example Simple IV
  • For instance, in the investment subsidy example,
    suppose that only a random number of firms can
    apply for the subsidy.
  • Let Z be the dummy variable (0,1) that indicates
    whether a firm can apply to obtain the subsidy or
    not.
  • Because Z is purely random, it is not related to
    u.
  • However, Z should be correlated with T because to
    be granted the subsidy (T1) it is necessary to
    be eligible (Z1).

12
Example Simple IV
  • Then, we have the following moment conditions
  • E( ui ) 0 that implies E(Yi - ?0 - ?1
    Ti) 0
  • E( Zi ui ) 0 that implies E(Zi Yi -
    ?0 - ?1 Ti) 0
  • Using the method of moments, we estimate ?0 and
    ?1 using the the sample moment conditions
    associated with the previous population moment
    conditions.

13
Example Simple IV
  • In this example, this estimator (IV) is
  • (Difference in mean outcomes)/(difference in
    treatment rate)

14
4. IV Estimation in the Simple RM
  • For y b0 b1x u, and given our assumptions
  • Cov(z,y) b1Cov(z,x) Cov(z,u),
  • b1 Cov(z,y) / Cov(z,x)
  • Therefore, given a random sample of x,y,z, by
    the LLN a consistent estimator of b1 is (the IV
    estimator)

15
Inference with IV Estimation
  • The homoskedasticity assumption in this case is
    E(u2z) s2 Var(u)
  • As in the OLS case, given the asymptotic
    variance, we can estimate the standard error

16
Comparison of IV and OLS standard errors
  • Standard error in IV case differs from OLS only
    in the R2 from regressing x on z
  • Since R2 lt 1, IV standard errors are larger
  • However, IV is consistent, while OLS is
    inconsistent, when Cov(x,u) ? 0
  • The stronger the correlation between z and x, the
    smaller the IV standard errors

17
Poor Instruments
  • We have a poor instrument when z and x are weakly
    correlated.
  • The problem of weak instruments is not just that
    the variance of the IV estimator is much larger
    than the variance of the OLS.
  • A more serious problem is that the IV estimator
    can have a large asymptotic bias even if z and u
    are only moderately correlated.

18
Poor Instruments (cont.)
  • We can compare the asymptotic bias in OLS and IV
  • Prefer IV if Corr(z,u)/Corr(z,x) lt Corr(x,u),
    that is if
  • Corr(z,x) gt Corr(z,u)/Corr(x,u)

19
Some Applications IV.Estimating treatment
effects in AMI
  • McClellan, M., B. McNeil and J. Newhouse, JAMA,
    1994.
  • "Does More Intensive Treatment of Acute
    Myocardial Infarction Reduce Mortality?
  • ? Medicare claims data, elderly with heart
    attack (AMI), 1987-91
  • ? Treatment Cardiac Catheterization (marker for
    aggressive care)
  • ? Outcome Survival to 1 day, 30 days, 90 days,
    etc.
  • ? Instrument Is nearest hospital a
    catheterization hospital?
  • Differential Distance
  • (distance to nearest cath) - (distance to
    nearest non-cath)
  • based on zipcode of residence, zip code of
    hospital

20
Poor Instruments (cont.)
  • Suppose that Corr(z,x)0.10 (which in fact is
    larger than in many applications).
  • Then, the IV estimator has a smaller bias than
    the OLS estimator only if Corr(z,u) is at least
    10 times smaller than Corr(x,u).
  • Suppose that Corr(x,u)0.10 and that
    Corr(x,u)0.01.
  • Then, the IV estimator has a smaller bias than
    the OLS estimator only if Corr(z,x)gt0.10.

21
Is Differential Distance a Good Instrument?
  • Correlated with treatment (Cath)? Yes. ?
    26.2 get Cath if nearest hospital is Cath
    hospital ? 19.5 get Cath if nearest hospital
    is not Cath hospital
  • 2. Uncorrelated with unobserved patient severity?
    Never sure! But unrelated to observable patient
    severity in claims

22
Major Findings of McClellan et al.
  • Least squares dramatically overstates treatment
    effect, because Cath associated with fewer risk
    factors.
  • ? 1-year mortality is 30 lower (17 vs. 47)
    if Cath ? OLS estimate is 24, adjusting for
    observable risk factors
  • 2. IV estimates suggest Cath associated with 5-10
    percentage point reduction in mortality nearly
    all in 1st day.

23
Validating McClellan et al.
  • Recent work replicates validates earlier work
    using
  • 1. more comprehensive control variables
  • 2. alternative instruments
  • McClellan and Noguchi, 1998 (Tables 1 2 below)
  • Geppert, McClellan and Staiger, 2001 (Table 4
    below)
  • -- Data from Cooperative Cardiovascular Project
    (CCP)
  • Chart data for appx. 180,000 AMI patients from
    1994-95
  • Linked Medicare claims data
  •  
  • -- Treatments and outcomes of AMI in elderly as
    in earlier work
  •  
  • -- Instruments
  • (1) Differential distance
  • (2)    Variation in hospital Cath rate (gt4000
    dummies)
  •  

24
Key Validation Questions
  • Are severity measures unobserved in claims data
    uncorrelated with instrument (differential
    distance)?
  • Are OLS results closer to IV with more extensive
    controls?
  • Are IV results robust to more extensive controls?
  • Are IV results robust to alternative instruments?

25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
Conclusions of Validation
  • Measured individual covariates can be used to
    assess bias of alternative methods for estimating
    treatment effects with observational data.
  • Methods that attempt to adjust for observable
    differences are quite sensitive to the use of
    more detailed chart data, and yield biased
    estimates of treatment effects in commonly
    available datasets.
  • IV methods for evaluating AMI treatment are not
    sensitive to the use of more detailed chart data,
    and appear to have minimal bias.

29
Growing number of applications of IV using a
variety of instruments
  •  n    Geography as an instrument
  • (distance, rivers, small area variation)
  •  n    Legal/political institutions as an
    instrument
  • (laws, election dynamics)
  •  n    Administrative rules as an instrument
  • (wage/staffing rules, reimbursement rules,
    eligibility rules)
  •  n    Naturally occurring randomization
  • (draft, birth timing, lottery, roommate
    assignment, weather)

30
Example Angrist Krueger
  • Data US Census 5 PUMS
  • Sample 329,509 men born 1930-39
  • ln(earnings) Education? X? e
  • Instruments (Quarter of Birth)(Year of
    birth) (Quarter of Birth)(State of birth)
  • Instruments 178
  • First-stage F 1.869(p-value) (0.000)

31
Example Angrist Krueger (cont.)
  • Estimate of ? 95 Confidence interval
  •  
  • OLS 0.063 (0.062,0.063)
  •  
  • 2SLS 0.081 (0.060,0.102)
  •  
  • 2SLS 0.060 (0.031,0.089)
  • with random
  • instruments
  •  
  • LIML 0.098 (0.068, 0.128)
  •  
  • Valid Confidence (-0.015,0.240)
  • Interval (Anderson-Rubin)

32
Example Geppert, McClellan Staiger
  • Use between-hospital variation in treatment
    intensity (e.g. cath rate) as instrument to
    estimate treatment effects
  • Equivalent to using gt4000 hospital dummies as
    instruments
  • But instruments are weak 1st Stage F-statistic
    is 10-25 ? 2SLS estimates have small bias (1/F)
    towards OLS ? 2SLS SEs are too small (many
    instruments, modest F) ? LIML SEs should be
    okay
  • Using hierarchical structure, we develop
    alternative GMM estimation procedure to correct
    estimates SEs. (asymptotically equivalent to
    LIML, but simpler)
  • Cath effects similar to McClellan et al., but
    more precisely estimated

33
4. IV Estimation in the Multiple RM
  • IV estimation can be extended to the multiple
    regression case.
  • Call the model we are interested in estimating
    the structural model.
  • Our problem is that one or more of the variables
    are endogenous.
  • We need an instrument for each endogenous variable

34
IV Estimation in the Multiple RM (cont.)
  • Write the structural model as
  • y1 b0 b1y2 b2z1 u1
  • where y2 is endogenous and z1 is exogenous.
  • Let z2 be the instrument, so Cov(z2,u1) 0 and
  • y2 p0 p1z1 p2z2 v2
  • where p2 ? 0
  • This reduced form equation regresses the
    endogenous variable on all exogenous ones

35
Two Stage Least Squares (2SLS)
  • Its possible to have multiple instruments
  • Consider our original structural model, and let
  • y2 p0 p1z1 p2z2 p3z3 v2
  • Here were assuming that both z2 and z3 are valid
    instruments they do not appear in the
    structural model and are uncorrelated with the
    structural error term, u1

36
2SLS Best Instrument
  • Could use either z2 or z3 as an instrument
  • The best instrument is a linear combination of
    all of the exogenous variables, y2 p0 p1z1
    p2z2 p3z3
  • We can estimate y2 by regressing y2 on z1, z2
    and z3 can call this the first stage
  • If then substitute y2 for y2 in the structural
    model, get same coefficient as IV

37
More on 2SLS
  • While the coefficients are the same, the standard
    errors from doing 2SLS by hand are incorrect, so
    let Stata do it for you.
  • Method extends to multiple endogenous variables
    need to be sure that we have at least as many
    excluded exogenous variables (instruments) as
    there are endogenous variables in the structural
    equation

38
Institutions as the fundamental cause of
long-run growth
  • D. Acemoglu, S. Johnson, J. Robinson (2004) with
    some additions
  • Theoretical Framework
  • Economic Institutions and Income differences
  • Natural Experiments
  • The Colonial Origins of Comparative Development
    An Empirical Investigation (2001)
  • Why do Institutions differ?
  • Sources of inefficiencies
  • Political implications
  • Summary

39
Theoretical Framework
40
Economic Institutions and Income Differences
  • Economic institutions (vs. geography and culture)
    as fundamental cause of different patterns of
    economic growth
  • Good economic institutions
  • (to simplify and focus the discussion)
    institutions that provide security of property
    rights and relatively equal access to economic
    resources to a broad cross-section of society

41
Economic Institutions and Income Differences
Average Protection Against Risk of Expropriation
1985-95 and log GDP per capita 1995
42
Economic Institutions and Income Differences
  • Secure property rights cause prosperity?
  • Problems with making such an inference!
  • It could be reverse causation!
  • It could be a problem of omitted variable bias
  • What can we do?
  • look for a natural experiment
  • find a source of variation in economic
    institutions that should have no effect on
    economic outcomes

43
Natural Experiment The Korean Experiment
  • At the time of separation
  • approximately the same GDP per capita
  • Few geographic and cultural distinctions
  • North followed the model of Soviet socialism and
    the Chinese Revolution in abolishing private
    property. Economic decision not mediated by the
    market
  • South system of private property and market and
    private incentives to develop the economy

GDP per capita in North and South Korea 1950-98
44
Natural Experiment The Korean Experiment
  • The only possible explanation for the radically
    different economic experience
  • their very different INSTITUTIONS
  • Necessity to look at a larger scale natural
    experiment in institutional divergence!!!

45
Natural Experiment The Colonial Experiment
  • Europeans imposed different sets of institutions
    in different parts of the globe
  • The Reversal of Fortune
  • The nation states that coincide today with the
    boundaries of prosperous empires (Incas, Aztecs)
    in 1500 are among the poorer societies today!
  • The less developed civilisation in North America,
    Australia are much richer than those in the land
    of Incas and Aztecs

log GDP per capita in 1995 and log Population
Density in 1500
46
The Colonial Experiment
  • Institutions hypothesis of the Reversal Fortune
  • Densely-settled relative developed places
    worse institutions
  • Sparsely-settled areas better institutions
  • Why?
  • Introduce/maintain extraction resources economy
    in densely settled areas (where they could
    exploit the population)
  • Protection their own rights in sparsely-settled
    areas where the Europeans were the majority

47
The colonial Origins of Comparative Development
An Empirical Investigation
  • The disease environment not favourable for the
    attractiveness of European settlement
  • Settlement mortality as exogenous variable for
    the subsequent path of institutional development
    to pin the causal effect of economic institutions
    on prosperity
  • No impact of this variable on current income
    levels only though economic institutions during
    the colonial period
  • Measure Mortality rate faced by Europeans
    (primarily soldiers, sailors and bishops)

48
The colonial Origins of Comparative Development
An Empirical Investigation
  • Hypothesis
  • (potential) settler mortality ? settlements?
  • early institutions? current institutions?
  • ?current performance
  • Empirical Results
  • Institutions cause growth!!!

49
5. Addressing Errors-in-Variables with IV
  • Remember the classical errors-in-variables
    problem where we observe x1 instead of x1
  • Where x1 x1 e1, and e1 is uncorrelated with
    x1 and x2
  • If there is a z, such that Corr(z,u) 0 and
    Corr(z,x1) ? 0, then IV will remove the
    measurement error bias

50
6. Testing for Endogeneity
  • Since OLS is preferred to IV if we do not have an
    endogeneity problem, then wed like to be able to
    test for endogeneity
  • If we do not have endogeneity, both OLS and IV
    are consistent
  • Idea of Hausman test is to see if the estimates
    from OLS and IV are different.

51
Testing for Endogeneity (cont)
  • While its a good idea to see if IV and OLS have
    different implications, its easier to use a
    regression test for endogeneity
  • If y2 is endogenous, then v2 (from the reduced
    form equation) and u1 from the structural model
    will be correlated
  • The test is based on this observation

52
Testing for Endogeneity (cont)
  • Save the residuals from the first stage
  • Include the residual in the structural equation
    (which of course has y2 in it)
  • If the coefficient on the residual is
    statistically different from zero, reject the
    null of exogeneity
  • If multiple endogenous variables, jointly test
    the residuals from each first stage

53
7. Testing Overidentifying Restrictions
  • If there is just one instrument for our
    endogenous variable, we cant test whether the
    instrument is uncorrelated with the error
  • We say the model is just identified
  • If we have multiple instruments, it is possible
    to test the overidentifying restrictions to see
    if some of the instruments are correlated with
    the error

54
Testing Overidentifying Restrictions
  • Estimate the structural model using IV and obtain
    the residuals
  • Regress the residuals on all the exogenous
    variables and obtain the R2 to form nR2
  • Under the null that all instruments are
    uncorrelated with the error, LM cq2 where q is
    the number of extra instruments

55
8. Testing for Heteroskedasticity
  • When using 2SLS, we need a slight adjustment to
    the Breusch-Pagan test
  • Get the residuals from the IV estimation
  • Regress these residuals squared on all of the
    exogenous variables in the model (including the
    instruments)
  • Test for the joint significance

56
9. Testing for Serial Correlation
  • When using 2SLS, we need a slight adjustment to
    the test for serial correlation
  • Get the residuals from the IV estimation
  • Re-estimate the structural model by 2SLS,
    including the lagged residuals, and using the
    same instruments as originally
  • Can do 2SLS on a quasi-differenced model, using
    quasi-differenced instruments
Write a Comment
User Comments (0)
About PowerShow.com