Title: Estimating Causal Effects with Experimental Data
1Estimating Causal Effects with Experimental Data
2Some Basic Terminology
- Start with example where X is binary (though
simple to generalize) - X0 is control group
- X1 is treatment group
- Causal effect sometimes called treatment effect
- Randomization implies everyone has same
probability of treatment
3Why is Randomization Good?
- If X allocated at random then know that X is
independent of all pre-treatment variables in
whole wide world - an amazing claim but true.
- Implies there cannot be a problem of omitted
variables, reverse causality etc - On average, only reason for difference between
treatment and control group is different receipt
of treatment
4Why is this useful?An Example Racial
Discrimination
- Black men earn less than white men in US
- LOGWAGE Coef. Std. Err. t
- ------------------------------------------
- BLACK -.1673813 .0066708 -25.09
- NO_HS -.2138331 .0077192 -27.70
- SOMECOLL .1104148 .0049139 22.47
- COLLEGE .4660205 .0048839 95.42
- AGE .0704488 .0008552 82.38
- AGESQUARED -.0007227 .0000101 -71.41
- _cons 1.088116 .0172715 63.00
- Could be discrimination or other factors
unobserved by the researcher but observed by the
employer? - hard to fully resolve with non-experimental data
5An Experimental Design
- Bertrand/Mullainathan Are Emily and Greg More
Employable Than Lakisha and Jamal, American
Economic Review, 2004 - Create fake CVs and send replies to job adverts
- Allocate names at random to CVs some given
black-sounding names, others white-sounding
6- Outcome variable is call-back rates
- Interpretation not direct measure of racial
discrimination, just effect of having a
black-sounding name may have other
connotations. - But name uncorrelated by construction with other
material on CV
7The Treatment Effect
8Estimating Treatment Effects the Statistics
Course Approach
- Take mean of outcome variable in treatment group
- Take mean of outcome variable in control group
- Take difference between the two
- No problems but
- Does not generalize to where X is not binary
- Does not directly compute standard errors
9Estimating Treatment Effects A Regression
Approach
- Run regression
- yiß0ß1Xiei
- Proposition 2.2 The OLS estimator of ß1 is an
unbiased estimator of the causal effect of X on
y - Proof Many ways to prove this but simplest way
is perhaps - Proposition 1.1 says OLS estimates E(yX)
- E(yX0) ß0 so OLS estimate of intercept is
consistent estimate of E(yX0) - E(yX1) ß0ß1 so ß1 is consistent estimate of
E(yX1) -E(yX0) - Hence can read off estimate of treatment effect
from coefficient on X - Approach easily generalizes to where X is not
binary - Also gives estimate of standard error
10Computing Standard Errors
- Unless told otherwise regression package will
compute standard errors assuming errors are
homoskedastic i.e. - Even if only interested in effect of treatment on
mean X may affect other aspects of distribution
e.g. variance - This will cause heteroskedasticity
- Heteroskedasticity does not make OLS regression
coefficients inconsistent but does make OLS
standard errors inconsistent
11Robust Standard Errors
- Also called
- Huber standard errors
- White standard errors
- Heteroskedastic-consistent standard errors
- Simple to use in practice e.g. in STATA
- . reg y x, robust
- Statistics course approach
- Get variance of estimate of mean of treatment and
control group - Sum to give estimate of variance of difference in
means
12Bertrand/MullainathanBasic Results
13Summary So Far
- Econometrics very easy if all data comes from
randomized controlled experiment - Just need to collect data on treatment/control
and outcome variables - Just need to compare means of outcomes of
treatment and control groups - Is data on other variables of any use at all?
- Not necessary but useful
14Including Other Regressors
- Can get consistent estimate of treatment effect
without worrying about other variables - Reason is that randomization ensures no problem
of omitted variables bias - But there are reasons to include other
regressors - Improved efficiency
- Check for randomization
- Improve randomization
- Control for conditional randomization
- Heterogeneity in treatment effects
15The Uses of Other Regressors I Improved
Efficiency
- Dont just want consistent estimate of causal
effect also want low standard error (or high
precision or efficiency). - Standard formula for standard error of OLS
estimate of ß is s2/Var(X) - s2 comes from variance of residual in regression
(1-R2) Var(y) - Include more variables and R2 rises formal
proof (Proposition 2.4) a bit more complicated
but this is basic idea.
16The Uses of Other Regressors II Check for
Randomization
- Randomization can go wrong
- Poor implementation of research design
- Bad luck
- If randomization done well then W should be
independent of X this is testable - Test for differences in W in treatment/control
groups - Probit model for X on W
17The Uses of Other Regressors IIIImprove
Randomization
- Can also use W at stage of assigning treatment
- Can guarantee that in your sample X and W are
independent instead of it being just
probabiliistic - This is what Bertrand/Mullainathan do when
assigning names to CVs
18The Uses of Other Regressors IVAdjust for
Conditional Randomization
- This is case where must include W to get
consistent estimates of treatment effects - Conditional randomization is where probability of
treatment is different for people with different
values of W, but random conditional on W - Why have conditional randomization?
- May have no choice
- May want to do it (c.f. stratification)
19An Example Project STAR
- Allocation of students to classes is random
within schools - But small number of classes per school
- This leads to following relationship between
probability of treatment and number of kids in
school
20Controlling for Conditional Randomization
- X can know be correlated with W
- But, conditional on W, X independent of other
factors - But must get functional form of relationship
between y and W correct matching procedures - This is not the case with (unconditional)
randomization see class exercize
21Heterogeneity in Treatment Effects
- So far have assumed causal (treatment) effect the
same for everyone - No good reason to believe this
- Start with case of no other regressors
- yiß0ß1iXiei
- Random assignment implies X independent of ß1i
- Sometimes called random coefficients model
22What treatment effect to estimate?
- Would like to estimate causal effect for everyone
this is not possible Hollands fundamental
problem of statistical inference - Can only hope to estimate some average
- Average treatment effect
- Proposition 2.5 OLS estimates ATE
23Observable Heterogeneity
- Full outcomes notation
- Outcome if in control group
- y0i?0Wiu0i
- Outcome if in treatment group
- y1i?1Wiu1i
- Treatment effect is (y1i-y0i) and can be written
as - (y1i-y0i )(?1- ?0 )Wiu1i-u0i
- Note treatment effect has observable and
unobservable component - Can estimate as
- Two separate equations
- One single equation
24Combining treatment and control groups into
single regression
- We can write
- Combining outcomes equations leads to
- Regression includes W and interactions of W with
X these are observable part of treatment effect - Note error likely to be heteroskedastic
25Bertrand/Mullainathan
- Different treatment effect for high and low
quality CVs
26Units of Measurement
- Causal effect measured in units of experiment
not very helpful - Often want to convert causal effects to more
meaningful units e.g. in Project STAR what is
effect of reducing class size by one child
27Simple estimator of this would be
- where S is class size
- Takes the treatment effect on outcome variable
and divides by treatment effect on class size - Not hard to compute but how to get standard
error?
28IV Can Do the Job
- Cant run regression of y on S S influenced by
factors other than treatment status - But X is
- Correlated with S
- Uncorrelated with unobserved stuff (because of
randomization) - Hence X can be used as an instrument for S
- IV estimator has form (just-identified case)
29The Wald Estimator
- This will give estimate of standard error of
treatment effect - Where instrument is binary and no other
regressors included the IV estimate of slope
coefficient can be shown to be
30Partial Compliance
- So far
- in control group implies no treatment
- In treatment group implies get treatment
- Often things are not as clean as this
- Treatment is an opportunity
- Close substitutes available to those in control
group - Implementation not perfect e.g. pushy parents
31An Example Moving to Opportunity
- Designed to investigate the impact of living in
bad neighbourhoods on outcomes - Gave some residents of public housing projects
chance to move out - Two treatments
- Voucher for private rental housing
- Voucher for private rental housing restricted for
use in good neighbourhoods - No-one forced to move so imperfect compliance
60 and 40 did use it
32Some Terminology
- Z denotes whether in control or treatment group
intention-to-treat - X denotes whether actually get treatment
- With perfect compliance
- Pr(X1Z1)1
- Pr(X1Z0)0
- With imperfect compliance
- 1gtPr(X1Z1)gtPr(X1Z0)gt0
33What Do We Want to Estimate?
- Intention-to-Treat
- ITTE(yZ1)-E(yZ0)
- This can be estimated in usual way
- Treatment Effect on Treated
34Estimating TOT
- Cant use simple regression of y on Z
- But should recognize TOT as Wald estimator
- Can estimated by regressing y on X using Z as
instrument - Relationship between TOT and ITT
35Most Important Results from MTO
- No effects on adult economic outcomes
- Improvements in adult mental health
- Beneficial outcomes for teenage girls
- Adverse outcomes for teenage boys
36Sample results from MTO
- TOT approximately twice the size of ITT
- Consistent with 50 use of vouchers
37IV with Heterogeneous Treatment Effects (More
Difficult)
- If treatment effect same for everyone then TOT
recovers this (obvious) - But what if treatment effect heterogeneous?
- No simple answer to this question
- Suppose model for treatment effect is
38Proposition 2.6The IV estimate for the
heterogeneous treatment case is a consistent
estimate ofwherethe difference in the
probability of treatment for individual i when in
treatment and control group
39Interpretation
- This is weighted average of treatment effects
- weights will vary with instrument contrast
with heterogeneous treatment case - Some cases in which can interpret IV estimate as
ATE
40How will IV estimate differ from ATE
- IV is ATE if no correlation between ß1i and pi
- Previous formula says depends on covariance of
ß1i and pi - In some situations can sign but not always
- Example 1 no-one gets treatment in the absence
of the programme so - If those who get treatment when in the treatment
group are those with the highest returns then - IVgtATE
41- Example 2 treatment is voluntary for those in
the control group but compulsory for those in the
treatment group - This implies
- If those who get treatment in control are those
with highest returns then - IVltATE
42Angrist/Imbens Monotonicity Assumption
- Case where IV estimate is not ATE
- Assume that everyone moved in same direction by
treatment monotonicity assumption - Then can show that IV is average of treatment
effect for those whose behaviour changed by being
in treatment group - They call this the Local Average Treatment Effect
(LATE)
43Problems with Experiments
- Expense
- Ethical Issues
- Threats to Internal Validity
- Failure to follow experiment
- Experimental effects (Hawthorne effects)
- Threats to External Validity
- Non-representative programme
- Non-representative sample
- Scale effects
44Conclusions on Experiments
- Are gold standard of empirical research
- Are becoming more common
- Not enough of them to keep us busy
- Study of non-experimental data can deliver useful
knowledge - Some issues similar, others different