Title: Estimating Causal Effects with Experimental Data
1Estimating Causal Effects with Experimental Data
2Some Basic Terminology
- Start with example where X is binary (though
simple to generalize) - X0 is control group
- X1 is treatment group
- Causal effect sometimes called treatment effect
- Randomization implies everyone has same
probability of treatment
3Why is Randomization Good?
- If X allocated at random then know that X is
independent of all pre-treatment variables in
whole wide world - An amazing claim but true.
- Implies there cannot be a problem of omitted
variables, reverse causality etc - On average, only reason for difference between
treatment and control group is different receipt
of treatment
4Proposition 2.1Pre-treatment characteristics
must be independent of randomized treatment
- Proof Joint distribution of X and W is f(X,W)
- Can decompose this into
- f(X,W)fXW (XW)fW(W)
- Now random assignment means
- fXW (XW)fX (X)
- This implies
- f(X,W)fX (X)fW(W)
- This implies X and W independent
5Why is this useful?An Example Racial
Discrimination
- Black men earn less than white men in US
- LOGWAGE Coef. Std. Err. t
- ------------------------------------------
- BLACK -.1673813 .0066708 -25.09
- NO_HS -.2138331 .0077192 -27.70
- SOMECOLL .1104148 .0049139 22.47
- COLLEGE .4660205 .0048839 95.42
- AGE .0704488 .0008552 82.38
- AGESQUARED -.0007227 .0000101 -71.41
- _cons 1.088116 .0172715 63.00
- Could be discrimination or other factors
unobserved by the researcher but observed by the
employer? - Hard to fully resolve with non-experimental data
6An Experimental Design
- Bertrand/Mullainathan Are Emily and Greg More
Employable Than Lakisha and Jamal, American
Economic Review, 2004 - Create fake CVs and send replies to job adverts
- Allocate names at random to CVs some given
black-sounding names, others white-sounding
7- Outcome variable is call-back rates
- Interpretation not direct measure of racial
discrimination, just effect of having a
black-sounding name may have other
connotations. - But name uncorrelated by construction with other
material on CV
8The Treatment Effect
9Estimating Treatment Effects the Statistics
Course Approach
- Take mean of outcome variable in treatment group
- Take mean of outcome variable in control group
- Take difference between the two
- No problems but
- Does not generalize to where X is not binary
- Does not directly compute standard errors
10Estimating Treatment Effects A Regression
Approach
- Run regression
- yiß0ß1Xiei
- Proposition 2.2 The OLS estimator of ß1 is an
unbiased estimator of the causal effect of X on
y - Proof Many ways to prove this but simplest way
is perhaps - Proposition 1.1 says OLS estimates E(yX)
- E(yX0) ß0 so OLS estimate of intercept is
consistent estimate of E(yX0) - E(yX1) ß0ß1 so ß1 is consistent estimate of
E(yX1) -E(yX0) - Hence can read off estimate of treatment effect
from coefficient on X - Approach easily generalizes to where X is not
binary - Also gives estimate of standard error
11Computing Standard Errors
- Unless told otherwise regression package will
compute standard errors assuming errors are
homoskedastic i.e. - Even if only interested in effect of treatment on
mean X may affect other aspects of distribution
e.g. variance - This will cause heteroskedasticity
- Heteroskedasticity does not make OLS regression
coefficients inconsistent but does make OLS
standard errors inconsistent
12Robust Standard Errors
- Also called
- Huber standard errors
- White standard errors
- Heteroskedastic-consistent standard errors
- Statistics course approach
- Get variance of estimate of mean of treatment and
control group - Sum to give estimate of variance of difference in
means
13A Regression-Based Approach
- Can estimate this by using sample equivalents
- Note that this is same as OLS standard errors if
X and e are independent
14Proposition 2.3If e and X are independent the
OLS formula for the standard errors will be
consistent even if the variance of e differs
across individuals.
- Proof If e and X are independent
- Putting this in expression for asymptotic
variance of OLS estimator - A consistent estimate of the final term is the
mean of the squared residuals i.e. usual estimate
of s2
15A Regression-Based Approach
- Have to interpret residual variance differently
not common to all individuals but the mean across
individuals - With one regressor can write robust standard
error as - Simple to use in practice e.g. in STATA
- . reg y x, robust
16Bertrand/MullainathanBasic Results
17Summary So Far
- Econometrics very easy if all data comes from
randomized controlled experiment - Just need to collect data on treatment/control
and outcome variables - Just need to compare means of outcomes of
treatment and control groups - Is data on other variables of any use at all?
- Not necessary but useful
18Including Other Regressors
- Can get consistent estimate of treatment effect
without worrying about other variables - Reason is that randomization ensures no problem
of omitted variables bias - But there are reasons to include other
regressors - Improved efficiency
- Check for randomization
- Improve randomization
- Control for conditional randomization
- Heterogeneity in treatment effects
19The Uses of Other Regressors I Improved
Efficiency
- Dont just want consistent estimate of causal
effect also want low standard error (or high
precision or efficiency). - Standard formula for standard error of OLS
estimate of ß is s2(XX)-1 - s2 comes from variance of residual in regression
(1-R2) Var(y)
20Proposition 2.4The asymptotic variance of ß is
lower when W is included
- Proof (Will only do case where X and W are
one-dimensional) - When W is included variance of the estimate of
the treatment effect will be first diagonal
element of
21Proof (continued)
- Now
- Using trick from end of notes on causal effects
we can write this as
22Proof (continued)
- Inverting leads to
- By randomization X and W are independent so
- The only difference is in the error variance
this must be smaller when W is included as R2
rises
23The Uses of Other Regressors II Check for
Randomization
- Randomization can go wrong
- Poor implementation of research design
- Bad luck
- If randomization done well then W should be
independent of X this is testable - Test for differences in W in treatment/control
groups - Probit model for X on W or regress W on X.
24The Uses of Other Regressors IIIImprove
Randomization
- Can also use W at stage of assigning treatment
- Can guarantee that in your sample X and W are
independent instead of it being just
probabilistic - This is what Bertrand/Mullainathan do when
assigning names to CVs
25The Uses of Other Regressors IVAdjust for
Conditional Randomization
- This is case where must include W to get
consistent estimates of treatment effects - Conditional randomization is where probability of
treatment is different for people with different
values of W, but random conditional on W - Why have conditional randomization?
- May have no choice
- May want to do it (c.f. stratification)
26An Example Project STAR
- Allocation of students to classes is random
within schools - But small number of classes per school
- This leads to following relationship between
probability of treatment and number of kids in
school
27Controlling for Conditional Randomization
- X can now be correlated with W
- But, conditional on W, X independent of other
factors - But must get functional form of relationship
between y and W correct matching procedures - This is not the case with (unconditional)
randomization see class exercise
28Heterogeneity in Treatment Effects
- So far have assumed causal (treatment) effect the
same for everyone - No good reason to believe this
- Start with case of no other regressors
- yiß0ß1iXiei
- Random assignment implies X independent of ß1i
- Sometimes called random coefficients model
29What treatment effect to estimate?
- Would like to estimate causal effect for everyone
this is not possible - Can only hope to estimate some average
- Average treatment effect
30Proposition 2.5OLS estimates ATE
- Proof for single regressor
31Observable Heterogeneity
- Potential outcomes notation
- Outcome if in control group
- y0i?0Wiu0i
- Outcome if in treatment group
- y1i?1Wiu1i
- Treatment effect is (y1i-y0i) and can be written
as - (y1i-y0i )(?1- ?0 )Wiu1i-u0i
- Note treatment effect has observable and
unobservable component - Can estimate as
- Two separate equations
- One single equation
32Combining treatment and control groups into
single regression
- We can write
- Combining outcomes equations leads to
- Regression includes W and interactions of W with
X these are observable part of treatment effect - Note error likely to be heteroskedastic
33Bertrand/Mullainathan
- Different treatment effect for high and low
quality CVs
34Units of Measurement
- Causal effect measured in units of experiment
not very helpful - Often want to convert causal effects to more
meaningful units e.g. in Project STAR what is
effect of reducing class size by one child
35Simple estimator of this would be
- where S is class size
- Takes the treatment effect on outcome variable
and divides by treatment effect on class size - Not hard to compute but how to get standard
error?
36IV Can Do the Job
- Cant run regression of y on S S influenced by
factors other than treatment status - But X is
- Correlated with S
- Uncorrelated with unobserved stuff (because of
randomization) - Hence X can be used as an instrument for S
- IV estimator has form (just-identified case)
37The Wald Estimator
- This will give estimate of standard error of
treatment effect - Where instrument is binary and no other
regressors included the IV estimate of slope
coefficient can be shown to be
38Partial Compliance
- So far
- in control group implies no treatment
- In treatment group implies get treatment
- Often things are not as clean as this
- Treatment is an opportunity
- Close substitutes available to those in control
group - Implementation not perfect e.g. pushy parents
39An Example Moving to Opportunity
- Designed to investigate the impact of living in
bad neighbourhoods on outcomes - Gave some residents of public housing projects
chance to move out - Two treatments
- Voucher for private rental housing
- Voucher for private rental housing restricted for
use in good neighbourhoods - No-one forced to move so imperfect compliance
60 and 40 did use it
40Some Terminology
- Z denotes whether in control or treatment group
intention-to-treat - X denotes whether actually get treatment
- With perfect compliance
- Pr(X1Z1)1
- Pr(X1Z0)0
- With imperfect compliance
- 1gtPr(X1Z1)gtPr(X1Z0)gt0
41What Do We Want to Estimate?
- Intention-to-Treat
- ITTE(yZ1)-E(yZ0)
- This can be estimated in usual way
- Treatment Effect on Treated
42Estimating TOT
- Cant use simple regression of y on Z
- But should recognize TOT as Wald estimator
- Can estimated by regressing y on X using Z as
instrument - Relationship between TOT and ITT
43Most Important Results from MTO
- No effects on adult economic outcomes
- Improvements in adult mental health
- Beneficial outcomes for teenage girls
- Adverse outcomes for teenage boys
44Sample results from MTO
- TOT approximately twice the size of ITT
- Consistent with 50 use of vouchers
45IV with Heterogeneous Treatment Effects
- If treatment effect same for everyone then TOT
recovers this (obvious) - But what if treatment effect heterogeneous?
- No simple answer to this question
- Suppose model for treatment effect is
46Proposition 2.6The IV estimate for the
heterogeneous treatment case is a consistent
estimate ofwherethe difference in the
probability of treatment for individual i when in
treatment and control group
47Proof
- Model for effect of intention to treat on being
treated
48Proof (continued)
- Can write reduced-form as
- Wald estimator then becomes
- As
49Hence Wald estimator can be thought of as
estimator as
- This is weighted average of treatment effects
- weights will vary with instrument contrast
with heterogeneous treatment case - Some cases in which can interpret IV estimate as
ATE
50Proposition 2.7 IV estimate is ATE if a. no
heterogeneity in treatment effectb. ß1i
uncorrelated with pi
- Proof
- A. This should be obvious as
- B. Can write as
51How will IV estimate differ from ATE
- Previous formula says depends on covariance of
ß1i and pi - In some situations can sign but not always
- Example 1 no-one gets treatment in the absence
of the programme so - If those who get treatment when in the treatment
group are those with the highest returns then - IVgtATE
52- Example 2 treatment is voluntary for those in
the control group but compulsory for those in the
treatment group - This implies
- If those who get treatment in control are those
with highest returns then - IVltATE
53Angrist/Imbens Monotonicity Assumption
- Case where IV estimate is not ATE
- Assume that everyone moved in same direction by
treatment monotonicity assumption - Then can show that IV is average of treatment
effect for those whose behaviour changed by being
in treatment group - They call this the Local Average Treatment Effect
(LATE)
54Spill-overs/ Externalities /General Equilibrium
Effects
- Have assumed that treatment only affects outcome
for person for receives it - Many situations in which this is not true
- E.g. externalities, spill-overs, effects on
market prices - Example Miguel and Kremer, Worms Identifying
Impacts on Education and Health in the Presence
of Treatment Externalities, Econometrica 2004
55Background
- Infection from intestinal worms is rife among
Kenyan schoolchildren - Major cause of school absence
- Leads to lower human capital accumulation, lower
growth? - Investigation of effectiveness of anti-worming
drugs on health, education
56Existing studies
- Randomize drug treatment within schools
- But probability of re-infection affected by
infection rate among contacts I.e. externalities
very likely - This research design will not capture these
effects - To see this, consider model
57Miguel/Kremer Methodology
- Existing methodology cannot measure externality
only individual effect - Randomize treatment across schools not
individuals - This can identify ß1 ß2
- Could have had design in which randomized
proportion of individuals within schools getting
treatment
58Typical Result
- Cannot separate externality from direct effect
but this is important for public policy - Have non-experimental approach to this using
fact that not all kids from same village go to
same school - This gives variation in X
59Some examples of how they do this
- Include number of kids in local area who are in
treatment schools
60Problems with Experiments
- Expense
- Ethical Issues
- Threats to Internal Validity
- Failure to follow experiment
- Experimental effects (Hawthorne effects)
- Threats to External Validity
- Non-representative programme
- Non-representative sample
- Scale effects
61Conclusions on Experiments
- Are gold standard of empirical research
- Are becoming more common
- Not enough of them to keep us busy
- Study of non-experimental data can deliver useful
knowledge - Some issues similar, others different