Estimating Causal Effects with Experimental Data

About This Presentation

Title:

Estimating Causal Effects with Experimental Data

Description:

... estimate of treatment effect from ... Just need to collect data on treatment/control and outcome variables ... Can also use W at stage of assigning treatment ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 62

Provided by: alanm160

Category:

more less

Transcript and Presenter's Notes

Title: Estimating Causal Effects with Experimental Data

1
Estimating Causal Effects with Experimental Data
2
Some Basic Terminology

Start with example where X is binary (though
simple to generalize)
X0 is control group
X1 is treatment group
Causal effect sometimes called treatment effect
Randomization implies everyone has same
probability of treatment

3
Why is Randomization Good?

If X allocated at random then know that X is
independent of all pre-treatment variables in
whole wide world
An amazing claim but true.
Implies there cannot be a problem of omitted
variables, reverse causality etc
On average, only reason for difference between
treatment and control group is different receipt
of treatment

4
Proposition 2.1Pre-treatment characteristics
must be independent of randomized treatment

Proof Joint distribution of X and W is f(X,W)
Can decompose this into
f(X,W)fXW (XW)fW(W)
Now random assignment means
fXW (XW)fX (X)
This implies
f(X,W)fX (X)fW(W)
This implies X and W independent

5
Why is this useful?An Example Racial
Discrimination

Black men earn less than white men in US
LOGWAGE Coef. Std. Err. t
------------------------------------------
BLACK -.1673813 .0066708 -25.09
NO_HS -.2138331 .0077192 -27.70
SOMECOLL .1104148 .0049139 22.47
COLLEGE .4660205 .0048839 95.42
AGE .0704488 .0008552 82.38
AGESQUARED -.0007227 .0000101 -71.41
_cons 1.088116 .0172715 63.00
Could be discrimination or other factors
unobserved by the researcher but observed by the
employer?
Hard to fully resolve with non-experimental data

6
An Experimental Design

Bertrand/Mullainathan Are Emily and Greg More
Employable Than Lakisha and Jamal, American
Economic Review, 2004
Create fake CVs and send replies to job adverts
Allocate names at random to CVs some given
black-sounding names, others white-sounding

Outcome variable is call-back rates
Interpretation not direct measure of racial
discrimination, just effect of having a
black-sounding name may have other
connotations.
But name uncorrelated by construction with other
material on CV

8
The Treatment Effect

Want estimate of

9
Estimating Treatment Effects the Statistics
Course Approach

Take mean of outcome variable in treatment group
Take mean of outcome variable in control group
Take difference between the two
No problems but
Does not generalize to where X is not binary
Does not directly compute standard errors

10
Estimating Treatment Effects A Regression
Approach

Run regression
yiß0ß1Xiei
Proposition 2.2 The OLS estimator of ß1 is an
unbiased estimator of the causal effect of X on
y
Proof Many ways to prove this but simplest way
is perhaps
Proposition 1.1 says OLS estimates E(yX)
E(yX0) ß0 so OLS estimate of intercept is
consistent estimate of E(yX0)
E(yX1) ß0ß1 so ß1 is consistent estimate of
E(yX1) -E(yX0)
Hence can read off estimate of treatment effect
from coefficient on X
Approach easily generalizes to where X is not
binary
Also gives estimate of standard error

11
Computing Standard Errors

Unless told otherwise regression package will
compute standard errors assuming errors are
homoskedastic i.e.
Even if only interested in effect of treatment on
mean X may affect other aspects of distribution
e.g. variance
This will cause heteroskedasticity
Heteroskedasticity does not make OLS regression
coefficients inconsistent but does make OLS
standard errors inconsistent

12
Robust Standard Errors

Also called
Huber standard errors
White standard errors
Heteroskedastic-consistent standard errors
Statistics course approach
Get variance of estimate of mean of treatment and
control group
Sum to give estimate of variance of difference in
means

13
A Regression-Based Approach

Can estimate this by using sample equivalents
Note that this is same as OLS standard errors if
X and e are independent

14
Proposition 2.3If e and X are independent the
OLS formula for the standard errors will be
consistent even if the variance of e differs
across individuals.

Proof If e and X are independent
Putting this in expression for asymptotic
variance of OLS estimator
A consistent estimate of the final term is the
mean of the squared residuals i.e. usual estimate
of s2

15
A Regression-Based Approach

Have to interpret residual variance differently
not common to all individuals but the mean across
individuals
With one regressor can write robust standard
error as
Simple to use in practice e.g. in STATA
. reg y x, robust

16
Bertrand/MullainathanBasic Results
17
Summary So Far

Econometrics very easy if all data comes from
randomized controlled experiment
Just need to collect data on treatment/control
and outcome variables
Just need to compare means of outcomes of
treatment and control groups
Is data on other variables of any use at all?
Not necessary but useful

18
Including Other Regressors

Can get consistent estimate of treatment effect
without worrying about other variables
Reason is that randomization ensures no problem
of omitted variables bias
But there are reasons to include other
regressors
Improved efficiency
Check for randomization
Improve randomization
Control for conditional randomization
Heterogeneity in treatment effects

19
The Uses of Other Regressors I Improved
Efficiency

Dont just want consistent estimate of causal
effect also want low standard error (or high
precision or efficiency).
Standard formula for standard error of OLS
estimate of ß is s2(XX)-1
s2 comes from variance of residual in regression
(1-R2) Var(y)

20
Proposition 2.4The asymptotic variance of ß is
lower when W is included

Proof (Will only do case where X and W are
one-dimensional)
When W is included variance of the estimate of
the treatment effect will be first diagonal
element of

21
Proof (continued)

Now
Using trick from end of notes on causal effects
we can write this as

22
Proof (continued)

Inverting leads to
By randomization X and W are independent so
The only difference is in the error variance
this must be smaller when W is included as R2
rises

23
The Uses of Other Regressors II Check for
Randomization

Randomization can go wrong
Poor implementation of research design
Bad luck
If randomization done well then W should be
independent of X this is testable
Test for differences in W in treatment/control
groups
Probit model for X on W or regress W on X.

24
The Uses of Other Regressors IIIImprove
Randomization

Can also use W at stage of assigning treatment
Can guarantee that in your sample X and W are
independent instead of it being just
probabilistic
This is what Bertrand/Mullainathan do when
assigning names to CVs

25
The Uses of Other Regressors IVAdjust for
Conditional Randomization

This is case where must include W to get
consistent estimates of treatment effects
Conditional randomization is where probability of
treatment is different for people with different
values of W, but random conditional on W
Why have conditional randomization?
May have no choice
May want to do it (c.f. stratification)

26
An Example Project STAR

Allocation of students to classes is random
within schools
But small number of classes per school
This leads to following relationship between
probability of treatment and number of kids in
school

27
Controlling for Conditional Randomization

X can now be correlated with W
But, conditional on W, X independent of other
factors
But must get functional form of relationship
between y and W correct matching procedures
This is not the case with (unconditional)
randomization see class exercise

28
Heterogeneity in Treatment Effects

So far have assumed causal (treatment) effect the
same for everyone
No good reason to believe this
Start with case of no other regressors
yiß0ß1iXiei
Random assignment implies X independent of ß1i
Sometimes called random coefficients model

29
What treatment effect to estimate?

Would like to estimate causal effect for everyone
this is not possible
Can only hope to estimate some average
Average treatment effect

30
Proposition 2.5OLS estimates ATE

Proof for single regressor

31
Observable Heterogeneity

Potential outcomes notation
Outcome if in control group
y0i?0Wiu0i
Outcome if in treatment group
y1i?1Wiu1i
Treatment effect is (y1i-y0i) and can be written
as
(y1i-y0i )(?1- ?0 )Wiu1i-u0i
Note treatment effect has observable and
unobservable component
Can estimate as
Two separate equations
One single equation

32
Combining treatment and control groups into
single regression

We can write
Combining outcomes equations leads to
Regression includes W and interactions of W with
X these are observable part of treatment effect
Note error likely to be heteroskedastic

33
Bertrand/Mullainathan

Different treatment effect for high and low
quality CVs

34
Units of Measurement

Causal effect measured in units of experiment
not very helpful
Often want to convert causal effects to more
meaningful units e.g. in Project STAR what is
effect of reducing class size by one child

35
Simple estimator of this would be

where S is class size
Takes the treatment effect on outcome variable
and divides by treatment effect on class size
Not hard to compute but how to get standard
error?

36
IV Can Do the Job

Cant run regression of y on S S influenced by
factors other than treatment status
But X is
Correlated with S
Uncorrelated with unobserved stuff (because of
randomization)
Hence X can be used as an instrument for S
IV estimator has form (just-identified case)

37
The Wald Estimator

This will give estimate of standard error of
treatment effect
Where instrument is binary and no other
regressors included the IV estimate of slope
coefficient can be shown to be

38
Partial Compliance

So far
in control group implies no treatment
In treatment group implies get treatment
Often things are not as clean as this
Treatment is an opportunity
Close substitutes available to those in control
group
Implementation not perfect e.g. pushy parents

39
An Example Moving to Opportunity

Designed to investigate the impact of living in
bad neighbourhoods on outcomes
Gave some residents of public housing projects
chance to move out
Two treatments
Voucher for private rental housing
Voucher for private rental housing restricted for
use in good neighbourhoods
No-one forced to move so imperfect compliance
60 and 40 did use it

40
Some Terminology

Z denotes whether in control or treatment group
intention-to-treat
X denotes whether actually get treatment
With perfect compliance
Pr(X1Z1)1
Pr(X1Z0)0
With imperfect compliance
1gtPr(X1Z1)gtPr(X1Z0)gt0

41
What Do We Want to Estimate?

Intention-to-Treat
ITTE(yZ1)-E(yZ0)
This can be estimated in usual way
Treatment Effect on Treated

42
Estimating TOT

Cant use simple regression of y on Z
But should recognize TOT as Wald estimator
Can estimated by regressing y on X using Z as
instrument
Relationship between TOT and ITT

43
Most Important Results from MTO

No effects on adult economic outcomes
Improvements in adult mental health
Beneficial outcomes for teenage girls
Adverse outcomes for teenage boys

44
Sample results from MTO

TOT approximately twice the size of ITT
Consistent with 50 use of vouchers

45
IV with Heterogeneous Treatment Effects

If treatment effect same for everyone then TOT
recovers this (obvious)
But what if treatment effect heterogeneous?
No simple answer to this question
Suppose model for treatment effect is

46
Proposition 2.6The IV estimate for the
heterogeneous treatment case is a consistent
estimate ofwherethe difference in the
probability of treatment for individual i when in
treatment and control group
47
Proof

Model for effect of intention to treat on being
treated

48
Proof (continued)

Can write reduced-form as
Wald estimator then becomes
As

49
Hence Wald estimator can be thought of as
estimator as

This is weighted average of treatment effects
weights will vary with instrument contrast
with heterogeneous treatment case
Some cases in which can interpret IV estimate as
ATE

50
Proposition 2.7 IV estimate is ATE if a. no
heterogeneity in treatment effectb. ß1i
uncorrelated with pi

Proof
A. This should be obvious as
B. Can write as

51
How will IV estimate differ from ATE

Previous formula says depends on covariance of
ß1i and pi
In some situations can sign but not always
Example 1 no-one gets treatment in the absence
of the programme so
If those who get treatment when in the treatment
group are those with the highest returns then
IVgtATE

Example 2 treatment is voluntary for those in
the control group but compulsory for those in the
treatment group
This implies
If those who get treatment in control are those
with highest returns then
IVltATE

53
Angrist/Imbens Monotonicity Assumption

Case where IV estimate is not ATE
Assume that everyone moved in same direction by
treatment monotonicity assumption
Then can show that IV is average of treatment
effect for those whose behaviour changed by being
in treatment group
They call this the Local Average Treatment Effect
(LATE)

54
Spill-overs/ Externalities /General Equilibrium
Effects

Have assumed that treatment only affects outcome
for person for receives it
Many situations in which this is not true
E.g. externalities, spill-overs, effects on
market prices
Example Miguel and Kremer, Worms Identifying
Impacts on Education and Health in the Presence
of Treatment Externalities, Econometrica 2004

55
Background

Infection from intestinal worms is rife among
Kenyan schoolchildren
Major cause of school absence
Leads to lower human capital accumulation, lower
growth?
Investigation of effectiveness of anti-worming
drugs on health, education

56
Existing studies

Randomize drug treatment within schools
But probability of re-infection affected by
infection rate among contacts I.e. externalities
very likely
This research design will not capture these
effects
To see this, consider model

57
Miguel/Kremer Methodology

Existing methodology cannot measure externality
only individual effect
Randomize treatment across schools not
individuals
This can identify ß1 ß2
Could have had design in which randomized
proportion of individuals within schools getting
treatment

58
Typical Result

Cannot separate externality from direct effect
but this is important for public policy
Have non-experimental approach to this using
fact that not all kids from same village go to
same school
This gives variation in X

59
Some examples of how they do this