Estimating Causal Effects with Experimental Data - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Estimating Causal Effects with Experimental Data

Description:

... estimate of treatment effect from ... Just need to collect data on treatment/control and outcome variables ... Can also use W at stage of assigning treatment ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 62
Provided by: alanm160
Category:

less

Transcript and Presenter's Notes

Title: Estimating Causal Effects with Experimental Data


1
Estimating Causal Effects with Experimental Data
2
Some Basic Terminology
  • Start with example where X is binary (though
    simple to generalize)
  • X0 is control group
  • X1 is treatment group
  • Causal effect sometimes called treatment effect
  • Randomization implies everyone has same
    probability of treatment

3
Why is Randomization Good?
  • If X allocated at random then know that X is
    independent of all pre-treatment variables in
    whole wide world
  • An amazing claim but true.
  • Implies there cannot be a problem of omitted
    variables, reverse causality etc
  • On average, only reason for difference between
    treatment and control group is different receipt
    of treatment

4
Proposition 2.1Pre-treatment characteristics
must be independent of randomized treatment
  • Proof Joint distribution of X and W is f(X,W)
  • Can decompose this into
  • f(X,W)fXW (XW)fW(W)
  • Now random assignment means
  • fXW (XW)fX (X)
  • This implies
  • f(X,W)fX (X)fW(W)
  • This implies X and W independent

5
Why is this useful?An Example Racial
Discrimination
  • Black men earn less than white men in US
  • LOGWAGE Coef. Std. Err. t
  • ------------------------------------------
  • BLACK -.1673813 .0066708 -25.09
  • NO_HS -.2138331 .0077192 -27.70
  • SOMECOLL .1104148 .0049139 22.47
  • COLLEGE .4660205 .0048839 95.42
  • AGE .0704488 .0008552 82.38
  • AGESQUARED -.0007227 .0000101 -71.41
  • _cons 1.088116 .0172715 63.00
  • Could be discrimination or other factors
    unobserved by the researcher but observed by the
    employer?
  • Hard to fully resolve with non-experimental data

6
An Experimental Design
  • Bertrand/Mullainathan Are Emily and Greg More
    Employable Than Lakisha and Jamal, American
    Economic Review, 2004
  • Create fake CVs and send replies to job adverts
  • Allocate names at random to CVs some given
    black-sounding names, others white-sounding

7
  • Outcome variable is call-back rates
  • Interpretation not direct measure of racial
    discrimination, just effect of having a
    black-sounding name may have other
    connotations.
  • But name uncorrelated by construction with other
    material on CV

8
The Treatment Effect
  • Want estimate of

9
Estimating Treatment Effects the Statistics
Course Approach
  • Take mean of outcome variable in treatment group
  • Take mean of outcome variable in control group
  • Take difference between the two
  • No problems but
  • Does not generalize to where X is not binary
  • Does not directly compute standard errors

10
Estimating Treatment Effects A Regression
Approach
  • Run regression
  • yiß0ß1Xiei
  • Proposition 2.2 The OLS estimator of ß1 is an
    unbiased estimator of the causal effect of X on
    y
  • Proof Many ways to prove this but simplest way
    is perhaps
  • Proposition 1.1 says OLS estimates E(yX)
  • E(yX0) ß0 so OLS estimate of intercept is
    consistent estimate of E(yX0)
  • E(yX1) ß0ß1 so ß1 is consistent estimate of
    E(yX1) -E(yX0)
  • Hence can read off estimate of treatment effect
    from coefficient on X
  • Approach easily generalizes to where X is not
    binary
  • Also gives estimate of standard error

11
Computing Standard Errors
  • Unless told otherwise regression package will
    compute standard errors assuming errors are
    homoskedastic i.e.
  • Even if only interested in effect of treatment on
    mean X may affect other aspects of distribution
    e.g. variance
  • This will cause heteroskedasticity
  • Heteroskedasticity does not make OLS regression
    coefficients inconsistent but does make OLS
    standard errors inconsistent

12
Robust Standard Errors
  • Also called
  • Huber standard errors
  • White standard errors
  • Heteroskedastic-consistent standard errors
  • Statistics course approach
  • Get variance of estimate of mean of treatment and
    control group
  • Sum to give estimate of variance of difference in
    means

13
A Regression-Based Approach
  • Can estimate this by using sample equivalents
  • Note that this is same as OLS standard errors if
    X and e are independent

14
Proposition 2.3If e and X are independent the
OLS formula for the standard errors will be
consistent even if the variance of e differs
across individuals.
  • Proof If e and X are independent
  • Putting this in expression for asymptotic
    variance of OLS estimator
  • A consistent estimate of the final term is the
    mean of the squared residuals i.e. usual estimate
    of s2

15
A Regression-Based Approach
  • Have to interpret residual variance differently
    not common to all individuals but the mean across
    individuals
  • With one regressor can write robust standard
    error as
  • Simple to use in practice e.g. in STATA
  • . reg y x, robust

16
Bertrand/MullainathanBasic Results
17
Summary So Far
  • Econometrics very easy if all data comes from
    randomized controlled experiment
  • Just need to collect data on treatment/control
    and outcome variables
  • Just need to compare means of outcomes of
    treatment and control groups
  • Is data on other variables of any use at all?
  • Not necessary but useful

18
Including Other Regressors
  • Can get consistent estimate of treatment effect
    without worrying about other variables
  • Reason is that randomization ensures no problem
    of omitted variables bias
  • But there are reasons to include other
    regressors
  • Improved efficiency
  • Check for randomization
  • Improve randomization
  • Control for conditional randomization
  • Heterogeneity in treatment effects

19
The Uses of Other Regressors I Improved
Efficiency
  • Dont just want consistent estimate of causal
    effect also want low standard error (or high
    precision or efficiency).
  • Standard formula for standard error of OLS
    estimate of ß is s2(XX)-1
  • s2 comes from variance of residual in regression
    (1-R2) Var(y)

20
Proposition 2.4The asymptotic variance of ß is
lower when W is included
  • Proof (Will only do case where X and W are
    one-dimensional)
  • When W is included variance of the estimate of
    the treatment effect will be first diagonal
    element of

21
Proof (continued)
  • Now
  • Using trick from end of notes on causal effects
    we can write this as

22
Proof (continued)
  • Inverting leads to
  • By randomization X and W are independent so
  • The only difference is in the error variance
    this must be smaller when W is included as R2
    rises

23
The Uses of Other Regressors II Check for
Randomization
  • Randomization can go wrong
  • Poor implementation of research design
  • Bad luck
  • If randomization done well then W should be
    independent of X this is testable
  • Test for differences in W in treatment/control
    groups
  • Probit model for X on W or regress W on X.

24
The Uses of Other Regressors IIIImprove
Randomization
  • Can also use W at stage of assigning treatment
  • Can guarantee that in your sample X and W are
    independent instead of it being just
    probabilistic
  • This is what Bertrand/Mullainathan do when
    assigning names to CVs

25
The Uses of Other Regressors IVAdjust for
Conditional Randomization
  • This is case where must include W to get
    consistent estimates of treatment effects
  • Conditional randomization is where probability of
    treatment is different for people with different
    values of W, but random conditional on W
  • Why have conditional randomization?
  • May have no choice
  • May want to do it (c.f. stratification)

26
An Example Project STAR
  • Allocation of students to classes is random
    within schools
  • But small number of classes per school
  • This leads to following relationship between
    probability of treatment and number of kids in
    school

27
Controlling for Conditional Randomization
  • X can now be correlated with W
  • But, conditional on W, X independent of other
    factors
  • But must get functional form of relationship
    between y and W correct matching procedures
  • This is not the case with (unconditional)
    randomization see class exercise

28
Heterogeneity in Treatment Effects
  • So far have assumed causal (treatment) effect the
    same for everyone
  • No good reason to believe this
  • Start with case of no other regressors
  • yiß0ß1iXiei
  • Random assignment implies X independent of ß1i
  • Sometimes called random coefficients model

29
What treatment effect to estimate?
  • Would like to estimate causal effect for everyone
    this is not possible
  • Can only hope to estimate some average
  • Average treatment effect

30
Proposition 2.5OLS estimates ATE
  • Proof for single regressor

31
Observable Heterogeneity
  • Potential outcomes notation
  • Outcome if in control group
  • y0i?0Wiu0i
  • Outcome if in treatment group
  • y1i?1Wiu1i
  • Treatment effect is (y1i-y0i) and can be written
    as
  • (y1i-y0i )(?1- ?0 )Wiu1i-u0i
  • Note treatment effect has observable and
    unobservable component
  • Can estimate as
  • Two separate equations
  • One single equation

32
Combining treatment and control groups into
single regression
  • We can write
  • Combining outcomes equations leads to
  • Regression includes W and interactions of W with
    X these are observable part of treatment effect
  • Note error likely to be heteroskedastic

33
Bertrand/Mullainathan
  • Different treatment effect for high and low
    quality CVs

34
Units of Measurement
  • Causal effect measured in units of experiment
    not very helpful
  • Often want to convert causal effects to more
    meaningful units e.g. in Project STAR what is
    effect of reducing class size by one child

35
Simple estimator of this would be
  • where S is class size
  • Takes the treatment effect on outcome variable
    and divides by treatment effect on class size
  • Not hard to compute but how to get standard
    error?

36
IV Can Do the Job
  • Cant run regression of y on S S influenced by
    factors other than treatment status
  • But X is
  • Correlated with S
  • Uncorrelated with unobserved stuff (because of
    randomization)
  • Hence X can be used as an instrument for S
  • IV estimator has form (just-identified case)

37
The Wald Estimator
  • This will give estimate of standard error of
    treatment effect
  • Where instrument is binary and no other
    regressors included the IV estimate of slope
    coefficient can be shown to be

38
Partial Compliance
  • So far
  • in control group implies no treatment
  • In treatment group implies get treatment
  • Often things are not as clean as this
  • Treatment is an opportunity
  • Close substitutes available to those in control
    group
  • Implementation not perfect e.g. pushy parents

39
An Example Moving to Opportunity
  • Designed to investigate the impact of living in
    bad neighbourhoods on outcomes
  • Gave some residents of public housing projects
    chance to move out
  • Two treatments
  • Voucher for private rental housing
  • Voucher for private rental housing restricted for
    use in good neighbourhoods
  • No-one forced to move so imperfect compliance
    60 and 40 did use it

40
Some Terminology
  • Z denotes whether in control or treatment group
    intention-to-treat
  • X denotes whether actually get treatment
  • With perfect compliance
  • Pr(X1Z1)1
  • Pr(X1Z0)0
  • With imperfect compliance
  • 1gtPr(X1Z1)gtPr(X1Z0)gt0

41
What Do We Want to Estimate?
  • Intention-to-Treat
  • ITTE(yZ1)-E(yZ0)
  • This can be estimated in usual way
  • Treatment Effect on Treated

42
Estimating TOT
  • Cant use simple regression of y on Z
  • But should recognize TOT as Wald estimator
  • Can estimated by regressing y on X using Z as
    instrument
  • Relationship between TOT and ITT

43
Most Important Results from MTO
  • No effects on adult economic outcomes
  • Improvements in adult mental health
  • Beneficial outcomes for teenage girls
  • Adverse outcomes for teenage boys

44
Sample results from MTO
  • TOT approximately twice the size of ITT
  • Consistent with 50 use of vouchers

45
IV with Heterogeneous Treatment Effects
  • If treatment effect same for everyone then TOT
    recovers this (obvious)
  • But what if treatment effect heterogeneous?
  • No simple answer to this question
  • Suppose model for treatment effect is

46
Proposition 2.6The IV estimate for the
heterogeneous treatment case is a consistent
estimate ofwherethe difference in the
probability of treatment for individual i when in
treatment and control group
47
Proof
  • Model for effect of intention to treat on being
    treated

48
Proof (continued)
  • Can write reduced-form as
  • Wald estimator then becomes
  • As

49
Hence Wald estimator can be thought of as
estimator as
  • This is weighted average of treatment effects
  • weights will vary with instrument contrast
    with heterogeneous treatment case
  • Some cases in which can interpret IV estimate as
    ATE

50
Proposition 2.7 IV estimate is ATE if a. no
heterogeneity in treatment effectb. ß1i
uncorrelated with pi
  • Proof
  • A. This should be obvious as
  • B. Can write as

51
How will IV estimate differ from ATE
  • Previous formula says depends on covariance of
    ß1i and pi
  • In some situations can sign but not always
  • Example 1 no-one gets treatment in the absence
    of the programme so
  • If those who get treatment when in the treatment
    group are those with the highest returns then
  • IVgtATE

52
  • Example 2 treatment is voluntary for those in
    the control group but compulsory for those in the
    treatment group
  • This implies
  • If those who get treatment in control are those
    with highest returns then
  • IVltATE

53
Angrist/Imbens Monotonicity Assumption
  • Case where IV estimate is not ATE
  • Assume that everyone moved in same direction by
    treatment monotonicity assumption
  • Then can show that IV is average of treatment
    effect for those whose behaviour changed by being
    in treatment group
  • They call this the Local Average Treatment Effect
    (LATE)

54
Spill-overs/ Externalities /General Equilibrium
Effects
  • Have assumed that treatment only affects outcome
    for person for receives it
  • Many situations in which this is not true
  • E.g. externalities, spill-overs, effects on
    market prices
  • Example Miguel and Kremer, Worms Identifying
    Impacts on Education and Health in the Presence
    of Treatment Externalities, Econometrica 2004

55
Background
  • Infection from intestinal worms is rife among
    Kenyan schoolchildren
  • Major cause of school absence
  • Leads to lower human capital accumulation, lower
    growth?
  • Investigation of effectiveness of anti-worming
    drugs on health, education

56
Existing studies
  • Randomize drug treatment within schools
  • But probability of re-infection affected by
    infection rate among contacts I.e. externalities
    very likely
  • This research design will not capture these
    effects
  • To see this, consider model

57
Miguel/Kremer Methodology
  • Existing methodology cannot measure externality
    only individual effect
  • Randomize treatment across schools not
    individuals
  • This can identify ß1 ß2
  • Could have had design in which randomized
    proportion of individuals within schools getting
    treatment

58
Typical Result
  • Cannot separate externality from direct effect
    but this is important for public policy
  • Have non-experimental approach to this using
    fact that not all kids from same village go to
    same school
  • This gives variation in X

59
Some examples of how they do this
  • Include number of kids in local area who are in
    treatment schools

60
Problems with Experiments
  • Expense
  • Ethical Issues
  • Threats to Internal Validity
  • Failure to follow experiment
  • Experimental effects (Hawthorne effects)
  • Threats to External Validity
  • Non-representative programme
  • Non-representative sample
  • Scale effects

61
Conclusions on Experiments
  • Are gold standard of empirical research
  • Are becoming more common
  • Not enough of them to keep us busy
  • Study of non-experimental data can deliver useful
    knowledge
  • Some issues similar, others different
Write a Comment
User Comments (0)
About PowerShow.com