Title: ENDOGENEITY - SIMULTANEITY
1ENDOGENEITY - SIMULTANEITY
2What is endogeneity and why we do not like it?
REPETITION
- Three causes
- X influences Y, but Y reinforces X too
- Z causes both X and Y fairly contemporaneusly
- X causes Y, but we cannot observe X and Z (which
we observe) is influenced by X but also by Y - Consequences
- No matter how many observations estimators
biased (this is called inconsistent) - Ergo whatever point estimates we find, we cant
even tell if they are positive/negative/significan
t, because we do not know the size of bias no
way to estimate the size of bias
3The magic of ceteris paribus
- Each regression is actually ceteris paribus
- Problem data may be at odds with ceteris paribus
- Examples?
4Problems with Inferring Causal Effects from
Regressions
- Regressions tell us about correlations but
correlation is not causation - Example Regression of whether currently have
health problem on whether have been in hospital
in past year -  HEALTHPROB      Coef.  Std. Err.     t  Â
---------------------------------------------
    PATIENT    .262982  .0095126  Â
27.65        _cons    .153447   .003092  Â
49.63Â Â - Do hospitals make you sick? a causal effect
5The problem in causal inference in case of
simultaneity
Confounding Influence
Treatment
Outcome
6Any solutions?
Confounding Influence
Treatment
Outcome
7Instrumental Variables solution
Confounding Influence
Treatment
Outcome
Instrumental Variable(s)
8Fixed Effects Solution (DiD does pretty much the
same)
Fixed Influences
Confounding Influence
Treatment
Outcome
9Short motivating story ALMPs in Poland
- Basic statement 50 of unemployed have found
employment because of ALMPs - Facts
- 50 of whom? only those, who were treated (only
those were monitored) - only 90 of treated completed the programmes
- of those, who completed, indeed 50 work, but
only 60 of these who work say it was because of
the programme - So how many actually employed because of the
programme?
10Short motivating story ALMPs in Poland
11Basic problems in causal inference
- Compare somebody before and after
- If they were different already before, the
differential will be wrongly attributed to
treatment - can we measure/capture this inherent difference?
- does it stay unchanged before and after?
- what if we only know after?
- If the difference stays the same gt DiD estimator
gt assumption that cannot be tested for - If the difference cannot be believed to stay the
same?
12Faked counterfactual or generating a paralel world
- MEDICINE takes control groups people as sick,
who get a different treatment or a placebo gt
experimenting - What if experiment impossible?
13What if experiment impossble?
Only cross-sectional data
Panel data
Instrumental variables
Propensity Score Matching DiD
Before After Estimators
Propensity Score Matching
Difference in Difference Estimators (DiD)
Regression Discontinuity Design
14Propensity Score Matching
Confounding Influence
Treatment
Treatment
Outcome
15Propensity score matching
Group Y1 Y0
Treated (D1) Observed counterfactual (does not exist)
Nontreated (D0) counterfactual (does not exist) observed
- Average treatment effect
- E(Y)E(Y1-Y0)E(Y1)-Y0
- Average treatment effect for the untreated
- E(Y1-Y0D0)E(Y1D0)-E(Y0D0)
- Average treatment effect for the treated (ATT)
- E(Y1-Y0D1)E(Y1D1)-E(Y0D1)
16Propensity Score Matching
- Idea
- Compares outcomes of similar units where the only
difference is treatment discards the rest - Example
- Low ability students will have lower future
achievement, and are also likely to be retained
in class - Naïve comparison of untreated/treated students
creates bias, where the untreated do better in
the post period - Matching methods make the proper comparison
- Problems
- If similar units do not exist, cannot use this
estimator
17How to get PSM estimator?
- First stage run treatment on observable
characteristics - Second stage estimate the probability of
treatment - Third stage compare results of those treated
and similar non-treated (statistical twinns) - The less similar they are, the less likely they
should be compared one with another
18The obtained propensity score is irrelevant (as
long as consistent)
- NEAREST NEIGHBOR (NN)
- Pros gt tzw. 11
- Cons gt if 11 does not exist, completely
senseless
19The obtained propensity score is irrelevant (as
long as consistent)
- CALIPER/RADIUS MATCHING(NN)
- Pros gt more elastic than NN
- Cons gt who specifies the radius/caliper?
20The obtained propensity score is irrelevant (as
long as consistent)
- Stratification and Interval
- Pros gt eliminates discretion in radius/caliper
choice - Cons gt within strata/interval, units dont have
to be similar - (some people say 10 strata is ql)
21The obtained propensity score is irrelevant (as
long as consistent)
- KERNEL MATCHING (KM)
- Pros gt uses always all observations
- Cons gt need to remember about common support
Treatment Control
22What is common support?
- Distributions of pscore may differ substantially
across units - Only sensible solutions!
23Real world examples
24Next week practical excercise
- Read the papers posted on the web
- I will post one that we will replicate soon