Title: Regression Discontinuity
1Regression Discontinuity
2What is R.D.?
- Regression--the econometric/statistical tool
social scientists use to analyze multivariate
correlations
Where Y is some sort of dependent variable,
alphas a constant, the Xs are a bunch of
independent variables, the betas are
coefficients, and the e is the error term.
3Discontinuity
- Some sort of arbitrary jump/change thanks to a
quirk in law or nature. - Were interested in the ones that make very
similar people get very dissimilar results.
4Discontinuity Examples
- PSAT/NMSQT
- Basically the top 16,000 test-takers get a
scholarship. - A small difference in test score can means a
discontinuous jump in scholarship amount.
5Discontinuity Examples
- School Class Size
- Maimonides Rule--No more than 40 kids in a class
in Israel. - 40 kids in school means 40 kids per class. 41
kids means two classes with 20 and 21. - (Angrist Lavy, QJE 1999)
6Discontinuity Examples
- Union Elections
- If employers want to unionize, NLRB holds
election. 50 means the employer doesnt have to
recognize the union, and 50 1 means the
employer is required to bargain in good faith
with the union. - (DiNardo Lee, QJE 2004)
7Discontinuity Examples
- U.S. House Elections
- Incumbency advantage. If youre first past the
pole in the previous election, even by just one
vote, you get a huge advantage in the next
election. - (David Lee, Journal of Econometrics 2007)
8Discontinuity Examples
- Air Pollution and Home Values
- The Clean Air Acts National Ambient Air Quality
Standards say if the geometric mean concentration
of 5 pollutant particulates is 75 micrograms per
cubic meter or greater, county is classified as
non-attainment and are subject to much more
stringent regulation. - (Ken Chay, Michael Greenstone, JPE 2005)
9Combine the R and the D
- Run a regression based on a situation where
youve got a discontinuity. - Treat above-the-cutoff and below-the-cutoff like
the treatment and control groups from a
randomization.
10Why are we doing this?
- Why do we have to look for quirks like this?
Cant we just control for whatever we want using
OLS or some other line-fitting tool? - Just get a bunch of peoples salaries and PSAT
scores. PSATs are X, income is Y, run a
regression in SPSS/Stata, or heck, even Excel,
and we have causal inference, right? Higher test
scores cause people to earn more later in life.
11No.
- The statistical methods we use are based on lot
of assumptions. Importantly, the error terms
(which is really full of things we cant measure,
the unobservables) are supposed to be
uncorrelated with the Xs and normally
distributed. - In reality, those conditions probably hasnt been
met in any of the previous situations. - For example, class size is probably correlated
with some type of neighborhood quality. - Please turn to your neighbor and discuss what is
probably wrong with each of the previous 5
examples (PSAT, class size, union elections,
house elections, air pollution)
12No.
- The statistical methods we use are based on lot
of assumptions. Importantly, the error terms
(which is really full of things we cant measure,
the unobservables) are supposed to be
uncorrelated with the Xs and normally
distributed. - In reality, those conditions probably hasnt been
met in any of the previous situations. - Higher PSAT kids might have higher ability.
- Crowded classrooms might be in poorer schools.
- Unionized workers might work for certain types of
firms. - Incumbent politicians might be better. They won
before, didnt they? - Pollution might be correlated to economic growth,
which could increase home values.
13Controlling for everything?
- Focus on the Israeli schools for a second.
- We can try and control for neighborhood poverty
level. - Does that solve the problem?
- No.
- If neighborhood poverty level is correlated with
the X of interest (class size) why would you
think its safe to assume that the unobservables
arent correlated? Have you really magically
controlled for every single thing thats
correlated with the X of interest? Probably not. - So lets find a bandwidth in which these things
are uncorrelated. -
14A Bandwidth of Randomness
- Test scores arent random, and neither is class
size, nor air pollution. - But is a kid in the 94.9th percentile really that
different from the 95th percentile kid? - Is a school with 40 kids that different from a
school with 41? - Right around the cutoff, theres a good chance
things are random.
15No Sorting - Observables
- But dont take my word for it. Look at the
averages of the observables in your below cutoff
group, and the averages of the observables in the
above cutoff group. Are they the same?
Hopefully, but maybe not. - Do people know about this cutoff? Are they doing
some endogenous sorting? When deciding where to
live, did good moms look for schools where their
kids would be the 41st kid? Did certain types of
polluters look for counties where theyd be
below the cutoff? - These things can be checked to some degree--look
at the average observables above and below the
cutoff.
16No Sorting - Clumping
- In addition to checking the observables on either
side of the cutoff, we should check the density
of the distribution. Is it unusually low/high
right around the cutoff? - If theres some abnormally large portion of
people right around the cutoff, its quite
possible that you dont have random assignment.
17No Sorting - Clumping
- Dude, youre totally cheating. Please stop.
- Emily Conover Adriana Camacho Manipulation of
Social Program Eligibility
18GSP--Multiple Analyses
- Incentives to Learn, Ted Miguel, Michael
Kremer, Rebecca Thornton - Girls Scholarship Program, Busia Kenya.
- Randomize holding a scholarship competition
across schools in Busia and Teso districts. - Treatment If a girl finishes in the top 15 in
her district on the end-of-year exam, she wins a
two-year scholarship. - Randomization Analysis Does attending a school
with the competition make you work harder/improve
schooling outcomes? - RD Analysis Does winning the award improve
schooling outcomes?
19P-900 in Chile
- The Central Role of Noise in Evaluating
Interventions That Use Test Scores to Rank
Schools Kenneth Y. Chay, Patrick J. Mcewan,
Miguel Urquiola, AER 2005 - Mean Reversion Sophomore Slump, SI Cover Curse,
Heisman Trophy Curse, Madden curse, and in the
opposite direction.
20THIS IS THE MOST AMAZING THING EVER!
- HOLY CRAP! Look at the educational outcomes of
treatment schools in 1990, compared to those same
schools in 1988, before the program. AMAZING!
FANTABULOUS!
21Oh, wait.
- Hmm. Thats kind of disappointing.
22So how do we actually do this?
- Draw two pretty pictures
- Eligibility criterion (test score, income, or
whatever) vs. Program Enrollment - Eligibility criterion vs. Outcome
23So how do we actually do this?
- 2. Run a simple regression.
- (Yes, this is basically all we ever do, and the
stats programs we use can run the calculation in
almost any situation, but before we do it, its
necessary to make sure the situation is
appropriate and draw the graphs so that we can
have confidence that our estimates are actually
causal.) - Outcome as a function of test score (or
whatever), with a binary (1 if yes, 0 if no)
variable for program enrollment.
24Is it really that simple?
- Dont be silly.
- You could totally have a situation where the
outcome is some sort of quadratic or cubic or nth
polynomial function of the test score. Try
controlling for that. This is going to depend on
the situation and is somewhat arbitrary.
25Wait, somewhat arbitrary?
- Yeh, lame, I know. Arbitrarys what were trying
to avoid. But two things arent univerally
clear - 1. How wide a bandwidth around the cutoff are we
looking at? - Were really only confident in our estimate for
people that are close to the cutoff. This is a
LOCAL AVERAGE TREATMENT EFFECT. We can
confidently say that a school right around the
cutoff would improve average test scores by X if
they received the treatment, but were not so
confident that already awesome schools would get
the same benefit.
26Wait, somewhat arbitrary?
- 2. Without the program, what shaped function
would there be naturally? - What sort of function do we throw in to control
for the fact that even if there was no National
Merit Semifinalist scholarship, smarter kids are
likely to earn more later in life? - The solution SHOW YOUR WORK
27Youre Such a Phony.
- In addition to showing your work, another good
robustness check is to test for the effects of
non-existent programs.
28Youre Such a Phony.
29Conclusion
- Find a threshold
- Look at people just above and just below
- Make sure theres no sorting
- Its only a local effect
30In Your Groups
- Do we have a threshold?
- Are people sorting?
- Its a local effect--is that what we want?