Matching Estimators - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Matching Estimators

Description:

Differences-in-Differences and A (Very) Brief Introduction to Panel Data Author: suntory Last modified by: RLAB Created Date: 2/5/2006 11:28:05 AM – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 33

Provided by: sunt6

Category:

more less

Transcript and Presenter's Notes

Title: Matching Estimators

1
Matching Estimators

Methods of Economic Investigation
Lecture 11

2
Last Time

General Theme If you dont have an experiment,
how do you get a control group
Difference in Differences
How it works compare before-after between two
comparable entities
Assumptions Fixed differences over time
Tests to improve credibility of assumption
Pre-treatment trends
Ashenfelter Dip

3
Todays Class

Another way to get a control group Matching
Assumptions for identification
Specific form of matching called propensity
score matching
Is it better than just a plain old regression?

4
The Counterfactual Framework

Counterfactual what would have happened to the
treated subjects, had they not received
treatment?
Idea individuals selected into treatment and
nontreatment groups have potential outcomes in
both states
the one in which they are observed
the one in which they are not observed.

5
Reminder of Terms

For the treated group, we have observed mean
outcome under the condition of treatment
E(Y1T1) and unobserved mean outcome under the
condition of nontreatment E(Y0T1).
For the control group we have both observed mean
E(Y0T0) and unobserved mean E(Y1T0)

6
What is matching?

Pairing treatment and comparison units that are
similar in terms of observable characteristics
Can do this in regressions (with covariates) or
prior to regression to define your treatment and
control samples

7
Matching Assumption

Conditioning on observables (X) we can take
assignment to treatment as if random, i.e.
What is the implicit statement unobservables
(stuff not in X) plays no role in treatment
assignment (T)

8
A matched estimator

E(Y1 Y0 T1)
EY1 X, T1 EY0 X, T0 -
EY0 X, T1 EY0 X, T0
Key idea all selection occurs only through
observed X

Assumed to be zero
Matched treatment effect
9
Just do a regression

Regression are flexible
if you only put in a main effect the regression
will estimate a purely linear specification
Interactions and fixed effects allow different
slopes and intercepts for any combination of
variables
Can include quadratic and higher order polynomial
terms if necessary
But fundamentally specify additively separable
terms

10
Sometimes regression not feasible

The issue is largely related to dimentionality
Each time you add an observable characteristics,
you partition your data into bins.
Imagine all variables are zero-one variables
Then if you have k Xs, you have 2k potential
different values
Need enough observations in each value to
estimate that precisely

11
Reducing the Dimensionality

Use of propensity score Probability of receiving
treatment, conditional on covariates
Key assumption if
and defining
If this is true, can interpret estimate of
differences in outcomes conditional on X as
causal effect

12
Why not control for X

Matching is flexible in a different way
Avoid specifying a particular for the outcome
equation, decision process or unobservable term
Just need the right observables
Flexible in the form of how Xs affect treatment
probability but inflexible in how treatment
probability affects outcome

13
Participation decision

Remember our 3 groups
Always takers take the treatment if offered AND
take the treatment if not offered
We observe them if T0 but R1
Never takers dont take the treatment if not
offered AND dont take it even if it is offered
We observe them if T1 but R0
Compliers just do what theyre assigned to do
T1 R1 OR T0 R0

14
Conditions for Matching to Work

Take 1-sided non-compliance for easeif not
offered, cant take it, but some people dont
take it even if offered

Error term for never takers
Error term for compliers
On avg, conditional on X unobservable are the same
If its zero ? Perfect compliance so
conditioning on X replicates experimental setting
15
Common Support

Can only exist if there is a region of common
support
People with the same X values are in both the
treatment and the control groups
Let S be the set of all observables X, then
0ltPr(T1 X)lt0 for some S subset of S
Intuition Someone in control close enough to
match to treatment unit OR enough overlap in the
distribution of treated and untreated individuals

16
Lots of common support
Between red and blue line is area of common
support
17
Not so much common support
18
Trimming

Define Min and Max values of X for region of
overlapdrop all units not in that region
Remove Regions which do not have strictly
positive propensity score in both treatment and
control distributions
(Petra and Todd, 2005)
Both are quite similar when used in practice but
if missing sections in middle of distribution can
use the second option

19
How do we match on p(X)

Taken literally, should match on exactly p(Xi)
In practice hard to do so strategy is to match
treated units to comparison units whose p-scores
are sufficiently close to consider
Issues
How many times can 1 unit be a match
How many to match to treatment unit
How to match if using more than 1 control unit
per treatment unit

20
Replacement

Issue once control group person Z is a match for
individual A, can she also be a match for
individual B
Trade-off between bias and precision
With replacement minimizes the propensity score
distance between the matched and the comparison
unit
Without replacement

21
Are we doing a one-to-one match?

If 1-to-1 match units closely related but may
not be very precise estimates
More you include in match, the more the p-score
of the control group will differ from the
treatment group
Trade-off between bias and precision
Typically use 1-to-many match because 1-to-1 is
extremely data intensive if X is multi-dimensional

22
Different matching algorithms-1

Can use nearest neighbor which chooses m closest
comparison units
implicitly weights these all the same
Get fixed m but may end up with different pscores
Can use caliperradius around a point
Again implicitly weights these the same
Fixed difference in p-scores, but may not be many
units in radius
Stratify
Break sample up into intervals
Estimate treatment effect separately in each
region

23
Different Matching Algorithms-2

Can also use some type of distribution
Kernel estimator puts some type of distribution
(e.g. normal) around the each treatment unit and
weights closer control units more and farther
control units less
Explicit weighting function can be used if you
have some knowledge of how related units of
certain distances are to each other

24
How close is close enough?

No right answer in these choiceswill depend
heavily on sample issues
How deep is the common support (i.e. are there
lots of people in both control and treatment
group at all the p-score values
Should all be the same asymptotically but in
finite samples (which is everything) may differ

25
Tradeoffs in different methods
Source Caliendo and Kopeinig, 2005
26
How to estimate a p-score

Typically use a logit
Specific, useful functional form for estimating
discrete choice models
You havent learned these yet but you will
For now, think of running a regular OLS
regression where the outcome is 1 if you got the
treatment and zero if you didnt
Take the ET X and thats your propensity score

27
The Treatment Effect

CIA holds and sufficient region of of common
support
Difference in outcome between treated individual
i and weighted comparison group J, with weight
generated by the p-score distribution in the
common support region

J is comparison group with J is the number of
comparison group units matched to i
N is the treatment group and N is the size of
the treatment group
28
General Procedure

Run Regression
Dependent variable T1, if participate T 0,
otherwise.
Choose appropriate conditioning variables, X
Obtain propensity score predicted probability
(p)

1-to-1 match
estimate difference in outcomes for each pair
Take average difference as treatment effect

1-to-n Match
Nearest neighbor matching
Caliper matching
Nonparametric/kernel matching

Multivariate analysis based on new sample
29
Standard Errors

Problem Estimated variance of treatment effect
should include additional variance from
estimating p
Typically people bootstrap which is a
non-parametric form of estimating your
coefficients over and over until you get a
distribution of those coefficientsuse the
variance from that
Will do this in a few weeks

30
Some concerns about Matching

Data intensive in propensity score estimation
May reduce dimensionality of treatment effect
estimation but still need enough of a sample to
estimate propensity score over common support
Need LOTS of Xs for this to be believable
Inflexible in how p-score is related to treatment
Worry about heterogeneity
Bias terms much more difficult to sign
(non-linear p-score bias)

31
Matching Diff-in-Diff

Worry that unobservables causing selection
because matching on X not sufficient
Can combine this with difference and difference
estimates
Take control group J for each individual i
Estimate difference before treatment
If the groups are truly as if random should be
zero
If its not zero can assume fixed differences
over time and take before after difference in
treatment and control groups

32
Next Time

Comparing Non-Experimental Methods to the
experiments they are trying to replicate
Goal See how well these techniques work to get
the estimated experimental effect

Write a Comment

User Comments (0)