Title: GY460 Techniques of Spatial Analysis
1GY460 Techniques of Spatial Analysis
Lecture 4 Techniques for dealing with spatial
sorting and selection(fixed effects,
diff-in-diff, matching and discontinuities etc.)
2Introduction
- Sometimes we just want to eliminate problems
induced by spatial sorting and heterogeneity - i.e. differences between places which may lead to
confounding factors and biased estimates of
relationships of interest - Selection (sorting) on observable and
unobservable characteristics - Examples
- Eliminating spatial factors from models of firm
behaviour - Eliminating geographical influences from models
of school quality - Various methods are available for dealing with
this we have looked some of these already
3Regression models with spatial effects
4Data with discrete zones
- N observations in the data
- Grouped in to M zones (regions, districts,
neighbourhoods) - E.g.
- Cross-section data with gt1 cross-sectional
observations in each neighbourhood - Or panel data with more than one time period for
each neighbourhood
5Spatial variation in the mean
- Empirical model, with discrete neighbourhoods m
- yim for observation i in place m, depends on
- xim characteristics of observation i in place m
- ?im unobserved factors for observation i in
place m - um Unobserved factors common to all
observations in place m - X-sectional case i cross-sectional units, m
places - Panel data case i time units, mplaces
6Random effects
- Empirical model, with discrete neighbourhoods m
- If um uncorrelated with xim, then OLS consistent
just like spatial error model - Error terms ? are correlated within spatial
groups m - But uncorrelated between spatial groups
- Use GLS or ML (assuming normality) for efficient
estimates and unbiased s.e.s (multi-level
modelling)
7Fixed area effects dummy variables
- Empirical model, with discrete neighbourhoods m
- If um correlated with xim, then OLS inconsistent.
- Options
- Estimate the area fixed effects using OLS
- Least Squares Dummy variable model neighbourhood
dummy variables
8Fixed area effects within groups
- Or within-groups transformation difference the
variables from the neighbourhood mean - Where is the mean of y in group m
- Eliminates um
- Estimate by OLS
- Only uses deviation of variables from
neighbourhood means so only within-neighbourhood
variation counts - LSDV and Within Groups (or (Fixed Effect)
models are equivalent
9Fixed area effects panel data
- Even better information with repeated
observations on panel units (individuals, firms,
regions etc.) over time - Panel data
- Now all relationships of interest can be
estimated from variation within panel units over
time - Use within-groups or first-differences over time,
e.g. - Q what does (vt-vt-1 ) represent? How could you
control for it? Then, what variation in the data
allows us to estimate ??? Hence what do we
assume, if ? is to be estimated consistently?
10Dynamic panel data models
- It would be useful to estimate this model e.g.
to estimate the dependence of y on past values
(or control for mean reversion) - Q Can this within group model be estimated
consistently by OLS? - See Nickell (1981) Econometrica
- What about the first differenced model?
- Q Is there a useful IV here?
11Dynamic panel data models
- In principle you could use instruments for
- This is the basis of the Arrelano Bond estimator
(1991, Review of Economic Studies) - They develop a GMM estimator which weights the
instruments taking into account the
first-differenced error structure e.g.
implemented in xtabond in STATA - Problems serial correlation in error terms?, if
? is close to zero the instruments will be very
weak (since lagged values dont predict current
values if ?0) - Can also use
as instruments for - System GMM (Blundell and Bond 1998) xtabond2
12Spatial panel data models
- These look attractive e.g. to eliminate sorting
i.e. u_i - But this still suffers from the simultaneity
problems of the spatial y model requires
maximum likelihood or instruments for - Also difficult to defend that there is spatial
correlation, but no time-dynamics - So you have to estimate
- Have to deal with time dynamic y and spatial y!
13Spatial panel data models
- Probably more useful to consider the reduced form
e.g.
14Difference in difference
15Difference-in-difference
- Suppose we have places, firms individuals i
observed over time. - Treatment group D1 is exposed to some treatment
x1,0 at time t1, whereas a control group D0 is
not - There is selection into treatment group
(EfD?0) and common time effects g
16Difference-in-difference
- The effect of the treatment can be estimated by a
Difference in difference estimator - Note that this is the same as youd get from OLS
on
17Difference-in-difference
- The DiD estimator is commonly used for evaluation
of policy interventions - DiD doesnt work if the treatment and control
groups have different time trends - If the composition of the treatment or control
groups change before and after treatment e.g.
18Matching
19Matching estimators
- Matching tries to do something similar, when
treatment and control group are not both observed
pre and post policy - Suppose we observe two groups
- Suppose the goal is to estimate the Average
effect of the Treatment on the Treated (ATT) - As we know, simple difference in means wont
work - i.e. because the treated and non-treated would
have different Y in the absence of treatment
20Matching estimators
- But suppose we have some observable
characteristics Z for which - i.e. mean pre-treatment Y for individuals with
characteristics Z is the same, whether or not
they are in the treatment group - Called Conditional Independence Assumption CIA
- Allows for selection into treated and non-treated
groups by Z (selection on observables), but not
by unobservables. - So if you can find individuals in group 0 who
have the same Z as those in group 1 you can
estimate from the
individuals in group 0 - If Z is discrete this is straightforward..
21Matching estimators
- So we can estimate
- The naïve estimate of the effect of the treatment
is 190-125 65
22Matching estimators
- For the treated, Y0 is unobserved but can be
estimated by re-weighting (under the CIA
assumption) - So the ATT is 190-180 10
23Matching estimators
- But what if (as is usual) Z is not discrete?
Propensity score matching does this reweighting
using an estimate of the probabilty that
individual with characteristics z is in the
treatment group - (Rosenbaum and Rubin (1983) Biometrika)
- Requires a first stage estimate of Pr(D1 Z)
e.g. from a probit or logit regression on Z - Then the treatment effect for an individual i in
the treated group can be estimated as - Where the weights depend on the difference
between the propensity score for individual i and
the untreated controls j, and
24Matching estimators
- In practice Matching estimators behave like
kitchen sink regressions you are just
controlling for as many observable
characteristics as possible (Z) - However, you are controlling for these Z in a
very non-linear way like having lots of control
variables and their interactions in an OLS
regression - Matching estimators allow for heterogenous
treatment effects - You can re-weight in other ways, e.g. to estimate
the effect of the treatment on the population, or
on the un-treated - No solution to selection on unobservables which
is surely the main issue! - Requires common support no overlap between Z
in the treated and untreated groups ? you cant
match.
25Discontinuity designs
26Discontinuity designs
- Regression discontinuity method tries to identify
causal effects from abrupt changes - Requires a discontinuity induced by institutional
rules, policy etc. - e.g. majority voting
- Class size rules e.g. Maimonides rule
- Geographical administrative boundaries
- Assumption is that assignment to treatment is
determined by some covariate X when it reaches a
value d - The outcome is otherwise only related to X by a
smooth function e.g. EyX m(X)
27Discontinuity designs
28Discontinuity designs
- So
- Idea is to estimate the average effect of the
treatment at the discontinuity point - We could control for a m(x) parametrically
(polynomial series etc.) - Or restrict the sample to observations for which
x is close to c i.e.
29Boundary discontinuities
School quality in district B
ve quality-price relationship across boundary
Price, homeowner characteristics
Price, homeowner characteristics
School quality in district A
Unobserved local amenity
30Discontinuity designs
- In principle, X is identical for treatment and
controls exactly at the discontinuity - But practical applications require non-zero
differences between X and discontinuity - E.g. can rarely find a large enough sample of
housing transactions exactly on the boundary - Trade off between adequate sample size and
elimination of biases due to m(x) - We looked at practical spatial examples e.g.
Black (1999), Duranton et al (2006) - See also Gibbons, S., Machin, S and Silva, O.
(2009), Valuing School Quality Using Boundary
Discontinuity Regressions, SERC DP0018
http//www.spatialeconomics.ac.uk/textonly/SERC/pu
blications/download/sercdp0018.pdf
31Applications to spatial policy evaluation
- Research designs can incorporate elements of all
these methods e.g. match treatment and control
groups using propensity score matching, then
implement dif in dif - Machin, S., McNally, S., Meghir,C. (2007),
Resources and Standards in Urban Schools, IZA
DP2653 http//ftp.iza.org/dp2653.pdf - Busso, M. and P. Kline (2006) Do Local Economic
Development Programs Work, Evidence from Federal
Empowerment Zone Program, http//www.econ.berkeley
.edu/pkline/papers/Busso-Kline20EZ20(web).pdf - Romero, R. and M. Noble (2008) Evaluating
Englands New Deal for Communities Programme
Using the Difference in Difference method,
Journal of Economic Geography 8(6) 1-20
32The partial linear model
33Continuous space
- A general model with spatial heterogeneity
- Si is an index of the location of observation i
- Model continuous unobserved variation over space
- m(.) is supposed to represent large-scale
predictable variation over space e.g. land
values - ? random shocks sales price of specific houses
- We discussed these issues in the lecture on
smoothing - Could do it parametrically e.g. polynomial series
or Cheshire and Sheppard (1995) see earlier
lectures
34Partial linear model
- Suppose
- If we know ?, function m(.) is just the expected
(mean) value of y-xb given the location s1, s2 - Refer to the lecture on smoothing this can be
inferred from values of y in neighbouring
locations once we know ? - Spatial weighting again
- Kernel weighting, nearest neighbours etc..
35Semi-parametric spatial models
- Must get estimates of beta first? How?
- e.g. see Robinson (1988), Econometric, Root-n
consistent Semiparametric Regression - Estimate averages of y and all x at each point in
the data, non-parametrically - Estimate the betas by OLS on
- Note analogy to the within-groups model
- Can then estimate
36Applications to housing analysis
- Clapp, J. M., H.-J. Kim, and A. E. Gelfand
(2002) "Predicting Spatial Patterns of House
Prices Using Lpr and Bayesian Smoothing," Real
Estate Economics, 30, (4), 505-532 - Use of non-parametric methods to construct house
price indices - Gibbons, S., and S. Machin (2003) "Valuing
English Primary Schools," Journal of Urban
Economics, 53, (2). - Use of the semi-parametric model for eliminating
larger-scale neighbourhood effects on school
performance
37Conclusions
- Underlying issue we have considered is selection
or sorting e.g. people, firms etc of different
types sort into different locations and this can
lead to biased estimates of causal relationships - Selection can be on unobservables, or observables
- We considered various techniques for dealing with
these problems - Other solutions random assignment, IV we have
or will consider elsewhere.