Title: GY460 Techniques of Spatial Analysis
1GY460 Techniques of Spatial Analysis
Lecture 7 Measures of Inequality, Concentration
and Segregation
2Introduction
- Many situations where we want summary statistics
that characterise the distribution of a
characteristic across data units e.g. - Number of industries in different regions
- Income across individuals
- Crime rates across wards
- Proportion in the population non-white in
different wards - This lecture discusses the use of these indices
in relation to spatial patterns
3Descriptions of distributions
4Cumulative distribution function
- Basic statistical concept
- With a random variable that takes on discrete
values, an estimate is
1
x F(x)
100 0.2
120 0.4
140 0.6
250 0.8
400 1
0.8
0.6
0.4
0.2
0
100
200
300
400
5Lorenz curve
- Commonly used to describe inequality (e.g.
income) - With a random variable that takes on discrete
values, an estimate is
6Lorenz curve
L(x)
1
x F(x) L(x)
100 0.2 0.10
120 0.4 0.22
140 0.6 0.36
240 0.8 0.60
400 1 1.00
0.8
0.6
0.4
0.2
0
0.2
0.4
1
0.6
0.8
F(x)
7Segregation curve
- This is a variant of the Lorenz curve that is
appropriate when considering inequality in
proportions - E.g. white/non-white
- Suppose we are interested in ethnic segregation.
Should we consider whites or non whites? - Lorenz curve gives different results
- Segregation curve base on comparing cumulative
contribution of each unit (school, ward,
district, firm etc.) to total white or non-white
8White Lorenz curve
L(w)
1
White F(w) S(w) L(w)
0.05 0.2 0.025 0.025
0.10 0.4 0.050 0.075
0.15 0.6 0.075 0.150
0.75 0.8 0.375 0.525
0.95 1 0.475 1.00
0.8
0.6
0.4
0.2
0
0.2
0.4
1
0.6
0.8
F(x)
Note here, sum(white) 2
9Non-white Lorenz curve
L(nw)
1
Non-White F(nw) S(nw) L(nw)
0.05 0.2 0.017 0.017
0.25 0.4 0.083 0.100
0.85 0.6 0.283 0.383
0.90 0.8 0.300 0.683
0.95 1 0.317 1.00
0.8
0.6
0.4
0.2
0
0.2
0.4
1
0.6
0.8
F(x)
Note here, sum(nonwhite) 3
10 Segregation curve
L(nw)
1
Non-White L(nw) White L(w)
0.05 0.017 0.95 0.475
0.25 0.100 0.75 0.850
0.85 0.383 0.15 0.925
0.90 0.683 0.10 0.975
0.95 1.000 0.05 1.000
0.8
0.6
0.4
0.2
0
0.2
0.4
1
0.6
0.8
L(w)
Note units are ranked by nw here
11A smorgasbord of inequality indices
12Indices
- All the useful information about the
distributions is contained in the
Cumulative/Lorenz/Segregation curves plus the
mean - But useful to be able to summarize the features
of the these distributions using single numbers - Indices intended to rank distributions in study
areas/periods according to the inequality - Unfortunately no single index provides a complete
summary
13Generalised entropy family
- Many commonly used indices have the same general
form - Indices of this form have the key properties of
scale invariance and decomposability - Sale invariance means that x and ?x give same
index - units of measurement or inflation dont matter
for income inequality - Decomposability means that index is a weighted
sum of the indices for sub-groups of the
population - e.g. regions
14Coefficient of variation
- For beta 2, gives half-squared coefficient of
variation - So
- (where sample variance is the 1/n version )
15Herfindahl
- This is closely related to the Herfindahl index
- Which is often used to measure industrial
concentration
16Theil index
- Another commonly used index is the Theil Index
- Which corresponds to the generalised entropy
measure case when ? ? 1
17Additive decomposability
- Good thing about CV (squared), theil index and
generalised entropy is that they can be
decomposed into sub-groups - E.g. suppose we have K regions with index Ik.
Then the total inequality Itotal can be written
as a sum of within region and between region
indices - Where wk is a region-specific weight which
depends on the regional share of total x - (In the generalised entropy case it can be shown
that)
18Gini index
- The GINI isnt a member of the generalised
entropy family - GINI is twice area between the Lorenz curve and
the 45 degree line (equality across data units) - Computed in practice using (when units are same
size)
x
100
120
140
240
400
19Gini index
0.5 x Gini
Lorenz curve
20Gini index for household incomes in Britain
Source Poverty and Inequality in Britain 2005,
IFS, London
21Segregation indices
22Indices for categorical variables
- Gini, generalised entropy family can be used when
interest is on a categorical variable e.g. - Black/white, industrial classification
- Though problem with asymmetry c.f. Lorenz curves
for white/non-white shown earlier - Various Segregation indices often used to
describe distribution of categorical variables - Measure inequality in one group relative to
other group or total - Benchmark is same proportion of each group in
each data unit (e.g. regions) - All have been re-invented many times
23Dissimilarity index
- Used for measuring distribution of some group j
across units of aggregation i - e.g.
24Dissimilarity index
- Dissimilarity ranges between 0 (all units the
same) and 1 (units are either all group j or zero
group j) e.g.
b w
800 800
600 600
400 400
200 200
b w
1000 0
1000 0
0 1000
0 1000
25Dissimilarity index
- Indicates the proportions of one group that would
have to re-locate to generate no segregation
b w
200 800
400 600
600 400
800 200
b w
800 800
600 600
400 400
200 200
200
600
26Dissimilarity index
- One problem is that it isnt scale invariant,
i.e. sensitive if there are proportional changes
in one group
b w
100 900
200 800
300 700
400 600
b w
200 800
400 600
600 400
800 200
27Segregation index
- Same purpose all thats different is that the
comparison with total numbers in unit i, not
numbers that are not in the j group - e.g.
- The Krugman index is just 2 x this, using
employment or GDP - Sepcialisation of place i i as geographical
units, j as industries - Concentration of industry j j as geographical
units, i as industries
28Segregation/Krugman index
- Not sensitive to proportional changes in the
group of interest
b All
100 1000
200 1000
300 1000
400 1000
b All
200 1000
400 1000
600 1000
800 1000
29Segregation/Krugman index
- But upper bound varies with total proportion in
group - It is (1 - proportion in group j) D
b All
1000 1000
1000 1000
0 1000
0 1000
b All
2000 2000
2000 2000
0 1000
0 1000
30Isolation index
- Measures the probability that random minority
group member (e.g. black) shares a unit with
another minority member rather sensitive to
overall share
b w
250 750
250 750
250 750
250 750
b w
250 750
0 1000
0 1000
0 1000
31Isolation index
- Modified by Cutler, Glaeser, Vigdor (Journal of
Political Economy 1999) to allow for overall
minority group size divide by the maximum value
to scale between 0-1
32Isolation index
b w
250 750
250 750
250 750
250 750
b w
250 750
0 1000
0 1000
0 1000
33Spatial indices
- All the indices discussed measure inequality
between data units so are spatial only if the
data units are regions, districts or other
spatial units! - No measure here of how data is distributed within
units - E.g. all poor residents live in one part of the
district - Or whether there are spatial patterns across
units - e.g. all the majority poor districts next to each
other - Some indices try to take account of these factors
- See Massey and Denton (1988) or White (1983), The
Measurement of Spatial Segregation, AJS, 88
1008-1019 - Echinique and Fryer (2005), On the Measurement of
Segregation, NBER W11258
34Example applications of segregation indices
35Ethnic segregation indices in English secondary
schools
Source Burgess and Wilson 2003
36Ethnic segregation in US cities
37Ethnic segregation in US cities
38US segregation and black white test gap
Source Vigdor and Ludwig 2007, NBER Working
Paper W12988
39Segregation indices are descriptive!
- Remember that segregation indices are descriptive
statistics! - Usual rules apply about inferring causality
- See Hoxby (2000) on reading list for example of
attempt to use similar indices for causal
analysis - Uses numbers of rivers in US metropolitan areas
as instrument for market fragmentation in
schooling
40Industrial concentration using aggregated data
41Another segregation index
- Variation on a theme square the difference
rather than take absolute difference - I.e. its the squared difference between the
contribution of unit i to total of j and
contribution of i to overall total (or other
comparison group) - Can be used measuring concentration due to
agglomeration forces? - Ellison and Glaeser (1997) develop this index
42Another segregation index
- The G index
- Sometimes called Gini though Gini here is (by
one calculation) 0.23
b All L(b)
100 1000 0.1
200 1000 0.3
300 1000 0.6
400 1000 1.0
43The Ellison and Glaeser Index
- But not possible to distinguish industrial
concentration caused by market concentration (a
few large plants) from agglomerative forces (many
small plants co-located) - E G (Journal of Political Economy 1997) correct
the index to allow for this - Requires plant-level Herfindahl for industry j Hj
44US 446/449 industries more concentrated than
expected. State-level data
45Industrial location
- See the further readings on the list
- Holmes, T And J. Stevens (2004) The Spatial
Distribution Of Economic Activates In North
America Handbook Of Urban And Regional Economics,
Volume 4, Jacques Thisse And Vernon Henderson
(Eds.) - Combes, P. P. And H. G. Overman (2004) The
Spatial Distribution Of Economic Activities In
The EU Handbook Of Urban And Regional Economics,
Volume 4, Jacques Thisse And Vernon Henderson
(Eds.)
46References
- Cutler, DM, Glaeser, EL and Vidgor, JL (1999),
The rise and decline of the American ghetto,
Journal of Political Economy, 107(3) 455-506 - Burgess, S and D. Wilson (2003) Ethnic
Segregation in Englands Schools, CMPO Working
Paper 03/086 - Ellison, G. and E. Glaeser (1997) Geographic
Concentration in US Manufacturing Industries A
Dartboard Approach, Journal of Political Economy
105 (5) 889-927