Title: Spatial Data Analysis Areas I: Rate Smoothing and the MAUP
1Spatial Data Analysis Areas I Rate Smoothing
and the MAUP
Ifgi, Muenster, Fall School 2005
- Gilberto Câmara
- INPE, Brazil
2Areal data
- Study region is partitioned in disjoint areas
- The region is the union of the areas
- Each map has one or more associated measures
- Treated as random variables
- Examples
- Map of Germany divided in municipalities. For
each area, we measure the unemployment rate and
the literacy rate. - Is unemployment correlated with years of school?
- What about Brazil?
3Violence in Minas Gerais
4Violence in Minas Gerais
5Violence in Minas Gerais
6Attributes in areal data
- As a general rule, each measure is a sum, count
or a similar aggregated function over all the
area - Each value is associated to all the corresponding
area - If we need to choose a single location, usually
we take the polygon centroid - There are no intermediate values
7What is mapped in areal data?
- Typical values are rates or proportions
- Numerator events
- Denominador pop at risk
- Log maps?
8Log rate of motor vehicle accident death per
100.000 residents, 1990-92
9Log ratio of homicide death of males 15-49 per
100.000 residents of same group age, 1990-92
10Models of Discrete Spatial Variation
Random variable in area i
- n of ill people
- n of newborn babies
- per capita income
Source Renato Assunção (UFMG/Brasil)
11Dealing with rates and proportions
When the study variable is a rate or a
proportion, mapping those rates is the first
obvious step in any analysis. However, the use of
raw observed rates might be misleading, since the
variability of those rates will be a function of
the population counts, which differs widely
between the areas. Bailey,1995
12Source Fred Ramos (CEDEST/Brasil)
13Model-Driven Approaches
- Model of discrete spatial variation
- Each subregion is described by is a statistical
distribution Zi - e.g., homicides numbers are Poisson (?, ?).
- The main objective of the analysis is to estimate
the joint distribution of random variables Z
Z1,,Zn - We use a model-driven approach to correct the
missing data - It is called the Empirical Bayes method...
- We could also use the Full Bayes method (but
that is another story...)
14(measured rate)
i
In Bayesian statistics, the best estimate
of the true and unknown rate is
where
Source Fred Ramos (CEDEST/Brasil)
15Empirical Bayes
Simplifying assumptions for estimating means and
variances for all random variables of all areas
(Marshall, 1991)
Source Fred Ramos (CEDEST/Brasil)
16Source Fred Ramos (CEDEST/Brasil)
17Infant Mortality Rate São Paulo (Raw)
Source Fred Ramos (CEDEST/Brasil)
18Infant Mortality Rate São Paulo (Corrected)
Source Fred Ramos (CEDEST/Brasil)
19Some Important Questions
- How does scale matter?
- How do the spatial partitions matter?
- How does proximity matter?
- What can we learn by studing how multiple data
vary in space? - How much prior assumptions can we impose in our
spatial data?
20A Question of Scale
Problema das Unidades de Área Modificáveis - MAUP
- A basic problem with areal data
- The spatial definition of the frontiers of the
areas impacts the results - Different results can be obtained by just
changing the frontiers of these zones. - This problem is known as the the modifiable area
unit problem
21Scale Effects
Per capita income
Jobs/ population
Illiterate / population
Source Fred Ramos (CEDEST/Brasil)
22Scale Effects
Per capita income
Jobs/ population
Illiterate / population
Source Fred Ramos (CEDEST/Brasil)
23Scale Effects Figthing the MAUP
Population gt60 years
Illiterates
per capita income
270 ZONES OD97
Source Fred Ramos (CEDEST/Brasil)
24Scale Effects Figthing the MAUP
Population gt60 years
Illiterates
per capita income
96 DISTRICTS OF SÃO PAULO
Source Fred Ramos (CEDEST/Brasil)
25Scale Effects Figthing the MAUP
Source Fred Ramos (CEDEST/Brasil)
Population gt60 years
Illiterates
per capita income
96 INCOME-HOMOGENOUS ZONES IN SÃO PAULO
26Correlation matrices
270 ZONES OD97
VARIABLES
A) Percentage of population 60 year-old or
more B) Percentage of illiterate population C)
Per capita individual income
96 DISTRICTS
96 INCOME-AGGREGATED
Source Fred Ramos (CEDEST/Brasil)
27A Questão da Escala
Get census data
Adaptation
Identify inter-tract variation
Reduce data variability
Minimize the outlier effect
28Regionalization
- Reagregate N small areas (finest scale available)
into M bigger regions to reduce scale effects. - A possible solution constrained clustering
29Regionalization Maps as graphs
30Regionalization Maps as graphs
Simple aggregation
Population-constrained aggregation