Spatial Data Analysis of Areas: Regression - PowerPoint PPT Presentation

About This Presentation
Title:

Spatial Data Analysis of Areas: Regression

Description:

Dependent variable (Y) determined by independent variables X1,X2 (e.g., Y = mX b) ... The betas vary in space (each location has a different coeficient) ... – PowerPoint PPT presentation

Number of Views:165
Avg rating:3.0/5.0
Slides: 61
Provided by: gcam9
Category:

less

Transcript and Presenter's Notes

Title: Spatial Data Analysis of Areas: Regression


1
Spatial Data Analysis of Areas Regression
2
Introduction
  • Basic Idea
  • Dependent variable (Y) determined by independent
    variables X1,X2 (e.g., Y mX b).
  • Uses of regression
  • Description
  • Control
  • Prediction

3
Simple Linear Regression
  • Yi?0?1Xi ?i
  • Yi value of dependent variable on trial i
  • ?0, ?1 (unknown parameters)
  • Xi value of independent variable on trial i
  • ?i ith error term (unexplained variation),
    where
  • E ?i0,
  • ? 2(?i) ? 2
  • error terms are N(0, ?2)

basic model
4
Multiple Regression
Basic Model
  • Yi is the ith observation of the dependent
    variable
  • are parameters
  • are observations of the
    ind variables
  • are independent and normal

estimated model
ith residual
5
Sometimes we need to transform the data
Scatter plots (a) Y versus PORC3_NR (percentage
of large farms in number ) (b) log10 Y versus
log 10 (PORC3_NR).
Predicted versus Observed Plots (a) model with
variables not transformed) R2 0.61 (b) Model
7 R2 0.85.
6
Precision of estimates and fit
  • Analysis of variation
  • Sum of squares of Y Sum of squares of
    estimate Sum of squares of residuals
  • Dividing both sides by TSS (sum of squares of Y)
  • 1 ESS/TSS RSS/TSS
  • where ESS/TSS r2 (coefficient of determination)
  • r2 gives the proportion of total variation
    explained by the sample regression equation.
  • The closer is r2 to 1.00, the better the fit.

7
Analysis of Residuals
  • It is a good idea to plot the residuals against
    the independent variables to see if they show a
    trend.
  • Possible behaviors
  • Correlation (e.g., the higher the independent
    variable, the higher the residual)
  • Nonlinearity
  • Heteroskedacity (i.e., the variance of the
    residual increases or decreases with the
    independent variable).
  • Regression assumes that residuals are constant
    variance and normally distributed.

8
Good Residual Plot
9
Nonlinearity
0.25
0.2
0.15
0.1
residual
0.05
0
-0.05
0
20
40
60
-0.1
-0.15
X
10
Heteroskedacity
1
0.5
residual
0
0
20
40
60
-0.5
X
-1
11
Regression with Spatial Data Understanding
Deforestation in Amazonia
12
The forest...
13
(No Transcript)
14
The rains...
15
The rivers...
16
Deforestation...
17
Fire...
18
Fire...
19
Amazon Deforestation 2003
Deforestation 2002/2003
Deforestation until 2002
Fonte INPE PRODES Digital, 2004.
20
What Drives Tropical Deforestation?
of the cases
? 5 10 50
Underlying Factors driving proximate causes
Causative interlinkages at proximate/underlying
levels
Internal drivers
If less than 5of cases, not depicted here.
sourceGeist Lambin
21
1 9 7 3
22
1 9 9 1
Courtesy INPE/OBT
23
1 9 9 9
Courtesy INPE/OBT
24
Deforestation in Amazonia
PRODES (Total 1997) 532.086 km2 PRODES (Total
2001) 607.957 km2
25
Modelling Tropical Deforestation
Coarse 100 km x 100 km grid
Fine 25 km x 25 km grid
26
Amazônia in 2015?
fonte Aguiar et al., 2004
27
Factors Affecting Deforestation
28
Coarse resolution candidate models
29
Coarse resolution Hot-spots map
30
Modelling Deforestation in Amazonia
  • High coefficients of multiple determination were
    obtained on all models built (R2 from 0.80 to
    0.86).
  • The main factors identified were
  • Population density
  • Connection to national markets
  • Climatic conditions
  • Indicators related to land distribution between
    large and small farmers.
  • The main current agricultural frontier areas, in
    Pará and Amazonas States, where intense
    deforestation processes are taking place now were
    correctly identified as hot-spots of change. 

31
Spatial regression models
32
Spatial regression
  • Specifying the Structure of Spatial dependence
  • which locations/observations interact
  • Testing for the Presence of Spatial Dependence
  • what type of dependence, what is the alternative
  • Estimating Models with Spatial Dependence
  • spatial lag, spatial error, higher order
  • Spatial Prediction
  • interpolation, missing values

source Luc Anselin
33
Nonspatial regression
  • Objective
  • Predict the behaviour of a response variable,
    given a set of known factors (explanatory
    variables).
  • Multivariate nonspatial models
  • yk ?0 ?1x1k ?ixik ?i
  • yk estimate of response variable for object k
  • ?i regression coefficient for factor i
  • xi explanatory variable i for region k
  • ?k random error
  • Adjustment quality

n
S
(
y

y
)
2
i
i
i

1
R

1

2
n
S
2
(
y

y
)
i
i
i

1
34
Nonspatial regression hypotheses
  • Y X? ? (model)
  • Explanatory variables are linearly independent
  • Y - vector of samples of response variable (n x
    1)
  • X matrix of explanatory variables (n x k)
  • ? - coefficient vector (k x 1)
  • ? - error vector (n x 1)
  • E(?i ) 0 ( expected value)
  • ?i N( 0, ?i2 ) (normal distribution)

35
Generalized linear models
  • g(Y) X? U
  • Response is some function of the explanatory
    variables
  • g(.) is a link function
  • Ex logarithm function
  • U error vector
  • ?(U) 0 (expected value)
  • ?(UUT ) C (covariance matrix)
  • if C ?2 I, the error is homoskedastic

36
Spatial regression
  • Spatial effects
  • What happens if the original data is spatially
    autocorrelated?
  • The results will be influenced, showing
    statistical associated where there is none
  • How can we evaluate the spatial effects?
  • Measure the spatial autocorrelation (Morans I)
    of the regression residuals

37
Regression using spatial data
  • Try a linear model first
  • Adjust the model and calculate residuals
  • Are the residuals spatially autocorrelated?
  • No, were OK
  • Yes, nonspatial model will be biased and we
    should propose a spatial model

38
Spatial dependence
  • Estimating the Form/Extent of Spatial Interaction
  • substantive spatial dependence
  • spatial lag models
  • Correcting for the Effect of Spatial Spill-overs
  • spatial dependence as a nuisance
  • spatial error models

source Luc Anselin
39
Spatial dependence
  • Substantive Spatial Dependence
  • lag dependence
  • include Wy as explanatory variable in regression
  • y ?Wy Xß e
  • Dependence as a Nuisance
  • error dependence
  • non-spherical error variance
  • Eee O
  • where O incorporates dependence structure

40
Interpretation of spatial lag
  • True Contagion
  • related to economic-behavioral process
  • only meaningful if areal units appropriate
    (ecological fallacy)
  • interesting economic interpretation (substantive)
  • Apparent Contagion
  • scale problem, spatial filtering

source Luc Anselin
41
Interpretation of Spatial Error
  • Spill-Over in Ignored Variables
  • poor match process with unit of observation or
    level of aggregation
  • apparent contagion regional structural change
  • economic interpretation less interesting nuisance
    parameter
  • Common in Empirical Practice

source Luc Anselin
42
Cost of ignoring spatial dependence
  • Ignoring Spatial Lag
  • omitted variable problem
  • OLS estimates biased and inconsistent
  • Ignoring Spatial Error
  • efficiency problem
  • OLS still unbiased, but inefficient
  • OLS standard errors and t-tests biased

source Luc Anselin
43
Spatial regression models
  • Incorporate spatial dependency
  • Spatial lag model
  • Two explanatory terms
  • One is the variable at the neighborhood
  • Second is the other variables

44
Spatial regimes
  • Extension of the non-spatial regression model
  • Considers clusters of areas
  • Groups each cluster in a different explanatory
    variable
  • yi ?0 ?1x1 ?ixi ?i
  • Gets different parameters for each cluster

45
A study of the spatially varying relationship
between homicide rates and socio-economic data of
São Paulo using GWR
Frederico Roman Ramos CEDEST/Brasil
46
Geographically Weighted Regression
  • Extensão of traditional regression model where
    the parameters are estimaded locally
  • (ui,vi) are the geographical coordinates of point
    i.
  • The betas vary in space (each location has a
    different coeficient)
  • We estimate an ordinary regression for each point
    where the neighbours have more weight

47
Introducing São Paulo
Some numbers Metropolitan region Population
17,878,703 (ibge,200) 39 municipalities Municipali
ty of São Paulo Population 10,434,252 HDI_M
0.841 (pnud, 2000) 96 districts IEX 74 out of
96 districts were classified as socially excluded
(cedest,2002) 4,637 homicide victims in 2001
48
Data
4,637 homicide victims residence geoadressed 2001
456 Census Sample Tracts 2000
49
Density surface of victim-based homicides
50
Victim-based homicide rate (Tx_homic)
Tx_homic count homicide events (2001)
100.000 population (census, 2000)
51
LISA Victim-based homicide rate
52
Percentage of illiterate house-head (Xanlf)
Definition House-head is the person responsible
for the house. Generally, but not necessarily,
who has the highest income of the house
53
LISA Percentage of illiterate house-head
54
OLS regression results for TX_homic and X_analf
55
OLS regression results for TX_homic and X_analf
56
LISA for standardized residuals of the OLS
regression for TX_homic and X_analf
Moran0,2624
57
GWR regression results for TX_homic and Xanlf

GWR ESTIMATION

Fitting
Geographically Weighted Regression Model...
Number of observations............ 456 Number of
independent variables... 2 (Intercept is
variable 1) Bandwidth (in data units).........
0.0246524516 Number of locations to fit model..
456 Diagnostic information... Residual sum
of squares........ 111179.875 Effective number
of parameters.. 83.1309998 Sigma................
.......... 17.2677182 Akaike Information
Criterion... 4007.32139 Coefficient of
Determination... 0.699720224
58
GWR regression results for TX_homic and Xanlf
residuals
Moran -0,0303
59
GWR regression results for TX_homic and Xanlf
Local Beta1
Local t-value
60
CONCLUSIONS
  • There are significant differences in the
    relationship between violence rates and social
    territorial data over the intra-urban area of São
    Paulo
  • This results reinforces our hypotheses that we
    should avoid using general concepts
  • The GWR technique is a useful instrument in
    social territorial analysis
Write a Comment
User Comments (0)
About PowerShow.com