Title: Spatial Statistics
1Spatial Statistics
- Modified from Dr. YU-FEN LI
2Point Pattern Descriptors
- Central tendency
- Mean Center (Spatial Mean)
- Weighted Mean Center
- Median Center (Spatial Median) not used widely
for its ambiguity - Consider n points
3Central tendency Mean Center (Spatial Mean)
- The two means of the coordinates define the
location of the mean center as
4Central tendency Weighted Mean Center
- The two means of the coordinates define the
location of the mean center as - where is the weight at point i
5Point Pattern Descriptors
- Dispersion and Orientation
- Standard distance
- Weighted standard distance
- Standard deviational ellipse
6Dispersion and Orientation Standard Distance
- How points deviate from the mean center
- Recall population standard deviation
- is the mean center,
7Dispersion and Orientation Weighted Standard
Distance
- Points may have different attribute values that
reflect the relative importance - is the weighted mean center,
8Dispersion and Orientation Standard
Deviational Ellipse
- Standard distance is a good single measure of the
dispersion of the incidents around the mean
center, but it does not capture any directional
bias - The standard deviational ellipse gives dispersion
in two dimensions and is defined by 3 parameters - Angle of rotation
- Dispersion along major axis
- Dispersion along minor axis
9Dispersion and Orientation Standard
Deviational Ellipse
- Basic concept is to
- Find the axis going through maximum dispersion
(thus derive angle of rotation) - Calculate standard deviation of the points along
this axis (thus derive the length of major axis)
- Calculate standard deviation of points along the
axis perpendicular to major axis (thus derive the
length of minor axis)
10Statistical Methods in GIS
- Point pattern analyzers
- Location information only
- Line pattern analyzers
- Location Attribute information
- Polygon pattern analyzers
- Location Attribute information
11POINT PATTERN ANALYZERS
- Two primary approaches
- Quadrat Analysis
- based on observing the frequency distribution or
density of points within a set of grids - Nearest Neighbor Analysis
- based on distances of points
12Quadrat Analysis (QA)
- Point Density approach
- The density measured by QA is compared with it of
a random pattern
RANDOM
CLUSTERED
UNIFORM/ DISPERSED
13Quadrat Analysis (QA)
Exhaustive census
Random sampling
14Quadrat Analysis (QA)
- Apply uniform or random grid over area (A) with
size of quadrats given by - where r of points
- width of square quadrat is
- radius of circular quadrat is
15Quadrat Analysis (QA) --Frequency distribution
comparison
- Treat each cell as an observation and count the
number of points within it - Compare observed frequencies in the quadrats with
expected frequencies that would be generated by - a random process (modeled by the Poisson
distribution) - a clustered process (e.g. one cell with r
points, n-1 cells with 0 points) (n number of
quadrats) - a uniform process (e.g. each cell has r/n
points) - The standard Kolmogorov-Smirnov (K-S) test for
comparing two frequency distributions can then be
applied
16Quadrat Analysis (QA) -- Kolmogorov-Smirnov (K-S)
Test
- The test statistic D is simply given by
- where Oi and Ei are the observed and expected
cumulative proportions of the ith category in the
two distributions. - i.e. the largest difference (irrespective of
sign) between observed cumulative frequency and
expected cumulative frequency
17Kolmogorov-Smirnov Test (?1)
- A. Situations in which the control and treatment
groups do not differ in mean, but only in some
other way. For example consider the datasets - controlA0.22, -0.87, -2.39, -1.79, 0.37,
-1.54, 1.28, -0.31, -0.74, 1.72, 0.38, -0.17,
-0.62, -1.10, 0.30, 0.15, 2.30, 0.19, -0.50,
-0.09 - treatmentA-5.13, -2.19, -2.43, -3.83, 0.50,
-3.25, 4.32, 1.63, 5.18, -0.43, 7.11, 4.87,
-3.10, -5.81, 3.76, 6.31,2.58, 0.07, 5.76, 3.50
18Kolmogorov-Smirnov Test (?1)
- There are then a few situations in which it is a
mistake to trust the results of a t-test - Notice that both datasets are approximately
balanced around zero evidently the mean in both
cases is "near zero. However there is
substantially more variation in the treatment
group which ranges approximately from -6 to 6
whereas the control group ranges approximately
from -2½ to 2½. The datasets are different, but
the t-test cannot see the difference.
19Kolmogorov-Smirnov Test (?1)
20Kolmogorov-Smirnov Test (?1)
- the percentile plot of this data (in red) along
with the behavior expected for the above
lognormal distribution (in blue)
21Kolmogorov-Smirnov Test (?2)
- Situations in which the treatment and control
groups are smallish datasets (say 20 items each)
that differ in mean, but substantial non-normal
distribution masks the difference. For example,
consider the datasets - controlB1.26, 0.34, 0.70, 1.75, 50.57, 1.55,
0.08, 0.42, 0.50, 3.20, 0.15, 0.49, 0.95, 0.24,
1.37, 0.17, 6.98, 0.10, 0.94, 0.38 - treatmentB 2.37, 2.16, 14.82, 1.73, 41.04,
0.23, 1.32, 2.91, 39.41, 0.11, 27.44, 4.51, 0.51,
4.50, 0.18, 14.68, 4.66, 1.30, 2.06, 1.19 - These datasets were drawn from lognormal
distributions that differ substantially in mean.
The KS test detects this difference, the t-test
does not. Of course, if the user knew that the
data were non-normally distributed, s/he would
know not to apply the t-test in the first place.
22Kolmogorov-Smirnov Test (?2)
- Sorted controlB0.08, 0.10, 0.15, 0.17, 0.24,
0.34, 0.38, 0.42, 0.49, 0.50, 0.70, 0.94, 0.95,
1.26, 1.37, 1.55, 1.75, 3.20, 6.98, 50.57
23Kolmogorov-Smirnov Test (?2)
24Kolmogorov-Smirnov Test (?2)
25Kolmogorov-Smirnov Test (?2)
26Kolmogorov-Smirnov Test (?2)
the percentile plot of this data (in red) along
with the behavior expected for the above
lognormal distribution (in blue).
27Quadrat Analysis (QA) -- Kolmogorov-Smirnov (K-S)
Test
- The critical value at the 5 level is given by
-
- where n is the number of quadrats
-
- in a two-sample case -- where n1 and n2 are the
numbers of quadrats in the two sets of
distributions
28Quadrat Analysis Variance-Mean Ratio (VMR)
- Test if the observed pattern is different from a
random pattern (generated from a Poisson
distribution which mean variance) - Treat each cell as an observation and count the
number of points within it, to create the
variable X - Calculate variance and mean of X, and create the
variance to mean ratio variance / mean
29Quadrat Analysis Variance-Mean Ratio (VMR)
- For an uniform distribution, the variance is
zero. - we expect a variance-mean ratio close to 0
- For a random distribution, the variance and mean
are the same. - we expect a variance-mean ratio around 1
- For a clustered distribution, the variance is
relatively large - we expect a variance-mean ratio above 1
30Significance Test for VMR
- the mean of the observed distribution
- , where xi is the number
of points in a quadrat, ni is the number of
quadrats with xi points, and n is the total
number of quadrats -
31Weakness of Quadrat Analysis
- Results may depend on quadrat size and
orientation - Is a measure of dispersion, and not really
pattern, because it is based primarily on the
density of points, and not their arrangement in
relation to one another - Results in a single measure for the entire
distribution, so variations within the region are
not recognized (could have clustering locally in
some areas, but not overall)
32Weakness of Quadrat Analysis
- For example, quadrat analysis cannot distinguish
between these two, obviously different, patterns
33Nearest-Neighbor Index (NNI)
- Uses distances between points as its basis.
- Compares the observed average distance between
each point and its nearest neighbors with the
expected average distance that would occur if the
distribution were random - NNI r obs / r exp
- For random pattern, NNI 1
- For clustered pattern, NNI lt 1
- For dispersed pattern, NNI gt 1
34Nearest-Neighbor Index (NNI) Significance test
35(No Transcript)
36Nearest-Neighbor Index (NNI)
- Advantages
- NNI takes into account distance
- No quadrat size problem to be concerned with
- However, NNI not as good as might appear --
- Index highly dependent on the boundary for the
area - its size and its shape (perimeter)
- Fundamentally based on only the mean distance
- Doesnt incorporate local variations (could have
clustering locally in some areas, but not
overall) - Based on point location only and doesnt
incorporate magnitude of phenomena at that point
37Nearest-Neighbor Index (NNI)
- An adjustment for edge effects available but
does not solve all the problems
38Nearest-Neighbor Index (NNI)
- Some alternatives to the NNI are
- the G and F functions, based on the entire
frequency distribution of nearest neighbor
distances, and - the K function based on all interpoint distances.
39Spatial Autocorrelation
- Most statistical analyses are based on the
assumption that the values of observations in
each sample are independent of one another - Positive spatial autocorrelation violates this,
because samples taken from nearby areas are
related to each other and are not independent
40Spatial Autocorrelation
- In ordinary least squares regression (OLS), for
example, the correlation coefficients will be
biased and their precision exaggerated - Bias implies correlation coefficients may be
higher than they really are - They are biased because the areas with higher
concentrations of events will have a greater
impact on the model estimate - Exaggerated precision (lower standard error)
implies they are more likely to be found
statistically significant - they will overestimate precision because, since
events tend to be concentrated, there are
actually a fewer number of independent
observations than is being assumed.
41Spatial Autocorrelation
- Several measures available
- Join Count Statistic
- Morans I
- Gearys Ratio C
- General (Getis-Ord) G
- Anselins Local Index of Spatial Autocorrelation
(LISA)
Discuss them later
42LINE PATTERN ANALYZERS
- Two general types of linear features
- Vectors (lines with arrows)
- Networks
- Spatial attributes of linear features
- Length
- Orientation and Direction
- Spatial attribute of network features
- Connectivity or Topology
43Spatial Attributes of Linear Features -- Length
(x1,y1 )
c
a
(x1,y2 )
(x2,y2 )
b
44Spatial Attributes of Linear Features -- Length
- Great circle distance D of locations A and B
- where
- a and b are the latitude readings of locations A
and B - ?? is the absolute difference in longitude
between A and B
45Spatial Attributes of Linear Features
Orientation and Direction
- Orientation
- Directional
- e.g. West-East orientation
- Non-directional (from to )
- e.g. To describe a fault line --
- from location y to location x
- from location x to
location y - Direction
- Dependent on the beginning and ending locations
- from location y to location x
- ? from location x to
location y
46Directional Statistics Directional Mean
Directional Mean Average direction of a set of
vectors
47Directional Statistics Directional Mean
Y
?
X
48Directional Statistics Circular Variance
- Shows the angular variability of the set of
vectors
Y
X
49Directional Statistics Circular Variance
- For a set of n vectors,
-
- , all vectors have the same direction
or no circular variability - , all vectors are in opposite
directions
50Network Analysis
- Connectivity how different links are connected
- Vertices junctions or nodes
- Links/edges the lines joining the vertices
51Connectivity Matrix (C)
- Cij 1 if direct connect between i and j
- Cij 0, otherwise
52Connectivity Matrix (C)
- C1 direct
- C2 number of 2 step paths from i to j
- Example from i to k to j is a 2 step path with
one intermediate vertex k - C3 number of 3 step paths from i to j
- Example from i to k to m to j is a 3 step path
with two intermediate vertices
53Network as a matrix
C2 C1 C1 C3 C2 C1 C4 C3 C1 C5 C4
C1 .
54Minimally connected network
- Each vertex is connected to the network, and
there are no superfluous linkages - The minimum number of edges needed to create a
network is V-1, one less than the number of
vertices in the network i.e, eminV-15
55Maximally connected network
- Nonplanar
- the maximum number of edges is
emax V(V-1)
emax V(V-1)/2
56Maximally connected network
- Planar --
- the maximum number of edges is emax 3(V-2)
57Gamma Index
- Gamma index provides useful basic ratio for
evaluating the relative connectivity of an entire
network - Ratio between the number of edges actually in a
given network and the maximum number possible in
that network - ? actual edges/maximum edges
- minimally connected network is
- ? (V-1) / 3(V-2)
58Alpha Index
- compares the number of actual (fundamental)
"circuits" with the maximum number of all
possible fundamental circuits - ? (E - V 1) / (2V - 5), where 2V - 5 the
maximum number of fundamental circuits
59Diameter
- the number of linkages or steps needed to connect
the two most remote nodes in the network - the better connected the network, the lower the
diameter
60POLYGON PATTERN ANALYZERS
- We will discuss the use of spatial statistics to
describe and measure spatial patterns formed by
geographic objects that are associated with areas
or polygons.
61Spatial Autocorrelation (SA) Spatial Weights
Matrices
- SA measures the degree of sameness of attribute
values among areal units (or polygons) within
their neighborhood - Different ways of specifying spatial relationships
62Neighborhood Definitions Adjacency Criterion
- Immediate (first-order) neighbors of X
- Rooks case
63Neighborhood Definitions Binary Connectivity
Matrix
- C connectivity matrix with elements cij ,
- cij 1 if the ith polygon is adjacent to the jth
polygon - cij 0 if the ith polygon is NOT adjacent to the
jth polygon - Symmetrical cij cji
- Not efficiency
64Neighborhood Definitions Stochastic Matrix
- Row-standardized matrix (stochastic matrix)
- Assume each neighbor exerts the same amount of
influence - W spatial weights matrix with elements wij ,
65Neighborhood Definitions Distance between
polygon centroids
- For example,
- Within a radius of 1 mile
- Adjacency measure is just a binary representation
of the distance measure - 1 zero distance between two neighboring units
66Spatial Weights Matrices Centroid Distances
- dij represents the distance between areal units i
and j - Weight
- Inversely proportional to the distance
- Weight
- Distance-decay spatial relationships diminish
more than just proportionally to the distance
67Space as a matrix
- W where wij is some measure of interaction
- adjacency
- decreasing function of distance
- invariant under rotation, displacement
- readily obtained from a GIS
68Spatial Autocorrelation (SA)
- Univariate handle one variable and evaluate how
that variable is correlated over space - Several measures available
- Global measures SA stable across the study
region - Join Count Statistic measure the magnitude of
SA among polygons with binary nominal data - Morans I Index
- Gearys Ratio C
- G statistic
For interval or ratio data
69Spatial Autocorrelation (SA)
- Several measures available
- Local measures may not stable over the study
region - Local version of the G statistic
- Local Index of Spatial Autocorrelation (LISA)
local version of Morans I and Gearys Ratio C
70Spatial Autocorrelation (SA)Joint Count
Statistics
- Binary attribute data
- WW
- BW
- BB
- Compare the observed numbers of joints of various
types (BB,WW, BW) with those expected from a
random pattern
71Applications of the W matrix
- Spatial regression
- add spatially lagged terms weighted by W
- Anselins SPACESTAT
- Moran and Geary indices of spatial dependence
72Global spatial autocorrelation statistic --
Morans I
-
-
- xi is the value of interval or ratio variable in
areal unit i, - W is the sum of all elements of the spatial
weights matrix (i.e. W??wij), and - n is the number of areal units
73Global spatial autocorrelation statistic --
Morans I
- I ranges from 1 to 1
- If no spatial autocorrelation exists,
- lt 0
- inversely related to n
- Z-test
74Global spatial autocorrelation statistic
Gearys Ratio
-
-
- xi is the value of interval or ratio variable in
areal unit i, - W is the sum of all elements of the spatial
weights matrix (i.e. W??wij), and - n is the number of areal units
75Global spatial autocorrelation statistic --
Gearys Ratio
- C ranges from 0 to 2
- C0 indicates a perfect positive spatial
autocorrelation when all neighboring values are
the same - C2 indicates an extremely negative spatial
autocorrelation - E(C)1, not affected by n
- Z-test
76Global spatial autocorrelation statistic
General G Statistic
- Morans I Gearys C cannot tell HH vs LL as
they are concerned with only whether neighboring
values are similar or not - The general G-statistic
- where wij(d)1 if areal unit j is within d from
areal unit i o.w. wij(d)0. - Z-test
77Local spatial autocorrelation statistic LISA
- Local Index of Spatial Autocorrelation (LISA)
local version of Morans I and Gearys Ratio C - Local Moran statistic for areal unit i
- High clustering of similar values (all high or
all low) - Low clustering of dissimilar values
78Local spatial autocorrelation statistic LISA
- Local Gearys Ratio C for areal unit i
- Low clustering of similar values (all high or
all low) - High clustering of dissimilar values
79Local spatial autocorrelation statistic local
G-statistic
- Local G-statistic for areal unit i
- Standard Scores
80Local spatial autocorrelation statistic local
G-statistic
- Interpretation of standard scores for
81More Discussions on GIS and Spatial Statistics
82Spatial dependence
- The First Law of Geography (Tobler)
- all things are related but nearby things are more
related than distant things - Acceptance of the null hypothesis of no spatial
dependence is always a Type II error - Hell is a place with no spatial dependence
83It's chilly today in Seattle
Spoken word
Text
Picture
x, y, T
84Spatial heterogeneity
- Uncontrolled variance over the Earths surface
- There is no average place
- Results depend explicitly on bounds
- Places as samples
- Consider the model
- y a bx
85(No Transcript)