Spatial Statistics presentation

About This Presentation

Transcript and Presenter's Notes

Title: Spatial Statistics

1
Spatial Statistics

Modified from Dr. YU-FEN LI

2
Point Pattern Descriptors

Central tendency
Mean Center (Spatial Mean)
Weighted Mean Center
Median Center (Spatial Median) not used widely
for its ambiguity
Consider n points

3
Central tendency Mean Center (Spatial Mean)

The two means of the coordinates define the
location of the mean center as

4
Central tendency Weighted Mean Center

The two means of the coordinates define the
location of the mean center as
where is the weight at point i

5
Point Pattern Descriptors

Dispersion and Orientation
Standard distance
Weighted standard distance
Standard deviational ellipse

6
Dispersion and Orientation Standard Distance

How points deviate from the mean center
Recall population standard deviation
is the mean center,

7
Dispersion and Orientation Weighted Standard
Distance

Points may have different attribute values that
reflect the relative importance
is the weighted mean center,

8
Dispersion and Orientation Standard
Deviational Ellipse

Standard distance is a good single measure of the
dispersion of the incidents around the mean
center, but it does not capture any directional
bias
The standard deviational ellipse gives dispersion
in two dimensions and is defined by 3 parameters
Angle of rotation
Dispersion along major axis
Dispersion along minor axis

9
Dispersion and Orientation Standard
Deviational Ellipse

Basic concept is to
Find the axis going through maximum dispersion
(thus derive angle of rotation)
Calculate standard deviation of the points along
this axis (thus derive the length of major axis)

Calculate standard deviation of points along the
axis perpendicular to major axis (thus derive the
length of minor axis)

10
Statistical Methods in GIS

Point pattern analyzers
Location information only
Line pattern analyzers
Location Attribute information
Polygon pattern analyzers
Location Attribute information

11
POINT PATTERN ANALYZERS

Two primary approaches
Quadrat Analysis
based on observing the frequency distribution or
density of points within a set of grids
Nearest Neighbor Analysis
based on distances of points

12
Quadrat Analysis (QA)

Point Density approach
The density measured by QA is compared with it of
a random pattern

RANDOM
CLUSTERED
UNIFORM/ DISPERSED
13
Quadrat Analysis (QA)
Exhaustive census
Random sampling
14
Quadrat Analysis (QA)

Apply uniform or random grid over area (A) with
size of quadrats given by
where r of points
width of square quadrat is
radius of circular quadrat is

15
Quadrat Analysis (QA) --Frequency distribution
comparison

Treat each cell as an observation and count the
number of points within it
Compare observed frequencies in the quadrats with
expected frequencies that would be generated by
a random process (modeled by the Poisson
distribution)
a clustered process (e.g. one cell with r
points, n-1 cells with 0 points) (n number of
quadrats)
a uniform process (e.g. each cell has r/n
points)
The standard Kolmogorov-Smirnov (K-S) test for
comparing two frequency distributions can then be
applied

16
Quadrat Analysis (QA) -- Kolmogorov-Smirnov (K-S)
Test

The test statistic D is simply given by
where Oi and Ei are the observed and expected
cumulative proportions of the ith category in the
two distributions.
i.e. the largest difference (irrespective of
sign) between observed cumulative frequency and
expected cumulative frequency

17
Kolmogorov-Smirnov Test (?1)

A. Situations in which the control and treatment
groups do not differ in mean, but only in some
other way. For example consider the datasets
controlA0.22, -0.87, -2.39, -1.79, 0.37,
-1.54, 1.28, -0.31, -0.74, 1.72, 0.38, -0.17,
-0.62, -1.10, 0.30, 0.15, 2.30, 0.19, -0.50,
-0.09
treatmentA-5.13, -2.19, -2.43, -3.83, 0.50,
-3.25, 4.32, 1.63, 5.18, -0.43, 7.11, 4.87,
-3.10, -5.81, 3.76, 6.31,2.58, 0.07, 5.76, 3.50

18
Kolmogorov-Smirnov Test (?1)

There are then a few situations in which it is a
mistake to trust the results of a t-test
Notice that both datasets are approximately
balanced around zero evidently the mean in both
cases is "near zero. However there is
substantially more variation in the treatment
group which ranges approximately from -6 to 6
whereas the control group ranges approximately
from -2½ to 2½. The datasets are different, but
the t-test cannot see the difference.

19
Kolmogorov-Smirnov Test (?1)
20
Kolmogorov-Smirnov Test (?1)

the percentile plot of this data (in red) along
with the behavior expected for the above
lognormal distribution (in blue)

21
Kolmogorov-Smirnov Test (?2)

Situations in which the treatment and control
groups are smallish datasets (say 20 items each)
that differ in mean, but substantial non-normal
distribution masks the difference. For example,
consider the datasets
controlB1.26, 0.34, 0.70, 1.75, 50.57, 1.55,
0.08, 0.42, 0.50, 3.20, 0.15, 0.49, 0.95, 0.24,
1.37, 0.17, 6.98, 0.10, 0.94, 0.38
treatmentB 2.37, 2.16, 14.82, 1.73, 41.04,
0.23, 1.32, 2.91, 39.41, 0.11, 27.44, 4.51, 0.51,
4.50, 0.18, 14.68, 4.66, 1.30, 2.06, 1.19
These datasets were drawn from lognormal
distributions that differ substantially in mean.
The KS test detects this difference, the t-test
does not. Of course, if the user knew that the
data were non-normally distributed, s/he would
know not to apply the t-test in the first place.

22
Kolmogorov-Smirnov Test (?2)

Sorted controlB0.08, 0.10, 0.15, 0.17, 0.24,
0.34, 0.38, 0.42, 0.49, 0.50, 0.70, 0.94, 0.95,
1.26, 1.37, 1.55, 1.75, 3.20, 6.98, 50.57

23
Kolmogorov-Smirnov Test (?2)
24
Kolmogorov-Smirnov Test (?2)
25
Kolmogorov-Smirnov Test (?2)
26
Kolmogorov-Smirnov Test (?2)
the percentile plot of this data (in red) along
with the behavior expected for the above
lognormal distribution (in blue).
27
Quadrat Analysis (QA) -- Kolmogorov-Smirnov (K-S)
Test

The critical value at the 5 level is given by
where n is the number of quadrats
in a two-sample case -- where n1 and n2 are the
numbers of quadrats in the two sets of
distributions

28
Quadrat Analysis Variance-Mean Ratio (VMR)

Test if the observed pattern is different from a
random pattern (generated from a Poisson
distribution which mean variance)
Treat each cell as an observation and count the
number of points within it, to create the
variable X
Calculate variance and mean of X, and create the
variance to mean ratio variance / mean

29
Quadrat Analysis Variance-Mean Ratio (VMR)

For an uniform distribution, the variance is
zero.
we expect a variance-mean ratio close to 0
For a random distribution, the variance and mean
are the same.
we expect a variance-mean ratio around 1
For a clustered distribution, the variance is
relatively large
we expect a variance-mean ratio above 1

30
Significance Test for VMR

the mean of the observed distribution
, where xi is the number
of points in a quadrat, ni is the number of
quadrats with xi points, and n is the total
number of quadrats

31
Weakness of Quadrat Analysis

Results may depend on quadrat size and
orientation
Is a measure of dispersion, and not really
pattern, because it is based primarily on the
density of points, and not their arrangement in
relation to one another
Results in a single measure for the entire
distribution, so variations within the region are
not recognized (could have clustering locally in
some areas, but not overall)

32
Weakness of Quadrat Analysis

For example, quadrat analysis cannot distinguish
between these two, obviously different, patterns

33
Nearest-Neighbor Index (NNI)

Uses distances between points as its basis.
Compares the observed average distance between
each point and its nearest neighbors with the
expected average distance that would occur if the
distribution were random
NNI r obs / r exp
For random pattern, NNI 1
For clustered pattern, NNI lt 1
For dispersed pattern, NNI gt 1

34
Nearest-Neighbor Index (NNI) Significance test
35
(No Transcript)
36
Nearest-Neighbor Index (NNI)

Advantages
NNI takes into account distance
No quadrat size problem to be concerned with
However, NNI not as good as might appear --
Index highly dependent on the boundary for the
area
its size and its shape (perimeter)
Fundamentally based on only the mean distance
Doesnt incorporate local variations (could have
clustering locally in some areas, but not
overall)
Based on point location only and doesnt
incorporate magnitude of phenomena at that point

37
Nearest-Neighbor Index (NNI)

An adjustment for edge effects available but
does not solve all the problems

38
Nearest-Neighbor Index (NNI)

Some alternatives to the NNI are
the G and F functions, based on the entire
frequency distribution of nearest neighbor
distances, and
the K function based on all interpoint distances.

39
Spatial Autocorrelation

Most statistical analyses are based on the
assumption that the values of observations in
each sample are independent of one another
Positive spatial autocorrelation violates this,
because samples taken from nearby areas are
related to each other and are not independent

40
Spatial Autocorrelation

In ordinary least squares regression (OLS), for
example, the correlation coefficients will be
biased and their precision exaggerated
Bias implies correlation coefficients may be
higher than they really are
They are biased because the areas with higher
concentrations of events will have a greater
impact on the model estimate
Exaggerated precision (lower standard error)
implies they are more likely to be found
statistically significant
they will overestimate precision because, since
events tend to be concentrated, there are
actually a fewer number of independent
observations than is being assumed.

41
Spatial Autocorrelation

Several measures available
Join Count Statistic
Morans I
Gearys Ratio C
General (Getis-Ord) G
Anselins Local Index of Spatial Autocorrelation
(LISA)

Discuss them later
42
LINE PATTERN ANALYZERS

Two general types of linear features
Vectors (lines with arrows)
Networks
Spatial attributes of linear features
Length
Orientation and Direction
Spatial attribute of network features
Connectivity or Topology

43
Spatial Attributes of Linear Features -- Length

Linear distance

(x1,y1 )
c
a
(x1,y2 )
(x2,y2 )
b
44
Spatial Attributes of Linear Features -- Length

Great circle distance D of locations A and B
where
a and b are the latitude readings of locations A
and B
?? is the absolute difference in longitude
between A and B

45
Spatial Attributes of Linear Features
Orientation and Direction

Orientation
Directional
e.g. West-East orientation
Non-directional (from to )
e.g. To describe a fault line --
from location y to location x
from location x to
location y
Direction
Dependent on the beginning and ending locations
from location y to location x
? from location x to
location y

46
Directional Statistics Directional Mean
Directional Mean Average direction of a set of
vectors
47
Directional Statistics Directional Mean
Y

?
X
48
Directional Statistics Circular Variance

Shows the angular variability of the set of
vectors

Y
X
49
Directional Statistics Circular Variance

For a set of n vectors,
, all vectors have the same direction
or no circular variability
, all vectors are in opposite
directions

50
Network Analysis

Connectivity how different links are connected
Vertices junctions or nodes
Links/edges the lines joining the vertices

51
Connectivity Matrix (C)

Cij 1 if direct connect between i and j
Cij 0, otherwise

52
Connectivity Matrix (C)

C1 direct
C2 number of 2 step paths from i to j
Example from i to k to j is a 2 step path with
one intermediate vertex k
C3 number of 3 step paths from i to j
Example from i to k to m to j is a 3 step path
with two intermediate vertices

53
Network as a matrix
C2 C1 C1 C3 C2 C1 C4 C3 C1 C5 C4
C1 .
54
Minimally connected network

Each vertex is connected to the network, and
there are no superfluous linkages
The minimum number of edges needed to create a
network is V-1, one less than the number of
vertices in the network i.e, eminV-15

55
Maximally connected network

Nonplanar
the maximum number of edges is

Directional

emax V(V-1)

Non-directional

emax V(V-1)/2
56
Maximally connected network

Planar --
the maximum number of edges is emax 3(V-2)

57
Gamma Index

Gamma index provides useful basic ratio for
evaluating the relative connectivity of an entire
network
Ratio between the number of edges actually in a
given network and the maximum number possible in
that network
? actual edges/maximum edges
minimally connected network is
? (V-1) / 3(V-2)

58
Alpha Index

compares the number of actual (fundamental)
"circuits" with the maximum number of all
possible fundamental circuits
? (E - V 1) / (2V - 5), where 2V - 5 the
maximum number of fundamental circuits

59
Diameter

the number of linkages or steps needed to connect
the two most remote nodes in the network
the better connected the network, the lower the
diameter

60
POLYGON PATTERN ANALYZERS

We will discuss the use of spatial statistics to
describe and measure spatial patterns formed by
geographic objects that are associated with areas
or polygons.

61
Spatial Autocorrelation (SA) Spatial Weights
Matrices

SA measures the degree of sameness of attribute
values among areal units (or polygons) within
their neighborhood
Different ways of specifying spatial relationships

62
Neighborhood Definitions Adjacency Criterion

Immediate (first-order) neighbors of X
Rooks case

Queens case

63
Neighborhood Definitions Binary Connectivity
Matrix

C connectivity matrix with elements cij ,
cij 1 if the ith polygon is adjacent to the jth
polygon
cij 0 if the ith polygon is NOT adjacent to the
jth polygon
Symmetrical cij cji
Not efficiency

64
Neighborhood Definitions Stochastic Matrix

Row-standardized matrix (stochastic matrix)
Assume each neighbor exerts the same amount of
influence
W spatial weights matrix with elements wij ,

65
Neighborhood Definitions Distance between
polygon centroids

For example,
Within a radius of 1 mile
Adjacency measure is just a binary representation
of the distance measure
1 zero distance between two neighboring units

66
Spatial Weights Matrices Centroid Distances

dij represents the distance between areal units i
and j
Weight
Inversely proportional to the distance
Weight
Distance-decay spatial relationships diminish
more than just proportionally to the distance

67
Space as a matrix

W where wij is some measure of interaction
adjacency
decreasing function of distance
invariant under rotation, displacement
readily obtained from a GIS

68
Spatial Autocorrelation (SA)

Univariate handle one variable and evaluate how
that variable is correlated over space
Several measures available
Global measures SA stable across the study
region
Join Count Statistic measure the magnitude of
SA among polygons with binary nominal data
Morans I Index
Gearys Ratio C
G statistic

For interval or ratio data
69
Spatial Autocorrelation (SA)

Several measures available
Local measures may not stable over the study
region
Local version of the G statistic
Local Index of Spatial Autocorrelation (LISA)
local version of Morans I and Gearys Ratio C

70
Spatial Autocorrelation (SA)Joint Count
Statistics

Binary attribute data
WW
BW
BB
Compare the observed numbers of joints of various
types (BB,WW, BW) with those expected from a
random pattern

71
Applications of the W matrix

Spatial regression
add spatially lagged terms weighted by W
Anselins SPACESTAT
Moran and Geary indices of spatial dependence

72
Global spatial autocorrelation statistic --
Morans I

xi is the value of interval or ratio variable in
areal unit i,
W is the sum of all elements of the spatial
weights matrix (i.e. W??wij), and
n is the number of areal units

73
Global spatial autocorrelation statistic --
Morans I

I ranges from 1 to 1
If no spatial autocorrelation exists,
lt 0
inversely related to n
Z-test

74
Global spatial autocorrelation statistic
Gearys Ratio

xi is the value of interval or ratio variable in
areal unit i,
W is the sum of all elements of the spatial
weights matrix (i.e. W??wij), and
n is the number of areal units

75
Global spatial autocorrelation statistic --
Gearys Ratio

C ranges from 0 to 2
C0 indicates a perfect positive spatial
autocorrelation when all neighboring values are
the same
C2 indicates an extremely negative spatial
autocorrelation
E(C)1, not affected by n
Z-test

76
Global spatial autocorrelation statistic
General G Statistic

Morans I Gearys C cannot tell HH vs LL as
they are concerned with only whether neighboring
values are similar or not
The general G-statistic
where wij(d)1 if areal unit j is within d from
areal unit i o.w. wij(d)0.
Z-test

77
Local spatial autocorrelation statistic LISA

Local Index of Spatial Autocorrelation (LISA)
local version of Morans I and Gearys Ratio C
Local Moran statistic for areal unit i
High clustering of similar values (all high or
all low)
Low clustering of dissimilar values

78
Local spatial autocorrelation statistic LISA

Local Gearys Ratio C for areal unit i
Low clustering of similar values (all high or
all low)
High clustering of dissimilar values

79
Local spatial autocorrelation statistic local
G-statistic

Local G-statistic for areal unit i
Standard Scores

80
Local spatial autocorrelation statistic local
G-statistic

Interpretation of standard scores for

81
More Discussions on GIS and Spatial Statistics
82
Spatial dependence

The First Law of Geography (Tobler)
all things are related but nearby things are more
related than distant things
Acceptance of the null hypothesis of no spatial
dependence is always a Type II error
Hell is a place with no spatial dependence

83
It's chilly today in Seattle
Spoken word
Text
Picture
x, y, T
84
Spatial heterogeneity

Uncontrolled variance over the Earths surface
There is no average place
Results depend explicitly on bounds
Places as samples
Consider the model
y a bx

85
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com

Spatial Statistics PowerPoint PPT Presentation