Spatial Statistics - PowerPoint PPT Presentation

1 / 85
About This Presentation
Title:

Spatial Statistics

Description:

Find the axis going through maximum dispersion (thus derive angle of rotation) ... Apply uniform or random grid over. area (A) with size of quadrats given by: ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 86
Provided by: sand94
Category:

less

Transcript and Presenter's Notes

Title: Spatial Statistics


1
Spatial Statistics
  • Modified from Dr. YU-FEN LI

2
Point Pattern Descriptors
  • Central tendency
  • Mean Center (Spatial Mean)
  • Weighted Mean Center
  • Median Center (Spatial Median) not used widely
    for its ambiguity
  • Consider n points

3
Central tendency Mean Center (Spatial Mean)
  • The two means of the coordinates define the
    location of the mean center as

4
Central tendency Weighted Mean Center
  • The two means of the coordinates define the
    location of the mean center as
  • where is the weight at point i

5
Point Pattern Descriptors
  • Dispersion and Orientation
  • Standard distance
  • Weighted standard distance
  • Standard deviational ellipse

6
Dispersion and Orientation Standard Distance
  • How points deviate from the mean center
  • Recall population standard deviation
  • is the mean center,

7
Dispersion and Orientation Weighted Standard
Distance
  • Points may have different attribute values that
    reflect the relative importance
  • is the weighted mean center,

8
Dispersion and Orientation Standard
Deviational Ellipse
  • Standard distance is a good single measure of the
    dispersion of the incidents around the mean
    center, but it does not capture any directional
    bias
  • The standard deviational ellipse gives dispersion
    in two dimensions and is defined by 3 parameters
  • Angle of rotation
  • Dispersion along major axis
  • Dispersion along minor axis

9
Dispersion and Orientation Standard
Deviational Ellipse
  • Basic concept is to
  • Find the axis going through maximum dispersion
    (thus derive angle of rotation)
  • Calculate standard deviation of the points along
    this axis (thus derive the length of major axis)
  • Calculate standard deviation of points along the
    axis perpendicular to major axis (thus derive the
    length of minor axis)

10
Statistical Methods in GIS
  • Point pattern analyzers
  • Location information only
  • Line pattern analyzers
  • Location Attribute information
  • Polygon pattern analyzers
  • Location Attribute information

11
POINT PATTERN ANALYZERS
  • Two primary approaches
  • Quadrat Analysis
  • based on observing the frequency distribution or
    density of points within a set of grids
  • Nearest Neighbor Analysis
  • based on distances of points

12
Quadrat Analysis (QA)
  • Point Density approach
  • The density measured by QA is compared with it of
    a random pattern

RANDOM
CLUSTERED
UNIFORM/ DISPERSED
13
Quadrat Analysis (QA)
Exhaustive census
Random sampling
14
Quadrat Analysis (QA)
  • Apply uniform or random grid over area (A) with
    size of quadrats given by
  • where r of points
  • width of square quadrat is
  • radius of circular quadrat is

15
Quadrat Analysis (QA) --Frequency distribution
comparison
  • Treat each cell as an observation and count the
    number of points within it
  • Compare observed frequencies in the quadrats with
    expected frequencies that would be generated by
  • a random process (modeled by the Poisson
    distribution)
  • a clustered process (e.g. one cell with r
    points, n-1 cells with 0 points) (n number of
    quadrats)
  • a uniform process (e.g. each cell has r/n
    points)
  • The standard Kolmogorov-Smirnov (K-S) test for
    comparing two frequency distributions can then be
    applied

16
Quadrat Analysis (QA) -- Kolmogorov-Smirnov (K-S)
Test
  • The test statistic D is simply given by
  • where Oi and Ei are the observed and expected
    cumulative proportions of the ith category in the
    two distributions.
  • i.e. the largest difference (irrespective of
    sign) between observed cumulative frequency and
    expected cumulative frequency

17
Kolmogorov-Smirnov Test (?1)
  • A. Situations in which the control and treatment
    groups do not differ in mean, but only in some
    other way. For example consider the datasets
  • controlA0.22, -0.87, -2.39, -1.79, 0.37,
    -1.54, 1.28, -0.31, -0.74, 1.72, 0.38, -0.17,
    -0.62, -1.10, 0.30, 0.15, 2.30, 0.19, -0.50,
    -0.09
  • treatmentA-5.13, -2.19, -2.43, -3.83, 0.50,
    -3.25, 4.32, 1.63, 5.18, -0.43, 7.11, 4.87,
    -3.10, -5.81, 3.76, 6.31,2.58, 0.07, 5.76, 3.50

18
Kolmogorov-Smirnov Test (?1)
  • There are then a few situations in which it is a
    mistake to trust the results of a t-test
  • Notice that both datasets are approximately
    balanced around zero evidently the mean in both
    cases is "near zero. However there is
    substantially more variation in the treatment
    group which ranges approximately from -6 to 6
    whereas the control group ranges approximately
    from -2½ to 2½. The datasets are different, but
    the t-test cannot see the difference.

19
Kolmogorov-Smirnov Test (?1)
20
Kolmogorov-Smirnov Test (?1)
  • the percentile plot of this data (in red) along
    with the behavior expected for the above
    lognormal distribution (in blue)

21
Kolmogorov-Smirnov Test (?2)
  • Situations in which the treatment and control
    groups are smallish datasets (say 20 items each)
    that differ in mean, but substantial non-normal
    distribution masks the difference. For example,
    consider the datasets
  • controlB1.26, 0.34, 0.70, 1.75, 50.57, 1.55,
    0.08, 0.42, 0.50, 3.20, 0.15, 0.49, 0.95, 0.24,
    1.37, 0.17, 6.98, 0.10, 0.94, 0.38
  • treatmentB 2.37, 2.16, 14.82, 1.73, 41.04,
    0.23, 1.32, 2.91, 39.41, 0.11, 27.44, 4.51, 0.51,
    4.50, 0.18, 14.68, 4.66, 1.30, 2.06, 1.19
  • These datasets were drawn from lognormal
    distributions that differ substantially in mean.
    The KS test detects this difference, the t-test
    does not. Of course, if the user knew that the
    data were non-normally distributed, s/he would
    know not to apply the t-test in the first place.

22
Kolmogorov-Smirnov Test (?2)
  • Sorted controlB0.08, 0.10, 0.15, 0.17, 0.24,
    0.34, 0.38, 0.42, 0.49, 0.50, 0.70, 0.94, 0.95,
    1.26, 1.37, 1.55, 1.75, 3.20, 6.98, 50.57

23
Kolmogorov-Smirnov Test (?2)
24
Kolmogorov-Smirnov Test (?2)
25
Kolmogorov-Smirnov Test (?2)
26
Kolmogorov-Smirnov Test (?2)
the percentile plot of this data (in red) along
with the behavior expected for the above
lognormal distribution (in blue).
27
Quadrat Analysis (QA) -- Kolmogorov-Smirnov (K-S)
Test
  • The critical value at the 5 level is given by
  • where n is the number of quadrats
  • in a two-sample case -- where n1 and n2 are the
    numbers of quadrats in the two sets of
    distributions

28
Quadrat Analysis Variance-Mean Ratio (VMR)
  • Test if the observed pattern is different from a
    random pattern (generated from a Poisson
    distribution which mean variance)
  • Treat each cell as an observation and count the
    number of points within it, to create the
    variable X
  • Calculate variance and mean of X, and create the
    variance to mean ratio variance / mean

29
Quadrat Analysis Variance-Mean Ratio (VMR)
  • For an uniform distribution, the variance is
    zero.
  • we expect a variance-mean ratio close to 0
  • For a random distribution, the variance and mean
    are the same.
  • we expect a variance-mean ratio around 1
  • For a clustered distribution, the variance is
    relatively large
  • we expect a variance-mean ratio above 1

30
Significance Test for VMR
  • the mean of the observed distribution
  • , where xi is the number
    of points in a quadrat, ni is the number of
    quadrats with xi points, and n is the total
    number of quadrats

31
Weakness of Quadrat Analysis
  • Results may depend on quadrat size and
    orientation
  • Is a measure of dispersion, and not really
    pattern, because it is based primarily on the
    density of points, and not their arrangement in
    relation to one another
  • Results in a single measure for the entire
    distribution, so variations within the region are
    not recognized (could have clustering locally in
    some areas, but not overall)

32
Weakness of Quadrat Analysis
  • For example, quadrat analysis cannot distinguish
    between these two, obviously different, patterns

33
Nearest-Neighbor Index (NNI)
  • Uses distances between points as its basis.
  • Compares the observed average distance between
    each point and its nearest neighbors with the
    expected average distance that would occur if the
    distribution were random
  • NNI r obs / r exp
  • For random pattern, NNI 1
  • For clustered pattern, NNI lt 1
  • For dispersed pattern, NNI gt 1

34
Nearest-Neighbor Index (NNI) Significance test
35
(No Transcript)
36
Nearest-Neighbor Index (NNI)
  • Advantages
  • NNI takes into account distance
  • No quadrat size problem to be concerned with
  • However, NNI not as good as might appear --
  • Index highly dependent on the boundary for the
    area
  • its size and its shape (perimeter)
  • Fundamentally based on only the mean distance
  • Doesnt incorporate local variations (could have
    clustering locally in some areas, but not
    overall)
  • Based on point location only and doesnt
    incorporate magnitude of phenomena at that point

37
Nearest-Neighbor Index (NNI)
  • An adjustment for edge effects available but
    does not solve all the problems

38
Nearest-Neighbor Index (NNI)
  • Some alternatives to the NNI are
  • the G and F functions, based on the entire
    frequency distribution of nearest neighbor
    distances, and
  • the K function based on all interpoint distances.

39
Spatial Autocorrelation
  • Most statistical analyses are based on the
    assumption that the values of observations in
    each sample are independent of one another
  • Positive spatial autocorrelation violates this,
    because samples taken from nearby areas are
    related to each other and are not independent

40
Spatial Autocorrelation
  • In ordinary least squares regression (OLS), for
    example, the correlation coefficients will be
    biased and their precision exaggerated
  • Bias implies correlation coefficients may be
    higher than they really are
  • They are biased because the areas with higher
    concentrations of events will have a greater
    impact on the model estimate
  • Exaggerated precision (lower standard error)
    implies they are more likely to be found
    statistically significant
  • they will overestimate precision because, since
    events tend to be concentrated, there are
    actually a fewer number of independent
    observations than is being assumed.

41
Spatial Autocorrelation
  • Several measures available
  • Join Count Statistic
  • Morans I
  • Gearys Ratio C
  • General (Getis-Ord) G
  • Anselins Local Index of Spatial Autocorrelation
    (LISA)

Discuss them later
42
LINE PATTERN ANALYZERS
  • Two general types of linear features
  • Vectors (lines with arrows)
  • Networks
  • Spatial attributes of linear features
  • Length
  • Orientation and Direction
  • Spatial attribute of network features
  • Connectivity or Topology

43
Spatial Attributes of Linear Features -- Length
  • Linear distance

(x1,y1 )
c
a
(x1,y2 )
(x2,y2 )
b
44
Spatial Attributes of Linear Features -- Length
  • Great circle distance D of locations A and B
  • where
  • a and b are the latitude readings of locations A
    and B
  • ?? is the absolute difference in longitude
    between A and B

45
Spatial Attributes of Linear Features
Orientation and Direction
  • Orientation
  • Directional
  • e.g. West-East orientation
  • Non-directional (from to )
  • e.g. To describe a fault line --
  • from location y to location x
  • from location x to
    location y
  • Direction
  • Dependent on the beginning and ending locations
  • from location y to location x
  • ? from location x to
    location y

46
Directional Statistics Directional Mean
Directional Mean Average direction of a set of
vectors
47
Directional Statistics Directional Mean
Y


?
X
48
Directional Statistics Circular Variance
  • Shows the angular variability of the set of
    vectors

Y
X
49
Directional Statistics Circular Variance
  • For a set of n vectors,
  • , all vectors have the same direction
    or no circular variability
  • , all vectors are in opposite
    directions

50
Network Analysis
  • Connectivity how different links are connected
  • Vertices junctions or nodes
  • Links/edges the lines joining the vertices

51
Connectivity Matrix (C)
  • Cij 1 if direct connect between i and j
  • Cij 0, otherwise

52
Connectivity Matrix (C)
  • C1 direct
  • C2 number of 2 step paths from i to j
  • Example from i to k to j is a 2 step path with
    one intermediate vertex k
  • C3 number of 3 step paths from i to j
  • Example from i to k to m to j is a 3 step path
    with two intermediate vertices

53
Network as a matrix
C2 C1 C1 C3 C2 C1 C4 C3 C1 C5 C4
C1 .
54
Minimally connected network
  • Each vertex is connected to the network, and
    there are no superfluous linkages
  • The minimum number of edges needed to create a
    network is V-1, one less than the number of
    vertices in the network i.e, eminV-15

55
Maximally connected network
  • Nonplanar
  • the maximum number of edges is
  • Directional

emax V(V-1)
  • Non-directional

emax V(V-1)/2
56
Maximally connected network
  • Planar --
  • the maximum number of edges is emax 3(V-2)

57
Gamma Index
  • Gamma index provides useful basic ratio for
    evaluating the relative connectivity of an entire
    network
  • Ratio between the number of edges actually in a
    given network and the maximum number possible in
    that network
  • ? actual edges/maximum edges
  • minimally connected network is
  • ? (V-1) / 3(V-2)

58
Alpha Index
  • compares the number of actual (fundamental)
    "circuits" with the maximum number of all
    possible fundamental circuits
  • ? (E - V 1) / (2V - 5), where 2V - 5 the
    maximum number of fundamental circuits

59
Diameter
  • the number of linkages or steps needed to connect
    the two most remote nodes in the network
  • the better connected the network, the lower the
    diameter

60
POLYGON PATTERN ANALYZERS
  • We will discuss the use of spatial statistics to
    describe and measure spatial patterns formed by
    geographic objects that are associated with areas
    or polygons.

61
Spatial Autocorrelation (SA) Spatial Weights
Matrices
  • SA measures the degree of sameness of attribute
    values among areal units (or polygons) within
    their neighborhood
  • Different ways of specifying spatial relationships

62
Neighborhood Definitions Adjacency Criterion
  • Immediate (first-order) neighbors of X
  • Rooks case
  • Queens case

63
Neighborhood Definitions Binary Connectivity
Matrix
  • C connectivity matrix with elements cij ,
  • cij 1 if the ith polygon is adjacent to the jth
    polygon
  • cij 0 if the ith polygon is NOT adjacent to the
    jth polygon
  • Symmetrical cij cji
  • Not efficiency

64
Neighborhood Definitions Stochastic Matrix
  • Row-standardized matrix (stochastic matrix)
  • Assume each neighbor exerts the same amount of
    influence
  • W spatial weights matrix with elements wij ,

65
Neighborhood Definitions Distance between
polygon centroids
  • For example,
  • Within a radius of 1 mile
  • Adjacency measure is just a binary representation
    of the distance measure
  • 1 zero distance between two neighboring units

66
Spatial Weights Matrices Centroid Distances
  • dij represents the distance between areal units i
    and j
  • Weight
  • Inversely proportional to the distance
  • Weight
  • Distance-decay spatial relationships diminish
    more than just proportionally to the distance

67
Space as a matrix
  • W where wij is some measure of interaction
  • adjacency
  • decreasing function of distance
  • invariant under rotation, displacement
  • readily obtained from a GIS

68
Spatial Autocorrelation (SA)
  • Univariate handle one variable and evaluate how
    that variable is correlated over space
  • Several measures available
  • Global measures SA stable across the study
    region
  • Join Count Statistic measure the magnitude of
    SA among polygons with binary nominal data
  • Morans I Index
  • Gearys Ratio C
  • G statistic

For interval or ratio data
69
Spatial Autocorrelation (SA)
  • Several measures available
  • Local measures may not stable over the study
    region
  • Local version of the G statistic
  • Local Index of Spatial Autocorrelation (LISA)
    local version of Morans I and Gearys Ratio C

70
Spatial Autocorrelation (SA)Joint Count
Statistics
  • Binary attribute data
  • WW
  • BW
  • BB
  • Compare the observed numbers of joints of various
    types (BB,WW, BW) with those expected from a
    random pattern

71
Applications of the W matrix
  • Spatial regression
  • add spatially lagged terms weighted by W
  • Anselins SPACESTAT
  • Moran and Geary indices of spatial dependence

72
Global spatial autocorrelation statistic --
Morans I
  • xi is the value of interval or ratio variable in
    areal unit i,
  • W is the sum of all elements of the spatial
    weights matrix (i.e. W??wij), and
  • n is the number of areal units

73
Global spatial autocorrelation statistic --
Morans I
  • I ranges from 1 to 1
  • If no spatial autocorrelation exists,
  • lt 0
  • inversely related to n
  • Z-test

74
Global spatial autocorrelation statistic
Gearys Ratio
  • xi is the value of interval or ratio variable in
    areal unit i,
  • W is the sum of all elements of the spatial
    weights matrix (i.e. W??wij), and
  • n is the number of areal units

75
Global spatial autocorrelation statistic --
Gearys Ratio
  • C ranges from 0 to 2
  • C0 indicates a perfect positive spatial
    autocorrelation when all neighboring values are
    the same
  • C2 indicates an extremely negative spatial
    autocorrelation
  • E(C)1, not affected by n
  • Z-test

76
Global spatial autocorrelation statistic
General G Statistic
  • Morans I Gearys C cannot tell HH vs LL as
    they are concerned with only whether neighboring
    values are similar or not
  • The general G-statistic
  • where wij(d)1 if areal unit j is within d from
    areal unit i o.w. wij(d)0.
  • Z-test

77
Local spatial autocorrelation statistic LISA
  • Local Index of Spatial Autocorrelation (LISA)
    local version of Morans I and Gearys Ratio C
  • Local Moran statistic for areal unit i
  • High clustering of similar values (all high or
    all low)
  • Low clustering of dissimilar values

78
Local spatial autocorrelation statistic LISA
  • Local Gearys Ratio C for areal unit i
  • Low clustering of similar values (all high or
    all low)
  • High clustering of dissimilar values

79
Local spatial autocorrelation statistic local
G-statistic
  • Local G-statistic for areal unit i
  • Standard Scores

80
Local spatial autocorrelation statistic local
G-statistic
  • Interpretation of standard scores for

81
More Discussions on GIS and Spatial Statistics
82
Spatial dependence
  • The First Law of Geography (Tobler)
  • all things are related but nearby things are more
    related than distant things
  • Acceptance of the null hypothesis of no spatial
    dependence is always a Type II error
  • Hell is a place with no spatial dependence

83
It's chilly today in Seattle
Spoken word
Text
Picture
x, y, T
84
Spatial heterogeneity
  • Uncontrolled variance over the Earths surface
  • There is no average place
  • Results depend explicitly on bounds
  • Places as samples
  • Consider the model
  • y a bx

85
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com