Statistical Analysis of Geographical Information - PowerPoint PPT Presentation

About This Presentation
Title:

Statistical Analysis of Geographical Information

Description:

Dr. Marina Gavrilova * Numerical scales of Geary s Ratio and Moran s I Spatial Patterns Geary s C Moran s I Clustered pattern in which adjacent or nearby ... – PowerPoint PPT presentation

Number of Views:304
Avg rating:3.0/5.0
Slides: 69
Provided by: cwa81
Category:

less

Transcript and Presenter's Notes

Title: Statistical Analysis of Geographical Information


1
Statistical Analysis of Geographical Information
  • Dr. Marina Gavrilova

2
Topics
  • Introduction
  • Distribution Descriptors One Variable
  • Relationship Descriptors Two Variables
  • Point Pattern Descriptors
  • Point Pattern Analyzers
  • Autocorrelation

3
Introdution quantitative measures to describe
data
  • Statistics classification
  • Classified by function
  • Description statistics
  • Inferential statistics
  • Classified by areas of application
  • Classical statistics sociology, political
    science, medicine and engineering.
  • Spatial statistics based on classical and
    extended to the spatially referenced data.
  • Geostatistics one kind of Spatial statistics and
    originated in geo-science.

4
Random and Systematic process
  • A certain phenomenon occurs Random process or
    Systematic Process?
  • Soil Example
  • Hypothesis soil fertility of a farm is low
  • To test the hypothesis, gather more data about
    the soil.
  • Collect a sample of soil for further examination
    instead of the entire population.
  • Observation each examined location Sample size
    number of observations selected.

5
Features about spatial data(1)
  • A region can be partitioned in many ways based
    on the given criteria. USA States boundaries,
    census geography. Modifiable Area Unit Problem
    (MAUP) include
  • Scale effect Analyze data at multiple levels of
    spatial resolution results in inconsistency.
  • Zoning effect Analyze data derived from
    different zonal systems with similar number of
    areal units results in inconsistency.

6
Features about spatial data(2)
  • Spatial autocorrelation represents the nature of
    geography and, consequently, will almost always
    be present in spatial data.
  • Tober First Law of Geography
  • All things related to each other, but closer
    things are related more.
  • Butterfly Effect Butterfly flapping in China
    may cause a hurricane landfall in the US due to
    spatial propagation of air disturbances.

7
Distribution descriptorsone variable
8
Measure of central tendency
  • Mode The value that occurs most frequently in a
    set of data or called the modal value. If two or
    more categories have the highest frequency, then
    data is bimodal or multimodal.
  • Median The middle value after all values are
    sorted in ascending or descending order.
  • Mean or Average n observation, each with an
    observed value xi then the simple arithmetic
    mean is defined as

9
Measure of central tendency
  • Grouped or weighted mean if data values are
    grouped into classes, then all data within each
    group are represented by on value as the overall
    value in that class. A mean derived from the
    grouped data is called a grouped mean or a
    weighted mean.
  • If xi is the midpoint of the i th class (k
    classes together) with fi as the number of data
    values in that class (frequency), the weighted
    mean

10
Measures of dispersion (1)
  • While mean is a good measure of the central
    tendency of a set of data, it captures no
    information about how the values are concentrated
    or scattered around the mean.
  • Range, Minimum, Maximum, and Percentiles
  • Range Maximum-Minumum
  • Percentiles are the corresponding data values
    that have certain percentages of the data smaller
    than these values. Data Xa and Xb have the same
    median 7, different 25th (3 for Xa and -5 for Xb
    ) Xa 1 3 5 7 9 11 13
  • Xb -11
    -5 1 7 13 19 25

11
Measures of dispersion (2)
  • Mean Deviation unlike the dispersion measures
    discussed so far using one or a few data values
    in the series, the mean deviation takes into
    account all data values. It is calculated by
    summing all the differences that individual data
    values have from the mean and then dividing this
    sum by the number of observation.

12
Measures of dispersion (3)
  • Variance and Standard Deviation Another way to
    avoid the offsets caused by adding positive and
    negative deviations from the mean together is to
    square all deviations from the mean before
    summing them.

13
Measures of dispersion (4)
  • Weighted Variance and Weighted Standard
    Deviation.
  • fi is the frequency for the i th group or class,
  • xi is the midpoint value in the i th group,
  • is the weighted mean, and
  • k is the number of groups.

14
  • Relationship Descriptors
  • Two Variables

15
One Variables
  • The mean and its variations address the issue of
    location, where the observations distribute along
    the continuous value line. Median and mode
    consider this central tendency issue. Variance,
    standard deviation, and percentiles address the
    issue of dispersion. Skewness deals with
    direction clustering. Kurtosis addresses the
    issue of concentration. All these measures focus
    on the distribution of the values using one
    variable at a time.

16
Relationship Descriptors
  • Mean, standard variable cannot measure the
    relationships between different distributions
    quantitatively.
  • One of statistics is based on the concept
    correlation measures statistically the direction
    and strength of the relationship between two sets
    of data or two variables for a number of
    observation. Regression measures the dependence
    of one variable on another.

17
Correlation Analysis (1)
  • Education is traditionally regarded as an asset.
    It enriches a persons life in many ways. We
    usually believe that education and income are
    somewhat related and change in the same
    direction. If we recognize the value of education
    in eventually achieving a higher income, it would
    be nice to know how strong this relationship is,
    that is, how these aspects of life are related or
    correlated.

18
Correlation Analysis (2)
  • Each relationship has two important aspects the
    direction and strength of the relationship.
    Between two related variable, the relationship is
    typically measured as correlation a statistical
    measure indicating how values in one variable are
    related to values in the other variable.
  • Positive or direct correlation
  • Negative or inverse correlation

19
Trend Analysis
  • Trend analysis is a technique measuring the
    trend, while correlation is a statistical measure
    of two variables.
  • Trend analysis addresses the dependence of one
    variable on another.
  • Going beyond the strength and direction of the
    relationship, trend analysis allow us to model
    the relationship and to estimate likely value of
    one variable based on the value of another
    variable.
  • Models that are constructed with this technique
    are known as regression models.

20
Simple Linear Regression Model
  • Simple linear regression model or bivariate
    regression model Using a straight line to model
    the relationship between tow variables. Here are
    an example. A regression between median household
    income and median house value for 51 states.

21
Regression model
  • Some phenomena may be modeled by the regression
    reasonable well, and others may not.
  • Regression model assumes a linear relationship
    between the variable. If the relationship is not
    linear or if the two variables have weak or no
    relationship, then the model will perform poorly.
  • A multivariate regression model, which can
    accommodate multiple independent variables. Under
    either circumstance, we may have committed a
    model specification error.

22
Point Pattern descriptors and analyzers
23
Point Pattern
  • Point Pattern Descriptors
  • Central Tendency
  • Dispersion and Orientation
  • Point Pattern Analyzers
  • Quadrant Analysis
  • Nearest-Neighbor Analysis
  • Spatial Autocorrelation of Points
  • K-Function

24
The Nature of Point Features
  • Point pattern descriptors cover
  • The methods for determining the overall patterns
    of a given set of points.
  • Measures used to describe the magnitude of
    spatial dispersion of a given set of points.
  • How the direction bias of a set of points can be
    extracted statistically.

25
Central Tendency of Point Distributions
  • A set of point descriptors provide certain
    descriptive information on the distribution of a
    set of points.
  • Central tendency information, mean centers,
    weighted mean centers, and median centers provide
    a good summary of how a set of points distributes
    in the geographic space.
  • To describe the spatial dispersion
    characteristics of a set of points, the measures
    of standard distance and standard ellipse will be
    discussed. These measures indicate the spatial
    variation and orientation of a point distribution.

26
Mean Center
  • The mean center, or spatial mean, is a central
    or average location of a set of points. For n
    points xmc and ymc are the coordinates of the
    mean center, xi and yi are the coordinates of
    point i, and n is the number of points.

27
Weighted Mean Center
  • The weighted mean center of a distribution of
    points can be found by multiplying the x- and y-
    coordinates of each point by the weight assigned
    to each observation or location.
  • wi is the weight at point i

28
Dispersion and Orientation of Point Distributions
  • Two sets of points may occupy the same geographic
    space and may be interrelated.
  • For example, one set of points represents the
    location of forest fires and the other the
    locations of camping cabins in a wildlife region.
    They may have the same overall locations, but
    forest fire have a more dispersed spatial pattern
    than cabins.
  • In additional to spatial central tendency, it may
    be interesting to evaluate the magnitude of
    dispersion of locations and the orientation of
    the spatial distribution.

29
Standard Distance
  • Similar to those in classical statistics, the
    population standard deviation, ,or the
    sample standard deviation, S, can be computed as

30
Weighted Standard Distance
  • Points in a distribution may have different
    attribute values that reflect the relative
    importance of different point observation.
  • Wi is the weight for point i, and
  • (xwmc, ywmc) is the weighted spatial mean.

31
Standard Deviational Ellipses
  • The standard distance circle is a very effective
    visualization tool to show the spatial spread of
    a set of point location.
  • A logical extension of the standard distance
    circle is the standard deviational ellipse. It
    can capture the directional bias in a point
    distribution. Three components are needed to
    describe it
  • An angle of rotation
  • Deviation along the major axis
  • Deviation along the minor axis

32
Elements defining a standard deviational ellipse
33
Standard deviational ellipses for men-only and
women-only shelters
34
Point Pattern Analyzers
  • To fully understand the various states and
    dynamics of a particular geographic phenomenon,
    an analyst must be able to detect spatial
    patterns from the point distributions and to
    track the changes in point patterns at different
    time.

35
Point Pattern Analyzers
  • Quadrant Analysis allows analysts to determine if
    a point distribution is similar to a random
    pattern using a spatial sampling framework.
  • Nearest Neighbor Analysis compares the average
    distance between nearest neighbors in a set of
    points to that of a theoretical pattern.
  • Spatial autocorrelation coefficients measure how
    similar neighboring points are.
  • K-function analysis can identify and evaluate the
    clustering of points at different spatial scales,
    or extents.

36
Quadrant Analysis
  • Quadrant Analysis evaluates a point distribution
    by examining how its density changes over space.
  • The density measured by Quadrant Analysis is then
    compared with the density of a theoretically
    constructed random pattern to see if the point
    distribution in question is more clustered or
    more dispersed than the random pattern.

37
General Concept in Quadrant Analysis (1)
  • A regular square grid and a number of points
    falling in some squares.
  • The square are referred to as quadrants, which
    are essentially sampling units in spatial
    statistical jargon.
  • Circle is the most geometrically compact shape,
    however circles cannot cover the entire
    geographic space unless they overlap.
  • In an extremely clustered point pattern, all or
    most of the points fall inside one or a few
    squares only. In an extremely dispersed pattern
    referred to as a uniform pattern or a triangular
    lattice, all squares contain similar number of
    points.

38
Observed pattern of Ohio cities and hypothetical
clustering and dispersed pattern
39
General Concept in Quadrant Analysis (2)
  • Statistically, Quadrant Analysis will achieve a
    fair evaluation of the density across the study
    area if it applies a large enough number of
    randomly generated quadrants.
  • An optimal size of quadrant can be calculated by
    2A/r . A is the area of study area, and r is the
    number of points in the distribution.
  • Once the quadrant size for a point distribution
    is determined, Quadrant Analysis can proceed to
    establish the frequency distribution of the
    number of points for all quadrant.

40
Examples of systematic and random quadrants
41
Comparing Observed and Expected Patterns
  • Besides using K-S statistics to test if the
    observed pattern is different from a random
    pattern, one may perform the Variance-Mean Ratio
    Test by taking advantage of a specific
    statistical property of the Position
    distribution.

42
Ordered Neighbor Analysis
  • Quadrant Analysis is useful in comparing an
    observed point pattern to a random or
    theoretically known distribution. However, it has
    certain limitations.
  • The analysis captures information on the points
    within each quadrant, but no information on
    points between quadrants is used in the analysis.
    As a result, Quadrant Analysis may be
    insufficient to distinguish between certain point
    pattern in the following figures.

43
Spatial Configurations
  • Visually, the two patterns are different. Using
    Quadrat Analysis, however, the two patterns yield
    the same result.

44
Nearest Neighbor Statistic
  • Nearest Neighbor Statistic is derived from the
    average distance between points and each of their
    nearest neighbors.
  • The second-ordered neighbor statistic uses the
    distance of the second nearest neighbors.
    Higher-ordered neighbors can be defined in
    similar ways.
  • Ordered Statistics can evaluate the pattern at
    different spatial scales.

45
Quadrant Analysis and Nearest Neighbor Analysis
  • While both Quadrant Analysis and Nearest Neighbor
    Analysis test point distribution, they utilize
    different spatial concepts.
  • Quadrant Analysis tests a point distribution with
    the points per area concept using quadrants as
    sampling units.
  • Nearest Neighbor Analysis uses the concept of
    area per point.
  • Both methods are similar in sense that the
    observed pattern is compared with some know
    distribution (random pattern).

46
Nearest Neighbor statistics
  • How Nearest Neighbor Analysis works.
  • In a homogeneous region, the most uniform pattern
    formed by a set of points occurs when this region
    is partitioned into a set of identical hexagons
    with a point at its center. The distance between
    points will be
  • , where A is the area of the region and n is the
    number of points.

47
R statistic or R scale
  • R statistic is the ratio of the observed average
    distance between nearest neighbors of a point
    distribution and the expected average nearest
    neighbor distance. It is also the nearest
    neighbor statistic.
  • robs is the observed average distance between
    nearest neighbors and rexp is the expected
    average distance between nearest neighbors as
    determined by the theoretical pattern.

48
Calculation of the observed nearest neighbor
distance
  • d1d13 d2d23 d3d32 d4d43
  • (For point 1, the nearest neighbor is 3)

49
Cities in Ohio
  • By selecting the seven largest cities in Ohio,
    we can compute their nearest neighbor distance
    and the observed average nearest neighbor
    distance robs 51.82miles.

50
Higher-order neighbor statistics
  • Nearest Neighbor Analysis has been extended to
    accommodate the second, third, and other
    higher-order neighbor definitions. When two
    points are not immediate nearest neighbors but
    rather the second nearest neighbors, the way
    distances are computed between them will need to
    be adjusted accordingly.

51
Second-order nearest neighbor distance
  • The second-order nearest neighbor statistic R2 is
    robs/rexp .
  • di is the distance between i and its second
    nearest neighbor.
  • The expected nearest neighbor distance in the
    denominator of the R2 statistic is similar to the
    first-order expected distance, the constant
    change from 0.5 to 0.75.

52
Observed and expected high-order nearest neighbor
distance
  • Standard error estimate for second-order nearest
    neighbor distance
  • Generally, for k-order neighbor statistic,
  • are the constants for
    expected distance and standard error,
    respectively.

53
K-Function Analysis Steps (1)
  • Another statistic that can offer some insights
    and is more parsimonious to evaluate if the
    magnitude of clustering is uniform over different
    spatial scales is K-function analysis. It is an
    extension of the ordered neighbor statistics. For
    a set of point in a region, the K-function
    analysis involves following steps
  • Select a distance increment or spatial lab, d,
    that is analogous to the unit reflecting the
    change in the spatial scale.
  • Set the iteration number g1 to begin the
    process.

54
K-Function Analysis Steps (2)
  • Around each point i in a region, create a
    circular buffer with a radius of h, where hdg.
    Therefore, the buffer will have a size d in the
    first iteration and 2d in the second and so on.
  • For each point, count the number of points
    falling within its buffer of size h and denote
    that count as n(h).
  • Increase the radius of the buffer by d.
  • Repeat steps 3, 4, and 5 by increasing h until
    gr or gD/d.

55
Estimation of the K-function
  • Figure in next slide uses only four points to
    illustrate the procedure.
  • Only three rings or buffers were created instead
    of the full range up to D. For a give h, we count
    the number of points within the buffers centered
    at all points. Point A is rather dispersed from
    other points, and therefore the counts are
    relatively low for buffers with small h. For
    point B, the point is in the middle of the
    cluster, and therefore the point count are
    relatively high with the small buffers, but the
    increases in point counts are substantial with
    large hs. For Point C and D, the points
    themselves are apart from the cluster.

56
Estimation of the K-function
57
Relationship between point counts and the spatial
lag h
  • The relationship between point counts and the
    spatial lag from empirical observation can be
    compared with a known patter, most likely a
    random pattern.
  • In a random pattern, point counts increase with
    increasing h but in no particular pattern.
  • K-function detect clustering at different scales
    by comparing the relationship between point
    counts and the size of h to that in a random
    distribution.

58
Computation of K-Function
  • The number of points within the buffer with a lag
    h, as follows
  • i and j are the indices of points.
  • dij is the distance between the two points i, j.
  • Ih is an indicator function such that Ih1 if
    dijlth and Ih0 otherwise

59
Boundary Problems in K-Function
  • Sharing similar problems with other spatial
    statistical and analytical techniques, the
    K-function is also subject to the boundary
    problems.
  • Image that a point is located rather close to the
    edge of the study region. When buffers are formed
    around the point, a significant proportion of
    buffers will be outside of the study area and
    thus will distort the probability of finding a
    point within the vicinity of h.

60
Spatial Autocorrelation of Points
  • Spatial autocorrelation coefficients measure and
    test how clustered/dispersed the point locations
    are with respect to their attribute values.
  • Spatial autocorrelation of a set of points refers
    to the degree of similarity between points or
    events occurring at these points and points or
    evens in nearby locations.
  • With the spatial autocorrelation coefficient, we
    can measure
  • The proximity of location
  • The similarity of the characteristics of these
    locations.

61
Measures for Spatial Autocorrelation
  • Two popular indices for measuring spatial
    autocorrelation applicable to a point
    distribution Gearys Ratio and Morans I Index.
  • sij representing the similarity of point i s and
    point j s attributes.
  • wij representing the proximity of point i s and
    point j s locations, wii0 for all points.
  • xi representing the value of the attribute of
    interest for point i .
  • n representing the total number of points.

62
SAC (1)
  • The spatial autocorrelation coefficient (SAC) is
    proportional to the weighted similarity of the
    point attribute values.

63
SAC (2)
  • The spatial weights in the computations of the
    spatial autocorrelation coefficient may take on a
    form other than a distance-based format. For
    example
  • wij can take a binary form of 1 or 0, depending
    on whether point i and point j are spatially
    adjacent.
  • If tow regions share a common boundary, the two
    centroids of these regions can be defined as
    spatially adjacent wij 1 otherwise wij 0.

64
Gearys Ratio
  • In Gearys Ratio, the similarity attribute
    values between two points is defined
  • The computation of Gearys Ratio

65
Morans I Index
  • In Morans I Index, the similarity attribute
    values between two points is defined
  • The computation of Morans I Index

66
Gearys Ratio vs. Morans I Index
Numerical scales of Gearys Ratio and Morans I Numerical scales of Gearys Ratio and Morans I Numerical scales of Gearys Ratio and Morans I
Spatial Patterns Gearys C Morans I
Clustered pattern in which adjacent or nearby points show similar characteristics 0ltClt1 I gt E(I)
Random pattern in which points do not show particular patterns of similarity C 1 I E(I)
Dispersed pattern in which adjacent or nearby points show different characteristics 1ltClt2 I lt E(I)
E(I) (-1)/(n-1), which n denoting the number of points in distribution E(I) (-1)/(n-1), which n denoting the number of points in distribution E(I) (-1)/(n-1), which n denoting the number of points in distribution
67
Scales of Gearys Ratio and Morans I Index
  • The indexs scale for Gearys Ratio does not
    correspond to our conventional impression of the
    correlation coefficient of the (-1, 1) scale,
    while the scale of Morans I resembles more
    closely the scale conventional correlation
    measure
  • The value for no spatial autocorrelation is not
    zero but -1/n-1
  • The values of Morans I Index in some empirical
    studies are not bounded by (-1,1), especially the
    upper bound of 1.

68
Conclusions
  • Distribution Descriptors using single variable
    and Relationship Descriptors using two (or more)
    variables are typical statistical tools.
  • Point Pattern Descriptors and Point Pattern
    Analyzers can be used to study more deep patterns
    of the data, in combination with various
    representations (spatial, grid, k-mean, ellipse
    etc)
  • Autocorrelation analysis is sued to understand
    further data relationship in respect to distance
    between spatial locations
Write a Comment
User Comments (0)
About PowerShow.com