Spatial Correspondence of Areal Distributions - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Spatial Correspondence of Areal Distributions

Description:

Table 2. Cities Falling Inside a County Won by Either Bush or Gore ... ZGore/Gore 15.47; ZBush/Bush 8.75. Overlay Analysis ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 31
Provided by: lem76
Category:

less

Transcript and Presenter's Notes

Title: Spatial Correspondence of Areal Distributions


1
Spatial Correspondence of Areal Distributions
  • Quadrat and nearest-neighbor analysis deal with a
    single distribution of points
  • Often, we want to measure the distribution of two
    or more variables
  • The coefficient of Areal correspondence and
    chi-square statistics perform these tasks

2
Coefficient of Areal Correspondence
  • Simple measure of the extent to which two
    distributions correspond to one another
  • Compare wheat farming to areas of minimal
    rainfall
  • Based on the approach of overlay analysis

3
Overlay Analysis
  • Two distributions of interest are mapped at the
    same scale and the outline of one is overlaid
    with the other

4
Coefficient of Areal Correspondence
  • CAC is the ratio between the area of the region
    where the two distributions overlap and the total
    area of the regions covered by the individual
    distributions of the entire region

5
(No Transcript)
6
Result of CAC
  • Where there is no correspondence, CAC is equal to
    0
  • Where there is total correspondence, CAC is equal
    to 1
  • CAC provides a simple measure of the extent of
    spatial association between two distributions,
    but it cannot provide any information about the
    statistical significance of the relationship

7
Resemblance Matrix
  • Proposed by Court (1970)
  • Advantages over CAC
  • Limits are 1 to 1 with a perfect negative
    correspondence given a value of 1
  • Sampling distribution is roughly normal, so you
    can test for statistical significance

8
Chi-Square Statistic
  • Measures the strength of association between two
    distributions
  • Class Example
  • Relationship between wheat yield and
    precipitation
  • Two maps showing high and low yields and high and
    low precipitation

9
HIGH PRECIP
HIGH YIELD
10
High Precip.
High Yield
11
Chi-Square
  • By combining distribution on one map we can
    better understand the relationship between the
    two distributions
  • In this example we are using a grid
  • The finer the grid, the more precise the
    measurement
  • Four possibilities exist
  • Low rainfall, low yield
  • Low rainfall, high yield
  • High rainfall, low yield
  • High rainfall, high yield

12
Chi-Square
  • Record the total number of occurrences into a
    table of observed frequencies

WHEAT
High Low
High Low
PRECIP.
13
Chi-Square
  • Create a table of expected frequencies using
    probability statistics ( High rain of high
    yield cells)
  • Row total column total / table total

WHEAT
WHEAT
High Low
High Low
High Low
High Low
PRECIP.
PRECIP.
14
Compute Chi-Square
  • Therefore, in our example we have

High Low
High Low
High Low
High Low
Observed
Expected
High/High
High/Low
Low/Low
Low/Low
15
Interpreting Chi Square
  • Zero indicates no relationship
  • Large numbers indicate stronger relationship
  • Or, a table of significance can be consulted to
    determine if the specific value is statistically
    significant
  • The fact that we have shown that there is a
    correlation between variables does NOT mean that
    we have found out anything about WHY this is so. 
    In our analysis we might state our assumptions as
    to why this is so, but we would need to perform
    other analyses to show causation.

16
If you dont have Chi-Square values
  • Yules Q
  • Value of Yules Q always lies between 1 and 1
  • Value of 0 indicates no relationship
  • Value of 1 indicates a positive relationship
  • Value of 1 indicates a negative relationship

17
Analysis of Election 2000
  • Polygon to Polygon
  • Point to Polygon

18
Assessing Our Cultural Divide Results from the
2000 Presidential Election Arthur J. Lembo, Jr.
Ph.D. Cornell University Paul Overberg USAToday
ANALYSIS OF SPATIAL AUTOCORRELATION JOIN COUNT
ANALYSIS
ANALYSIS OF SPATIAL CORRESPONDENCE OVERLAY
ANALYSIS
A second analysis was used to determine the
likelihood of a county with urban areas voting
for either candidate. For this study, four
categories were evaluated counties with small
cities (under 50,000), medium sized cities
(50,000 75,000), large sized cities (greater
than 75,000), and no cities. Based on the
percentage of counties won by each candidate
(Gore 22 Bush 78) we computed the random
probability that a city would fall within a Bush
county or a Gore county. This probability
allowed us to determine the expected number of
cities that would be located within Gore counties
or Bush counties. The actual number of cities
located in a Gore county or Bush county was
determined using overlay analysis with ArcView.
Similar to the previous example, z-scores were
computed for each of the categories as
follows where O is the observed number of
cities falling within a county, E is the expected
number of cities falling within a county, p is
the probability of a city falling in a Bush
county, q is the probability of a city falling in
a Gore County, and n are the total number of
cities. Table 2. Cities Falling Inside a
County Won by Either Bush or Gore
Expected Expected Observed Observed
Z Z Gore
Bush Gore Bush Gore
Bush Large ( 75K) 66 238
184 119 267 272 Medium
(50-75K) 54 196
147 98 470 55 Small
(2030 1236 4,998 3 No
City 427 1588 347
1690 18 29
As previously stated, a purely random sample
drawn from a population whose true mean is 0 at
the 95 confidence level would fall within a
z-score range of /- 1.96 in magnitude. Table 2
indicates that each of the z-score values exceed
1.96. Implied from this is that significant
correlation among votes for Al Gore and counties
with cities, and votes for George W. Bush and
counties without cities (rural areas) exists.
Join Count Analysis is a method of spatial
autocorrelation that evaluates the statistical
significance of clustering among neighboring
polygons. Based upon the total number of
counties won by each candidate (Gore 588 Bush
2214), the expected number of adjacent counties
that voted for the same candidate (i.e. two
adjacent counties voting for Bush) was computed .
In addition, the actual number of adjacent
counties that voted for the same candidate was
also computed using spatial analysis techniques
in ArcView GIS. The results were as
follows Table 1. Expected vs. Actual Joins of
Adjacent Counties Voting for the Same Candidate
Expected Actual Expected
Expected Actual Gore/Gore Joins
Gore/Gore Joins Bush/Bush Joins
Bush/Bush Joins 438
879 5516 6253 Assuming
an independent random process, we computed the
z-score, or number of standard deviations away
from the mean for each candidates specified
number of joins (ZGore/Gore 15.47 ZBush/Bush
8.75). A purely random sample drawn from a
population whose true mean is 0 at the 95
confidence level would fall within a z-score
range of /- 1.96 in magnitude. Both numbers
were significantly higher than 1.96, indicating
significant positive spatial autocorrelation.
Therefore, the join count analysis showed that
clustering exists within the county voting
patterns. Inferred from this analysis is the
observation that regionalized voting patterns
existed in the 2000 Presidential Election.
ABSTRACT Although the 2000 Presidential election
was one of the closest in recent history, many
commentators noted that the voting patterns
appeared to exhibit a cultural divide, with
urban areas voting for Al Gore, and rural areas
voting for George W. Bush. Because most of the
comments are based on a subjective view of the
county voting patterns, this project attempts to
provide a quantifiable measure of the voting
patterns exhibited during the 2000 election.
Specifically, we were interested in determining
if a statistically significant clustering pattern
existed based on county-wide results, and if each
candidate won their assumed cultural association
(Gore Urban Bush rural). To test these
hypotheses, two separate spatial analysis methods
were performed on county-wide voting patterns
within the United States. The first method
utilized a principle of spatial autocorrelation
called join count analysis to determine if voting
patterns exhibited evidence of spatial
clustering. The second method used map overlay
to determine the likelihood of cities falling
within either Bush or Gore counties.
Conclusion This analysis provided quantifiable
evidence that positive spatial autocorrelation
(clustering) of voting patterns existed during
the 2000 Presidential Election. Also, the
analysis showed a high statistical correlation
between urbanized areas and county votes for Al
Gore. Further analysis is necessary to better
understand causation (i.e. ethnicity, income,
age), however both analyses indicate that
geographic regions (i.e. urban areas) may have
played a large role in the vote determination for
Election 2000. Data Provided Courtesy of
Election Data Services, and USAToday
Figure 1. Examples of Cities in Relation to the
Distribution of Counties. These examples from
New York and Minnesota show that although Bush
(in red) won a majority of the counties, the
cities appear clustered primarily within the few
counties in which Gore won (in blue). For
example, in Minnesota, a majority of the cities
exist within Hennepin County, while in New York,
virtually every county Gore won has a city within
its border.
19
Election 2000 Results
  • Join Count Analysis
  • Table 1. Expected vs. Actual Joins of Adjacent
    Counties Voting for the Same Candidate
  • Expected Actual Expected
    Expected Actual
  • Gore/Gore Joins Gore/Gore Joins
    Bush/Bush Joins Bush/Bush Joins
  • 438 879
    5516 6253
  • ZGore/Gore 15.47 ZBush/Bush 8.75
  • Overlay Analysis
  • Table 2. Cities Falling Inside a County Won by
    Either Bush or Gore
  • Expected Expected Observed
    Observed Z Z
  • Gore Bush
    Gore Bush Gore
    Bush
  • Large ( 75K) 66 238
    184 119 267
    272
  • Medium (50-75K) 54 196
    147 98 470 55
  • Small ( 2030 1236
    4,998 3
  • No City 427 1588
    347 1690 18
    29

Not mutually exclusive from large cities. We must
account for this
20
Election 2000 Results
  • There was obvious spatial autocorrelation in the
    way way people voted. That is, Bush counties and
    Gore counties were highly clustered
  • Also, there are a very high correlation between
    urbanized counties voting for Gore, and
    non-urbanized counties voting for Bush

21
Analysis of Environmental Justice
  • Point in Polygon Analysis
  • By
  • Greg Thorhaugcss620 project Spring 2001

22
(No Transcript)
23
(No Transcript)
24
Erie Chi-Squared
25
Summary
  • Spatial Data Analysis is possible, through basic
    statistical methods
  • More in-depth analysis is possible using spatial
    statistics
  • GIS software may be used to prepare data for
    statistical analysis
  • Spatial data analysis techniques provide a
    powerful tool for analyzing GIS data, and enable
    users to solve creative problems

26
Cross Tabulation
  • Assume we have a 9 cell land cover map, one from
    1980 and one from 2000 with three categories A,
    B, and C.
  • You can see that the resulting cross tabulation
    provides a pixel, by pixel comparison of the
    interpreted land cover types with the two dates.
    So, for the upper left hand cell, the 1980 land
    use was A, and the 2000 land cover also indicated
    the value of A. Therefore, this is a match
    between the 1980 data and 2000 data. However, in
    the lower right cell you can see that the 1980
    data indicated a value of C, while the 2000 value
    was B. This is not a match, and would indicate
    an error between the two sources.
  • We can now quantify the results into a matrix as
    shown below. This matrix, is oftentimes called a
    confusion matrix

Ground Reference Data
Interpreted Land Cover Data
Cross Tabulated Grid
A
B
A
B
B
A
BA
BB
AA
Cross Tabulate
B
C
C
B
B
C
BB
BC
CC
A
A
B
B
A
C
BA
AA
CB
A B C
A B C
2
0
2
0
2
1
0
1
1
27
Confusion Matrix
Ground Reference
  • The matrix on the right shows the comparison of
    the two hypothetical data sets. The 1980 data
    set and the 2000 data set .
  • As an example, geographic features that were
    classified as A on the map in 1980, and actually
    were still be A in 2000, represent the upper left
    hand matrix with the value 2 (there were two
    pixels that met this criteria). This means that
    2 units in the overall map that were A, actually
    is A. Similarly, the same exists for the
    classifications of B and C.
  • But, there may have been times where the 1980
    value was A and the 2000 value was B. In this
    case, the 2 represented in the top row of the
    matrix says that there are 2 units of something
    that was A in 1980, but is now B in 2000.
  • We can begin to add these number up, by adding an
    additional row and column. But what do these
    numbers tell us?

A B C
A B C
2
0
2
Map Classification
0
2
1
0
1
1
Ground Reference
A B C
A B C
2
0
2
4
Map Classification
0
2
1
3
0
1
1
2
2
5
2

28
Comparing the maps
  • The bottom row tells us that there were two cells
    that were A, five cells that were B, and two
    cells that were C. The rightmost column tells us
    that we mapped four cells as A, three cells as B,
    and 2 cells as C. Adding up the Diagonal cells
    says there were 5 cells where we actually got it
    right.
  • So, the overall map comparison is really a
    function of
  • Total cells on the diagonal / total number of
    cells.
  • (2 2 1) / (2 2 0 0 2 1 0 11)
    5/9 .55 agreement

Ground Reference
A B C
A B C
2
0
2
4
Map Classification
0
2
1
3
0
1
1
2
2
5
2

29
Other Accuracy Assessment
  • The total correspondence of our example is 55.
    But, that only tells us part of the story. What
    if we were really interested in classification B?
    Where there changes in classification B? Even
    here, there are two different ways of
    interpreting that question
  • If I were interested in mapping all the areas of
    B, how well did I get them all? This is called
    the map Producers Accuracy. That is, how well
    did we produce a map of classification B.
  • If I were to use the map to find B, how
    successful would I be? This is called the Map
    Users Accuracy. That is, much confidence should
    a user of the map have for a given
    classification.
  • To compute the map users accuracy, we would
    divide the total number correct within a row with
    the total number in the whole row. Staying with
    our example of classification B
  • We said that we had two cells where B was
    correct. However, we actually said that there
    were three cells that contained B (in other
    words, we incorrectly called a cell B, when it
    should have been C). Therefore, we have
  • 2 correct B values / 3 total values .66 users
    accuracy.
  • This means that if we were to use this map and
    look for the classification of B, we would be
    correct 66 of the time.
  • To compute the map producers accuracy, we would
    divide the total number of correct within a
    column with the total number in the whole column.
    Staying with our example of classification B
  • We said that we had two cells where B was
    correct. However, we actually said that there
    were five cells that should have been B.
    Therefore, we have
  • 2 correct B values / 5 total values that should
    be B .4 producers accuracy
  • This means that the map produced only 40 of all
    the Bs that were out there.

Ground Reference
A B C
A B C
2
0
2
4
Map Classification
0
2
1
3
0
1
1
2
2
5
2

30
User and Producer Accuracy
Users Accuracy
Ground Reference
  • To test your understanding of all this, compute
    the users and producers accuracy for
    classifications A and C.
  • This also gives us some indication of the nature
    of the errors. For instance, it appears that we
    confused classification A with classification B
    (we said on two occasions that B was A). By
    understanding the nature of the errors, perhaps
    we can go back, look over our process and correct
    for that mistake.

A B C
A B C
2
0
2
4
Map Classification
0
2
1
3
.66
0
1
1
2

2
5
2
Producers Accuracy

.4
Write a Comment
User Comments (0)
About PowerShow.com