Title: SPATIAL MODELS
1SPATIAL MODELS FOR DATA REPORTED AS COUNTS
OVER GEOGRAPHIC AREAS
Gary Simon, 28 APRIL 2006
2With special thanks Frank LoPresti, Academic
Computing Services, GIS Group Kevin Tun,
Stern I.T. Group
3Heres an interesting obscure formula. Consider
a set of points Point 1 (x1 , y1) Point
2 (x2 , y2) . Point n (xn , yn)
4Connect the points in order. Draw a line from
point 1 to point 2, then from point 2 to point 3,
., from point n-1 to point n. Finally draw a
line from point n back to point 1. Assume that
none of the segments cross, so that this is a
polygon.
5The area of the resulting polygon is given by
The occurs when the perimeter is drawn
counter-clockwise, the when drawn clockwise.
6(No Transcript)
7The data K regions Counts zl , z2 , ,
zK Total count z Populations P1, P2 , ,
PK Total population P
8The obvious null hypothesis of uniformity is
tested by G2
9Uniformity is often rejected. What should be the
alternative to uniformity?
Techniques like kriging assess covariance
structure and not the structure of the expected
counts.
10There are also techniques that measure spatial
association (Cliff and Ord, 1973, 1981) with I
and with c, and these also relate to covariance
notions. Cliff, A.D. and Ord, J.K. (1981)
Spatial Autocorrelation, London Pion. Cliff,
A.D. and Ord, J.K. (1981) Spatial Processes
Models and Applications, London Pion.
Spatial association can also be given angular
interpretations (Simon, 1997). Simon, Gary
(1997) An Angular Version of Spatial
Correlations, with Exact Significance
Tests, Geographical Analysis, vol 29, 3, pp
267-278.
11Lets form a model for the spatial force and
give this model a central location or hot spot.
Note this location as s .
Here sx and sy are parameters to be estimated.
12Let f(z) be the spatial force at location z
.
Then let f(z)
13Since f(z) , f(s) c
. At any z with a , f(z)
. Thus a is a half-strength distance.
14In this form, the only role of c is to assure the
condition
15This can be generalized to mix uniform and
hot-spot features. f(z)
The parameter ? assesses the strength of the
hot-spot relative to uniformity.
Negative ? notes a protective effect.
16The maximum likelihood expected counts ek
will be used in the test statistic
G2
17The value of ek will be computed as Pk
average force on county k scaled so that
18Consider cancer rates in Florida.
Age-Adjusted Death
Rates for Florida, 1998 2002.
http//www.stateofflorida.com
19Florida has 67 counties. There were 38,814 cases
in a population of 15,982,378. The rate is 2.43
per 1,000. The G2 statistic is 2,816.27 on 66
degrees of freedom. The cancer rates are not
uniform.
20The maximum likelihood fit occurred at parameter
values sx 375.8877 sy 300.6793 a
13.4375 ? 2.325
21This fit has G2 2,246.93 on 67 - 4 62
degrees of freedom. This is still an inadequate
fit, but the reduction in G2 is 569.34 with four
degrees of freedom.
22The fitted values are these
The hot spot is at (82.56 w long, 28.80 n lat),
in Citrus County.
23Map information comes in (longitude, latitude)
form that needs to be converted to (x, y) form in
(say) miles.
24Each degree of latitude has the same mile
equivalent.
North Pole
One degree of latitude cuts off same arc
length at all latitudes.
Equatorial plane
25However, a degree of longitude represents a small
distance near the poles and a large distance near
the equator.
30 N Latitude
Equator
26Problem Find the length of one degree of
longitude at latitude ?.
Solution Form a triangle with one corner at the
north pole, an angle of one degree at the north
pole, and with sides 90-?.
2730 N Latitude
Equator
In a spherical triangle, the sides also have
angle measure.
28We can use the law of sines for spherical
triangles
A, B, C are the angles and a, b, c are the sides.
29The computation of E(zk) ek is found as Pk
average force on county k. This average force
could be f(ck), where ck is the center of the
county.
30 Instead we will use where ? denotes the
county and h is the two-dimensional variable of
integration.
31The value of can be
obtained from outside sources.
The challenge comes in finding
This can be difficult even for simple figures ?
is not simple.
32Finding requires some
organized description of ??, the boundary of ?.
Fortunately, such descriptions are available from
mapping programs.
33Consider this geographical region
34Mapping program MapInfo will export an MIF file
giving coordinates of (latitude, longitude)
points on the boundary.
The file has layout 26 -75 40.1288 -75.0154
40.1378 -75.1094 40.0454 . . . -75
40.0294 -74.9755 40.0485 -74.9893 40.1259 -75
40.1288
35A graph of these points
36With the boundary so identified, county ? is a
polygon, so the task of finding
is equivalent to integrating over that polygon.
The mathematics can be done with Greens theorem.
37Greens theorem for connected region ? and for
scalar functions P and Q of two variables is
38The boundary ?? needs to be parameterized as a
function of a single variable, say t. This is
possible when the boundary is made up of simple
curves or, as in the MapInfo story, straight
lines.
39The line connecting
to is parameterized as
Note that dy means .
40In the statement of Greens theorem,
lets use and
so that
41Greens theorem is now
Area(?)
42This solves as P(x, y) 0 and Q(x,
y) x and then Area(?)
43With the boundary ?? given as a polygon, the
calculation is routine. The consequence
is Area(?) where m? is the number of boundary
points of region ?.
44This calculation finds the area of region ? and,
as a side benefit, discovers whether the point
ordering was clockwise or counter-clockwise.
45We need also the integrated force function
46Match to Greens theorem
with P(x, y) 0 and
47This means that we need to be able to find Q(x,
y) The solution is Q(x, y)
48Then
49(No Transcript)
50Let , , , be
the boundary points of ?. Then
Segment k connects point k to point k 1.
(Last segment goes back to point 1.)
51Each segment is parameterized by t, with 0 ? t ?
1. The integral can be found by any reasonable
approximation method. If the interval is short,
use the average of the integrand at the endpoints.
52In particular, ?
53The summation over k collapses to the very simple
form
The counter k is to be interpreted mod(m?).
54For any values of sx, sy, a, ? it is possible to
express a Poisson likelihood and thus to get
maximum likelihood estimates. This is not easy
computation.
55Trevelyan, Smallman-Raynor, and Cliff provided a
spatial analysis of the 1916 polio epidemic that
hit the northeastern United States. Trevelyan,
Barry, Smallman-Raynor, Matthew, and Cliff,
Andrew D. (2005) The Spatial Structure of
Epidemic Emergence Geographical Aspects of
Poliomyelitis in North-eastern USA, July-October
1916, Journal of the Royal Statistical Society,
Series A, vol 168, part 4, pp 701-722.
56Their region of inquiry County-based data
for 148 counties.
57These counties had total population 20,532,602
and 20,777 cases of polio. This is about 1.01
cases per thousand people.
58Observed polio rates
59Test for uniformity gives G2 16,713.64 147
degrees of freedom
60Maximum likelihood estimates sx 450.78 sy
135.77 a 56.80 ? 15.66 The
center is offshore, east of Ocean County, New
Jersey.
61The display of fitted rates
62Fit measure is G2 7,045.73 143 degrees of
freedom Reduction in G2 is 9,667.91, for four
degrees of freedom.
63Next step Use the integrated force function as
a carrier in a Poisson regression.
64The End