Title: Spatial Statistics : Relationships
1Lecture 3
- Spatial Statistics Relationships
2Relationship statements
- Male moderate drinkers are less likely to suffer
from insulin dependent diabetes than nondrinkers. - A better economy has more potential for people to
be employed. - The number of watches someone wears is directly
proportional to the number of arms they have. - Mountains cause rainfall. Smoking causes cancer.
- Girls are better than boys, so there.
- The number of teapots in China has no effect on
the frequency of volcanic eruptions in Italy.
3This lecture
- Correlation how much do two variables vary
together? - Regression what is their relationship?
- Spatial autocorrelation and crosscorrelation
- Semi-variograms
- Geographically Weighted Regression
4Correlation
- As one variable changes, how closely do others
follow it? - Usually represented on graphs.
- Can plot unrelated pairs of data from datasets of
different sizes (q-q plots) - rank both datasets and plot 10 value against 10
value, 90 value against 90 value. - Or plot linked pairs of data in scatterplots.
- Correlation can be positive or negative.
5Positive correlations
- Attractiveness of chosen gender vs. alcohol
intake. - Bus trip time from Headingley vs. importance of
travel reason. - Ashs cumulative Pokémon losses vs. matches
played.
6Negative correlations
- Ability to perform with chosen gender vs. alcohol
intake. - Money vs. clubs visited.
- Will to live vs. time in statistics lectures.
7Correlation is one of the most useful and used
statistical techniques.
- Correlation is an essential part of science.
- Correlation is an essential part of politics.
8Examples
9Correlation is one of the most abused statistical
techniques.
- Correlation is an essential part of dodgy
science. - Correlation is an essential part of political
misinformation. - Data can be selectively correlated.
- There is no cause and effect link just because
two variables correlate.
10Examples
11What can we do about this?
- Tricky, but one start is to build convincing
cause-and-effect models that demonstrate the same
behavior. - This gives us something concrete to investigate.
- But we then have to test our predictions.
12Correlation can be strong or weak.
13How do we measure correlation?
- We use Correlation coefficients.
- These are usually denoted as r and vary between
1 and 1. - -1 very strong negative correlation.
- 0 no correlation.
- 1 very strong positive correlation.
14Which one depends on the type of data
- Â Parametric tests used for data that is
- Interval or ratio.
- Normally distributed.
- Sample populations have the same standard
deviations. - Non-parametric tests used for all other data,
including - Ranked data.
- Categorized data.
15One Parametric test Pearsons Correlation
Coefficient.
- Idea is to calculate the average covariance how
much one variable varies as the other varies. - Deviation value mean
- The product of the variables deviations gives a
measure of covariance. - (valueOne meanOne) x (valueTwo meanTwo)
- If both variables values are far from the mean
the product is large. If one deviation is large
and the other small, the number will be smaller.
16Pearsons Correlation Coefficient
- Pearsons correlation coefficient r is the sum
of these products normalised by the standard
deviations. - The simplest way of calculating this is
- r ((Sxy) / n) xmym
- sxsy
-
- where x and y are samples, xm and ym are
sample means, sx and sy are sample standard
deviations, and n is the sample sizes.
17One non-parametric test Spearman Rank
Correlation Coefficient.
- Given x and y sample pairs, we convert the xs
into their rank in all the xs, and the ys into
their rank in all the ys. - Spearmans coefficient is then calculated using
- rs 1 - 6Sd2
- n3 n
- where d is the difference between the ranks for
any given pair (a measure of the covariance).
18Testing the significance of the coefficients
- When the data can be assumed normal, we can test
the null hypothesis that there is no correlation
(r 0) using the following statistic -
- t r (n 2)0.5
- (1 r2)0.5
- which has a t distribution and n-2 degrees of
freedom.
19Problem correlations
Bizarre
Strong but non-linear
With many non-linear relationships we can
transform the data to a linear form. For example
exponential data can be made linear by taking the
natural log of the data.
Very bizarre
20Regression
- Quantifying the relationship between two or more
variables. - Linear regression with two variables.
We aim to produce a single line that quantifies
the relationship.
?y
Dependent variable (y)
?x
The equation for such a line is y a bx where
b is the slope (?y/?x on the figure). We can
use this line to predict new values given an
independent value.
a
Independent variable (x)
21Finding the regression line
- We take the line that minimizes some measure of
how well the line fits the data. - In the case of two variable linear regression, we
try to minimize the deviations between the data
and the line, or residuals.
The equation for such a line is given by b
S(x-x)(y-y) S(x-x)2 a y - bx
22How much the line explains the data
- The sum of the squared residuals gives us a
measure of how much of the data is not explained
by the line. - This value, divided by the total variation in the
data (the sum of the values squared) gives a
fraction of how badly the line matches. - One minus this gives how well it matches - the
coefficient of determination. - Conveniently, this value is the square of the
correlation coefficient r, and is also therefore
known as r2. - Thus, the significance test for the r value also
gives us the significance of our line.
23Multiple regression
- We can still do regression when there is more
that one independent variable. - For example, in the case of three variables (two
independent) we are looking for a solution sheet,
not a line
We can do the same thing with a computer for as
many dimensions as we like, but more than three
become hard to visualize as graphs. Were
essentially trying to fit a line with the
equation y a b1x1 b2x2 b3x3
y
x2
x1
24Polynomial regression
- In some cases we may want to fit a curve through
non-linear data in multi-variable space. - For one independent variable, the equation (which
a type known as a polynomial) for the line is - y a bx b2x2 b3x3
- Excel, for example, will fit this for you.
25Polynomial curve fitting
- The degree of the polynomial is the power of the
last term. Higher degree curves fit the sample
data better. - However, weve seen that a sample doesnt
necessarily have the same distribution as the
population. - Our curve should reflect a general population
model not the sample data, with all its
measurement and random errors. - When we look at AI techniques well see that
predictive models based on data can become less
accurate about the world as they increasingly
match our samples and not the population. - Therefore we have to make a judgement as to the
polynomial degree, and not necessarily pick the
highest.
26Summary
- Correlation measures covariance, but doesnt say
anything about causal relationships. - We can measure correlation and test its
significance. - We can quantify relationships using regression
equations and use these to predict.
27Spatial autocorrelation
- One of the major issues in dealing with
geographical data. - The idea that values at one point may be
correlated with values of the same variable
nearby (or cross-correlated with another variable
nearby). - Geodemographics people living near each other
may have the same interests because they have the
same opportunities and self-cluster. - Rainfall in one geographical area stops rainfall
in another. - Crime spots cause social decay which in turn
causes more crime in a limited geographical area.
- All graded or clustered information suffers from
this.
28Frog averaging
- Say we want to know what the average number of
frogs in the country is. - We take a sample of six points.
- Three of them are normal and fall across the
whole country but three fall in an small area
where theres a hidden pond. - Its like weve only really taken four samples.
29How does this effect significance testing /
correlation / regression?
- Essentially if our data is spatially correlated,
we arent sampling as randomly as we would like
in our attempts to get an overview of the
population. - i.e. some of our samples are the same (not
independent) / dont count. This is the
equivalent of taking a smaller sample. - In correlation, it is possible that all our
correlation is due to geography, and none to our
variables.
30How do we test for it?
- First, plot the data and look for geographical
trends. - Particularly plot the residuals of any
regressions.
For example, the plot to the left might represent
murder rate residuals in an area after
deprivation and policing levels are taken into
account. / regressed. Anyone want to guess where
Dr Lecter lives?
- Cluster analysis (two weeks) looks for these
kinds of trends.
31But what about if its more pervasive?
- How do we test if, for example, its a constant
relationship between neighbours? Statistics that
junk individuals together are useless. - Example ring speciation.
- If you start eastwards from Alaska theres no
real difference between Herring gulls in one area
of the Arctic Ocean and the next. But the minor
differences build up around the globe so that
Alaskan and Siberian gulls cant interbreed. - Theres negative spatial autocorrelation in the
fertility you wouldnt understand if you mixed
the whole population up. - Example factors in the spread of Ebola.
- We need a measure of the covariance between
neighbours.
32Plotting Autocorrelations
- Imagine we had the following map of mineral
deposits, just showing one variable.
NNE
- Obviously there is an spatial autocorrelation in
the NNE direction and not in the others.
33h-scattergrams
- One way of displaying of autocorrelation is to
plot the values of points against the value of a
neighbour distance h away in some direction.
- Usually the correlation will decrease with
distance. - Correlation may vary with direction as well.
34Correlogram
- We can get a number of h-scatterplots for
different h, and work out their correlation
coefficients. - This gives the strength of the correlation as
distance from a set of points drops off.
We can plot these against each other for one
direction.
Or as a contour plot for all directions.
35Moment of Inertia
- If a point x1 and its neighbour x2 were identical
and plotted against each other, theyd fall on
the 45 line x1 x2. - A measure of how much data does this is the
moment of inertia. - m 1/2n S(x1-x2)2
- Unlike the correlation coefficient, m increases
as the data gets further spread.
36Variograms
- A plot of the moment of inertia vs. h is called a
Semi-variogram, or, more usually, just a
Variogram.
m 100
m
h-distance
m 200
h
h-angle
37Problems
- Variograms cant use all the data values without
additional assumptions e.g. what is North or the
Northernmost data point? Usually we ignore the
boundaries. - All the correlation plots can suffer badly from a
few unusual values, which can badly reduce the
correlations. - h-scatterplots allow us to see which unusual
points are causing the problems and let us decide
whether to remove them.
38Multi-variant Plots
- We may be interested in the relationship between
two variables and whether they are spatially
cross-correlated. - We can plot h-scatterplots for a variable x and a
variable y, but shift the y location by h.
y, h11/2
x
39Multi-variant Correlation
- We can also calculate the cross-correlation for
this h-scatterplot. -
- r (for some h) ((Sxy) / n) xmym
- sxsy
- Where the means and standard deviations are just
for the variable points used, i.e. x at one
position, y at another. -
40Multi-variant Variograms
- Equally the equation for the moment of inertia
can be extended to - m 1/2n S((x1-x2) (y1-y2))
- Note that this is no longer the moment of
inertia as the line can be off 45. - Also note that it uses both x and y at positions
1 and 2.
41The Use of Variograms
- As well see in later lectures, variograms can be
very useful. - They represent the variability at different
distances from a point. - You can therefore use them to construct
probability models of a landscape and predict the
value of missing areas. - This is known as kriging, and well look at it in
later sessions. - However, it might be nice to have a single
statistic we can use to assess autocorrelation.
One way is using Joint Count Statistics.
42Joint Count Statistics
- Moran and Geary in the 1950s.
- Defines a binary variable something is either
present (white) or not (black). - Calculate the number of B-B W-W and B-W
connections. - These totals can then be compared with the normal
distribution, which is what wed get if the
process was random.
D
43Developments of this
- Morans I for contiguous areas.
- Gearys c for contiguous areas.
- However, this is strongly dependent on
- which directions you take as contiguous,
- variation in the size of areas and boundaries.
44Cliff and Ords Morans I test
- Core values are the deviations from the mean at
two locations. - These are then multiplied by an a priori weight
which represents how much two areas might effect
each other. - This is then normalized by the variation and
sample-number-to-weights ratio.
45How do we define the weights?
- Various options
- One or zero depending on whether the areas are
adjacent. - Each area has a total of one, and this is
divided up between its adjacent neighbours
dependant on the number of them. - Exponentially related to the distance between the
areas (its possible to assess the relationship
between each area and all the others). - We have to pick the most reasonable.
46Geographically Weighted Regression
- Pioneered by the Newcastle United team of
Fotheringham, Brunsdon, and Charlton. - A bit like Morans for regression, only even more
arduous.
47The Core Idea
- A standard regression has the same parameters
wherever you are geographically. - E.g. relationship between socioeconomics and
secondary school performance. - Usually the residuals tell you where youve gone
wrong. - GWR allows the parameters to vary spatially, so
you can look at these. - Assumes a link between where you are, and the
strength of a relationship.
48Locally weighted regression
- Run a standard regression for each point, but
weight near points as more important. - Often the weights are an exponential function of
distance and/or limited to a fixed number of
nearest neighbours. - Weakly dependent on the form of the weights.
Strongly dependent on how far weights stretch
around an area. - Can try to find the best distance. This is the
one that gives the best prediction for each
point, if that point is excluded from GWR
calculations. - Run, run and run again.
49GWR Software
- Derive local t statistics.
- Perform tests to assess the significance of the
spatial variation in the local parameter
estimates. - Perform tests to determine if the local model
performs better than the global one, accounting
for differences in degrees of freedom. - http//www.ncl.ac.uk/geography/GWR
50GWR ExampleSpatial Variations in School
Performance
- Did a global regression on Primary School Maths
results vs. demographics. - Then did a GWR regression. Derived the weights
for each factor for each point and plotted them. - Divided by an error variation term to give a
rough idea of the significance of the weights,
and plotted these.
51GWR Example Results
- In Leeds / Bradford, school size was much more
important than elsewhere (inverse relationship). - In Manchester, middle-class children do
proportionally better than their social group
elsewhere. - While the combination of variables and unknowns
is complex, GWR does suggest interesting avenues
of investigation.
Weights for school size
52Summary
- Correlation measures covariance, but doesnt say
anything about causal relationships. - We can measure correlation and test its
significance. - We can quantify relationships using regression
equations and use these to predict.
53Summary
- Spatial autocorrelation means our sampling
strategies arent as random / large as wed like. - Correlations can be due to geographical
correlations, not the ones weve tested for. - Plotting residuals geographically may let us see
autocorrelation. - Variograms and h-scatterplots are another good
way. - Morans I test allows us to quantify spatial
autocorrelation for a given weight scheme. - Geographically Weighted Regression helps us to
take autocorrelation into account and investigate
the weight it has.
54Next lecture
- Interpolation
- Homework
- Read handout on Autocorrelation Stats.
- View GWR talk.
- http//www.geocomputation.org/2001/
- Keynote.