CPSC 601.04 - PowerPoint PPT Presentation

1 / 37

About This Presentation

Title:

CPSC 601.04

Description:

Title: CPSC 601.82 Lecture 8 Author: marina Last modified by: marina Created Date: 2/6/2003 4:10:28 AM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:120

Avg rating:3.0/5.0

Slides: 38

Provided by: Mar5276

Category:

more less

Transcript and Presenter's Notes

Title: CPSC 601.04

1
CPSC 601.04

Statistical Analysis in GIS
Dr. M. Gavrilova

2
Overview

Importance of correct data representation
Variance and covariance
Autocorrelation
Applications to pattern analysis and geometric
modeling

3
Overuse of color and dimensionality
Four colors, three dimensions, and two plots to
visualize five data points
http//www.math.yorku.ca/SCS/Gallery/
4
Misleading data axis
5
Overcrowded data
Steven Skiena, Stony Brook, NY
http//www.cs.sunysb.edu/skiena
6
Time increasing over time
http//www.math.yorku.ca/SCS/Gallery/
7
Scatterplot linear or logarithmic?
Results of a poll of happiness from the World
Values Survey project of people throughout the
world in relation to economy, GNP per
capita. Many countries, particularly those in
Latin America, had higher marks for happiness
than their economic situation would predict.
Conclusion is based on the assumption that
happiness should be linearly related to GNP.
8
GIS goals

An organized collection of computer hardware,
software, geographic data, and personnel designed
to efficiently capture, store, update,
manipulate, analyze, and display all forms of
geographically referenced data.

9
Spatial Analysis

Provides
an efficient and generally reliable means of
obtaining knowledge about spatial processes,
a way of maximizing our knowledge of spatial
processes with the minimum of error.

10
Spatial processes

Spatial Data
location and attribute ? Pi (x, y, z)
Spatial Stochastic Processes
statistics and inference
Spatial is special
spatial autocorrelation
spatial non-stationarity
proximity

11
Examples of data analysis
The Space Shuttle Challenger exploded shortly
after take-off in January 1986. Cause failure of
the O-ring seals used to isolate the fuel supply
from burning gases. Graph from the Report of the
Presidential Commission on the Space Shuttle
Challenger Accident, 1986. NASA staff had
analysed the data on the relation between
temperature and number of O-ring failures (out of
6), but they had excluded observations where no
O-rings failed, believing that they were
uninformative. They were main observations
showing no failure at warm temperatures (65-80
degF).
12
Better graph curve fitting
Apart from the disasterouse omitting the
observations with 0 failures 1. drawing a
smoothed curve to fit the points 2. removing
the background grid which obscure datagives a
graph which shows excessive risks associated with
both high and low temperatures
13
Logistic regressing model
14
Challenger disaster

Reanalysis of the O-ring data involved fitting a
logistic regression model. This provides a
predicted extrapolation (black curve) of the
probability of failure to the low (31 degF)
temperature at the time of the launch and
confidence bands on that extrapolation (red
curves). See also Tappin, L. (1994). "Analyzing
data relating to the Challenger disaster".
Mathematics Teacher, 87, 423-426
There's not much data at low temperatures (the
confidence band is quite wide), but the predicted
probability of failure is uncomfortably high.
Would you take a ride on Challenger when the
weather is cold?

15
Good examples
The French engineer, Charles Minard (1781-1870),
illustrated the disastrous result of Napoleon's
failed Russian campaign of 1812. The graph shows
the size of the army by the width of the band
across the map of the campaign on its outward and
return legs, with temperature on the retreat
shown on the line graph at the bottom. Many
consider Minard's original the best statistical
graphic ever drawn.
16
Florence Nightingale's Coxcomb diagrams
17
Escaping the 2D
18
Definitions statistical variables

Samples, populations, consist of individuals.
Values of certain attributes are called
observations (e. g. age, income).
Attributes vary across individuals, and they
are called variables.
Variables are described by distributions and
their parameters (e.g. Normal, Poisson, ).
A random variable X assumes its value according
to the outcome of a chance experiment (coin,
dice).

19
Definitions Variance

Variance is the sum of squared deviations from
the mean divided by n (or n-1) sample number.

Sample Variance Population Variance
20
Autocorrelation

Spatial autocorrelation is a measure of the
similarity of objects within an area.
Jay Lee and Louis K. Marion, 2001

21
Morans Index

The formula to compute Morans index is the
following

where n is the number of individual points,
A area of the bounding polygon, i.e. the total
area of the map including all points
zi- value of the parameter measured for point I
(attribute)

22
Features

wij is computed according to the following rule,
min(dij) is the smallest of all distances between
all pairs of points computed
In this formula, distance dij is computed
according to the formulas for Euclidean, supremum
or Manhattan metrics. Since dii is equal to 0,
wii will become infinite, thus cases when ij
should be excluded. This will result in n2 n
pairs of points.

23
Selecting pairs of points

The sum by all i,j means that ALL ORDERED PAIRS
of points (i.e. order of consideration of pair ij
is important) should be considered by the
formula.
Sometimes, only pair of sample points within a
specific distance from each other are considered.

24
Application to pattern analysis

Example autocorrelation on a grid.
Sample points are combined in one cell. Size and
location of the cell defines autocorrelation
parameters.
Consider all pairs of GRID CELLS, where XC and YC
now denote coordinates of the center of each grid
cell and the attribute z for each grid is the sum
of combined attributes of all points that belong
to this cell.
Result insight on pattern analysis and
correlation can be obtained.

25
Case study 1 Pattern Analysis

Analysis of instances of patients undergoing
cardiac catheterization, and location of those
instances, i.e. city blocks.
Primary question spatial variation of heart
disease random or non-random pattern?
Secondary question relationship between disease
occurrence and social and demographic factors
(Spatial Regression).

26
Set up

Analysis results are affected by grid size
prone to subjective choices
constrained by spatial resolution of data
Solving the problem by
using a non-arbitrary grid(s)
implementing a guided selection of the
square unit area or grid size

27
City blocks in Calgary
28
Methodology

Definition of a city-block grid based on the
main division in the city, i.e. using the squared
grid centered on the intersection between Center
Street and Center Avenue as the main axes of the
geometric plan thus created.
Grid regularity decreases as distance increases
from its center.
L_p norms provide flexibility to adjust grids
size and shape consequently.

29
Methodology

Application of varying L_p norms
Varying spatial weights for spatial
autocorrelation
Autocorrelation analysis at varying scales
(CDA, community)
Data 2001/1996 census

30
Experiments
31
Observations

Sensitivity of Spatial Autocorrelation to
L_p norm
spatial weight
Proposed method useful in determining
best distance
best spatial weight
In context of multivariate spatial regression
best ?? lowest variance

32
Results

The Calgary Journal, Regional publication,
Researchers link heart disease to urban
lifestyles on SPARCS activity profile, Oct. 26
Nov. 8, 2005
High risk of heart attack male, high education,
married

33
Case study 2 Oil spill discharge
34
Summary statistics
cells Min. Max. Mean St. dev. Sum Skew Kurt.
Oil spill counts 44 (2,741) 0 3 0.02 0.162 53 9.85 113.6
Flight counts 2151 (2,741) 0 309 13.75 27.12 37,681 4.21 25.6
The mean and the standard deviation provide
information about the statistical dispersion of
the data and skewness (irregular) and kurtosis
(bulging in Greek) indicate highly skewed
distributions or lack of normality in the data.
35
Data clustering
36
Statistical analysis

Our exploratory analyses indicate that there is a
positive spatial autocorrelation within datasets
for all variables.
An initial overview of the statistical
distribution and normality of each of the
variables selected for this study indicated
absence of normality in the data.

Exploratory Spatial Analysis of Illegal Oil
Discharges Detected off Canadas Pacific
Coast. Norma Serra-Sogas1, Patrick OHara2,
Rosaline Canessa3, Stefania Bertazzon4 and Marina
Gavrilova5
37
Lecture summary

Proper statistical analysis is important
Variance and autocorrelation are two important
vehicles for data analysis
Combining these measures with various metrics,
hierarchical structures, grids, attributes and
also data filtering/visualization methods is a
direction of current research.

Write a Comment

User Comments (0)