Going Beyond GIS - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Going Beyond GIS

Description:

Environmental Health Sciences and Biostatistics. Bloomberg School of Public Health ... in Env Health Sci and Biostatistics. PhD in Statistics. Research agenda ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 47
Provided by: frankcu7
Learn more at: https://thinkport.org
Category:

less

Transcript and Presenter's Notes

Title: Going Beyond GIS


1
Going Beyond GIS for Environmental Health
Frank C. Curriero fcurrier_at_jhsph.edu Environment
al Health Sciences and Biostatistics Bloomberg
School of Public Health EnviroHealth
Connections Summer Institute 2006
2
Bio
  • Joint appt. in Env Health Sci and Biostatistics
  • PhD in Statistics
  • Research agenda is spatial statistics

Statistics
Geography (GIS)
Env Health
Spatial Statistics
3
Objectives
  • Provide exposure to the field of spatial
    statistics.
  • Keep it simple (non-technical)
  • Applications of GIS in Environmental
    Health
  • Beyond GIS, maps make you think/question
  • Current research topics
  • Geography (location) is a source of variation
    worth
  • considering in environmental health
    investigations.

4
What is Spatial Statistics?
Statistics for the analysis of spatial data
spatial
geographic

What is Spatial Data?
The where in addition to the what was
observed or measured is important and recorded
with the data. Location information (the
where) can vary.
What is GIS?
Stands for Geographic Information System Anything
more depends on who you ask!
5
What is a GIS?
One word def Database Two word def Visual
Database
Visual database for geographic data
  • Stores
  • Manipulates
  • Analysis
  • Queries
  • Creates
  • Displays

. . . .
MAPS
Layer cake of information
6
What else - A computer system (piece of
software) with a tremendous amount of
capability for storing, querying, combining,
presenting, . . . , spatial data. - GIS is
designed specifically for spatial data and
hence built to handle all of its complicated
features. - GIS is a generic name like word
processor. ArcGIS, MapInfo, Idrisi are
examples of different GIS. - The earth does
not have to be the backdrop for every GIS
application, but certainly most common.
7
What else (cont.) - Public health was not the
first and probably not be the last
application of GIS and spatial statistics. -
GIS as a mechanism for generating hypotheses
(exploratory spatial data analysis). - GIS is
a tool, a very powerful and valuable tool
when working with spatial data.
8
Applications in Spatial Statistics and GIS
  • Waterborne disease outbreaks
  • DDE soil contamination
  • Lyme Disease
  • Prostate cancer mapping
  • Chesapeake Bay water quality assessment

9
US Waterborne Disease Outbreaks, 1948-1994
Outbreak Data
Location Longitude Latitude Month
Year AL, Anniston -85.83 33.65
Oct 1953 AL, Center Pt. -86.68
33.63 Nov 1958 WY, Cody
-109.06 44.53 July 1986
. . .
. . .
. . .
10
US Waterborne Disease Outbreaks, 1948-1994
Substantive Questions
Do outbreaks occur at random across the US? Are
outbreaks preceded by extreme precipitation
events? Does the risk of an outbreak vary
spatially and related to watershed vulnerability?
11
Objective Association between extreme prcip. and
outbreaks Methods Overlay map of outbreaks and
extreme precip events 2,105
watersheds (USGS) 16,000
weather stations (NCDC) define
extreme precipitation aggregate
precip and outbreak to watershed Results 51
of outbreaks were coincident with extreme
levels of precip within a 2 month lag
preceding the outbreak
month. Conclusion Is this evidence of an
association?
12
16,000 Weather Stations Reporting Monthly
Precipitation
13
2105 US Watersheds
14
US Waterborne Disease Outbreaks, 1948-1994
Results 51 of outbreaks were coincident with
extreme levels of precip within a
2 month lag preceding the
outbreak month. Conclusion Is this evidence of
an association?
15
US Waterborne Disease Outbreaks, 1948-1994
  • Map generation included many involved GIS tasks
  • on numerous data sources, GIS Spatial Analysis.
  • Statistically speaking though it represents risk
  • factor data.
  • Spatial statistics often considers the map as a
  • starting point, which in GIS is often an
    endpoint.

16
Western Maryland Superfund Site
DDE Soil Sample Data
Sample Easting Northing DDE
(ppm) 1 1108420 725173
160 2 1108300
725378 4 110 1108490
725038 92
. . .
. . .
. . .
17
Substantive Questions
Does the site exceed regulated levels of DDE
contamination and in need of remediation? What
is the level of DDE in my backyard?
18
(No Transcript)
19
(No Transcript)
20
Kriged DDE Predictions
Kriging Spatial prediction at unsampled
locations based on data from
sampled locations. Environmental health
applications of kriging exposure maps
21
Baltimore County Lyme Disease 1989-1990
Lyme Case Lyme Control
Lyme Disease Cases and Controls
Cases
Controls Longitude Latitude Longitude
Latitude -76.4047 39.3421
-76.4054 39.3419 -76.3433 39.3736
-76.3522 39.3718 -76.7592
39.3265
-76.7665 39.3119
. .
. . .
22
Baltimore County Lyme Disease 1989-1990
Lyme Case Lyme Control
Substantive Questions
Do cases of Lyme Disease tend to cluster,
generally or as localized hot spots? Does risk
of Lyme Disease vary spatially over Balt.
County? Identify and quantify environmental risk
factors associated with Lyme Disease.
23
Baltimore County Lyme Disease Risk 1989-1990
Spatial Case/Control Analysis
  • Spatial density estimate of cases divided by
    spatial density
  • estimate of controls (nonparametric kernel
    approach).
  • Logistic regression approach to include
    covariates.

24
Statistical Methods Exist to Address
  • Do cases (events) show a tendency to cluster?
  • Identifying clusters or hot spots.
  • Does risk of disease (or outcome of interest)
    vary
  • spatially?
  • Is disease risk elevated near a particular point
  • source?
  • Spatial prediction of outcomes at unobserved
  • locations.
  • Risk factor estimation in the presence of
    residual
  • spatial variation.

25
Types of Spatial Data
1. Geostatistical Data
Basic structure is data tagged with
locations. Locations can essentially exist
anywhere. Referred to as continuous spatial
variation. Example MD Superfund Site DDE
26
2. Point Pattern Data
Locations are the data denoting occurrence of
events. Common to aggregate to area-level
data. Example Baltimore County Lyme Disease
Cases Baltimore County Lyme
Disease Controls
3. Area-level Data
Data summarized to an area unit. Rarely arises
naturally. Often an aggregate form of point
pattern data. Referred to as discrete spatial
variation. Example Maryland prostate cancer by
zip code
27
Why Collect Locations as Part of Data?
  • Sometimes locations are the only data (as in
    point patterns).
  • Risk (or outcome of interest) may vary
    spatially.
  • Location can serve as an information gateway to
    other
  • linked data sources environmental
  • demographic
  • social
  • etc.
  • Data are spatially dependent and locations are
    used in
  • statistical methods that account for this
    dependence.
  • In general things can vary spatially and
    geography (location)
  • maybe a source of variation worth considering.

28
Temporal Dependence
  • Time series or longitudinal data.
  • Past/present direction inherent in temporal data.

Spatial Dependence
  • Dimensions gt 1 and loss of directional
    component.
  • Observations closer together in space are more
  • similar than observations further away
    (clustering).

in space
on the earth
29
Spatial Dependence (clustering) in Environmental
Health Data
Could be due to
  • A contagious agent of the outcome under
  • investigation.
  • The spatial variation in the population at risk.
  • An underlying shared environmental
    characteristic,
  • measured or unmeasured, that also varies
    spatially
  • (Shared Environment Effect).

30
What GIS is Not
  • A complete system for statistical or scientific
    inference.
  • Maps, most basic and fundamental concepts in
    GIS,
  • are not statistical inference.
  • A GIS map of
  • one variable is analogous to a histogram
    display
  • two variables overlayed is analogous to an
    x-y
  • scatterplot or 2x2
    table.
  • In statistics we go beyond histograms and
  • scatterplots.

31
An Important Distinction
In the GIS literature analysis or spatial
analysis often means spatial data manipulation
which is something different than statistical
analysis.
32
Two Current Research Problems in Spatial
Statistics and GIS
Non-geocoded Data Non-Euclidean Distance
33
Geographic Analysis of Prostate Cancer in Maryland
PI Ann Klassen (HPM Oncology)
Collaborators Margaret Ensminger, Chyvette
Williams, JeanHeeHong (HPM)
Frank Curriero (Biostat), Anthony Alberg
(Epi) Martin Kulldorff
(Harvard), Helen Meissner (NCI)

Cooperative Agreement from Association of Schools
of Public Health and Centers for Disease
Control Data Agreement with the Maryland Cancer
Registry One of six CDC projects investigating
geography and prostate cancer, including NY,
CT/MA, NJ, Kansas/Iowa, and Louisiana.
34
Prostate Cancer Reported to MD Cancer Registry
1992-1997
Proportion of an Outcome of Interest

Legend
No Data
0 - 12
13 - 30
31 - 67
68 - 100

All geocoded cases
Outcomes of Interest Include
  • Incidence
  • Stage at diagnosis
  • Tumor grade at diagnosis
  • Failure to stage or grade
  • Treatment and mortality

35

Proportion of an Outcome of Interest
Legend
No Data
0 - 12
13 - 30
31 - 67
68 - 100
All geocoded cases
36
What is Geocoding?
GIS process of translating mailing address
information to coordinates on a map, such as with
longitude and latitude
16 Goucher Woods Ct Towson, MD 21286
(-76.5883, 39.4005)
Nongeocoded Data
Mailing addresses that could not be geocoded
8123 Rose Haven Road Rosedale, MD 21237
Nongeocoded
37
Reasons for Nongeocodes
Address error PO Box Rural routes Base maps
out of date
38
Proportion of Outcome of Interest
Geocoded Cases (15,585)
Legend
No Data
0 - 12
13 - 30
31 - 67
68 - 100
All Cases (17,091)
39
Statistical Issues
(1) Common to just ignore nongeocodes
What's the Consequence? Historically not well
documented in publications
(2) Level of aggregation for analyses?
Zip code level Census
tract, county, etc.
40
Statistical Issues (cont.)
(3) Nongeocodes represent missing data and
most likely not missing at random
MD Prostate Cancer Proportion of NonGeocodes
Nongeocoded
0 - 9
10 - 25
26 - 47
48 - 75
76 - 100
41
Statistical Issues (cont.)
(3) Nongeocodes carry plenty of information
Known Information (fictitious example)
Age 72
Race White
Year of Diagnosis 1991
Stage at Diagnosis Late
Tumor Grade Aggressive
Zip Code 21237
42
Statistical Solutions
(a) Impute a location for nongeocodes
Determine the age-race distribution within known
zip codes Weighted random selection based on
known age and race Sampling with and without
replacement Multiple imputation to assess bias

(Joint work with Ann Klassen,
HPM)
(b) Develop statistical models for outcomes at
different levels of aggregation
Spatial variation in risk model for geocoded
household level data and nongeocoded zip
code level data
(Joint work with Peter
Diggle, Biost)
43
Chesapeake Bay Water Quality Assessment
Data
Temperature Turbidity Dissolved
Oxygen Chlorophyll a
Needed
Assessments at unsampled locations
44
Kriging
A spatial regression method that provides
optimal prediction at unsampled
locations. Kriged predictions are weighted
averages of sampled data, higher weights given to
data closer to the prediction site. Proximity is
measured by the straight line Euclidean distance
(as the crow flies).

45
Chesapeake Bay Fixed Station Data
Euclidean distance may not be appropriate. Propos
e a water metric Currently kriging only
works for Euclidean distance. New methods needed.
46
Closing Remarks
  • GIS for spatial database management and
  • hypothesis generation (posing the questions)
  • Spatial Statistics for inferential methods
  • (answering the questions)
  • Why consider location
  • Scientific inference may depend on it
  • Gateway to environmental data
  • Source of variation worth considering
  • Biography and Geography of Public Health
Write a Comment
User Comments (0)
About PowerShow.com