Title: Erin Peterson
1Predicting Water Quality Impaired Stream Segments
using Landscape-scale Data and a Regional
Geostatistical Model
- Erin Peterson
- Geosciences Department
- Colorado State UniversityFort Collins, Colorado
2Space-Time Aquatic Resources Modeling and
Analysis Program
The work reported here was developed under STAR
Research Assistance Agreement CR-829095 awarded
by the U.S. Environmental Protection Agency (EPA)
to Colorado State University. This presentation
has not been formally reviewed by EPA. EPA does
not endorse any products or commercial services
mentioned in this presentation.
3Overview
- Introduction
-
- Background
-
- Patterns of spatial autocorrelation in stream
water chemistry -
- Predicting water quality impaired stream segments
using landscape-scale data and a regional
geostatistical model A case study in Maryland
4The Clean Water Act (CWA) 1972
- Section 303(d)
- Requires states and tribes to ID water quality
impaired stream segments - Section 305(b)
- Create a biannual water quality inventory
- Characterizes regional water quality
- Based on attainment of designated-use standards
assigned to individual stream segments
5Probability-based Random Survey Designs
- Used to meet section 305(b) requirements
- Derive a regional estimate of stream condition
- Assign a weight based on stream order
- Provides representative sample of streams by
order - Statistical inference about population of
streams, within stream order, over large area - Reported in stream miles based on inference of
attainment - Disadvantages
- Does not take watershed influence into account
- Does not ID spatial location of impaired stream
segments - Fails to meet requirements of CWA Section 303(d)
6Purpose
Develop a geostatistical methodology based on
coarse-scale GIS data and field surveys that can
be used to predict water quality characteristics
about stream segments found throughout a large
geographic area (e.g., state)
7(No Transcript)
8Geostatistical Modeling
- a.k.a. Kriging
- Interpolation method
- Allows spatial autocorrelation in error term
- More accurate predictions
- Fit an autocovariance function to data
- Describes relationship between observations based
on separation distance
- 3 Autocovariance Parameters
- Nugget variation between sites as separation
distance approaches zero - Sill delineated where semivariance asymptotes
- Range distance within which spatial
autocorrelation occurs
9Distance Measures Spatial Relationships
Distances and relationships are represented
differently depending on the distance measure
Straight-line Distance (SLD) Geostatistical
models typically based on SLD
10Distance Measures Spatial Relationships
Distances and relationships are represented
differently depending on the distance measure
Symmetric Hydrologic Distance (SHD) Hydrologic
connectivity Fish movement
11Distance Measures Spatial Relationships
Distances and relationships are represented
differently depending on the distance measure
Asymmetric Hydrologic Distance Longitudinal
transport of material
12Distance Measures Spatial Relationships
Distances and relationships are represented
differently depending on the distance measure
- Challenge
- Spatial autocovariance models developed for SLD
may not be valid for hydrologic distances - Covariance matrix is not positive definite
13Asymmetric Autocovariance Models for Stream
Networks
- Weighted asymmetric hydrologic distance (WAHD)
- Developed by Jay Ver Hoef, National Marine Mammal
Laboratory, Seattle - Moving average models
- Incorporate flow volume, flow direction, and use
hydrologic distance - Positive definite covariance matrices
Ver Hoef, J.M., Peterson, E.E., and Theobald,
D.M., Spatial Statistical Models that Use Flow
and Stream Distance, Environmental and Ecological
Statistics. In Press.
14Patterns of Spatial Autocorrelation in Stream
Water Chemistry
15Objectives
- Evaluate 8 chemical response variables
- pH measured in the lab (PHLAB)
- Conductivity (COND) measured in the lab µmho/cm
- Dissolved oxygen (DO) mg/l
- Dissolved organic carbon (DOC) mg/l
- Nitrate-nitrogen (NO3) mg/l
- Sulfate (SO4) mg/l
- Acid neutralizing capacity (ANC) µeq/l
- Temperature (TEMP) C
- Determine which distance measure is most
appropriate - SLD
- SHD
- WAHD
- More than one?
16Dataset
- Maryland Biological Stream Survey (MBSS) Data
- Maryland Department of Natural Resources
- 1995, 1996, 1997
- Stratified probability-based random survey design
- 881 sites in 17 interbasins
17(No Transcript)
18Spatial Distribution of MBSS Data
19GIS Tools
Automated tools needed to extract data about
hydrologic relationships between survey sites did
not exist! Wrote Visual Basic for Applications
(VBA) programs to
- Calculate watershed covariates for each stream
segment - Functional Linkage of Watersheds and Streams
(FLoWS) - Calculate separation distances between sites
- SLD, SHD, Asymmetric hydrologic distance (AHD)
- Calculate the spatial weights for the WAHD
- Convert GIS data to a format compatible with
statistics software - FLoWS tools will be available on the STARMAP
website - http//nrel.colostate.edu/projects/starmap
-
20Spatial Weights for WAHD
- Proportional influence (PI) influence of each
neighboring survey site on a downstream survey
site - Weighted by catchment area Surrogate for flow
volume
21Spatial Weights for WAHD
- Proportional influence (PI) influence of each
neighboring survey site on a downstream survey
site - Weighted by catchment area Surrogate for flow
volume
survey sites stream segment
22Spatial Weights for WAHD
- Proportional influence (PI) influence of each
neighboring survey site on a downstream survey
site - Weighted by catchment area Surrogate for flow
volume
A
C
B
E
D
F
G
H
23Data for Geostatistical Modeling
- Distance matrices
- SLD, SHD, AHD
- Spatial weights matrix
- Contains flow dependent weights for WAHD
- Watershed covariates
- Lumped watershed covariates
- Mean elevation, Urban
- Observations
- MBSS survey sites
24Geostatistical Modeling Methods
- Validation Set
- Unique for each chemical response variable
- 100 sites
- Initial Covariate Selection
- Reduce covariates to 5
- Model Development
- Restricted model space to all possible linear
models - Model set 32 models (25 models)
- One model set for
- General linear model (GLM), SLD, SHD, and WAHD
models
25Geostatistical Modeling Methods
- Geostatistical model parameter estimation
- Maximize the profile log-likelihood function
26Geostatistical Modeling Methods
Fit exponential autocorrelation function
- Model selection within model set
- GLM Akaike Information Corrected Criterion
(AICC) - Geostatistical models Spatial AICC (Hoeting et
al., in press)
where n is the number of observations, p-1 is the
number of covariates, and k is the number of
autocorrelation parameters. http//www.stat.col
ostate.edu/jah/papers/spavarsel.pdf
27Geostatistical Modeling Methods
- Model selection between model types
- 100 Predictions Universal kriging algorithm
- Mean square prediction error (MSPE)
- Cannot use AICC to compare models based on
different distance measures - Model comparison r2 for observed vs. predicted
values
28Results
- Summary statistics for distance measures
- Spatial neighborhood differs
- Affects number of neighboring sites
- Affects median, mean, and maximum separation
distance
29Results
Mean Range Values SLD 28.2 km SHD 88.03
km WAHD 57.8 km
- Range of spatial autocorrelation differs
- Shortest for SLD
- TEMP shortest range values
- DO largest range values
30Results
- Distance Measures
- GLM always has less predictive ability
- More than one distance measure usually performed
well - SLD, SHD, WAHD PHLAB DOC
- SLD and SHD ANC, DO, NO3
- WAHD SHD COND, TEMP
- SLD distance SO4
31Results
Predictive ability of models
Strong ANC, COND, DOC, NO3, PHLAB Weak DO,
TEMP, SO4
r2
32Discussion
Distance measure influences how spatial
relationships are represented in a stream network
- Sites relative influence on other sites
- Dictates form and size of spatial neighborhood
- Important because
- Impacts accuracy of the geostatistical model
predictions
33(No Transcript)
34Discussion
- Probability-based random survey design (-)
affected WAHD - Maximize spatial independence of sites
- Does not represent spatial relationships in
networks - Validation sites randomly selected
35Discussion
WAHD models explained more variability as
neighboring sites increased
- Not when neighbors had
- Similar watershed conditions
- Significantly different chemical response values
36Discussion
- GLM predictions improved as number of neighbors
increased - Clusters of sites in space have similar watershed
conditions - Statistical regression pulled towards the cluster
- GLM contained hidden spatial information
- Explained additional variability in data with gt
neighbors
37Predictive Ability of Geostatistical Models
r2
38Conclusions
- Spatial autocorrelation exists in stream
chemistry data at a relatively coarse scale - Geostatistical models improve the accuracy of
water chemistry predictions - Patterns of spatial autocorrelation differ
between chemical response variables - Ecological processes acting at different spatial
scales - SLD is the most suitable distance measure at
regional scale at this time - Unsuitable survey designs
- SHD GIS processing time is prohibitive
39Conclusions
- Results are scale specific
- Spatial patterns change with survey scale
- Other patterns may emerge at shorter separation
distances - Further research is needed at finer scales
- Watershed or small stream network
- Need new survey designs for stream networks
- Capture both coarse and fine scale variation
- Ensure that hydrologic neighborhoods are
represented
40Predicting Water Quality Impaired Stream Segments
using Landscape-scale Data and a Regional
Geostatistical Model A Case Study In Maryland
41Objective
Demonstrate how a geostatistical methodology can
be used to meet the requirements of the Clean
Water Act
- Predict regional water quality conditions
- ID the spatial location of potentially impaired
stream segments
42(No Transcript)
43Methods
Potential covariates
44Methods
Potential covariates after initial model
selection (10)
45Methods
- Fit geostatistical models
- Two distance measures SLD and WAHD
- Restricted model space to all possible linear
models - 1024 models per set (210 models)
- Parameter Estimation
- Maximized the profile log-likelihood function
46Methods
47Results
- SLD models performed better than WAHD
- Exception Spherical model
- Best models
- SLD Exponential, Mariah, and Rational Quadratic
models
- r2 for SLD model predictions
- Almost identical
- Further analysis restricted to SLD Mariah model
48Results
- Covariates for SLD Mariah model
- WATER, EMERGWET, WOODYWET, FELPERC, MINTEMP
- Positive relationship with DOC
- WATER, EMERGWET, WOODYWET, MINTEMP
- Negative relationship with DOC
- FELPERC
49Cross-validation intervals for Mariah model
regression coefficients
- Cross-validation interval 95 of regression
coefficients produced by leave-one-out cross
validation procedure - Narrow intervals
- Few extreme regression coefficient values
- Not produced by common sites
- Covariate values for the site are represented in
observed data - Not clustered in space
50r2 Observed vs. Predicted Values
1 influential site r2 without site 0.66
n 312 sites r2 0.72
51Model Fit
52Discussion
- SLD models more accurate than WAHD models
- Landscape-scale covariates were not restricted to
watershed boundaries
- Geology type
- Temperature
- Wetlands water
53Discussion
- Regression Coefficients
- Narrow cross-validation intervals
- Spatial location of the sites not as important as
watershed characteristics - Extreme regression coefficient values
- Not produced by common sites
- Not clustered in space
- Local-scale factor may have affected stream DOC
- Point source of organic waste
54Spatial Patterns in Model Fit
- North and east of Chesapeake Bay - large SPE
values - Naturally acidic blackwater streams with elevated
DOC - Not well represented in observed dataset
- 2 blackwater sites
- Geostatistical model unable to account for
natural variability - Large square prediction errors
- Large prediction variances
55Spatial Patterns in Model Fit
- West of Chesapeake Bay - low SPE values
- Due to statistical and spatial distribution of
observed data - Regression equation fit to the mean in the data
- Most observed sites low DOC values
- Less variation in western and central Maryland
- Neighboring sites tend to be similar
- Separation distances shorter in the west
- Short separation distances stronger covariances
56Model Performance
Unable to account for abrupt differences in DOC
values between neighboring sites with similar
watershed conditions
- What caused abrupt differences?
- Point sources of organic pollution
- Not represented in the model
- Non-point sources of pollution
- Lumped watershed attributes are non-spatial
- Differences due to spatial location of landuse
are not represented - Challenging to represent ecological processes
using coarse-scale lumped attributes - i.e. Flow path of water
57Generate Model Predictions
- Prediction sites
- Study area
- 1st, 2nd, and 3rd order non-tidal streams
- 3083 segments 5973 stream km
- ID downstream node of each segment
- Create prediction site
- More than one site at each confluence
- Generate predictions and prediction variances
- SLD Mariah model
- Universal kriging algorithm
- Assigned predictions and prediction variances
back to stream segments in GIS
58(No Transcript)
59Weak Model Fit
60Strong Model Fit
61Water Quality Attainment by Stream Kilometers
- Threshold values for DOC
- Set by Maryland Department of Natural Resources
- High DOC values may indicate biological or
ecological stress
62Implications for Water Quality Monitoring
- Tradeoff between cost-efficiency and model
accuracy - Western Maryland
- Can be described using a single geostatistical
model - Eastern and northeastern Maryland
- Accept poor model fit
- Collect additional survey data for regional
geostatistical model - Develop a separate geostatistical model for
eastern Maryland
63Implications for Water Quality Monitoring
- Apply this methodology to other regulated
constituents -
- Technical and Regulatory Services Administration
within the MDE modifying the NHD - Include water quality standards stream-use
designations by NHD segment - Use water quality standards instead of thresholds
- Categorize predictions into potentially impaired
or unimpaired status - Report on attainment in stream miles/kilometers
64Conclusions
- Geostatistical models generated more accurate DOC
predictions than previous non-spatial models
based on coarse-scale landscape data - SLD is more appropriate than WAHD for regional
geostatistical modeling of DOC at this time - Adds value to existing water quality monitoring
efforts - Used to comply with the CWA more easily
- Additional field sampling is not necessary
- Inferences about regional stream condition can be
generated - It can be used to identify the spatial location
of potentially impaired stream segments
65Conclusions
- Model predictions and prediction variances
- Allow additional field efforts to be concentrated
in - Areas with large amounts of uncertainty
- Areas with a greater potential for water quality
impairment - Model results can be displayed visually
- Allows professionals to communicate results to a
wide variety of audiences
66Thank You!
Advisors Dave Theobald and Melinda
Laituri Committee Members Will Clements and
Brian Bledsoe Collaborators N. Scott Urquhart,
Jay M. Ver Hoef, and Andrew A. Merton Team
Theobald Grant Wilcox, John Norman, Nate
Peterson, and Melissa Sherburne Dennis Ojima and
Keith Paustian Family and friends My husband
Nate
67Questions?