Title: Predicting Median Substrate
1Predicting Median Substrate
- for Oregon and Washington EMAP sites
- Utilizing GIS data
Julia J. Smith December 12, 2005
2Why Predict Median Substrate?
- Indicator of overall stream health
- Bed load transport
- Stream Power
- Microinvertebrate habitat
- Fish habitat
- How is human development affecting a stream
3What is LD50?
- LD50 is a measure of median substrate.
- Geometric mean of class boundaries
- Log10 of the geometric means
- Several samples at each site
- LD50 is the median value of
- log10(geometric mean of class)
4Substrate Classifications
5(No Transcript)
6(No Transcript)
7Geomorphic Metrics
- ? is the total bank-full shear stress
- ?s is the density of sediment
- ? is fluid density
- g is gravitational acceleration
- h is bank-full depth
- S is channel slope
8Geomorphic Metrics
Distance-weighted Stream Power versus LD50 r
0.327, p-value 2.63 x 10 -12
9Geomorphic Metrics
Outlet link mean slope versus LD50 r 0.214,
p-value 3.78 x 10-6
10Geologic Metrics
- Percent Unconsolidated Geologic type versus LD50
- r -0.246, p-value 1.18 x 10-7
11Climatic Metrics
Annual average precipitation versus LD50 r
0.199, p-value 1.56 x 10-6
12Climatic Metrics
- Average annual potential evapotranspiration (mm)
versus LD50 - r -0.046, p-value 0.342
13Land Cover Metrics
- 1. Developed
- 2. Barren
- 3. Forest
- 4. Grasses
- 5. Agriculture
- 6. Wetlands
- 7. Open water/perennial ice and snow
- 8. Shrubland
14Land Cover Metrics
Percentage of watershed that is forest versus
LD50 r 0.19, p-value 3.516 x 10-5
15Distance-Weighted metrics
- j represents the land cover type of concern,
- Aj represents the total area for land cover type
j in the watershed, - represents the coefficient of exponential decay,
- represents average distance from outlet for
land cover of type j - n represents the total number of the land cover
types
16Additional Land Cover Metrics
- Buffered Metrics Buffered within a measure of
the stream (30 meters, 100 meters, 300 meters) - Buffered and Distance-weighted metrics
17Goals
- Predict LD50 without visiting sites
- Small number of predictors for scientifically
sensible model
18Methods-Stepwise Variable Selection
- Multiple Linear Regression
- Top-in-tier models
- Top geomorphic models plus one from each of the
remaining tiers
19Akaikes Information Criterion
N observations p predictors RSS is the sum of
squared residuals
20AIC in stepwise variable selection
- Forward Stepwise Selection -
- Method for choosing the top predictor from each
tier - Start with the intercept model
- Choose the variable that reduces AIC the most and
include in model. - Stepwise selection in both directions-
- Method chosen for choosing all top Geomorphic
predictors - Start with full model.
- Add and subtract variables until the model with
minimum AIC is found or iteration stops.
21Methods CART Classification and Regression
Trees
22Methods CART Classification and Regression
Trees
23Hybrid of Multiple Linear Regression and CART
- Utilize CART on the residuals
- Add indicator variables to the multiple linear
regression equation for one minus the number of
terminal nodes in the tree - Create new multiple regression model with
variables and indicator variables
24Predictive-ability Statistics
25Analysis Comparison Top 4-tier Models
- Problems with top 4-tier models
- Low Adjusted R2
- Low Predictive Ability
- Over-prediction and under-prediction of fine and
bedrock substrate - Non-normal residuals
- Benefit of top 4-tier models
- Small number of predictors
26Example of Non-normality of ResidualsTop 4-Tier
Model
27Analysis Comparison Geomorphic plus Top 3-Tier
Models
- Problems with top geomorphic plus top 3-tier
model - Increase in number of variables
- Predictive ability still low
- Over-prediction and under-prediction of fine and
bedrock substrate - Some collinearity between variables
28Analysis Comparison Geomorphic plus Top 3-Tier
Models
- Benefits with top geomorphic plus top 3-tier
model - Improved predictions
- Improved normality of residuals
29Comparison of Analysis - CART
- Problems with CART
- Low predictive-ability
- Predicts several observed substrate sizes in one
node - Over-prediction and under-prediction of fines and
bedrock substrate - Omitting one site creates different tree
- Benefits of CART
- Simple analysis
- Missing variables not an issue
30CART Predictions
31Comparison of Analysis-Hybrids
- Problems with hybrid models
- Increased number of variables
- Collinearity with introduction of node indicator
variables - Non-normal residuals
32Comparison of Analysis-Hybrids
- Benefit of hybrid models
- Residuals closer to normal
- Increased predictive-ability
- Explains some of the variation created by fitting
a linear model to ordinal data
33One example Residual Tree forHybrid Geomorphic
plus Top 3-Tier Model
- Most promising multiple regression prediction
model Geomorphic plus top 3-tier
34One example Residual Tree forHybrid Geomorphic
plus Top 3-Tier Model
35One example Observed vs. Predicted forHybrid
Geomorphic plus Top 3-Tier Model
- Plot of predictions against observed LD50
36QQ-Plot of Residuals for Hybrid Model
37Coast Range Ecoregion
- Less skewed distribution of LD50
- No measurements are outliers
- Similar ecosystem throughout region
38Ecoregion Distributions
39Coast Range EMAP Sites
40Top 4-Tier Coast Range Model
- Predictors
- Average aspect (climatic)
- Average watershed elevation (geomorphic)
- watershed as volcanic geologic type (geologic)
- wetlands (distance weighted and buffered)
41QQ-Plot Top 4-Tier Coast Range
42Observed versus Predicted Top 4-Tier Coast
Range Model
43Coast Range ModelTop Geomorphic Variables
- Average watershed elevation (m)
- Drainage density
- Mean slope within a 300-meter buffer
- Ratio of width of stream to width of floodplain
- Coefficient of average hill connectivity
- Distance to the first tributary (m)
- Percent of landscape with less than 4 slope
- Percent of landscape with less than 7 slope
- Measure of size and complexity of river
- Percent of stream as cascade
- Distance-weighted stream power
- Watershed relief divided by its length
44QQ-Plot Coast Range Geomorphic plus Top 3-Tier
model
45Observed versus Predicted Coast Range
Geomorphic Top 3-Tier
46CART - Coast Range Ecoregion
Predictions versus Observed LD50
47Coast Range Hybrid Models
- Benefits of hybrid
- Improved prediction
- Improved fit
- Improved normality of residuals
- Problems with hybrid
- Increased number of predictors
- Collinearity with node indicator variables
48QQ-PlotCoast Range Hybrid Top 4-Tier
49Observed versus PredictedCoast Range Hybrid Top
4-Tier
50QQ-Plot Coast Range Hybrid Geomorphic plus Top
3-Tier
51Observed versus Predicted Coast Hybrid
Geomorphic plus Top 3-Tier
52Comparison of Coast Models
53Conclusions
- LD50 is difficult to predict
- Additional geomorphic predictors increases
prediction ability - Hybrid models increase prediction ability
- More success in Coast Range Ecoregion
54Future Work
- Logistic Regression
- Ordinal data treated as continuous in this study
- 12 categories might require more sophisticated
methods - Spatial Analysis
- Appears to be spatial correlation in distribution
of LD50