Predicting Median Substrate - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Predicting Median Substrate

Description:

Predicting Median Substrate for Oregon and Washington EMAP sites Utilizing GIS data Julia J. Smith December 12, 2005 Why Predict Median Substrate? – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 55
Provided by: statColo1
Category:

less

Transcript and Presenter's Notes

Title: Predicting Median Substrate


1
Predicting Median Substrate
  • for Oregon and Washington EMAP sites
  • Utilizing GIS data

Julia J. Smith December 12, 2005
2
Why Predict Median Substrate?
  • Indicator of overall stream health
  • Bed load transport
  • Stream Power
  • Microinvertebrate habitat
  • Fish habitat
  • How is human development affecting a stream

3
What is LD50?
  • LD50 is a measure of median substrate.
  • Geometric mean of class boundaries
  • Log10 of the geometric means
  • Several samples at each site
  • LD50 is the median value of
  • log10(geometric mean of class)

4
Substrate Classifications
5
(No Transcript)
6
(No Transcript)
7
Geomorphic Metrics
  • ? is the total bank-full shear stress
  • ?s is the density of sediment
  • ? is fluid density
  • g is gravitational acceleration
  • h is bank-full depth
  • S is channel slope

8
Geomorphic Metrics
Distance-weighted Stream Power versus LD50 r
0.327, p-value 2.63 x 10 -12
9
Geomorphic Metrics
Outlet link mean slope versus LD50 r 0.214,
p-value 3.78 x 10-6
10
Geologic Metrics
  • Percent Unconsolidated Geologic type versus LD50
  • r -0.246, p-value 1.18 x 10-7

11
Climatic Metrics
Annual average precipitation versus LD50 r
0.199, p-value 1.56 x 10-6
12
Climatic Metrics
  • Average annual potential evapotranspiration (mm)
    versus LD50
  • r -0.046, p-value 0.342

13
Land Cover Metrics
  • 1. Developed
  • 2. Barren
  • 3. Forest
  • 4. Grasses
  • 5. Agriculture
  • 6. Wetlands
  • 7. Open water/perennial ice and snow
  • 8. Shrubland

14
Land Cover Metrics
Percentage of watershed that is forest versus
LD50 r 0.19, p-value 3.516 x 10-5
15
Distance-Weighted metrics
  • j represents the land cover type of concern,
  • Aj represents the total area for land cover type
    j in the watershed,
  • represents the coefficient of exponential decay,
  • represents average distance from outlet for
    land cover of type j
  • n represents the total number of the land cover
    types

16
Additional Land Cover Metrics
  • Buffered Metrics Buffered within a measure of
    the stream (30 meters, 100 meters, 300 meters)
  • Buffered and Distance-weighted metrics

17
Goals
  • Predict LD50 without visiting sites
  • Small number of predictors for scientifically
    sensible model

18
Methods-Stepwise Variable Selection
  • Multiple Linear Regression
  • Top-in-tier models
  • Top geomorphic models plus one from each of the
    remaining tiers

19
Akaikes Information Criterion
N observations p predictors RSS is the sum of
squared residuals
20
AIC in stepwise variable selection
  • Forward Stepwise Selection -
  • Method for choosing the top predictor from each
    tier
  • Start with the intercept model
  • Choose the variable that reduces AIC the most and
    include in model.
  • Stepwise selection in both directions-
  • Method chosen for choosing all top Geomorphic
    predictors
  • Start with full model.
  • Add and subtract variables until the model with
    minimum AIC is found or iteration stops.

21
Methods CART Classification and Regression
Trees
22
Methods CART Classification and Regression
Trees
  • Predicted Response

23
Hybrid of Multiple Linear Regression and CART
  • Utilize CART on the residuals
  • Add indicator variables to the multiple linear
    regression equation for one minus the number of
    terminal nodes in the tree
  • Create new multiple regression model with
    variables and indicator variables

24
Predictive-ability Statistics

25
Analysis Comparison Top 4-tier Models
  • Problems with top 4-tier models
  • Low Adjusted R2
  • Low Predictive Ability
  • Over-prediction and under-prediction of fine and
    bedrock substrate
  • Non-normal residuals
  • Benefit of top 4-tier models
  • Small number of predictors

26
Example of Non-normality of ResidualsTop 4-Tier
Model
27
Analysis Comparison Geomorphic plus Top 3-Tier
Models
  • Problems with top geomorphic plus top 3-tier
    model
  • Increase in number of variables
  • Predictive ability still low
  • Over-prediction and under-prediction of fine and
    bedrock substrate
  • Some collinearity between variables

28
Analysis Comparison Geomorphic plus Top 3-Tier
Models
  • Benefits with top geomorphic plus top 3-tier
    model
  • Improved predictions
  • Improved normality of residuals

29
Comparison of Analysis - CART
  • Problems with CART
  • Low predictive-ability
  • Predicts several observed substrate sizes in one
    node
  • Over-prediction and under-prediction of fines and
    bedrock substrate
  • Omitting one site creates different tree
  • Benefits of CART
  • Simple analysis
  • Missing variables not an issue

30
CART Predictions
31
Comparison of Analysis-Hybrids
  • Problems with hybrid models
  • Increased number of variables
  • Collinearity with introduction of node indicator
    variables
  • Non-normal residuals

32
Comparison of Analysis-Hybrids
  • Benefit of hybrid models
  • Residuals closer to normal
  • Increased predictive-ability
  • Explains some of the variation created by fitting
    a linear model to ordinal data

33
One example Residual Tree forHybrid Geomorphic
plus Top 3-Tier Model
  • Most promising multiple regression prediction
    model Geomorphic plus top 3-tier

34
One example Residual Tree forHybrid Geomorphic
plus Top 3-Tier Model
35
One example Observed vs. Predicted forHybrid
Geomorphic plus Top 3-Tier Model
  • Plot of predictions against observed LD50

36
QQ-Plot of Residuals for Hybrid Model
37
Coast Range Ecoregion
  • Less skewed distribution of LD50
  • No measurements are outliers
  • Similar ecosystem throughout region

38
Ecoregion Distributions
39
Coast Range EMAP Sites
40
Top 4-Tier Coast Range Model
  • Predictors
  • Average aspect (climatic)
  • Average watershed elevation (geomorphic)
  • watershed as volcanic geologic type (geologic)
  • wetlands (distance weighted and buffered)

41
QQ-Plot Top 4-Tier Coast Range
42
Observed versus Predicted Top 4-Tier Coast
Range Model
43
Coast Range ModelTop Geomorphic Variables
  • Average watershed elevation (m)
  • Drainage density
  • Mean slope within a 300-meter buffer
  • Ratio of width of stream to width of floodplain
  • Coefficient of average hill connectivity
  • Distance to the first tributary (m)
  • Percent of landscape with less than 4 slope
  • Percent of landscape with less than 7 slope
  • Measure of size and complexity of river
  • Percent of stream as cascade
  • Distance-weighted stream power
  • Watershed relief divided by its length

44
QQ-Plot Coast Range Geomorphic plus Top 3-Tier
model
45
Observed versus Predicted Coast Range
Geomorphic Top 3-Tier
46
CART - Coast Range Ecoregion
Predictions versus Observed LD50
47
Coast Range Hybrid Models
  • Benefits of hybrid
  • Improved prediction
  • Improved fit
  • Improved normality of residuals
  • Problems with hybrid
  • Increased number of predictors
  • Collinearity with node indicator variables

48
QQ-PlotCoast Range Hybrid Top 4-Tier
49
Observed versus PredictedCoast Range Hybrid Top
4-Tier
50
QQ-Plot Coast Range Hybrid Geomorphic plus Top
3-Tier
51
Observed versus Predicted Coast Hybrid
Geomorphic plus Top 3-Tier
52
Comparison of Coast Models
53
Conclusions
  • LD50 is difficult to predict
  • Additional geomorphic predictors increases
    prediction ability
  • Hybrid models increase prediction ability
  • More success in Coast Range Ecoregion

54
Future Work
  • Logistic Regression
  • Ordinal data treated as continuous in this study
  • 12 categories might require more sophisticated
    methods
  • Spatial Analysis
  • Appears to be spatial correlation in distribution
    of LD50
Write a Comment
User Comments (0)
About PowerShow.com