Title: Hierarchical Models with Ecological Data
1Hierarchical Models with Ecological Data
- Edward Boone
- Keying Ye
- Eric P. Smith
- Department of Statistics, Virginia Tech
2Outline
- An ecological data set
- Hierarchical modeling with missing data and
spatial correlation - Analysis of the data
- Hierarchical model or not, does it make a
difference?
3The Data
- Water Quality data from the Ohio EPA.
- Response IBI Index of biotic integrity.
- Predictors
- QHEI Quality of Habitat Environment Index
- DO Dissolved Oxygen.
- Others such as NH3, Talk Alkalinity, PH, AL,
Hardness, PB, MN, TSS and more,
4The Problem
- Understanding the relationship between biology
and environmental conditions. - Which variables are important?
- Is the relationship similar across the state?
- Some of the attributes common to ecological data
- (1) Measures at different scales. (i.e. site,
river basin, state) - (2) Spatial correlation.
- (3) Missing data.
5(No Transcript)
6Which Predictors are Important?
- We performed forward, backward, stepwise and
highest posterior probability selections in the
regression, over the entire data set to determine
which models could be candidates for the
Hierarchical Models. - After data reduction only 190 of 2087
observations were used. So results may not be
reliable.
7(No Transcript)
8The Hierarchical Model
- Site Level
- Yi N (Xibi ,S(q) )
- Basin Level
- bi N( Tg , G(f) )
- Hyperparameters
- g N( a, A )
Spatial Continuous
Spatial Lattice
9- where we have
- S(q) is a continuous (geostatistical) spatial
covariance model. - G(f) is a lattice spatial model.
- This allows for modeling of spatial correlation
among the site level variables and the basin
level variables.
10Missing Data
- Missing data is a common feature to ecological
data sets. Data augmentation is used to deal
with the problem. - Our goal is to incorporate all uncertainties into
the model. So our imputed values vary during
simulation. This feature is not present in many
analyses.
11Data Augmentation Algorithm
- Partition the response vector and data matrix
into the following form -
- where Z is the vector of missing values.
12For the Normal case
- Prior for Z
- ZN(mZ,SZ)
- The full conditional for Z
- ZothersN(m,F)
- where
- and
13Estimation
- To estimate our model we will use the Gibbs
Sampler. - Suppose we have a parameter vector q
(q1,q2,...,qk) we wish to determine the posterior
distribution of. We can sample the posterior
distribution by the following method - q1 q2,q3,...,qk
- q2 q1,q3,...,qk
- qi q1,...,qi-1,qi1,...,qk
- qk q1,q2,...,qk-1
14Analysis of the data (no spatial)
- Base Model
- QHEI and DO are relatively mound shaped, thus
normal priors with m 50 and s 100 for QHEI and
m 5 and s 10 for DO are used (flat priors.)
15Model estimation using Gibbs sampling
16Model convergence checking
- Posterior distribution is simulated. The samples
had autocorrelation at the first lag so we used a
thinning interval of 2.
17(No Transcript)
18- Hierarchical Model Results of IBI vs. QHEI and DO
with no spatial correlation using Data
Augmentation. (Significance was determined via a
95 probability interval.)
Significance of QHEI
Significance of DO
19Data Analysis (with spatial correlation)
- Conditional Autoregressive Model
- This is a model for Lattice data. It is similar
to the Time Series model MA. The main assumption
is pair wise dependence in the data. - where aii0 and ajiaij.
20- This can be translated into a covariance matrix.
- S(I-A)-1D
- where D is a diagonal matrix and A is a matrix
of the aij. - The model we use here is the one parameter model,
- S(I-rA)-1D
- with constraints on r to ensure that S is
positive definite, and aij1 if basin i is a
neighbor of basin j, zero otherwise. Our model
will have three r parameters, one for each
predictor variable at level one. This is a CAR
model.
21Estimation using Gibbs sampling with CAR modeling
22Model convergence checking
- Posterior distribution is simulated. The samples
had autocorrelation at the first lag so we used a
thinning interval of 2.
23- Hierarchical Model Results of IBI vs. QHEI and DO
with no spatial correlation using Data
Augmentation. (Significance was determined via a
95 probability interval.)
Significance of QHEI
Significance of DO
24(1) No spatial (2) with CAR model
QHEI variable
25(1) No spatial (2) with CAR model
DO variable
26Possible Interpretations
- Quality of Habitat Environment Index
- (1) The non-CAR model showed overall
significance of QHEI, while the CAR model did
not - (2) Since QHEI was significant in three basins
under either model, we should not deem QHEI as
unimportant. - (3) Any policy decisions should be made on a
river basin level than a statewide level.
27Possible Interpretations (cont.)
- Dissolved Oxygen
- (1) DO is significant in both models so DO
should be treated as an important predictor - (2) There is significant variation in the mean
parameter for DO. So the effect of DO differs
across basins - (3) Any policy decisions should be made on a
river basin level than a statewide level.
28Can this be done without a hierarchical structure?
- Separable covariograms
- Suppose a covariogram function C(?) of h can be
written as C(h)C1(h1)C2(h2) where h(h1, h2).
Then C is called a separable covariogram. - The advantage of this is that two spatial
analyses maybe done separately.
29- Consider the problem we are interested in as
follows. - A Hierarchical Stochastic Process
- Level I Y(i,s)s? Dib(i)
- where EY(i,s)b(i)Xi(s)b(i) and
- VarY(i,s)b(i)Si
- Level II b(i)i? D2g
- where Eb(i)gTg and Varb(i)gW.
- For hyperparameters gp(g).
- Note we have two types of spatial problems here
one with continuous (s) and the other is lattice
(i).
30 31- Rewriting the model
- and
- Variance-covariance matrix for Y(s)
32- On the individual observation base, we have the
following covariance structures - where si(jl) is the j,l element of the matrix
Si, and - for i? k.
- Both spatial components are additively separable.
33- Comments
- (1) Y(i,s) is not second-order stationary for
any i, unless W0 (which we do not have the
lattice spatial component) so many standard
spatial methods do not work in this case - (2) Y(s) is not second-order stationary
- (3) The spatial components of s and i (the
lattice) are NOT separable so we cannot model
the covariance structure of the both separately.
34- What can we do?
- Hierarchical modeling with spatial correlation
of the Y(i,s) and lattice spatial component of i
(such as CAR or SAR models) is definitely one of
the natural answers.