Hierarchical Models with Ecological Data - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Hierarchical Models with Ecological Data

Description:

Water Quality data from the Ohio EPA. Response: IBI Index of biotic integrity. Predictors: ... which models could be candidates for the Hierarchical Models. ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 35
Provided by: kye
Category:

less

Transcript and Presenter's Notes

Title: Hierarchical Models with Ecological Data


1
Hierarchical Models with Ecological Data
  • Edward Boone
  • Keying Ye
  • Eric P. Smith
  • Department of Statistics, Virginia Tech

2
Outline
  • An ecological data set
  • Hierarchical modeling with missing data and
    spatial correlation
  • Analysis of the data
  • Hierarchical model or not, does it make a
    difference?

3
The Data
  • Water Quality data from the Ohio EPA.
  • Response IBI Index of biotic integrity.
  • Predictors
  • QHEI Quality of Habitat Environment Index
  • DO Dissolved Oxygen.
  • Others such as NH3, Talk Alkalinity, PH, AL,
    Hardness, PB, MN, TSS and more,

4
The Problem
  • Understanding the relationship between biology
    and environmental conditions.
  • Which variables are important?
  • Is the relationship similar across the state?
  • Some of the attributes common to ecological data
  • (1) Measures at different scales. (i.e. site,
    river basin, state)
  • (2) Spatial correlation.
  • (3) Missing data.

5
(No Transcript)
6
Which Predictors are Important?
  • We performed forward, backward, stepwise and
    highest posterior probability selections in the
    regression, over the entire data set to determine
    which models could be candidates for the
    Hierarchical Models.
  • After data reduction only 190 of 2087
    observations were used. So results may not be
    reliable.

7
(No Transcript)
8
The Hierarchical Model
  • Site Level
  • Yi N (Xibi ,S(q) )
  • Basin Level
  • bi N( Tg , G(f) )
  • Hyperparameters
  • g N( a, A )

Spatial Continuous
Spatial Lattice
9
  • where we have
  • S(q) is a continuous (geostatistical) spatial
    covariance model.
  • G(f) is a lattice spatial model.
  • This allows for modeling of spatial correlation
    among the site level variables and the basin
    level variables.

10
Missing Data
  • Missing data is a common feature to ecological
    data sets. Data augmentation is used to deal
    with the problem.
  • Our goal is to incorporate all uncertainties into
    the model. So our imputed values vary during
    simulation. This feature is not present in many
    analyses.

11
Data Augmentation Algorithm
  • Partition the response vector and data matrix
    into the following form
  • where Z is the vector of missing values.

12
For the Normal case
  • Prior for Z
  • ZN(mZ,SZ)
  • The full conditional for Z
  • ZothersN(m,F)
  • where
  • and

13
Estimation
  • To estimate our model we will use the Gibbs
    Sampler.
  • Suppose we have a parameter vector q
    (q1,q2,...,qk) we wish to determine the posterior
    distribution of. We can sample the posterior
    distribution by the following method
  • q1 q2,q3,...,qk
  • q2 q1,q3,...,qk
  • qi q1,...,qi-1,qi1,...,qk
  • qk q1,q2,...,qk-1

14
Analysis of the data (no spatial)
  • Base Model
  • QHEI and DO are relatively mound shaped, thus
    normal priors with m 50 and s 100 for QHEI and
    m 5 and s 10 for DO are used (flat priors.)

15
Model estimation using Gibbs sampling
  • Full Conditionals

16
Model convergence checking
  • Posterior distribution is simulated. The samples
    had autocorrelation at the first lag so we used a
    thinning interval of 2.

17
(No Transcript)
18
  • Hierarchical Model Results of IBI vs. QHEI and DO
    with no spatial correlation using Data
    Augmentation. (Significance was determined via a
    95 probability interval.)

Significance of QHEI
Significance of DO
19
Data Analysis (with spatial correlation)
  • Conditional Autoregressive Model
  • This is a model for Lattice data. It is similar
    to the Time Series model MA. The main assumption
    is pair wise dependence in the data.
  • where aii0 and ajiaij.

20
  • This can be translated into a covariance matrix.
  • S(I-A)-1D
  • where D is a diagonal matrix and A is a matrix
    of the aij.
  • The model we use here is the one parameter model,
  • S(I-rA)-1D
  • with constraints on r to ensure that S is
    positive definite, and aij1 if basin i is a
    neighbor of basin j, zero otherwise. Our model
    will have three r parameters, one for each
    predictor variable at level one. This is a CAR
    model.

21
Estimation using Gibbs sampling with CAR modeling
  • Full Conditionals

22
Model convergence checking
  • Posterior distribution is simulated. The samples
    had autocorrelation at the first lag so we used a
    thinning interval of 2.

23
  • Hierarchical Model Results of IBI vs. QHEI and DO
    with no spatial correlation using Data
    Augmentation. (Significance was determined via a
    95 probability interval.)

Significance of QHEI
Significance of DO
24
(1) No spatial (2) with CAR model
QHEI variable
25
(1) No spatial (2) with CAR model
DO variable
26
Possible Interpretations
  • Quality of Habitat Environment Index
  • (1) The non-CAR model showed overall
    significance of QHEI, while the CAR model did
    not
  • (2) Since QHEI was significant in three basins
    under either model, we should not deem QHEI as
    unimportant.
  • (3) Any policy decisions should be made on a
    river basin level than a statewide level.

27
Possible Interpretations (cont.)
  • Dissolved Oxygen
  • (1) DO is significant in both models so DO
    should be treated as an important predictor
  • (2) There is significant variation in the mean
    parameter for DO. So the effect of DO differs
    across basins
  • (3) Any policy decisions should be made on a
    river basin level than a statewide level.

28
Can this be done without a hierarchical structure?
  • Separable covariograms
  • Suppose a covariogram function C(?) of h can be
    written as C(h)C1(h1)C2(h2) where h(h1, h2).
    Then C is called a separable covariogram.
  • The advantage of this is that two spatial
    analyses maybe done separately.

29
  • Consider the problem we are interested in as
    follows.
  • A Hierarchical Stochastic Process
  • Level I Y(i,s)s? Dib(i)
  • where EY(i,s)b(i)Xi(s)b(i) and
  • VarY(i,s)b(i)Si
  • Level II b(i)i? D2g
  • where Eb(i)gTg and Varb(i)gW.
  • For hyperparameters gp(g).
  • Note we have two types of spatial problems here
    one with continuous (s) and the other is lattice
    (i).

30
  • Matrix notation

31
  • Rewriting the model
  • and
  • Variance-covariance matrix for Y(s)

32
  • On the individual observation base, we have the
    following covariance structures
  • where si(jl) is the j,l element of the matrix
    Si, and
  • for i? k.
  • Both spatial components are additively separable.

33
  • Comments
  • (1) Y(i,s) is not second-order stationary for
    any i, unless W0 (which we do not have the
    lattice spatial component) so many standard
    spatial methods do not work in this case
  • (2) Y(s) is not second-order stationary
  • (3) The spatial components of s and i (the
    lattice) are NOT separable so we cannot model
    the covariance structure of the both separately.

34
  • What can we do?
  • Hierarchical modeling with spatial correlation
    of the Y(i,s) and lattice spatial component of i
    (such as CAR or SAR models) is definitely one of
    the natural answers.
Write a Comment
User Comments (0)
About PowerShow.com