Statistical Models for Stream Ecology Data: Random Effects Graphical Models PowerPoint PPT Presentation

presentation player overlay
1 / 29
About This Presentation
Transcript and Presenter's Notes

Title: Statistical Models for Stream Ecology Data: Random Effects Graphical Models


1
Statistical Models for Stream Ecology
DataRandom Effects Graphical Models
  • Devin S. Johnson
  • Jennifer A. Hoeting
  • STARMAP
  • Department of Statistics
  • Colorado State University

2
The work reported here was developed under the
STAR Research Assistance Agreement CR-829095
awarded by the U.S. Environmental Protection
Agency (EPA) to Colorado State University. This
presentation has not been formally reviewed by
EPA.  The views expressed here are solely those
of presenter and the STARMAP, the Program he
represents. EPA does not endorse any products or
commercial services mentioned in this
presentation.
3
Motivating Problem
  • Various stream sites in the Mid-Atlantic region
    of the United States were visited in Summer 1994.
  • For each site, each observed fish species was
    cross categorized according to several traits
  • Environmental variables are also measured at each
    site (e.g. precipitation, chloride
    concentration,)
  • Relative proportions are more informative.
  • How can we determine if collected environmental
    variables affect species richness compositions
    (which ones)?

4
Outline
  • Introduction
  • Compositional data
  • Probability models
  • Brief introduction to chain graphs
  • A graphical model for compositional data
  • Modeling individual probabilities
  • Markov properties of random effects graphical
    models
  • Analysis of fish species richness compositional
    data
  • Conclusions and Future Research

5
Discrete Compositions and Probability Models
  • Compositional data are multivariate observations
  • Z (Z1,,ZD) subject to the constraints that
    SiZi 1 and Zi ? 0.
  • Compositional data are usually modeled with the
    Logistic-Normal distribution (Aitchison 1986).
  • Scale and location parameters provide a large
    amount of flexibility compared to the Dirichlet
    model
  • LN model defined for positive compositions only
  • Problem With discrete counts one has a
    non-trivial probability of observing 0
    individuals in a particular category

6
Existing Compositional Data Models
  • Billhiemer and Guttorp (2001) proposed using a
    multinomial state-space model for a single
    composition,
  • where Yij is the number of individuals belonging
    to category j 1,,D at site i 1,,S.
  • Limitations
  • Models proportions of a single categorical
    variable.
  • Abstract interpretation of included covariate
    effects

7
Existing Graphical Models
  • Graph model theory (see Lauritzen 1996) has been
    used for many years to
  • model cell probabilities for high dimensional
    contingency tables
  • determine dependence relationships among
    categorical and continuous variables
  • Limitation
  • Graphical models are designed for a single sample
    (or site in the case of the Oregon stream data).
    Compositional data may arise at many sites

8
New Improvements for Compositional Data Models
  • The Billhiemer and Guttorp model can be
    generalized by the application of graphical model
    theory.
  • Generalized models can be applied to
    cross-classified compositions
  • Simple interpretation of covariate effects as a
    variable in a Markov random field
  • Conversely, graphical model theory can be
    expanded to include models for multiple site
    sampling schemes

9
Chain Graphs
b
a
c
d
e
  • Mathematical graphs are used to illustrate
    complex dependence relationships in a
    multivariate distribution.
  • A random vector is represented as a set of
    vertices, V .
  • Pairs of vertices are connected by directed edges
    if a causal relationship is assumed, undirected
    if the relationship is mutual

10
Probability Model for Individuals (Unobserved
Composition)
  • Response variables
  • Set F of discrete categorical variables
  • Notation y is a specific cell
  • Explanatory variables
  • Set G ? D of categorical (D) and/or continuous
    (G) variables
  • Notation x refers to a specific explanatory
    observation
  • Random effects
  • Allows flexibility when sampling many sites
  • Unobserved covariates
  • Notation ef, f ? F, refers to a random effect.

11
Probability Model and Extended Chain Graph, Ge
  • Joint distribution
  • f (y, x, e) f (yx, e) ? f (x) ? f (e)
  • Graph illustrating possible dependence
    relationships for the full model, Ge.

12
Random Effects Discrete Regression Model(REDR)
  • Sampling of individuals occurs at many different
    random sites, i 1,,S, where covariates are
    measured only once per site
  • Hierarchical model for individual probabilities

13
Random Effects Discrete Regression Model(REDR)
  • Response parameters constraints
  • The function aF(x,e) is a normalizing constant
    w.r.t. y(x,e), and therefore, is not a function
    of y.
  • The parameters bfcd(y, xD), wfg dm(y, xD), and
    ef (y) are interaction effects that depend on y
    and xD through the levels of the variables in f
    and d only.
  • Interaction parameters (and random effects) are
    set to zero for identifiability of the model if
    the cells y or xD are indexed by the first level
    of any variable in f or d.

14
Random Effects Discrete Regression Model(REDR)
  • Model for explanatory variables (CG
    distribution)
  • Again, interactions depend on xD through the
    levels of the variables in the set d only, and
    identifiability constraints are imposed.

15
Graphical Models for Discrete Compositions
  • Sampling many individuals at a site results in
    cell counts,
  • C(y)i individuals in cell y at site i.
  • Conditional count likelihood
  • C(y)iy xi, Ni multinomial(Ni fRE(yxi,
    ei)y ),
  • or
  • C(y)i xi indep. Poisson(k (xi) ? fRE(yxi,
    ei) )
  • Joint covariate count likelihood
  • multinomial(Ni fRE(yxi, ei)y ) ? CG(l, t, ?Ø)

16
Markov Properties of Chain Graph Models
  • Let P denote a probability measure on the product
    space
  • X ?a?V X a
  • Markov (Global) property
  • The probability measure P is Markovian with
    respect to a chain graph G if for any triple (A,
    B, S) of disjoint sets in V, such that S
    separates A from B in Gan(A?B?S)m, we have
  • A ? B S.
  • There are two weaker Markov properties, pairwise
    and local Markov properties.

17
Markov Properties of the REDR Model
  • Proposition 1. A REDR model is Ge Markovian if
    and only if the following six constraints are
    satisfied for a given extended graph Ge.
  • Response model
  • bfcd(y, xD) 0 unless f ? c ? d is complete for
  • c ? d ? Ø.
  • wfgdm(y , xD) 0 for m 1,,M, unless f ? g ?
    d is complete, where g ? G and d ? D.
  • ef (y) -bf ØØ (y) with probability 1 if f is
    not complete.

18
Markov Properties of the REDR Model
  • Proposition 1. A REDR model is Ge Markovian if
    and only if the following six constraints are
    satisfied for a given extended graph Ge.
  • Covariate model
  • ld(xd) 0 unless d is complete .
  • tdg(xd) 0 unless g ? c is complete, where g
    ? G and d ? D.
  • ?mg. 0 unless m, g is complete, where g, m ? G
    and ?mg is the (m, g) element of ?Ø.

19
Markov Properties of the REDR Model
  • Sketch of proof.
  • Lauritzen and Wermuth (1989) prove conditions
    concerning the l, t, and ?Ø parameters for the CG
    distribution.
  • If the b and w parameters are 0 for the specified
    sets then the density factorizes according to
    Frydenburgs (1990) theorem.
  • A modified version of the proof of the
    Hammersley-Clifford Theorem shows that if f (yx,
    e) separates into complete factors, then, the
    corresponding b and w vectors for non-complete
    sets must be 0.

20
Preservative REDR Models
  • Preservative REDR models are defined by the
    following conditions
  • All connected components aq, q 1,,Q, of F in
    Ge are complete, where Q is the total number of
    connected components.
  • Any d ? G?D that is a parent of f ? aq is also a
    parent of every other f ? aq, q 1,,Q.

21
Markov Properties of the REDR Model
  • Proposition 2. If P is a preservative REDR model,
    and P is Ge Markovian, then the marginal
    distribution, PF?G ?D, of the covariates and
    response variables is G (Ge)F?G ?D Markovian.

Sketch of Proof. The integrated REDR density
follows Frydenbergs (1990) factorization
criterion. The factor functions, however, do not
exist in closed form.
22
Parameter Estimation
  • A Gibbs sampling approach is used for parameter
    estimation
  • Hierarchical centering
  • Produces Gibbs samplers which converge to the
    posterior distributions faster
  • Most parameters have standard full conditionals
    if given conditional conjugate distributions.
  • Independent priors imply that covariate and
    response models can be analyzed with separate
    MCMC procedures.

23
Fish Species Richness in the Mid-Atlantic
Highlands
  • 91 stream sites in the Mid Atlantic region of the
    United States were visited in an EPA EMAP study
  • Response composition
  • Observed fish species were cross-categorized
    according to 2 discrete variables
  • Habit
  • Column species
  • Benthic species
  • Pollution tolerance
  • Intolerant
  • Intermediate
  • Tolerant

24
Stream Covariates
  • Environmental covariates
  • Values were measured at each site for the
    following covariates
  • Mean watershed precipitation (m)
  • Minimum watershed elevation (m)
  • Turbidity (ln NTU)
  • Chloride concentration (ln meq/L)
  • Sulfate concentration (ln meq/L)
  • Watershed area (ln km2)

25
Fish Species Richness Model
  • Composition Graphical Model
  • and
  • Prior distributions

26
Model Selection
  • Three different models are considered
  • Independent response
  • (i.e. bfg(yi) ef (yi) 0 for f H, T )
  • Depended response w/ independent errors
  • Dependent response w/ correlated errors
  • (equivalent to Billheimer Guttorp model)

27
Fish Species Functional Groups
Posterior suggested chain graph for independence
model (lowest DIC model)
  • Edge exclusion determined from 95 HPD intervals
    for b parameters and off-diagonal elements of ?Ø.

28
Comments and Conclusions
  • Using Discrete Response model with random
    effects, the Billheimer-Guttorp model can be
    generalized
  • Relationships evaluated though a graphical model
  • Multi-way compositions can be analyzed with
    specified dependence structure between cells
  • MVN random effects imply that the cell
    probabilities have a constrained LN distribution
  • DR models also extend the capabilities of
    graphical models
  • Data can be analyzed from many multiple sites
  • Over dispersion in cell counts can be added

29
Future Work
  • Model determination under a Bayesian framework
  • Models involve regression coefficients as well as
    many random effects
  • Initial investigation suggests selection based on
    exclusion/inclusion of parameters not edges
    produces models with higher posterior mass
  • Accounting for spatial correlation
Write a Comment
User Comments (0)
About PowerShow.com