Title: Random Effects Graphical Models and the Analysis of Compositional Data
1Random Effects Graphical Models and the Analysis
of Compositional Data
- Devin S. Johnson and Jennifer A. Hoeting
- STARMAP
- Department of Statistics
- Colorado State University
- Developed under the EPA STAR Research Assistance
Agreement - CR-829095
2Motivating Problem
- Various stream sites in the Mid-Atlantic region
of the United States were visited in Summer 1994. - For each site, each observed fish species was
cross-categorized according to several traits - Environmental variables are also measured at each
site (e.g. precipitation, chloride concentration,
) - Relative proportions are more informative
(species composition). - How can we examine complex relationships between
and within the covariates and response traits ?
3Graphical Models (Chain Graphs)
- Graphical models (e.g. log-linear models, lattice
spatial models) have been explored for
examination of conditional dependencies within
multivariate random variables - Model
- f (y1, y2 x1, x2) ? f (x1) ? f (x2)
4Probability Model for Individuals
- Response variables
- Set F of discrete categorical variables
- Notation y is a specific cell
- Explanatory variables
- Set G of explanatory variables (covariates)
- Notation x refers to a specific explanatory
observation - Random effects
- Allows flexibility when sampling many sites
- Unobserved covariates
- Notation ef, f ? F, refers to a random effect.
5Probability Model and Extended Chain Graph, Ge
- Joint distribution
- f (y, x, e) f (yx, e) ? f (x) ? f (e)
- Graph illustrating possible dependence
relationships for the full model, Ge.
6Graphical Models for Discrete Compositions
- Sampling many individuals at a site results in
cell counts, - C(y)i individuals in cell y at site i
1,,S. - Conditional count likelihood
- C(y)iy xi multinomial(Ni f(yxi, ei)y ),
- Joint covariate count likelihood
- multinomial(Ni f(yxi, ei)y ) ? MVN(m, ?-1)
- Parameter estimation
- Gibbs sampler with hierarchical centering.
- Easier to impliment
- Improved convergence
7Fish Species Richness in the Mid-Atlantic
Highlands
- 91 stream sites in the Mid Atlantic region of the
United States were visited in an EPA EMAP study - Response composition
- Observed fish species were cross-categorized
according to 2 discrete variables
- Habit
- Column species
- Benthic species
- Pollution tolerance
- Intolerant
- Intermediate
- Tolerant
8Fish Species Richness in the Mid-Atlantic
Highlands
- Environmental covariates
- values were measured at each site for the
following covariates - Mean watershed precipitation (m)
- Minimum watershed elevation (m)
- Turbidity (ln NTU)
- Chloride concentration (ln meq/L)
- Sulfate concentration (ln meq/L)
- Watershed area (ln km2)
9Fish Species Richness Model
- Composition Graphical Model
- and
- Prior distributions
10Fish Species Functional Groups
Posterior suggested chain graph for independence
model (lower DIC than dependent response model)
- Edge exclusion determined from 95 HPD intervals
for b parameters and off-diagonal elements of ?.
11Comments and Conclusions
- Using the proposed state-space model for discrete
compositional data, - Relationships evaluated as a Markov random field
- Multi-way compositions can be analyzed with
specified dependence structure between cells - MVN random effects imply that the cell
probabilities have a constrained LN distribution - DR models also extend the capabilities of
graphical models. - Data can be analyzed from many multiple sites.
- Over dispersion in cell counts can be added.
12The work reported here was developed under the
STAR Research Assistance Agreement CR-829095
awarded by the U.S. Environmental Protection
Agency (EPA) to Colorado State University. This
presentation has not been formally reviewed by
EPA. The views expressed here are solely those
of presenter and the STARMAP, the Program he
represents. EPA does not endorse any products or
commercial services mentioned in this
presentation.