Title: Uncertainty in Ecological Risk Assessment using Bayesian Model Averaging
1Uncertainty in Ecological Risk Assessment using
Bayesian Model Averaging
Eric P. Smith, Keying Ye, Ilya A. Lipkovich
Department of Statistics Virginia Polytechnic
Institute and State University May 14, 2002
2Outline
- Stressor-response modeling with species taxa data
- Canonical Correspondence Analysis
- Problems with basic regression approaches
- Bayesian Model Averaging
- General approach for using and combining
important models - Ex Great Lakes data
3The problem
- Stressor response modeling
- Response species abundance
- Stressors chemical, habitat alteration, etc
- Approach regression, CCA
- Summarize model using a graphical display that
describes the relationship between taxa
abundances and stressor variables - Generally bad to use all the variables
4Current approach - graphical
- Example reference site analysis
- Great Lakes sediment samples (91-93)
- 233 sites
- 39 benthic invertebrate familes
- 38 chemicals, physical variables
- Common taxa Chironomidae (midge larvae),
Tubificidae (worms) and Sphaeridae (fingernail
clams), Naididae (worms), Pontoporeiidae
(shrimps) and Spongillidae (sponges). - Abundant taxa Tubificidae and Pontoporeiidae
(gt20) Chironomidae and recent invaders, the
Dreissenidae (zebra and quagga mussels).
5Location of sites
6Canonical Correspondence Analysis (CCA)
CCA (Ter Braak, 1986) allows one to relate the
latent gradients associated with distribution of
species over sites to certain environmental
variables measured on each site.
Y- Responses (species)
Explanatory variables (X)
7Canonical Correspondence Analysis
- Correspondence analysis (just analyze Y)
- analysis of table of counts (species abundance by
site) - Seek to reduce the data into gradient(s)
- Plot the gradients, scores on gradients
- Focus is on chi-square analysis (since we are
dealing with counts) - Canonical correspondence (analyze Y and X)
- Focus is on relationships between counts and
descriptor variables - Correspondence analysis of predicted counts
(predicted from multivariate regression) - Variable selection through testing
8CCA
- What do we want to know?
- Is there a stressor-response relationship?
- What stressors are involved?
- What species are affected?
- Which locations are affected?
- What are the sizes of the uncertainties in the
modeling process?
9Correspondence Analysis Biplot
Axes interpreted as gradients Sites close
together are similar Species near center are at
expectation
10Adding Explanatory Variables to the Plot
Ray lengthimportance Projection onto ray size
on that variable
11Whats wrong
- Method is graphical
- No uncertainty
- No test of variable importance
- No measure of species importance
- No measure of site score variability
- No uncertainty due to model selection
12Current approach inferential
- In CCA, use permutation testing to
- Test of overall relationship is significant
- Select a subset of variables
- Analogous to stepwise (forward) selection methods
- Species, site importance from scores and location
13Traditional Approaches to Model Selection(subset
selection)
- Model selection criteria (typically in a form of
penalized likelihood) R2 adj, AIC, BIC, Cp,
Cross-validation and bootstrap) - Incremental model building based on maximum
contribution to the likelihood (stepwise,
backward deletion, forward addition, leaps and
bounds, etc) - A combination of the above. Example use forward
selection to choose the best model for each
number of variables k, then use Mallows Cp to
pick the best one
14Criticism of approach
- Assumes a fixed model (true model)
- All inference (predicted values, etc) are based
on the assumption that the estimated model is
correct - Alternative approach use several models
- Can take a frequentist or Bayesian approach
15Frequentist and Bayesian Views at Model Selection
Uncertainty
Frequentist View
Bayesian View
model parameter is a random variable for which
posterior probabilities can be computed
First some parameters are estimated as zeroes (a
model is selected)
When parameters left in the model are estimated
we also have to account for uncertainty
associated with the first stage (via bootstrap)
The uncertainty associated with the model
parameter is accounted for by averaging it out
16Bayesian Model Averaging details for the math
hungry
- posterior model probabilities
- predictive posterior distribution for a quantity
of interest
D is the data, Mk is a model in some model space
M, K is number of models, ? is a quantity of
interest (such as a regression coefficient)
17Elements of BMA
- Assigning prior distributions for models and
model parameters - Searching through the model space for
data-supported models - Computing posterior model probabilities
- Obtaining estimates and confidence intervals for
the parameters and observables of interest - Obtaining variance components associated with
model selection uncertainty - Prediction of interesting quantities (new sites)
18Searching the Model Space for Data-Supported
Models
Need to fit all the models and evaluate the
fit Why because we want to weight by
probability of the model given the
data Need to calculate probability of model
importance
19Problems to solve in BMA approach
- Searching the Model Space for Data-Supported
Models - Stochastic Model Search via Markov Chain Monte
Carlo - Implementing BMA Markov Chain Monte Carlo Model
Composition, MC3 (Madigan, Raftery (1994)) - Computing Acceptance Probabilities via Bayes
Factors and BIC Approximation (how do I decide to
move or not to move) - How do I approximate the value of the model?
BIC - How do I combine information to estimate
uncertainty? - Need to align eigenvectors from different models
20Skyline Metaphor of MC3 - walking through the
city to find the tallest building checking height
of each building
M2
M9
M1
M7
M3
M8
M5
M6
M4
21Uncertainty estimation in BMA Semi-Bayesian
Approach
One can use classical procedures to obtain any
quantity of interest ? and then use model weights
to obtain the BMA estimates of their
unconditional expectation and variance
Model selection uncertainty component
Estimated via a classical procedure or bootstrap
22Bayesian Variable Assessment
The posterior effect probabilities can be
computed from the standard BMA output
i.e. this is the proportion of models containing
this variable
One can plot the ordered values of these
probabilities (scree plot) to figure out which
variables should be selected in the final model,
if we still want to build a single model
23Is prediction better for a single model or
combination of models?
Prediction error is based on the differences
between actual and fitted values in the
prediction data, DP, when applying parameters
estimated from the training data , DT
Relative efficiency of BMA with respect to the
best and full models is determined as follows
24Summary of BMA Predictive Performance in a
Simulation Study
25Interpreting BMA Output A simulated Example
- YXABE, where ABM is a rank 2 matrix of
coefficients for the five variables x1-x5 - E contains iid normal errors with unit variance
- The data were augmented with 5 more rows of
zeroes representing additional irrelevant
variables, x6-x10
26Interpreting BMA Output A simulated Example
(cont.)
27Interpreting BMA Output. A simulated Example
(cont.)
28Representing The Variance Components
- the outer ellipses shown with dashed lines
represent the total variability - the inner ellipses represent only sampling
variability - the variable scores and sampling variances on
the right display are rescaled by multiplying
them by the activation probabilities.
29Great Lakes Data Biplot display of species and
variables
30Great Lakes Data
31Uncertainty Intervals for Species Scores on the
First Axes
Larger uncertainties tend to correspond to rare
species
rarer
0 taxa at expectation (even dist)
32Sites, species and uncertainties
33Biplot display showing model and sampling
uncertainty
Sampling uncertainty
Model uncertainty
34Biplot of Species and Environmental Variables
after Scaling Variable Scores by Activation
Probabilities
Activation Probabilities
- Illustrate important species
- Show Depth-Temperature gradient
- Show importance of agricultural input
35Another application of the methodology BMA based
Outlier and Cluster Analysis
Model space can be extended to incorporate both
variables and observations (Hoeting, Raftery, and
Madigan, 1996) .
variables
observations
(1,1,0,1,0,1,0 1, 1, 0, 0, 1, 1, 0, 1, 0, 1,
0, .., 1,1,1,1,1)
- We can use this combined model space to
add/delete both variables and observations. - The goal is to obtain clusters of observations
with individual sets of predictors - Outliers can be isolated in a separate cluster
36BMA based Outlier and Cluster Analysis
- Example of use
- Data collected in a watershed
- Effect of stressor is over a spatial subset of
the watershed - Fitting regression model to entire set of data
may reduce or mask the relationship - Find the subset of the data that supports the
stressor response relationship - Great Lakes
- Look for sets of sampling sites that are
geographically close and have biology that is
related to certain environmental variables
37Summary
- BMA is a method for not discarding variables but
still developing good inference - Components model uncertainty, importance of
species and variables - Further developments
- Develop BMA Cluster Analysis for CCA
- Investigate the properties
- Implementing Missing Value Imputation in CCA,
possible adding missing value modeling along with
the cluster process
38References
Bensmail, H., Celeux, G., Raftery, A.E., Robert,
C. (1997). Inference in Model-Based Cluster
Analysis, Statistics and Computing, 7, 1-10.
Hoeting, J.A., Raftery, A.E., Madigan, D.
(1996). A method for simultaneous variable
selection and outlier identification in linear
regression. Computational Statistics and Data
Analysis, 22, 251-270. Gabriel, K.R. (1971). The
biplot-graphic display of matrices with
application to principal component analysis.
Biometrika, 58, 453-67. George, E.I., McCulloch,
R.R. (1993). Variable selection via Gibbs
sampling. JASA, 88, 881-889. Eckart, C and
Young, G. (1936). The approximation of one matrix
by another of lower rank. Psychometrika, 1,
211-218. Madigan, D., Raftery, A.E. (1994).
Model Selection and Accounting for Model
Uncertainty in Graphical Models Using Occams
Window, JASA, 89428, 1535-1546 Madigan, D.,
York, J. (1995). Bayesian Graphical Models for
Discrete Data. Int. Statist. Review. 63, 215-32.
Raftery, A.E., Madigan, D., Hoeting, J.A.
(1997). Bayesian Model Averaging for Linear
Regression Models, JASA, 92437, 179-191. Ter
Braak, C.J.F.(1994). Biplots in Reduced-Rank
Regression. Biom. J. 368, 983-1003
39Problem Estimating Variance Components of
Eigenvectors
- We estimate the sampling uncertainty of the
species and site coordinates via bootstrap, and
model selection uncertainty via MC3 - In either procedures, we may obtain eigenvectors
whose signs are flipped or whose positions are
changed - A general solution is Procrustes transformation
- Procrustes transformation is an orthogonal
transformation that makes the new coordinates as
close as possible to some original coordinates - For bootstrap these original coordinates would be
the eigenvectors obtained with the original data
- For BMA, the original coordinates are those
obtained with the full model
40Searching the Model Space for Data-Supported
Models
Problem Large number of models in regression
setting In subset selection problem, when number
of explanatory variables exceeds 25-30,
evaluating this expression directly is
unfeasible. The number of models K gt 230
Solution
Stochastic Model Search via Markov Chain Monte
Carlo
41Implementing BMA Markov Chain Monte Carlo Model
Composition, MC3 (Madigan, Raftery (1994))
Idea Define a rule for wandering through the
possible models. Walk in a way to find a best
model by starting at a random model and
comparing it with neighbor models. Stop at
the best model but record other model fits.
Occassionaly move to a worse model to avoid
getting stuck. Complex idea A Markov chain is
constructed, defining a neighborhood nbd(M) for
each model. In linear regression the
neighborhood can be the set of models with either
one variable more or one variable less than M,
plus M itself. Models are accepted with
probability depending on the fit of the model
42Computing Acceptance Probabilities via Bayes
Factors and BIC Approximation
BF
Acceptance probability
gt1 if M is a good model
Posterior odds ratio (BF) x (Prior odds ratio)
43Computing Bayes Factors via BIC
This must be computed but is not easy
approximate it
The Bayesian Information Criterion (BIC, Schwarz
1978 Kass and Wasserman, 1995) approximation is
the simplest
?2 is the likelihood ratio test statistic, M0 is
nested within M1 One advantage of BIC
approximation is that it eliminates the need to
specify prior distributions for model parameters