Title: Characterizing doseresponse model uncertainty using model averaging
1Characterizing dose-response model uncertainty
using model averaging
- Matthew W. Wheeler
- NIOSH
- MWheeler_at_cdc.gov
The findings and conclusions in this report are
those of the authors and do not necessarily
represent the views of the National Institute for
Occupational Safety and Health
2Acknowledgements
- A. John Bailer
- Miami University/NIOSH
3Outline
- Introduction/Motivation.
- Model Averaging 101.
- Validation Simulation Study.
- MA Software for dichotomous response.
- Low Dose Extrapolations.
- Conclusions/Future research.
4Introduction/Motivation
- Model choice is frequently often a point of
contention among risk assessors/risk managers. - Frequently multiple models describe the data
equivalently. - Two models estimates of risk, especially at the
lower bound, can differ dramatically. - Model uncertainty is inherent in most risk
estimation, though practically ignored in most
situations.
5Introduction/Motivation (cont)
- Consider the problem of estimating a benchmark
dose (BMD) from dichotomous dose response data. - Here we seek to estimate the BMD from a
plausible model, given experimental data. - In these experiments
- Animals are exposed to some potential hazard.
- The adverse response is assumed to be distributed
binomially. - Risk (i.e, probability of adverse response) is
estimated using regression modeling. - Multiple dose-response models can be used to
estimate risk.
6Common Dose-Response Models Used
- logistic model
(1) - log-logistic model
(2) - gamma
(3) - multistage
(4) - probit
(5)
7Common Dose-Response Models Used
- log-probit
(6) - quantal-linear
(7) - quantal-quadratic
(8) - Weibull (9)
- where G(a) gamma function evaluated at a, for
?(x) CDF N(0,1) and pi ? when di0 for models
(2) and (7).
8Benchmark dose estimation
- BMD is the dose associated with the a specified
increase in response relative to the control
response (BMR) e.g., - dose d such that BMRpd- p0/1- p0 or
- BMR pd- p0
- The BMR is commonly set at values of 1, 5, 10.
- BMDL 100(1-a) lower confidence limit on the
BMD. - NOTE As pd is dependent on a model thus the BMD
is model dependent!!
9Typical Risk Estimation Process
- Given data (in absence of mechanistic
information), a typical analyst will - Estimate the regression coefficients for models
(1)-(9). - Estimate the BMD/BMDL given the model.
- Pick the best model.
10Example of this
- Consider TiO2 lung tumor data which has been
combined from the studies of Heinrich et al.
(1995) Muhle et al. (1991) and Lee et al. (1985). - Here the benchmark dose (BMD) as well as its
lower bound (BMDL) are estimated at BMRs of 10
and 1.
11 All fits were obtained using the US EPAs
BMDS Calculated using the number of parameters
vs. the number of non-bounded parameters.
12(No Transcript)
13Model Choice
- If we pick best AIC the quantal-quadratic model
BMD estimates would be chosen. - If we pick the best Pearson ?2 test statistic
the 3-degree multistage model would be chosen. - All of the models are reasonable based on these
statistics.
14Model Choice (cont)
- Estimates are reasonably similar at the 10
estimate. - Estimates vary by a factor of 5 for the 1
estimate. - If a BMR of 0.1 is used (results not shown) the
Ti02 BMD/BMDL estimates differ by a factor of 35.
- This heterogeneity in model estimates exists even
though the model fit statistics are very similar.
- Model uncertainty results when any one of the
above models is chosen.
15Model Averaging
- A better way would be to find an adequate way to
combine all estimates, and thus describe/account
for model uncertainty. - Model Averaging (MA) is one such method that may
satisfactorily account for model uncertainty. - Instead of focusing on a single model it allows
researchers to focus on plausible behavior.
16Model Averaging (cont.)
- We can think of any model contributing
information (including possible bias) to an
analysis. - Picking any one model ignores other plausible
information, and possibly introduces bias into
the analysis. - Model averaging is a method that attempts to
synthesize all of the information available.
17Model Averaging (cont)
- Kang et al. (Regulatory Toxicology and
Pharmacology, 2000) and Bailer et al. (Risk
Analysis,2005) proposed model averaging for risk
assessment. - They used an Average-BMD methodology. (i.e.,
the calculated statistics were averaged, not the
corresponding dose response curve) - Averaged-BMD MA is not described here, but its
performance is often poor (Wheeler Bailer,
Environmental and Ecological Statistics, (2009))
18Average-Model MA
- Instead of averaging statistics we could average
models. - Given the fits of models (1)-(9) a MA procedure
- Calculates the dose-response based upon a
weighted average of dose-responses Raftery et al.
(1997), Buckland et al. (1997), with the MA
dose-response curve estimated as - Weights are formed as
- Where IiAIC, IiKIC , or IiBIC. Other weights
are possible.
19(No Transcript)
20Average-Model MA Benchmark Dose
- Given this Average-model, the benchmark dose is
then computed by finding the dose that satisfies
the equation - BMR pMA(d)i- p MA(0)/1- p MA(0).
- The BMDL is computed through a parametric
bootstrap. Here the 5th percentile of the
bootstrap distribution is used to compute the 95
lower tailed confidence limit estimate on the
BMD.
21Average-Model vs. Average-Dose
- This is substantially different from the
average-dose method. - Average-Dose
- Model Fits ? Individual BMD estimation ? BMD MA
estimate - Average-Model
- Model Fits ? MA Model Estimate ? MA-BMD
estimation
22TiO2 Analysis Revisited
23Validation Study
- MA seems like a good idea, however we need to
know if it works well in practice. - A simulation study was conducted to investigate
the behavior of MA.
24Validation Study
- 54 true model conditions, using models (1) (9)
were used in the simulation. - Full study described in Wheeler and Bailer (Risk
Analysis, 2007)
25Validation Study (Cont)
- The simulation proceeded by generating
hypothetical toxicology experiments with response
probability p(d). - With p(d) specified by one a parameterization of
one of the models (1)-(9). - These experiments consisted of 4 dose group
design with doses of 0, 0.25, 0.50, and 1.0. As
well as a 6 dose group design (not reported) - n50 for all dose groups.
- 2000 experiments were generated per true
dose-response curve. - Bias as well as coverage i.e., Pr(BMDL
BMDtrue) was estimated. - Coverage is reported here.
26Validation Study (Cont)
- In each experiment the average-model BMD as
well as the BMDL was estimated. - BMRs of 1 and 10 were used to estimate the BMD.
- Two model spaces for averaging were considered.
- One space consisted of three flexible models the
multistage, Weibull and the log-probit model. - The second space had seven models that added the
probit, logistic, quantal-linear, and
quantal-quadratic to the three model space. - The simulation took approximately 1 CPU year of
computation.
27Coverage BMR 10
28Coverage BMR 1
29Coverage (Summary)
- Nominal coverage is reached for most simulation
conditions. - MA fails to reach nominal coverage in the
quantal-linear and similar cases.
30Quantal-Linear Problems
- It is important to understand why the BMD is
mischaracterized in the quantal linear case. - We study this through investigating the sampling
distribution. - Here we can see the skewed sampling distribution,
at low doses, might be the culprit
31Sampling distribution for the quantal-linear
model
32Average fit for 3-model MA models
33Quantal Linear Bias
- The flexibility of the models combined with the
sampling distribution introduces bias into the
estimation of the dose-response curve. - The bias carries through in BMD estimation.
- This also may be the cause of the conservative
behavior (i.e. coverage gt 99) seen in the
quantal-quadratic case.
34Final notes on the Quantal Linear model
- Improved coverage can be obtained using BCa
bootstrap intervals. - Other results suggest that MA is superior to
picking the best model. - The results show MA is not a panacea, it is
however a step in the right direction.
35(No Transcript)
36Model Averaging Software
- Implementation of Average Model MA is
difficult. - The simulation code has been repackaged to allow
users to implement dichotomous dose-response
model averaging. - This is done in a simple MS Windows command
prompt program.
37- The software should be available shortly online.
At the Journal of Statistical Softwares web
site. (http//www.jstatsoft.org/) - Implementation is described in Wheeler and Bailer
(Journal of Statistical Software, In Press/(2008))
38MA Low Dose Extrapolations
- Because model averaging accounts for model
uncertainty, many are curious about MA and the
use of low dose extrapolations. - Specifically people want to know if it is still
best practice to calculate the BMDL at the point
of departure using MA (usually specified at a BMR
of 10), and then do a linear extrapolation?
39Low Dose Extrapolations
- The answer is yes and no.
- This is because at low doses the MA procedure is
performing its own linear extrapolation. - Thus if you dont, it will.
- As an example consider the Ti02 data above.
- We look at the extra risk curve using
- A linear low dose extrapolation, using the
linearized multistage mode, with an excess risk
of 10 being the point of departure. - MA lower bound estimate.
- MA estimated extra risk curve.
40(No Transcript)
41Low Dose Extrapolations
- The MA lower bound estimate is approximately
linear at doses below 0.1. - The estimated risk, given a dose, is not
substantially different (i.e., an order of
magnitude) from the linearized multistage model
estimate of risk. - Other examples (with smaller sample sizes) show
even less departure from the linearized
multistage low dose linear extrapolation.
42Linear Extrapolations
- Consider a fixed response of 20, 20, 20, 50,
90, at doses of 0, 0.5, 1, 2 and 4. - Further consider experiments having n 10, 20,
50 and 100, given the above response. - Thus we have the same response, all we do is
increase the sample size. We ask the question how
does MA respond?
43Estimated MA Dose Response
Estimated MA 95 Lower Bound
44Low Dose Extrapolations
- At low doses model averaging is essentially
performing a linear extrapolation. - The only difference is that it is essentially
picking the point of departure, which is often
very close to the standard 10. - It is going to be different from the standard
approach. - The difference is not an order of magnitude,
which is often suggested by non-linear dose
response curves.
45Conclusions
- Simulation Results Software Linearization
study implies dichotomous based model averaging
can be used reliably in ones own research. - Though we have tested the software, and fixed
many bugs, it is still a use at your own risk
program.
46- As mentioned before model averaging is not a
panacea. - As such it does not
- Relieve scientists from using their expert
judgment. - Remove the need for adequate individual model fit
diagnostics. - Remove all model uncertainty from the analysis.
47- It does
- Reframe the debate of model choice.
- Produces relatively stable central estimates
often independent of a given model being included
in the average.
48Selected References
- Raftery, A. E. (1995). Bayesian model selection
in social research. Sociological Methodology,
25, 111-163. - Hoeting, J.A., Madigan, D., Raftery, A.E.,
Volinsky, C.T. (1999) Bayesian model averaging
a tutorial. Statistical Science, 14, 382-417. - Buckland, S.T., Burnham, K. P., Augustin, N.
H., (1997). Model Selection An Integral Part of
Inference. Biometrics, 53, 603-618. - Kang, S.H., Kodell, R.L., Chen, J.J. (2000)
Incorporating Model Uncertainties along with Data
Uncertainties in Microbial Risk Assessment.
Regulatory Toxicology and Pharmacology, 32,
68-72. - Bailer, A.J., Noble R.B. and Wheeler, M. (2005)
Model uncertainty and risk - estimation for quantal responses. Risk Analysis,
25,291-299. - Wheeler, M. W., Bailer, A.J., (2007).
Properties of model-averaged BMDLs A study of
model averaging in dichotomous risk estimation.
Risk Analysis 27, 659670 - Wheeler, M. W., Bailer, A.J. (2008). Model
Averaging Software for Dichotomous Dose Response
Risk estimation. Journal of Statistical Software
(Accepted)