Title: Crystal Linkletter and Derek Bingham
1Variable Selection for Gaussian Process Models in
Computer Experiments
Crystal Linkletter and Derek Bingham Department
of Statistics and Actuarial Science Simon Fraser
University
David Higdon and Nick Hengartner Statistical
Sciences Discrete Event Simulations Los
Alamos National Laboratory
Kenny Q. Ye Department of Epidemiology and
Population Health Albert Einstein College of
Medicine
Introduction Computer simulators often require a
large number of inputs and are computationally
demanding. A main goal of computer
experimentation may be screening, identifying
which inputs have a significant impact on the
process being studied. Gaussian spatial process
(GASP) models are commonly used to model computer
simulators. These models are flexible, but make
variable selection challenging. We present
reference distribution variable selection (RDVS)
as a new approach to screening for GASP models.
Results Simulated Example We used a 54-run
space-filling Latin hypercube design with p10
factors. The response is generated by A GASP
model is used to analyse the generated response
and the RDVS algorithm is used to identify the
first four factors as active Posterior
distributions for correlation parameters of 10
factors. The horizontal line marks the 10th
percentile of the reference distribution.
Correlation parameters with posterior medians
below this line indicate active factors. Taylor
Cylinder Experiment A 118-run 5-level
nearly-orthogonal design was used. Exploratory
analysis suggests factor 6 is important,
otherwise significant factors are difficult to
identify RDVS identifies factor 6 and six
other factors as having a significant impact on
cylinder deformation.
Discussion RDVS is able to correctly identify
when none of the true factors are active. This
variable selection technique complements methods
in sensitivity analysis. It can be used as a
precursor to alternative visualization and ANOVA
approaches to screening. The method is robust
to the specification of the prior distributions.
Since the inert variable is assigned the same
prior as the true factors, the method
self-calibrates.
Gaussian Spatial Process Model To model the
response from a computer experiment, we use a
Bayesian version of the GASP model originally
used by Sacks et al. (1989) y(X) Simulator
response (n x 1) vector X Input to the
computer code (n x p) design matrix ?
White-noise process, independent of z(X) The
Gaussian spatial process, z(X), is specified to
have mean zero and covariance function Under
this parameterization, if ?k is close to one, the
kth input is not active. RDVS is a method for
gauging the relative magnitudes of the
correlation parameters ?k.
- Conclusions and Future Research
- RDVS is a new method for variable selection for
Bayesian Gaussian Spatial Process models. - The methodology is motivated by asking what
would the posterior distribution of the
correlation parameter for an inert factor look
like given the data? - The approach is Bayesian and only requires the
generation of an inert factor, but the screening
has a frequentist flavour, using the distribution
of the inert factor as a reference distribution. - Future research
- Using a linear regression model for the mean of
the GASP model - Using RDVS for variable selection for other
models.
Computer Experiment Example Taylor Cylinder
Experiment (Los Alamos National Lab) This
is a finite element code used to simulate the
high velocity impact of a cylinder. In the
experiment, copper cylinders (length 5.08 cm,
radius 1 cm) are fired into a fixed barrier at a
velocity of 177 m/s. The cylinder length after
impact is used as the outcome. The process is
governed by 14 parameters which control the
behaviour of the cylinder after impact. Over the
limited range that the computer experiment
exercises the simulator, it is expected that the
response is dominated by only a few of the 14
parameters.
- RDVS Algorithm
- To implement RDVS, a factor which is known to be
inert is appended to the design matrix X.
This provides a benchmark against which the other
input factors can be compared. - Algorithm
- Augment the design matrix by adding a new design
column corresponding to an inert factor. - Find the posterior median of the correlation
parameter corresponding to the dummy factor. - Repeat steps 1. and 2. many times to obtain the
distribution of the posterior median of an inert
factor to use as a reference distribution. - Compare the posterior medians of the correlation
parameters of the true factors to the reference
distribution. The percentile of the reference
distribution used for comparison reflects the
rate of falsely identifying an inert factor as
active.
Acknowledgements This research was initiated
while Linkletter, Bingham and Ye were visiting
the Statistical Sciences group at Los Alamos
National Laboratory. This work was supported by
a grant from the Natural Sciences and Engineering
Research Council of Canada. Yes research
supported by NSD DMS-0306306.