Title: Fast Simulators for Assessment and Propagation of Model Uncertainty*
1Fast Simulators for Assessment and Propagation of
Model Uncertainty
- Jim Berger, M.J. Bayarri, German Molina
- June 20, 2001
- SAMO 2001, Madrid
- Project of the National Institute of Statistical
Sciences
2Some activities requiring numerous runs of a
complex computer model
- Output analysis with random inputs, what is the
distribution of output variables? - Optimization finding the optimal setting for
process control variables (e.g., signal timing). - Design of computer or field experiments.
- Bayesian Inference learning about unknown model
parameters or inputs from field data (i.e., data
from the process being modeled).
3The problem and solution
- If runs of the computer model are too slow, the
activity cannot be completed. - The natural solution is to approximate the
computer model most common is approximation by a
faster computer model. - models of lower resolution
- linearized versions of the model
- response surface (or Gaussian process)
approximations - probability networks of various types.
4An Example Bayesian input analysis for CORSIM
- The microsimulator CORSIM is a computer model of
street and highway traffic. - It models vehicles, entering the network and
moving according to interaction rules. - The traffic network studied consists of a
44-intersection neighborhood in Chicago. - CORSIM was applied to model a one-hour period
during rush-hour.
5Network (Chicago)
OHare
Kingsbury
Huron
Erie
Ontario
Ohio
Grand
Illinois
Hubbard
Dearborn
Orleans
Franklin
LaSalle
Clark
Wells
LOOP
6Key Unknown Inputs
- Demands, ? the means of exponential
inter-arrival time distributions that determine
the (random) numbers of vehicles that enter the
system from external streets. ? is
16-dimensional. - Turning probabilities, P the probabilities that
vehicles turn right, left, or go through each
intersection. P is 84-dimensional.
7Data vehicle counts, C
- Demand counts the numbers of vehicles entering
the network at each street, recorded by observers
placed on the external streets. - Turning counts made by observers over short time
intervals at all intersections. - Video counts At central intersections, cameras
were placed that produced an exact count of
vehicles.
8Problems with the Data
- Demand counts are inaccurate, some as much as
40. - Turning counts were made over short time periods.
- Some of the turning counts were missing.
- The observer counts were incompatible with the
video counts (reality) so they were tuned to
bring them into accordance.
9Example of a tuning adjustment
Observer reported 1969 vehicles entering
here. This was adjusted to 1790 vehicles to fit
the observed video count here.
Erie
Ontario
LaSalle
10Problems with tuning
- Often, too few inputs are tuned, and those that
are tuned are then over-tuned. - The often considerable uncertainty in the tuned
inputs is ignored, resulting in overly optimistic
assessment of output variance . - Tuning can mask model biases that actually exist,
making the model less accurate for prediction
outside the range of the data (not applicable
here).
11A solution Bayesian analysis
- Compute the posterior distribution of the true
model inputs, given the data. - But this typically requires use of Markov chain
Monte Carlo (MCMC) methods, involving thousands
of model runs too time consuming for CORSIM. - Thus a fast simulator is needed, one which
represents those features of CORSIM that allow
the data to be related to model inputs.
12(No Transcript)
13Structure of the fast simulator
- It is a probability network
- with the same nodal structure as CORSIMS
- with unknown inputs ? (vehicle inter-arrival
rates) and P (turning probabilities) that mean
the same as in CORSIM - but, with instantaneous vehicles, that (i)
enter the network (ii) turn appropriately (iii)
exit. - Note fast simulators often have a limited
purpose, and are not general replacements for the
computer model here, we ignore the key features
of time, interactions, signals, etc.
14(No Transcript)
15Modeling the demand counts data
- Demand counts Each demand count, CiD, is
modelled by a Poisson distribution with mean bi
Ni , where Ni is the true count and bi-1 is the
unknown observer bias. - The bi are modelled as being i.i.d. Gamma(?, ?),
with ? lt2? (so that the expected bias is less
than 100), but are otherwise unknown, and
assigned a uniform prior distribution.
16Modeling the turning counts data
- If Ni vehicles arrive at an intersection from a
given direction, the numbers turning right, left,
and going through, (NiR, NiL, NiT), are assumed
to follow a multinomial distribution with
probabilities (PiR, PiL, PiT). - The (PiR, PiL, PiT) are assigned the Jeffreys
prior distribution ? (PiR PiL PiT)-1/2. - The observed turning counts, CiT, were assumed to
be accurate.
17Latent Variables and Restrictions
- Introduce latent Ni , counts on all streets
- the total number of vehicles entering an
intersection must equal the number leaving - the video counts, assumed to be accurate, lead to
known values of some sums of these Ni - Eliminate excess Ni (from an initial ?? to 74),
in such a way that the restrictions have a simple
structure. (Poster by G. Molina.) - Let N denote the constrained region of Ni .
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22The posterior distribution
- By Bayes theorem, the posterior distribution, p(
N, l, P, b, ?, ? C), of all unknowns given the
data C, is simply proportional to the product of
the likelihood and the prior, i.e. - fPoisson(CD ND, b) fmultinomial(CT P)
? pmultinomial(N P) pPoisson(ND l)
? pJeffreys(P,?) pGamma(b ?, ? )
1?????? 1N.
23Computation
- The posterior has 192 unknown parameters.
- Computation must be done by MCMC. We utilize a
Gibbs sampling scheme. - The full conditional distributions for P,?, b,
and ? are, respectively, Dirichlet, Gamma, Gamma,
and restricted Gamma these are easy to sample. - ? has a log-concave density rejection sampling
- Each Ni is sampled directly from its discrete
distribution (restricted range). - Roughly 100,000 iterations needed.
24(No Transcript)
25(No Transcript)
26(No Transcript)
27Gridlock and model constraints
- In CORSIM, gridlock (all vehicles stopped) can
occur (20 of the runs in last graph). - This essentially defines the unfeasibility
region, ?, of the parameter space. - This can be handled in CORSIM by simply ignoring
runs that yield gridlock (in the Bayesian
inference, this corresponds to multiplying the
posterior by 1?).
28Conclusions
- Tuning should be replaced by Bayesian inference
for unknown parameters or inputs. - It may be necessary to constrain the parameter
space by ignoring model runs that lie outside the
unfeasibility region. - If evaluation of the computer model is too slow,
fast simulators should be sought for which
Bayesian inference is feasible.