Fast Simulators for Assessment and Propagation of Model Uncertainty* presentation

About This Presentation

Transcript and Presenter's Notes

Title: Fast Simulators for Assessment and Propagation of Model Uncertainty*

1
Fast Simulators for Assessment and Propagation of
Model Uncertainty

Jim Berger, M.J. Bayarri, German Molina
June 20, 2001
SAMO 2001, Madrid
Project of the National Institute of Statistical
Sciences

2
Some activities requiring numerous runs of a
complex computer model

Output analysis with random inputs, what is the
distribution of output variables?
Optimization finding the optimal setting for
process control variables (e.g., signal timing).
Design of computer or field experiments.
Bayesian Inference learning about unknown model
parameters or inputs from field data (i.e., data
from the process being modeled).

3
The problem and solution

If runs of the computer model are too slow, the
activity cannot be completed.
The natural solution is to approximate the
computer model most common is approximation by a
faster computer model.
models of lower resolution
linearized versions of the model
response surface (or Gaussian process)
approximations
probability networks of various types.

4
An Example Bayesian input analysis for CORSIM

The microsimulator CORSIM is a computer model of
street and highway traffic.
It models vehicles, entering the network and
moving according to interaction rules.
The traffic network studied consists of a
44-intersection neighborhood in Chicago.
CORSIM was applied to model a one-hour period
during rush-hour.

5
Network (Chicago)
OHare
Kingsbury
Huron
Erie
Ontario
Ohio
Grand
Illinois
Hubbard
Dearborn
Orleans
Franklin
LaSalle
Clark
Wells
LOOP
6
Key Unknown Inputs

Demands, ? the means of exponential
inter-arrival time distributions that determine
the (random) numbers of vehicles that enter the
system from external streets. ? is
16-dimensional.
Turning probabilities, P the probabilities that
vehicles turn right, left, or go through each
intersection. P is 84-dimensional.

7
Data vehicle counts, C

Demand counts the numbers of vehicles entering
the network at each street, recorded by observers
placed on the external streets.
Turning counts made by observers over short time
intervals at all intersections.
Video counts At central intersections, cameras
were placed that produced an exact count of
vehicles.

8
Problems with the Data

Demand counts are inaccurate, some as much as
40.
Turning counts were made over short time periods.
Some of the turning counts were missing.
The observer counts were incompatible with the
video counts (reality) so they were tuned to
bring them into accordance.

9
Example of a tuning adjustment

Observer reported 1969 vehicles entering
here. This was adjusted to 1790 vehicles to fit
the observed video count here.
Erie
Ontario
LaSalle
10
Problems with tuning

Often, too few inputs are tuned, and those that
are tuned are then over-tuned.
The often considerable uncertainty in the tuned
inputs is ignored, resulting in overly optimistic
assessment of output variance .
Tuning can mask model biases that actually exist,
making the model less accurate for prediction
outside the range of the data (not applicable
here).

11
A solution Bayesian analysis

Compute the posterior distribution of the true
model inputs, given the data.
But this typically requires use of Markov chain
Monte Carlo (MCMC) methods, involving thousands
of model runs too time consuming for CORSIM.
Thus a fast simulator is needed, one which
represents those features of CORSIM that allow
the data to be related to model inputs.

12
(No Transcript)
13
Structure of the fast simulator

It is a probability network
with the same nodal structure as CORSIMS
with unknown inputs ? (vehicle inter-arrival
rates) and P (turning probabilities) that mean
the same as in CORSIM
but, with instantaneous vehicles, that (i)
enter the network (ii) turn appropriately (iii)
exit.
Note fast simulators often have a limited
purpose, and are not general replacements for the
computer model here, we ignore the key features
of time, interactions, signals, etc.

14
(No Transcript)
15
Modeling the demand counts data

Demand counts Each demand count, CiD, is
modelled by a Poisson distribution with mean bi
Ni , where Ni is the true count and bi-1 is the
unknown observer bias.
The bi are modelled as being i.i.d. Gamma(?, ?),
with ? lt2? (so that the expected bias is less
than 100), but are otherwise unknown, and
assigned a uniform prior distribution.

16
Modeling the turning counts data

If Ni vehicles arrive at an intersection from a
given direction, the numbers turning right, left,
and going through, (NiR, NiL, NiT), are assumed
to follow a multinomial distribution with
probabilities (PiR, PiL, PiT).
The (PiR, PiL, PiT) are assigned the Jeffreys
prior distribution ? (PiR PiL PiT)-1/2.
The observed turning counts, CiT, were assumed to
be accurate.

17
Latent Variables and Restrictions

Introduce latent Ni , counts on all streets
the total number of vehicles entering an
intersection must equal the number leaving
the video counts, assumed to be accurate, lead to
known values of some sums of these Ni
Eliminate excess Ni (from an initial ?? to 74),
in such a way that the restrictions have a simple
structure. (Poster by G. Molina.)
Let N denote the constrained region of Ni .

18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
The posterior distribution

By Bayes theorem, the posterior distribution, p(
N, l, P, b, ?, ? C), of all unknowns given the
data C, is simply proportional to the product of
the likelihood and the prior, i.e.
fPoisson(CD ND, b) fmultinomial(CT P)
? pmultinomial(N P) pPoisson(ND l)
? pJeffreys(P,?) pGamma(b ?, ? )
1?????? 1N.

23
Computation

The posterior has 192 unknown parameters.
Computation must be done by MCMC. We utilize a
Gibbs sampling scheme.
The full conditional distributions for P,?, b,
and ? are, respectively, Dirichlet, Gamma, Gamma,
and restricted Gamma these are easy to sample.
? has a log-concave density rejection sampling
Each Ni is sampled directly from its discrete
distribution (restricted range).
Roughly 100,000 iterations needed.

24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
Gridlock and model constraints

In CORSIM, gridlock (all vehicles stopped) can
occur (20 of the runs in last graph).
This essentially defines the unfeasibility
region, ?, of the parameter space.
This can be handled in CORSIM by simply ignoring
runs that yield gridlock (in the Bayesian
inference, this corresponds to multiplying the
posterior by 1?).

28
Conclusions

Tuning should be replaced by Bayesian inference
for unknown parameters or inputs.
It may be necessary to constrain the parameter
space by ignoring model runs that lie outside the
unfeasibility region.
If evaluation of the computer model is too slow,
fast simulators should be sought for which
Bayesian inference is feasible.

Write a Comment

User Comments (0)

About PowerShow.com

Fast Simulators for Assessment and Propagation of Model Uncertainty* PowerPoint PPT Presentation