Probability bounds analysis

About This Presentation

Title:

Probability bounds analysis

Description:

Case study: dike reliability. D. wave. sea level. revetment. blocks. clay layer. H tan( ) Z ... delta = [1.60, 1.65] //relative density of the revetment blocks ... – PowerPoint PPT presentation

Number of Views:191

Avg rating:3.0/5.0

Slides: 198

Provided by: scottf63

Category:

more less

Transcript and Presenter's Notes

Title: Probability bounds analysis

1
Probability bounds analysis

Scott Ferson, scott_at_ramas.com 2 October 2007,
Stony Brook University, MAR 550, Challenger 165
2
Outline

P-boxes marrying intervals and probability
Calculations via the Cartesian product
Getting p-boxes
Moment and shape constraints
From empirical data
Dependence
Model uncertainty
Updating
Backcalculation
Sensitivity analysis
Differential equations
Conclusions

Examples Dike reliability PCBs and duck
hunters Dioxin inhalation Hydrocarbon plume
3
Typical risk analysis problems

Sometimes little or even no data
Updating is rarely used
Very simple arithmetic (or logical) calculations
Occasionally, finite element meshes or
differential equations
Usually small in number of inputs
Nuclear power plants are the exception
Results are important and often high profile
But the approach is being used ever more widely

4
Probability vs. intervals

Probability theory
Can handle likelihoods and dependence well
Has an inadequate model of ignorance
Lying saying more than you really know
Interval analysis
Can handle epistemic uncertainty (ignorance) well
Inadequately models frequency and dependence
Cowardice saying less than you know

5
Whats needed

Reliable, conservative assessments of tail risks
Using available information but without forcing
analysts to make unjustified assumptions
Neither computationally expensive nor
intellectually taxing

6

7
Probability box (p-box)
Interval bounds on an cumulative distribution
function (CDF)
1
Cumulative probability
0

1.0

2.0

3.0

0.0
X
8
Generalizes an uncertain number
Probability distribution
Probability box
Interval
1
Cumulative probability
0
0
10
20
30
40
Not a uniform distribution
9
Probability bounds analysis

Marries intervals with probability theory
Distinguishes variability and incertitude
Solves many problems in uncertainty analysis
Input distributions unknown
Imperfectly known correlation and dependency
Large measurement error, censoring, small sample
sizes
Model uncertainty

10
Calculations

All standard mathematical operations
Arithmetic operations (, ?, , , , min, max)
Logical operations (and, or, not, if, etc.)
Transformations (exp, ln, sin, tan, abs, sqrt,
etc.)
Backcalculation (deconvolutions, updating)
Magnitude comparisons (lt, , gt, , ?)
Other operations (envelope, mixture, etc.)
Faster than Monte Carlo
Guaranteed to bounds answer
Good solutions often easy to compute

11
Faithful generalization

When inputs are distributions, its answers
conform with probability theory
When inputs are intervals, it agrees with
interval analysis

12
Probability bounds arithmetic
1
1
A
B
Cumulative Probability
Cumulative Probability
0
0
0
2
4
6
8
10
12
14
0
1
2
3
4
5
6
Whats the sum of AB?
13
Cartesian product
A mix(1,1,3, 1,2,4, 1, 3,5) B mix(1,
2,8, 1, 6,10, 1,8,12) A B
(range3,17, mean7.3,14, var0,22) A
B (range3,17, mean7.3,14,
var0,38)
AB independence
A?1,3 p1 1/3
A?3,5 p3 1/3
A?2,4 p2 1/3
B?2,8 q1 1/3
AB?3,11 prob1/9
B?6,10 q2 1/3
B?8,12 q3 1/3
14
AB under independence
1.00
0.75
0.50
Cumulative probability

Rigorous
Best possible

0.25
0.00
15
0
3
6
9
12
18
AB
15
Case study dike reliability
revetment
blocks
wave
sea level
clay layer
D
H tan(?) Z ?D ?
cos(?) M ?s (all
variables ?)
?
16
Inputs

relative density of the revetment blocks
? ? 1.60, 1.65
revetment blocks thickness
D ? 0.68, 0.72 meters
slope of the revetment
? ? arc tangent(0.32, 0.34) 0.309, 0.328
radians
model parameter
M ? 3.0, 5.2
significant wave height
H Weibull(scale 1.2,1.5 meters, shape
10,12)
offshore peak wave steepness
s normal(mean 0.039,0.041, stdev
0.005,0.006)

Reliability depends on the density and
thickness of its facing masonry
relative density of the revetment blocks
? 1.60, 1.65
revetment blocks thickness
D 0.68, 0.72 m
slope of the revetment
? atan(0.32, 0.34)
model parameter
M 3.0, 5.2
significant wave height
H (1,1.5, 1/8), (1.1,1.5, ¼), (1.3,
1.6, ¼), (1.3, 1.4), (1.5, 1.7,1/8) m
offshore peak wave steepness
s (3,3.6, 1/20), (3.4,4.2, 9/20),
(3.9, 4, 9/20), (4.5, 4.8, 1/20)

17
1
1
1
1
1
1
D
D
H
Cumulative probability
Cumulative probability
0
0
0
0
0
0
1.5
1.6
1.7
0
1
2
1.5
1.6
1.7
1
2
0.6
0.7
0.8
0.6
0.7
0.8
meters
meters
meters
meters
1
1
1
1
1
1
a
M
s
0
0
0
0
0
0
0.2
0.3
0.4
2
3
4
5
6
0.02
0.04
0.06
0.2
0.3
0.4
2
3
4
5
6
0.02
0.04
0.06

radians
radians
18
Discretize each p-box
1
H
Probability
0
0
1
2
meters
19
100 ? 100 Cartesian product
H
0, 1.02

Z
(
D
,
D
,
M
,
a
,
H
,
s
)

0.01
Independent

0.290,1.19

0.0001
0.314, 1.19
0.0001
s
0.536, 1.19
0.0001
0.544, 1.19
0.0001
20
Each cell is an interval calculation
Note that variable ? is uncertain and repeated,
but it does not lead to an inflation of the
uncertainty because the function is monotone
increasing over the range of ?s uncertainty
21
Reliability function

Reassembled from 10,000 intervals

1
0
0
1
PBA says the risk Z is less than zero is less
than 0.044 Monte Carlo said this risk is about
0.000014
The risk Z is less than zero is less than 0.091
// PBA delta 1.60, 1.65 //relative
density of the revetment blocks D 0.68, 0.72
//revetment blocks thickness alpha
atan(0.32, 0.34) //slope of the revetment
M 3.0, 5.2 //model parameter H
weibull(1.2,1.5, 10,12) //significant wave
height s N(0.039,0.041, 0.005,0.006)//offsho
re peak wave steepness Zb delta D - (H
tan(alpha) / cos(alpha)) / (M sqrt(s))
delta U(1.60, 1.65) D U(0.68, 0.72)
alpha atan(U(0.32, 0.34)) M
U(3.0, 5.2) H weibull(1.35, 11) s
normal(0.04, 0.0055) Zp delta D - (H
tan(alpha) / cos(alpha)) / (M sqrt(s))
show Zb in green show Zp in yellow _alpha(Zb,0)
0, 0.050505050506 _alpha(Zp,0) 0,
0.010101010102
(precise)
probabilistic calculation in R many lt-
1000000 delta lt- runif(many,1.60, 1.65) D lt-
runif(many,0.68, 0.72) alpha lt-
atan(runif(many,0.32, 0.34)) M lt-
runif(many,3.0, 5.2) H lt- rweibull(many,
11, 1.35) (parm order reversed)
1.2,1.5, 10,12 rgumbel lt- function(n,a,b)
return(a-blog(log(1/runif(n)))) s lt-
rnorm(many,0.04, 0.0055) 0.039,0.041,
0.005,0.006) Zp lt- delta D - (H tan(alpha)
/ cos(alpha)) / (M sqrt(s)) risk
ZpZplt0 length(risk)/many 1 1.6e-05
SLOW! plot(sort(Zp), 1length(Zp)/many, t's',
xlab'T', ylab'Probability', ylimc(0,1))
Over 3 orders of magnitude
22
Implementation in R

levels lt- 100
delta.1 lt- 1.60 delta.2 lt- 1.65
D.1 lt- 0.68 D.2 lt- 0.72
alpha.1 lt- atan(0.32) alpha.2 lt- atan(0.34)
M.1 lt- 3.0 M.2 lt- 5.2
alpha.1 lt- atan(0.32) alpha.2 lt- atan(0.34)
R.1 lt- tan(alpha.1) / cos(alpha.1) R.2 lt-
tan(alpha.2) / cos(alpha.2)
slice lt- function(p, df, a1, a2, b1, b2) w lt-
do.call(df, list(p,a1, b1)) x lt- do.call(df,
list(p,a1, b2))
y lt- do.call(df, list(p,a2, b1)) z lt-
do.call(df, list(p,a2, b2))
list(onepmin(w,x,y,z)1levels,
twopmax(w,x,y,z)2(levels1))
p lt- c(0.005, 1(levels-1)/levels, 0.995)
H lt- slice(p,'qweibull',10,12, 1.2,1.5)
s lt- slice(p,'qnorm', 0.039,0.041, 0.005,0.006)
Hs.1 lt- as.vector(outer(Hone, sqrt(stwo),
'/')) Hs.2 lt- as.vector(outer(Htwo,
sqrt(sone), '/'))
Zb.1 lt- delta.1 D.1 - (Hs.2 R.2) / M.1 Zb.2
lt- delta.2 D.2 - (Hs.1 R.1) / M.2
Zb.1 lt- sort(Zb.1) Zb.2 lt- sort(Zb.2)

Lower endpoints
Upper endpoints
23
Monte Carlo simulation

many lt- 1000000
delta lt- runif(many,1.60, 1.65)
D lt- runif(many,0.68, 0.72)
alpha lt- atan(runif(many,0.32, 0.34))
M lt- runif(many,3.0, 5.2)
H lt- rweibull(many, 11, 1.35) parm order
reversed
s lt- rnorm(many,0.04, 0.0055)
Zp lt- delta D - (H tan(alpha) / cos(alpha)) /
(M sqrt(s))
risk ZpZplt0
length(risk)/many
1 1.6e-05
Zp lt- Zp12000 else plotting takes too long
plot(sort(Zp), 1length(Zp)/many, t's',
xlab'T', ylab'Probability', ylimc(0,1))

Intervals replaced by uniform distributions or,
if theyre parameters, by their midpoint values
24
Selecting p-box inputs
25
Where do we get p-boxes?

Assumption
Modeling
Robust Bayesian analysis
Constraint propagation
Data with incertitude
Measurement error
Sampling error
Censoring

26
Constraint propagation
27
(Maximum entropy erases uncertainty)
1
1
1
mean, std
positive, quantile
min,max
0
0
0
2
4
6
8
10
-10
0
10
20
0
10
20
30
40
50
1
1
1
min, max, mean
sample data, range
integral, min,max
0
0
0
2
4
6
8
10
2
4
6
8
10
2
4
6
8
10
1
1
1
min, mean
min,max, mean,std
p-box
0
0
0
0
10
20
30
40
50
2
4
6
8
10
2
4
6
8
10
28
Data of poor repute

Data sets whose values are intervals

All measurements are actually intervals
Generalize empirical distributions to p-boxes
Censoring is sometimes significant
Model intervals as uniform distributions
Interval uncertainty doesnt exist in real life
Ignore intervals or model them as uniforms
-Tony OHagan
29
Incertitude is common in data

Periodic observations
When did the fish in my aquarium die during the
night?
Plus-or-minus measurement uncertainties
Coarse measurements, measurements from digital
readouts
Non-detects and data censoring
Chemical detection limits, studies prematurely
terminated
Privacy requirements
Epidemiological or medical information, census
data
Theoretical constraints
Concentrations, solubilities, probabilities,
survival rates
Bounding studies
Presumed or hypothetical limits in what-if
calculations

30
A tale of two data sets

Skinny Puffy
1.00, 1.52 3.5, 6.4
2.68, 2.98 6.9, 8.8
7.52, 7.67 6.1, 8.4
7.73, 8.35 2.8, 6.7
9.44, 9.99 3.5, 9.7
3.66, 4.58 6.5, 9.9
0.15, 3.8
4.5, 4.9
7.1, 7.9

31
Empirical p-boxes
1
1
1
Skinny
Puffy
Cumulative probability
Cumulative probability
0
0
0
2
4
6
8
0
10
2
4
6
8
0
10
2
4
6
8
0
10
2
4
6
8
0
10
x
x
x
x
32
Many possible precise data sets
1
Puffy
Cumulative probability
0
0
2
4
6
8
10
x
33
Statistics for data that are intervals

Some statistics are easy to compute
Empirical distribution, mean, median,
percentiles, etc.
Some are tricky, but easy for a computer
Variance, upper confidence limit, correlation,
etc.
Tradeoff between more versus better data
Review just published as a Sandia report

34
P-box parameter fitting

Method of matching moments
Regression approaches
Maximum likelihood

35
Method of matching moments

Mean, variance, etc. are now intervals
Very easy for one-parameter distributions
In general, moments are dependent
(The largest mean cannot be associated with the
largest variance)
Envelope distributions that extremize the moments
Resulting p-boxes represent the assumptions about
the shape family as informed by the available data

36

37
Sampling uncertainty

When youve measured only part of a population
Kolmogorov-Smirnov confidence intervals
Distribution-free
Assumes only random sampling
Parametric versions
normal, lognormal, exponential, Weibull, etc.
Bound the whole distribution
95 of the time, the true distribution
falls entirely inside the limits

38
Distributional confidence limits
1.0
0.8
Kolmogorov-Smirnov (distribution-free)
0.6
Cumulative probability
0.4
0.2
Normal distribution
0.0
300000
350000
400000
450000
500000
Volumetric heat capacity, ?Cp (J/m3C)
39
Confidence limits on p-boxes
1
95 KS confidence limits
Skinny and Puffy pooled (n 15)
Cumulative probability
0
0
5
10
x
40
Uncertainties expressible with p-boxes

Sampling uncertainty
Distributional confidence limits
Measurement incertitude
Intervals
Uncertainty about distribution shape
Constraints (non-parametric p-boxes)
Surrogacy uncertainty
Modeling

41
Example PCBs and duck hunters

Location Massachusetts and Connecticut
Receptor Adult human hunters of waterfowl
Contaminant PCBs (polychorinated biphenyls)
Exposure route dietary consumption of
contaminated waterfowl

Based on the assessment for non-cancer risks from
PCB to adult hunters who consume contaminated
waterfowl described in Human Health Risk
Assessment GE/Housatonic River Site Rest of
River, Volume IV, DCN GE-031203-ABMP, April
2003, Weston Solutions (West Chester,
Pennsylvania), Avatar Environmental (Exton,
Pennsylvania), and Applied Biomathematics
(Setauket, New York).
42
Hazard quotient
HQ (EF IR C (1- LOSS)) / (AT
BW RfD)
min, max, mean, std

EF mmms(1, 52, 5.4, 10) meals per year //
exposure frequency, censored data, n 23
IR mmms(1.5, 675, 188, 113) grams per meal //
poultry ingestion rate from EPAs EFH
C 7.1, 9.73 mg per kg // exposure point
(mean) concentration
LOSS 0 // loss due to cooking
AT 365.25 days per year // averaging time (not
just units conversion)
BW mixture(BWfemale, BWmale) // Brainard and
Burmaster (1992)
BWmale lognormal(171, 30) pounds // adult male
n 9,983
BWfemale lognormal(145, 30) pounds // adult
female n 10,339
RfD 0.00002 mg per kg per day // reference dose
considered tolerable

43
Inputs
1 - CDF
1
1
Exceedance risk
EF
IR
0
0
0
10
20
30
40
50
60
0
200
400
600
meals per year
grams per meal
1
1
males
females
C
BW
0
0
0
10
20
0
100
200
300
mg per kg
pounds
44
Results
EF mmms(1, 52, 5.4, 10) units('meals per
year') // exposure frequency, censored data, n
23 IR mmms(1.5, 675, 188, 113) units('grams
per meal') // poultry ingestion rate from EPAs
EFH C 7.1, 9.73 units('mg per kg') //
exposure point (mean) concentration LOSS 0 //
loss due to cooking AT 365.25 units('days per
year') // averaging time (not just units
conversion) BWmale lognormal(171, 30)
units('pounds') // adult male n 9,983 BWfemale
lognormal(145, 30) units('pounds') // adult
female n 10,339 BW mixture(BWfemale,
BWmale) // Brainard and Burmaster (1992) RfD
0.00002 units('mg per kg per day') // reference
dose considered tolerable HQ (EF IR C
(1- LOSS)) / (AT BW RfD) HQ
(range5.52769,557906, mean1726,13844,
var0,7e09) meals grams meal-1 days-1 pounds-1
day HQ HQ 0 HQ (range0.0121865,1229.97
, mean3.8,30.6, var0,34557) mean(HQ)
3.8072899212, 30.520271393 sd(HQ) 0,
185.89511004 median(HQ) 0.66730580299,
54.338212132 cut(HQ,95) 3.5745830321,
383.33656983 range(HQ) 0.012186461038,
1229.9710013
1
mean 3.8, 31 standard
deviation 0, 186 median 0.6,
55 95th percentile 3.5 , 384 range
0.01, 1230
Exceedance risk
0
0
500
1000
HQ
45
Dependence
46
Dependence

Not all variables are independent
Body size and skin surface area
Common-cause variables
Known dependencies should be modeled
What can we do when we dont know them?

47
How to do other dependencies?

Independent
Perfect (comonotonic)
Opposite (countermonotonic)

48
Perfect dependence
AB perfect positive
A?1,3 p1 1/3
A?3,5 p3 1/3
A?2,4 p2 1/3
B?2,8 q1 1/3
AB?3,11 prob1/3
AB?5,13 prob0
AB?4,12 prob0
B?6,10 q2 1/3
AB?7,13 prob0
AB?9,15 prob0
AB?8,14 prob1/3
B?8,12 q3 1/3
AB?9,15 prob0
AB?11,17 prob1/3
AB?10,16 prob0
49
Opposite dependence
AB opposite positive
A?1,3 p1 1/3
A?3,5 p3 1/3
A?2,4 p2 1/3
B?2,8 q1 1/3
AB?3,11 prob0
AB?5,13 prob1/3
AB?4,12 prob0
B?6,10 q2 1/3
AB?7,13 prob0
AB?9,15 prob0
AB?8,14 prob1/3
B?8,12 q3 1/3
AB?9,15 prob 1/3
AB?11,17 prob0
AB?10,16 prob0
50
Perfect and opposite dependencies
1
Cumulative probability
0
0
3
6
9
12
15
18
AB
51
Uncertainty about dependence

Sensitivity analyses usually used
Vary correlation coefficient between ?1 and 1
But this underestimates the true uncertainty
Example suppose X, Y uniform(0,24) but we
dont know the dependence between X and Y

52
Varying the correlation coefficient

1
1
Cumulative probability
X, Y uniform(0,24)
0
0
0
10
20
30
40
50
0
10
20
30
40
50
XY
53
Counterexample
54
What about other dependencies?

Independent
Perfectly positive
Opposite
Positively or negatively associated
Specified correlation coefficient
Nonlinear dependence (copula)
Unknown dependence

55
Fréchet inequalities

They make no assumption about dependence (Fréchet
1935)
max(0, P(A)P(B)1) ? P(A B) ? min(P(A), P(B))
max(P(A), P(B)) ? P(A ? B) ? min(1, P(A)P(B))

56
Fréchet case (no assumption)
AB Fréchet case
A?1,3 p1 1/3
A?3,5 p3 1/3
A?2,4 p2 1/3
B?2,8 q1 1/3
AB?3,11 prob0,1/3
AB?5,13 prob0,1/3
AB?4,12 prob0,1/3
B?6,10 q2 1/3
AB?7,13 prob0,1/3
AB?9,15 prob0,1/3
AB?8,14 prob0,1/3
B?8,12 q3 1/3
AB?9,15 prob0,1/3
AB?11,17 prob0,1/3
AB?10,16 prob0,1/3
57
Naïve Fréchet case
1
This p-box is not best possible
Cumulative Probability
0
0
3
6
9
12
15
18
AB
58
Fréchet can be improved

Interval estimates of probabilities dont reflect
the fact that the sum must equal one
Resulting p-box is too fat
Linear programming needed to get the optimal
answer using this approach
Frank, Nelsen and Sklar gave a way to compute the
optimal answer directly

59
Frank, Nelsen and Sklar (1987)
If X F and Y G, then the distribution of XY
is
where C is the copula between F and G. In any
case, and irrespective of this dependence, this
distribution is bounded by
This formula can be generalized to work with
bounds on F and G.
60
Best possible bounds
1
Cumulative Probability
0
0
3
6
9
12
15
18
AB
61
Unknown dependence
1
Fréchet
Cumulative probability
X,Y uniform(1,24)
0
10
20
30
40
50
0
X Y
62
Between independence and Fréchet

May know something about dependence that tightens
better than Fréchet
Dependence is positive (PQD)
P(X ? x, Y ? y) ? P(X ? x) P(Y ? y) for all x
and y
Variables are uncorrelated
Pearson correlation r is zero
Dependence is a particular kind

63
Unknown but positive dependence
1
Positive
Cumulative probability
X,Y uniform(1,24)
0
10
20
30
40
50
0
X Y
64
Uncorrelated variables
1
Uncorrelated
Cumulative probability
X,Y uniform(1,24)
0
10
20
30
40
50
0
X Y
65
Varying correlation between ?1 and 1
1
Pearson-normal
Cumulative probability
X,Y uniform(1,24)
0
10
20
30
40
50
0
X Y
66
Can model dependence exactly too
1
Frank (medial)
0.5
0
4
6
8
10
12
1
Cumulative probability
Mardia (Kendall)
0.5
0
X normal(5,1) Y uniform(2,5) various
correlations and dependence functions (copulas)
4
6
8
10
12
1
Clayton
0.5
0
4
6
8
10
12
X Y
67
Example dioxin inhalation

Location Superfund site in California
Receptor adults in neighboring community
Contaminant dioxin
Exposure route inhalation of windborne soil

Modified from Table II and IV in Copeland, T.L.,
A.M. Holbrow, J.M Otani, K.T. Conner and D.J.
Paustenbach 1994. Use of probabilistic methods to
understand the conservativism in Californias
approach to assessing health risks posed by air
contaminant. Journal of the Air and Waste
Management Association 44 1399-1413.
68
Total daily intake from inhalation

R normal(20, 2) //respiration rate, m3/day
CGL 2 //concentration at ground level, mg/m3
Finh uniform(0.46,1) //fraction of
particulates retained in lung, unitless
ED exponential(11) //exposure duration, years
EF uniform(0.58, 1) //exposure frequency,
fraction of a year
BW normal(64.2, 13.19) //receptor body
weight, kg
AT gumbel(70, 8) //averaging time, years

69
Input distributions
CGL
70
Results
1
All variables mutually independent
Exceedance risk
No assumptions about dependencies
0
0
1
2
TDI, mg kg?1 day?1
71
Uncertainty about dependence

Impossible with sensitivity analysis since its
an infinite-dimensional problem
Kolmogorov-Fréchet bounding lets you be sure
Can be a large or a small consequence

72
Model uncertainty
73
Model uncertainty

Doubt about the structural form of the model
Usually incertitude rather than variability
Often the elephant in the middle of the room

74
Uncertainty in probabilistic analyses

Parameters
Data surrogacy
Distribution shape
Intervariable dependence
Arithmetic expression
Level of abstraction

75
General strategies

Sensitivity (what-if) studies
Probabilistic mixture
Bayesian model averaging
Enveloping and bounding analyses

76
Sensitivity (what-if) studies

Simply re-computes the analysis with alternative
assumptions
Intergovernmental Panel on Climate Change
No theory required to use or understand

77
Example

The function f is one of two possibilities.
Either
f(A,B) fPlus(A,B) A B
or
f(A,B) fTimes(A,B) A ? B
is the correct model, but the analyst does not
know which. Suppose that A triangular(?2.6, 0,
2.6) and B triangular(2.4, 5, 7.6).

78
1
0.8
0.6
Cumulative probability
0.4
Plus
Times
0.2
0
-15
-10
-5
0
5
10
15
X
79
Drawbacks of what-if

Consider a long-term model of the economy under
global warming stress
3 baseline weather trends
3 emission scenarios
3 population models
3 mitigation plans
Combinatorially complex as more model components
are considered
Cumbersome to summarize results

80
Probabilistic mixture

Identify all possible models
Translate model uncertainty into choices about
distributions
Vertically average probability densities (or
cumulative probabilities)
Can use weights to account for credibility

Show graph with both density and cdf, with equal
weights
81
Example
1
P(f) 2P(f?)
0.8
0.6
Cumulative probability
What-if
0.4
Mixture
0.2
0
-15
-10
-5
0
5
10
15
X
82
Probabilistic mixture

State of the art in probabilistic risk analysis
Nuclear power plant risk assessments
Need to know what all the possibilities are
If dont know the weights, assume equality

83
Drawbacks of mixture

If you cannot enumerate the possible models, you
cant use this approach
Averages together incompatible theories and
yields an answer that neither theory supports
Can underestimate tail risks

84
Bayesian model averaging

Similar to the probabilistic mixture
Updates prior probabilities to get weights
Takes account of available data

85
Bayesian model averaging

Assume one of the two models is correct
Compute probability distribution for f(A,B)
Read off probability density of observed data
Thats the likelihood of the model
Repeat above steps for each model
Compute posterior ? prior ? likelihood
This gives the Bayes factors
Use the posteriors as weights for the mixture

86
1
Datum f(A,B) 7.59
0.8
0.6
Bayes
Cumulative probability
What-if
0.4
Mixture
0.2
0
-15
-10
-5
0
5
10
15
X
87
Bounding probabilities

Translate model uncertainties to a choice among
distributions
Envelope the cumulative distributions

88
1
0.8
0.6
Bayes
Cumulative probability
What-if
0.4
Mixture
Envelope
0.2
0
-15
-10
-5
0
5
10
15
X
89
Strategy for enumerable models

What-if analysis isnt feasible in big problems
Probabilistic mixture is, at best, ad hoc
For abundant data, Bayesian approach is best
Otherwise, its probably just wishful thinking
Bounding is reliable, but may be too wide

90
Uncertainty about distribution shape

Suppose correct model is known to be f(A,B)
AB, but the distributions for the independent
variables A and B are not precisely known.

1
1
0
0
-10
0
10
20
-10
0
10
91
Strategy for distribution shape

Very challenging for sensitivity analysis since
infinite-dimensional problem
Bayesians usually fall back on a maximum entropy
approach, which erases uncertainty rather than
propagates it
Bounding seems most reasonable, but should
reflect all available information

92
Uncertainty about dependence

Neither sensitivity studies nor Monte Carlo
simulation can comprehensively assess it
Bayesian model averaging cant even begin
Only bounding strategies work

93
What-if sensitivity analysis

Simple theory
Straightforward to implement
Doesnt confuse variability and incertitude
Must enumerate all possible models
Combinatorial complexity
Hard to summarize

94
Probabilistic mixture

Produces single distribution as answer
Can account for differential credibility
Must enumerate all possible models
Confounds variability and incertitude
Averages together incompatible theories
Underestimates tail risks

95
Bayesian model averaging

Produces single distribution as answer
Can account for differential prior credibility
Takes account of available data
Must enumerate all possible models
May be computationally challenging
Confounds variability and incertitude
Averages together incompatible theories
Underestimates tail risks

96
Bounding probability

Straightforward theoretically
Yields single mathematical object as answer
Doesnt confuse variability and incertitude
Doesnt underestimate tail risks
Cannot account for differential credibility
Cannot take account of available data
Optimality may be computationally expensive

97
Strategy for model uncertianty

Bounding seems best when data are sparse
When the answer is clear, strong assurance for
the conclusion
If the answer is too wide, need more information
to tighten it
Unless its okay to mislead people about the
reliability of your conclusions

98
Updating
99
Updating

Using knowledge of how variables are related to
tighten their estimates
Removes internal inconsistency and explicates
unrecognized knowledge
Also called constraint updating or editing
Also called natural extension

100
Example

Suppose
W 23, 33
H 112, 150
A 2000, 3200
Does knowing W ? H A let us to say any more?

101
Answer

Yes, we can infer that
W 23, 28.57
H 112, 139.13
A 2576, 3200
The formulas are just W intersect(W, A/H), etc.

To get the largest possible W, for instance, let
A be as large as possible and H as small as
possible, and solve for W A/H.
102
Bayesian strategy
Prior
Likelihood
Posterior
103
Bayes rule

Concentrates mass onto the manifold of feasible
combinations of W, H, and A
Answers have the same supports as intervals
Computationally complex
Needs specification of priors
Yields distributions that are not justified (come
from the choice of priors)
Expresses less uncertainty than is present

104
Updating with p-boxes
1
1
1
A
H
W
0
0
0
2000
3000
4000
20
30
40
120
140
160
105
Answers
106
Calculation with p-boxes

Agrees with interval analysis whenever inputs are
intervals
Relaxes Bayesian strategy when precise priors are
not warranted
Produces more reasonable answers when priors not
well known
Much easier to compute than Bayes rule

107
Backcalculation
108
Backcalculation

Needed for cleanup and remediation planning
Untangles an equation in uncertain numbers when
we know all but one of the variables
For instance, backcalculation finds B such that
AB C, from estimates for A and C

109
Hard with probability distributions

Inverting the equation doesnt work
Available analytical algorithms are unstable for
almost all problems
Except in a few special cases, Monte Carlo
simulation cannot compute backcalculations trial
and error methods are required

110
Cant just invert the equation

Dose Concentration Intake
Concentration Dose / Intake
When concentration is put back into the forward
equation, the resulting dose is wider than planned

111
Backcalculation with p-boxes

Suppose A B C, where
A normal(5, 1)
C 0 ? C, median ? 1.5, 90th ile ? 35, max ?
50

1
C
0
0
10
20
30
40
50
60
112
Getting the answer

The backcalculation algorithm basically reverses
the forward convolution
Not hard at allbut a little messy to show
Any distribution totally inside B is
sure to satisfy the constraint
its a kernel

1
B
0
-10
0
10
20
30
40
50
113
Check it by plugging it back in

A B C ? C

114
Precise distributions dont work

Precise distributions cant express the target
A concentration distribution giving a prescribed
distribution of doses seems to say we want some
doses to be high
Any distribution to the left would be better
A p-box on the dose target expresses this idea

115
Backcalculation algebra

Can define untanglings for all basic operations
e.g., if A ? B C, then B exp(backcalc(ln A,
ln C))
Can chain them together for big problems
Assuming independence widens the result
Repeated terms need special strategies

116
Conclusion

Planning cleanup requires backcalculation
Monte Carlo methods dont generally work except
in a trial-and-error approach
Can express the dose target as a p-box

117
Sensitivity analysis
118
Sensitivity analysis with p-boxes

Local sensitivity via derivatives
Explored macroscopically over the uncertainty in
the input
Describes the ensemble of tangent slopes to the
function over the range of uncertainty

119
Monotone function
Nonlinear function
range of input
range of input
120
Sensitivity analysis of p-boxes

Quantifies the reduction in uncertainty of a
result when an input is pinched
Pinching is hypothetically replacing it by a less
uncertain characterization

121
Pinching to a point value
1
1
Cumulative probability
Cumulative probability
0
0
1
2
3
0
1
2
3
0
X
X
122
Pinching to a (precise) distribution
1
1
Cumulative probability
Cumulative probability
0
0
1
2
3
0
1
2
3
0
X
X
123
Pinching to a zero-variance interval
1
Cumulative probability
0
1
2
3
0
X

Assumes value is constant, but unknown
Theres no analog of this in Monte Carlo

124
Using sensitivity analyses

There is only one take-home message
Shortlisting variables for treatment is bad
Reduces dimensionality, but erases uncertainty

125
Differential equations
126
Uncertainty usually explodes
The explosion can be traced to numerical
instabilities
x
Time
127
Uncertainty

Artifactual uncertainty
Too few polynomial terms
Numerical instability
Can be reduced by a better analysis
Authentic uncertainty
Genuine unpredictability due to input uncertainty
Cannot be reduced by a better analysis

Only by more information, data or assumptions
128
Uncertainty propagation

We want the prediction to break down if thats
what should happen
But we dont want artifactual uncertainty
Numerical instabilities
Wrapping effect
Dependence problem
Repeated parameters

129
Problem

Nonlinear ordinary differential equation (ODE)
dx/dt f(x, ?)
with uncertain ? and uncertain initial state x0
Information about ? and x0 comes as
Interval ranges
Probability distributions
Probability boxes

130
Model Initial states (range) Parameters (range)
VSPODE Mark Stadherr et al. (Notre Dame)
Taylor models interval Taylor series code, see
also COSY and VNODE
List of constants plus remainder
131
Example ODE

dx1/dt ?1 x1(1 x2)
dx2/dt ?2 x2(x11)
What are the states at t 10?
x0 (1.2, 1.1)T
?1 ?2.99, 3.01
?2 ?0.99, 1.01
VSPODE
Constant step size h 0.1, Order of Taylor model
q 5,
Order of interval Taylor series k 17, QR
factorization

132
VSPODE tells how to compute x1
tt env(uniform1(0, 0.004), uniform1(0,
0.005)) t1 tt3 t2 tt 1 t1 t1 - 3 t2
t2 - 1 1.916037656181642 1 t21
0.689979149231081 t11 1
-4.690741189299572 1 t22
-2.275734193378134 t11 t21
-0.450416914564394 t12 1
-29.788252573360062 1 t23
-35.200757076497972 t11 t22
-12.401600707197074 t12 t21
-1.349694561113611 t13 1
6.062509834147210 1 t24
-29.503128650484253 t11 t23
-25.744336555602068 t12 t22
-5.563350070358247 t13 t21
-0.222000132892585 t14 1
218.607042326120308 1 t25
390.260443722081675 t11 t24
256.315067368131281 t12 t23
86.029720297509172 t13 t22
15.322357274648443 t14 t21
1.094676837431721 t15 1
1.1477537620811058, 1.1477539164945061

1.916037656181642 ? ?10 ? ?21
0.689979149231081 ? ?11 ? ?20
-4.690741189299572 ? ?10 ? ?22
-2.275734193378134 ? ?11 ? ?21
-0.450416914564394 ? ?12 ? ?20
-29.788252573360062 ? ?10 ? ?23
-35.200757076497972 ? ?11 ? ?22
-12.401600707197074 ? ?12 ? ?21
-1.349694561113611 ? ?13 ? ?20
6.062509834147210 ? ?10 ? ?24
-29.503128650484253 ? ?11 ? ?23
-25.744336555602068 ? ?12 ? ?22
-5.563350070358247 ? ?13 ? ?21
-0.222000132892585 ? ?14 ? ?20
218.607042326120308 ? ?10 ? ?25
390.260443722081675 ? ?11 ? ?24
256.315067368131281 ? ?12 ? ?23
86.029720297509172 ? ?13 ? ?22
15.322357274648443 ? ?14 ? ?21
1.094676837431721 ? ?15 ? ?20
1.1477537620811058, 1.1477539164945061

where ? s are centered forms of the parameters
?1 ?1 ? 3, ?2 ?2 ? 1
133
Input p-boxes
?1
?2
t1 env(uniform1(0, 0.004), uniform1(0,
0.005)) tt t13 show tt in red tt t11 tt
N(3, .002,.0038) tt N(1, .002,.0038) tt
mmms(2.99, 3.0099999999, 3, .005) tt mmms(.99,
1.0099999999, 1, .005) tt U(2.99,3.0099999999) t
t U(.99,1.0099999999)
uniform
normal
min, max, mean, var
precise
134
Results
t1 env(uniform1(0, 0.004), uniform1(0,
0.005)) t2 t1 o MY1map2(t1,t2) clear show o
in red o MY2map2(t1,t2) t1 N(3,
.002,.0038) - 3 t2 N(1, .002,.0038) - 1 o
MY1map2(t1,t2) o MY2map2(t1,t2) t1
mmms(2.99, 3.0099999999, 3, .005) - 3 t2
mmms(.99, 1.0099999999, 1, .005) - 1 o
MY1map2(t1,t2) o MY2map2(t1,t2) t1
U(2.99,3.0099999999) - 3 t2 U(.99,1.0099999999)
- 1 o MY1map2(t1,t2) o MY2map2(t1,t2)
x1
x2
uniform
normal
min, max, mean, var
precise
135
Still repeated uncertainties

1.916037656181642 ? ?10 ? ?21
0.689979149231081 ? ?11 ? ?20
-4.690741189299572 ? ?10 ? ?22
-2.275734193378134 ? ?11 ? ?21
-0.450416914564394 ? ?12 ? ?20
-29.788252573360062 ? ?10 ? ?23
-35.200757076497972 ? ?11 ? ?22
-12.401600707197074 ? ?12 ? ?21
-1.349694561113611 ? ?13 ? ?20
6.062509834147210 ? ?10 ? ?24
-29.503128650484253 ? ?11 ? ?23
-25.744336555602068 ? ?12 ? ?22
-5.563350070358247 ? ?13 ? ?21
-0.222000132892585 ? ?14 ? ?20
218.607042326120308 ? ?10 ? ?25
390.260443722081675 ? ?11 ? ?24
256.315067368131281 ? ?12 ? ?23
86.029720297509172 ? ?13 ? ?22
15.322357274648443 ? ?14 ? ?21
1.094676837431721 ? ?15 ? ?20
1.1477537620811058, 1.1477539164945061

136
Subinterval reconstitution

Subinterval reconstitution (SIR)
Partition the inputs into subintervals
Apply the function to each subinterval
Form the union of the results
Still rigorous, but often tighter
The finer the partition, the tighter the union
Many strategies for partitioning
Apply to each cell in the Cartesian product

137
Discretizations
1
0
2.99
3
3.01
138
Contraction from SIR
clear show Y1.4 in blue show Y1.6 in blue show
Y1.4 in gray show Y1.7 in blue show Y1.6 in
gray show Y1.8 in blue show Y1.7 in
gray clear show Y2.4 in blue show Y2.6 in blue
show Y2.4 in gray show Y2.7 in blue show Y2.6 in
gray show Y2.8 in blue show Y2.7 in gray import
Y2.8 Importing variable from Y2-8.prn
1
1
Probability
Best possible bounds reveal the authentic
uncertainty
0
0
0.87
0.88
0.89
0.9
1.12
1.14
1.16
x1
x2
139
Monte Carlo is more limited

Monte Carlo cannot propagate incertitude
Monte Carlo cannot produce validated results
Though can be checked by repeating simulation
Validated results from distributions can be
obtained by modeling inputs with (narrow) p-boxes
and applying probability bounds analysis
Results converge to narrow p-boxes obtained from
infinitely many Monte Carlo replications

140
Conclusions

VSPODE is useful for bounding solutions of
parametric nonlinear ODEs
Probability bounds analysis is useful when
distributions are known imprecisely
Together, they rigorously propagate uncertainty
through a nonlinear ODE
Intervals
Distributions
P-boxes

Initial states Parameters
141
Conclusions
142
Rigorousness

Automatically verified calculations
The computations are guaranteed to enclose the
true results (so long as the inputs do)
You can still be wrong, but the method wont be
the reason if you are

143
How to use the results

When uncertainty makes no difference
(because results are so clear), bounding gives
confidence in the reliability of the decision
When uncertainty obscures the decision
(i) use results to identify inputs to study
better, or
(ii) use other criteria within probability bounds

144
Can uncertainty swamp the answer?

Sure, if uncertainty is huge
This should happen (its not unhelpful)
If you think the bounds are too wide, then put in
whatever information is missing
If there isnt any such information, do you want
to mislead your readers?

145
Monte Carlo is problematic

Probability doesnt accumulate gross uncertainty
in an intuitive way
Precision of the answer (measured as cv) depends
strongly on the number of inputs
The more inputs, the tighter the answer,
irrespective of the distribution shape

146
A few grossly uncertain inputs
147
A lot of grossly uncertain inputs...
Where does this surety come from? What justifies
it?
148
Smoke and mirrors certainty

P-boxes give a vacuous answer if all you provide
are vacuous inputs
Conventional probability theory, at least as its
naively applied, seems to manufacture certainty

Write a Comment

User Comments (0)