Title: Imprecise probabilities in engineering design
1Imprecise probabilities in engineering design
Scott Ferson Applied Biomathematics scott_at_ramas.co
m Workshop on Uncertainty Representation in
Robust and Reliability-based Design ASME
DETC/CIE, Philadelphia, 10 September 2006
2Imprecise probabilities (IP)
- Credal set (of possible probability measures)
- Relaxes the idea of a single probability measure
- Coherent upper and lower previsions
- de Finettis notion of a fair price
- Generalizes probability and expectation
- Gambles
3Three pillars of IP
- Behavioral definition of probability
- Can be operationalized
- Natural extension
- Linear programming to compute answers
- Rationality criteria
- Avoiding sure losses (Dutch books)
- Coherence (logical closure)
ASL means you cannot be made into a money
pump Inverted interval bounds on a probability
would violate ASL Coherence means fully
recognizing the implications of your betting
rates P(A ? B) gt P(A) P(B), for disjoint A and
B, would violate coherence
4Probability of an event
- Imagine a gamble that pays one dollar if an event
occurs (but nothing otherwise) - How much would you pay to buy this gamble?
- How much would you be willing to sell it for?
- Probability theory requires the same price for
both - By asserting the probability of the event, you
agree to buy any such gamble offered for this
amount or less, and to sell the same gamble for
any amount less than or equal to this fair
price and for every event! - IP just says, sometimes, your highest buying
price might be smaller than your lowest selling
price
5Credal set
- Knowledge and judgments are used to define a set
of possible probability measures M - All distributions within bounds are possible
- Only distributions having a given shape
- Probability of an event is within some interval
- Event A is at least as probable as event B
- Nothing is known about the probability of C
6IP generalizes other approaches
- Probability theory
- Bayesian analysis
- Worst-case analysis, info-gap theory
- Possibility / necessity models
- Dempster-Shafer theory, belief / plausibility
functions - Probability intervals, probability bounds
analysis - Lower/upper mass/density functions
- Robust Bayes, Bayesian sensitivity analysis
- Random set models
- Coherent lower previsions
DeFinetti probability measures Credal
sets Distributions with interval-valued
parameters Contamination models Choquet
capacities, 2-monotone capacities
7Assumptions
- Everyone makes assumptions
- But not all sets of assumptions are equal!
- Linear Gaussian Independent
- Montonic Unimodal Known correlation sign
- Any function Any distribution Any dependence
-
- IP doesnt require unwarranted assumptions
- Certainties lead to doubt doubts lead to
certainty
8Activities in engineering design
- Decision making
- Optimization
- Constraint propagation
- Convolutions
- Arithmetic
- Logic (event trees)
- Updating
- Validation
- Sensitivity analyses
often
sometimes
a lot
9Convolutions (i.e., adding, multiplying,
and-gating, or-gating, etc., for quantifying the
reliability or risk associated with a design)
10Probability boxes (p-boxes)
Interval bounds on an cumulative distribution
function (CDF)
1
Cumulative probability
0
1.0
2.0
3.0
0.0
X
11A few ways p-boxes arise
1
CDF
0
Precise distribution
12P-box arithmetic (and logic)
- All standard mathematical operations
- Arithmetic operations (, ?, , , , min, max)
- Logical operations (and, or, not, if, etc.)
- Transformations (exp, ln, sin, tan, abs, sqrt,
etc.) - Other operations (envelope, mixture, etc.)
- Faster than Monte Carlo
- Guaranteed to bounds answer
- Optimal answers generally require LP
13Example
- Calculate A B C D, with partial
information - As distribution is known, but not its parameters
- Bs parameters known, but not its shape
- C has a small empirical data set
- D is known to be a precise distribution
- Bounds assuming independence?
- Without any assumption about dependence?
14- A lognormal, mean .5,.6, variance
.001,.01) - B min 0, max 0.5, mode 0.3
- C sample data 0.2, 0.5, 0.6, 0.7, 0.75, 0.8
- D uniform(0, 1)
1
1
A
B
CDF
0
0
0
0.2
0.4
0.6
0
1
1
1
D
C
CDF
0
0
0
1
0
1
15ABCD
1
Under independence
0
1.0
2.0
3.0
0.0
16Generalization of methods
- Marries interval analysis with probability theory
- When information abundant, same as probability
theory - When inputs only ranges, agrees with interval
analysis - Cant get these answers from Monte Carlo methods
- Fewer assumptions
- Not just different assumptions
- Distribution-free methods
- Rigorous results
- Automatically verified calculations
- Built-in quality assurance
17Can uncertainty swamp the answer?
- Sure, if uncertainty is huge
- This should happen (its not unhelpful)
- If you think the bounds are too wide, then put in
whatever information is missing - If there isnt any such information, do you want
the results to mislead?
18Decision making
19Knights dichotomy
- Decisions under risk
- The probabilities of various outcomes are known
- Maximize expected utility
- Not good for big unique decisions or when
gamblers ruin is possible - Decisions under uncertainty
- Probabilities of the outcomes are unknown
- Several strategies, depending on the analyst
20Decisions under uncertainty
- Pareto (some strategy dominates in all scenarios)
- Maximin (largest minimum payoff)
- Maximax (largest maximum payoff)
- Hurwicz (largest average of min and max payoffs)
- Minimax regret (smallest of maximum regret)
- Bayes-Laplace (maximum expected payoff assuming
scenarios are equiprobable)
21Decision making in IP
- State of the world is a random variable, X ? X
- Outcome (reward) of an action depends on X
- We identify an action a with its reward fa X ?
R - In principle, wed like to choose the decision
with the largest expected reward, but how do we
do this? - We explore how the decision changes for different
probability measures in M, the set of possible
ones
22Comparing actions a and b
- Strictly preferred a gt b Ep( fa) gt Ep( fb) for
all p ?M - Almost preferred a ? b Ep( fa) ? Ep( fb) for
all p ?M - Indifferent a ? b Ep( fa) Ep( fb) for all p
?M - Incomparable a b Ep( fa) lt Ep( fb) and
- Eq( fa) gt Eq( fb) some p,q ?M
- where Ep( f ) p(x) f (x), and
- M is the set of possible probability distributions
? x ? X
23E-admissibility
- Vary p in M and, assuming it is the correct
probability measure, see which decision emerges
as the one that maximizes expected utility - The result is the set of all such decisions for
all p ? M
24Alternative maximality
- Maximal decisions are undominated for some p
- Ep( fa) ? Ep( fb), for some action b, for some p
? M - Actions cannot be
- linearly ordered,
- but only partially
- ordered
25Another alternative ?-maximin
- We could take the decision that maximizes the
worst-case expected reward - Essentially a worst-case optimization
- Generalizes two criteria from traditional theory
- Maximize expected utility
- Maximin
26Several IP decision criteria
?-maximax
?-maximin
E-admissible
maximal
interval dominance
27Example
(due to Troffaes 2004)
- Suppose we are betting on a coin toss
- Only know probability of heads ? 0.28, 0.7
- Want to decide among six available gambles
- 1 Pays 4 for heads, pays 0 for tails
- 2 Pays 0 for heads, pays 4 for tails
- 3 Pays 3 for heads, pays 2 for tails
- 4 Pays ½ for heads, pays 3 for tails
- 5 Pays 2.35 for heads, pays 2.35 for tails
- 6 Pays 4.1 for heads, pays ?0.3 for tails
f1(H) 4, f1(T) 0 f2(H) 0, f2(T) 4 f3(H)
3, f3(T) 2 f4(H) ½, f4(T) 3 f5(H)
2.35, f5(T) 2.35 f6(H) 4.1, f6(T) ?0.3
28E-admissibility
- M is a one-dimensional space of probability
measures - Probability Preference
- p(H) lt 2/5 2
- p(H) 2/5 2, 3 (indifferent)
- 2/5 lt p(H) lt 2/3 3
- 2/5 lt p(H) lt 2/3 1, 3 (indifferent)
- 2/3 lt p(H) 1
29Criteria yield different answers
?-maximax 2
?-maximin 5
E-admissible 1,2,3
maximal 1,2,3,5
interval dominance 1,2,3,5,6
30So many answers
- Topic of current discussion and research
- Different criteria are useful in different
settings - The more precise the input, the tighter the
outputs - ? criteria usually yield only one decision
- ? criteria not good if many sequential decisions
- Some argue that E-admissibility is best overall
- Maximality is close to E-admissibility, but much
easier to compute, especially for large problems
31IP versus traditional approaches
- Decisions under IP allow indecision when your
uncertainty entails it - Bayes always produces a single decision (up to
indifference), no matter how little information
may be available - IP unifies the two poles of Knights division
into a continuum
32Comparison to Bayesian approach
- Axioms identical except IP doesnt use
completeness - Bayesian rationality implies not only avoidance
of sure loss coherence, but also the idea that
an agent must agree to buy or sell any bet at one
price - Uncertainty of probability is meaningful, and
its operationalized as the difference between
the max buying price and min selling price - If you know all the probabilities (and utilities)
perfectly, then IP reduces to Bayes
33Why Bayes fares poorly
- Bayesian approaches dont distinguish ignorance
from equiprobability - Neuroimaging and clinical psychology shows humans
strongly distinguish uncertainty from risk - Most humans regularly and strongly deviate from
Bayes - Hsu (2005) reported that people who have brain
lesions associated with the site believed to
handle uncertainty behave according to the
Bayesian normative rules - Bayesians are too sure of themselves (e.g.,
Clippy)
34Robust Bayes
35Derivation of Bayes rule
- P(A B) P(B) P(A B) P(B A) P(A)
- P(A B) P(A) P(B A) / P(B)
- The prevalence of a disease in the general
population is 0.01. - If a diseased person is tested, theres a 99.9
chance the test is positive. - If a healthy person is tested, theres a 99.99
chance the test is negative. - If you test positive, whats the chance you have
the disease?
Almost all doctors say 99 or greater, but the
true answer is 50.
36Bayes rule on distributions
- posterior ? prior ? likelihood
posterior (normalized)
likelihood
prior
37Two main problems
- Subjectivity required
- Beliefs needed for priors may be inconsistent
with public policy/decision making - Inadequate model of ignorance
- Doesnt distinguish between ignorance and
equiprobability
38Solution study robustness
- Answer is robust if it doesnt depend sensitively
on the assumptions and inputs - Robust Bayes analysis, also called Bayesian
sensitivity analysis, investigates this
39Uncertainty about the prior
- class of prior distributions ? class of posteriors
posteriors
priors
likelihood
40Uncertainty about the likelihood
class of likelihood functions ? class of
posteriors
posteriors
likelihoods
prior
41Uncertainty about both
Posteriors
Priors
Likelihoods
42Uncertainty about decisions
- class of probability models ? class of decisions
- class of utility functions ? class of decisions
- If you end up with a single decision, great.
- If the class of decisions is large and diverse,
then any conclusion should be rather tentative.
43Bayesian dogma of ideal precision
- Robust Bayes is inconsistent with the Bayesian
idea that uncertainty should be measured by a
single additive probability measure and values
should always be measured by a precise utility
function. - Some Bayesians justify it as a convenience
- Others suggest it accounts for uncertainty beyond
probability theory
44Sensitivity analysis
45Sensitivity analysis with p-boxes
- Local sensitivity via derivatives
- Explored macroscopically over the uncertainty in
the input - Describes the ensemble of tangent slopes to the
function over the range of uncertainty
46Monotone function
Nonlinear function
range of input
range of input
47Sensitivity analysis of p-boxes
- Quantifies the reduction in uncertainty of a
result when an input is pinched - Pinching is hypothetically replacing it by a less
uncertain characterization
48Pinching to a point value
1
1
Cumulative probability
Cumulative probability
0
0
1
2
3
0
1
2
3
0
X
X
49Pinching to a (precise) distribution
1
1
Cumulative probability
Cumulative probability
0
0
1
2
3
0
1
2
3
0
X
X
50Pinching to a zero-variance interval
1
Cumulative probability
0
1
2
3
0
X
- Assumes value is constant, but unknown
- Theres no analog of this in Monte Carlo
51Using sensitivity analyses
- There is only one take-home message
- Shortlisting variables for treatment is bad
- Reduces dimensionality, but erases uncertainty
52Validation
53How the data come
400
350
300
Temperature degrees Celsius
250
200
1000
900
800
700
600
Time seconds
54How we look at them
55One suggestion for a metric
1
Area or average horizontal distance between the
empirical distribution Sn and the predicted
distribution
Probability
0
200
250
300
350
450
400
Temperature
56Pooling data comparisons
- When data are to be compared against a single
distribution, theyre pooled into Sn - When data are compared against different
distributions, this isnt possible - Conformance must be expressed on some universal
scale
57Universal scale
N(2, 0.6) normal(range0.454502,3.5455,
mean2, var0.36) max(0.0001,exponential(1.7))
(range0.0001,9.00714, mean1.699999,1.700
1, var2.43,2.89) mix(U(1,5),N(10,1)) 2.3
(range2.3,28.9244, mean14.95,
var70.9742)
1
1
1
Probability
0
0
0
1
10
100
1000
0
1
2
3
4
0
10
5
- uiFi (xi) where xi are the data and Fi are
their respective predictions
58Backtransforming to physical scale
1
G
u
Probability
Probability
0
0
5
1
3
2
4
59Backtransforming to physical scale
- The distribution of G?1(Fi (xi)) represents the
empirical data (like Sn does) but in a common,
transformed scale - Could pick any of many scales, and each leads to
a different value for the metric - The likely distribution of interest is the one
used for the validation statement
60Epistemic uncertainty in predictions
a N(5,11,1) show a b 8.1 show b in blue b
15 breadth(env(rightside(a),b))
4.023263478773 b 11 breadth(env(rightside(a),b
)) / 2 0.4087173895951
1
1
1
Probability
d 0
d ? 4
d ? 0.4
0
0
0
0
10
20
0
10
20
0
10
20
- In left, the datum evidences no discrepancy at
all - In middle, the discrepancy is relative to the
edge - In right, the discrepancy is even smaller
61Epistemic uncertainty in both
z0.0001 zz 9.999 show z,zz a
N(6,7,1)-1 show a b -1mix(1,5,7,
1,6.5,8, 1,7.6,9.99, 1, 3.3,6, 1,4,8,
1,4.5,8, 1,5,7, 1,7.5,9, 1,4,8, 1,5,9,
1,6,9.99) show b in blue b -0.2mix(1,
9,9.6,1, 5.3,6.2, 1,5.6,6, 1,7.8,8.4,
1,5.9,7.8, 1,8.3,8.7, 1,5,7, 1,7.5,8,
1,7.6,9.99, 1, 3.3,6, 1,4,8, 1,4.5,8,
1,5,7, 1,8.5,9, 1,7,8, 1,7,9,
1,8,9.99) breadth(env(rightside(a),b))
2.137345705795 c -4 b -0.2mix(1,
9,9.6,1, 5.3,6.2c, 1,5.6,6c, 1,7.8,8.4,
1,5.9,7.8, 1,8.3,8.7, 1,5,7, 1,7.5,8,
1,7.6,9.99, 1, 3.3,6, 1,4,8, 1,4.5,8c,
1,5,7c, 1,8.5,9, 1,7,8, 1,7,9,
1,8,9.99) breadth(env(rightside(a),b)) / 2
1.329372857714
1
1
1
d 0
d ? 0.05
d ? 0.07
Probability
0
0
0
0
5
10
0
5
10
0
5
10
Predictions in white Observations in blue
62Backcalculation
63A typical problem
- How can we design an shielding system if we cant
well specify the radiation distribution? - Could plan for worst case analysis
- Often wasteful
- Cant account for rare even worse extremes
- Could pretend we know the distribution
- Unreasonable for new designs or environments
64IP solution
- Natural compromise that can express both
- Gross uncertainty like intervals and worst cases
- Distributional information about tail risks
- Need to solve equations containing uncertain
numbers - Constraint propagation, or backcalculation
65Cant just invert the equation
- Total ionizing dose Radiation / Shielding
- Shielding Radiation / Dose
- When Shielding is put back into the forward
equation, the resulting dose is wider than planned
66How come?
a 2,8 b 0,4 c a b bb c / a cc
a bb c 0, 32 cc 0, 128
128/32 4
- Suppose dose should be less than 32, and
radiation ranges between 50 and 200 - If we solved for shielding by division, wed get
a distribution ranging between ltltgtgt - But if we put that answer back into the equation
-
- Dose Radiation / Shielding
-
- wed get a distribution with values as large as
128, which is four times larger than planned
67Backcalculation with p-boxes
- Suppose A B C, where
- A normal(5, 1)
- C 0 ? C, median ? 15, 90th ile ? 35, max ?
50
68Getting the answer
- The backcalculation algorithm basically reverses
the forward convolution - Not hard at allbut a little messy to show
- Any distribution totally inside B is
sure to satisfy the constraint
its a kernel
1
B
0
-10
0
10
20
30
40
50
69Check it by plugging it back in
70When you Know that A B C A B C A ?
B C A / B C A B C 2A C A² C
And you have estimates for A, B A, C B ,C A,
B A, C B ,C A, B A, C B ,C A, B A, C B ,C A, B A,
C B ,C A C A C
Use this formula to find the unknown C A B B
backcalc(A,C) A backcalc (B,C) C A B B
backcalc(A,C) A backcalc(B,C) C A B B
factor(A,C) A factor(B,C) C A / B B
1/factor(A,C) A factor(1/B,C) C A B B
factor(log A, log C) A exp(factor(B, log C)) C
2 A A C / 2 C A 2 A sqrt(C)
71Hard with probability distributions
- Inverting the equation doesnt work
- Available analytical algorithms are unstable for
almost all problems - Except in a few special cases, Monte Carlo
simulation cannot compute backcalculations trial
and error methods are required
72Precise distributions dont work
- Precise distributions cant express the target
- A specification for shielding giving a prescribed
distribution of doses seems to say we want some
doses to be high - Any distribution to the left would be better
- A p-box on the dose target expresses this idea
73Conclusions
74New organization
- In the past, focus on where uncertainty arose
- Parameters
- Drivers
- Model structure
- Today, focus is on the nature of uncertainty
- Ignorance (epistemic uncertainty)
- Variability (aleatory uncertainty)
- Vagueness (semantic uncertainty, fuzziness)
- Confusion, mistakes
75Untenable assumptions
- Uncertainties are small
- Sources of variation are independent
- Uncertainties cancel each other out
- Linearized models good enough
- Underlying physics is known and modeled
- Computations are inexpensive to make
76Need ways to relax assumptions
- Possibly large uncertainties
- Non-independent, or unknown dependencies
- Uncertainties that may not cancel
- Arbitrary mathematical operations
- Model uncertainty
77Good engineering
Dumb luck
Honorable failure
Negligence
78Take-home messages
- It seems antiscientific (or at least silly) to
say you know more than you do - Bayesian decision making always yields one
answer, even if this is not really tenable - IP tells you when you need to be careful and
reserve judgment
79References
http//www.sciencemag.org/cgi/content/short/310/57
54/1680 http//www.sciencedirect.com/science?_ob
ArticleURL_udiB6T24-3VXBPWR-1_user10_handleV
-WA-A-W-V-MsSWYWW-UUA-U-AABDWWZUBV-AABVYUZYBV-CVUE
BVVZZ-V-U_fmtsummary_coverDate012F312F1996_
rdoc1_origbrowse_srch23toc234908231996239
99419998237027221_cdi4908viewc_acctC000050
221_version1_urlVersion0_userid10md5c6985d
af53c5402c195c1106cec9622f
- Cosmides, L., and J. Tooby. 1996. Are humans good
intuitive statisticians after All? Rethinking
some conclusions from the literature on judgment
under uncertainty. Cognition 581-73. - Hsu, M., M. Bhatt, R. Adolphs, D. Tranel, and
C.F. Camerer. 2005. Neural systems responding to
degrees of uncertainty in human decision-making.
Science 3101680-1683. - Kmietowicz, Z.W. and A.D. Pearman. 1981. Decision
Theory and Incomplete Knowledge. Gower,
Hampshire, England. - Knight, F.H. 1921. Risk, Uncertainty and Profit.
L.S.E., London. - Troffaes, M. 2004. Decision making with imprecise
probabilities a short review. The SIPTA
Newsletter 2(1) 4-7. - Walley, P. 1991. Statistical Reasoning with
Imprecise Probabilities. Chapman and Hall,
London.
80Web-accessible reading
- http//maths.dur.ac.uk/dma31jm/durham-intro.pdf
- (Gert de Coomans gentle introduction to
imprecise probabilities) - http//www.cs.cmu.edu/qbayes/Tutorial/quasi-bayes
ian.html - (Fabios Cozmans introduction to imprecise
probabilities) - http//idsia.ch/zaffalon/events/school2004/school
.htm - (summer school on imprecise probabilities)
- http//www.sandia.gov/epistemic/Reports/SAND2002-4
015.pdf - (introduction to p-boxes and related structures)
- http//www.ramas.com/depend.zip
- (handling dependencies in uncertainty modeling)
- http//www.ramas.com/bayes.pdf
- (introduction to Bayesian and robust Bayesian
methods in risk analysis) - http//www.ramas.com/intstats.pdf
81End
82