Title: Decision Making Under Uncertainty
1Decision Making Under Uncertainty
- Russell and Norvig ch 16
- CMSC421 Fall 2006
2Utility-Based Agent
3Non-deterministic vs. Probabilistic Uncertainty
- a,b,c
- decision that is best for worst case
Non-deterministic model
Probabilistic model
Adversarial search
4Expected Utility
- Random variable X with n values x1,,xn and
distribution (p1,,pn)E.g. Xi is
Resulti(A)Do(A), E, the state reached after
doing an action A given E, what we know about the
current state - Function U of XE.g., U is the utility of a state
- The expected utility of A is EUAE Si1,,n
p(xiA)U(xi) Si1,,n
p(Resulti(A)Do(A),E)U(Resulti(A))
5One State/One Action Example
U(S0) 100 x 0.2 50 x 0.7 70 x 0.1
20 35 7 62
6One State/Two Actions Example
- U1(S0) 62
- U2(S0) 74
- U(S0) maxU1(S0),U2(S0)
- 74
80
7Introducing Action Costs
- U1(S0) 62 5 57
- U2(S0) 74 25 49
- U(S0) maxU1(S0),U2(S0)
- 57
-5
-25
80
8MEU Principle
- rational agent should choose the action that
maximizes agents expected utility - this is the basis of the field of decision theory
- normative criterion for rational choice of action
AI is Solved!!!
9Not quite
- Must have complete model of
- Actions
- Utilities
- States
- Even if you have a complete model, will be
computationally intractable - In fact, a truly rational agent takes into
account the utility of reasoning as
well---bounded rationality - Nevertheless, great progress has been made in
this area recently, and we are able to solve much
more complex decision theoretic problems than
ever before
10Well look at
- Decision Theoretic Reasoning
- Simple decision making (ch. 16)
- Sequential decision making (ch. 17)
11Preferences
- An agent chooses among prizes (A, B, etc.) and
lotteries, i.e., situations with uncertain
prizes - Lottery L p, A (1 p), B
- Notation A gt B A preferred to B A ? B
indifference between A and B A B B not
preferred to A
12Rational Preferences
- Idea preferences of a rational agent must obey
constraints - Axioms of Utility Theory
- Orderability (A gt B) v (B gt A) v (A ? B)
- Transitivity (A gt B) (B gt C) ?(A gt C)
- Contitnuity A gt B gt C ? ?p p, A 1-p,C ? B
- Substitutability A ? B ? p, A 1-p,C ? p,
B 1-p,C - Monotonicity A gt B ? (p q ? p, A 1-p, B
q, A 1-q, B)
13Rational Preferences
- Violating the constraints leads to irrational
behavior - E.g an agent with intransitive preferences can
be induced to give away all its money - if B gt C, than an agent who has C would pay some
amount, say 1, to get B - if A gt B, then an agent who has B would pay, say,
1 to get A - if C gt A, then an agent who has A would pay, say,
1 to get C - .oh, oh!
14Rational Preferences ? Utility
- Theorem (Ramsey, 1931, von Neumann and
Morgenstern, 1944) Given preferences satisfying
the constraints, there exists a real-valued
function U such that U(A) U(B) ? A
B U(p1,S1,pn,Sn)?i piU(Si) - MEU principle Choose the action that maximizes
expected utility
15Utility Assessment
- Standard approach to assessment of human
utilitescompare a given state A to a standard
lottery Lp that has best possible prize w/ prob.
p worst possible catastrophy w/ prob. (1-p) - adjust lottery probability p until A?Lp
continue as before
p
A ? Lp
instant death
1 - p
16Aside Money ? Utility function
- Given a lottery L with expected monetrary value
EMV(L), - usually U(L) lt U(EMV(L))
- e.g., people are risk-averse
- Would you rather have 1,000,000 for sure, or a
lottery with 0.5, 0 0.5, 3,000,000?
17Decision Networks
- Extend BNs to handle actions and utilities
- Also called Influence diagrams
- Make use of BN inference
- Can do Value of Information calculations
18Decision Networks cont.
- Chance nodes random variables, as in BNs
- Decision nodes actions that decision maker can
take - Utility/value nodes the utility of the outcome
state.
19RN example
20Prenatal Testing Example
21Umbrella Network
take/dont take
P(rain) 0.4
Take Umbrella
rain
umbrella
P(umbtake) 1.0 P(umbtake)1.0
happiness
U(umb, rain) 100 U(umb, rain) -100
U(umb,rain) 0 U(umb,rain) -25
22Evaluating Decision Networks
- Set the evidence variables for current state
- For each possible value of the decision node
- Set decision node to that value
- Calculate the posterior probability of the parent
nodes of the utility node, using BN inference - Calculate the resulting utility for action
- return the action with the highest utility
23Umbrella Network
take/dont take
P(rain) 0.4
Take Umbrella
rain
umbrella
P(umbtake) 1.0 P(umbtake) 0
happiness
U(umb, rain) 100 U(umb, rain) -100
U(umb,rain) 0 U(umb,rain) -25
24Umbrella Network
take/dont take
P(rain) 0.4
Take Umbrella
rain
umbrella
1
P(umbtake) 0.8 P(umbtake)0.1
happiness
umb rain P(umb,rain take)
0 0 0.2 x 0.6
0 1 0.2 x 0.4
1 0 0.8 x 0.6
1 1 0.8 x 0.4
U(umb, rain) 100 U(umb, rain) -100
U(umb,rain) 0 U(umb,rain) -25
1 EU(take) 100 x .12 -100 x 0.08 0 x 0.48
-25 x .32 ???
25Umbrella Network
So, in this case I would?
take/dont take
P(rain) 0.4
Take Umbrella
rain
umbrella
2
P(umbtake) 0.8 P(umbtake)0.1
happiness
umb rain P(umb,rain take)
0 0 0. 9 x 0.6
0 1 0.9 x 0.4
1 0 0.1 x 0.6
1 1 0.1 x 0.4
U(umb, rain) 100 U(umb, rain) -100
U(umb,rain) 0 U(umb,rain) -25
2 EU(take) 100 x .54 -100 x 0.36 0 x
0.06 -25 x .04 ???
26Value of Information
- Idea Compute the expected value of acquiring
possible evidence - Example buying oil drilling rights
- Two blocks A and B, exactly one of them has oil,
worth k - Prior probability 0.5
- Current price of block is k/2
- What is the value of getting a survey of A done?
- Survey will say oil in A or no oil in A w/
prob. 0.5 - Compute expected value of information (VOI)
- expected value of best action given the
infromation minus expected value of best action
without information - VOI(Survey) 0.5 x value of buy A given oil in
A 0.5 x value of buy B
given no oil in A 0 ??
27Value of Information (VOI)
- suppose agents current knowledge is E. The
value of the current best action ? is
28Umbrella Network
take/dont take
P(rain) 0.4
Take Umbrella
rain
umbrella
forecast
P(umbtake) 0.8 P(umbtake)0.1
happiness
R P(FrainyR)
0 0.2
1 0.7
U(umb, rain) 100 U(umb, rain) -100
U(umb,rain) 0 U(umb,rain) -25
29VOI
- VOI(forecast) P(rainy)EU(?rainy)
P(rainy)EU(?rainy) EU(?)
30umb rain P(umb,rain take, rainy)
0 0
0 1
1 0
1 1
umb rain P(umb,rain take, rainy)
0 0
0 1
1 0
1 1
3 EU(takerainy)
1 EU(takerainy)
umb rain P(umb,rain take, rainy)
0 0
0 1
1 0
1 1
umb rain P(umb,rain take, rainy)
0 0
0 1
1 0
1 1
4 EU(takerainy)
2 EU(takerainy)
31Umbrella Network
F P(RrainF)
0 0.2
1 0.7
take/dont take
Take Umbrella
rain
umbrella
forecast
P(umbtake) 0.8 P(umbtake)0.1
happiness
P(Frainy) 0.4
U(umb, rain) 100 U(umb, rain) -100
U(umb,rain) 0 U(umb,rain) -25
32umb rain P(umb,rain take, rainy)
0 0
0 1
1 0
1 1
umb rain P(umb,rain take, rainy)
0 0
0 1
1 0
1 1
3 EU(takerainy)
1 EU(takerainy)
umb rain P(umb,rain take, rainy)
0 0
0 1
1 0
1 1
umb rain P(umb,rain take, rainy)
0 0
0 1
1 0
1 1
4 EU(takerainy)
2 EU(takerainy)
33VOI
- VOI(forecast) P(rainy)EU(?rainy)
P(rainy)EU(?rainy) EU(?)
34Summary Simple Decision Making
- Decision Theory Probability Theory Utility
Theory - Rational Agent operates by MEU
- Decision Networks
- Value of Information