Decision Making Under Uncertainty

1 / 60
About This Presentation
Title:

Decision Making Under Uncertainty

Description:

Decision Making Under Uncertainty Russell and Norvig: ch 16, 17 CMSC 671 Fall 2005 material from Lise Getoor, Jean-Claude Latombe, and Daphne Koller – PowerPoint PPT presentation

Number of Views:5
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Decision Making Under Uncertainty


1
Decision Making Under Uncertainty
  • Russell and Norvig ch 16, 17
  • CMSC 671 Fall 2005

material from Lise Getoor, Jean-Claude Latombe,
and Daphne Koller
2
Decision Making Under Uncertainty
  • Many environments have multiple possible outcomes
  • Some of these outcomes may be good others may be
    bad
  • Some may be very likely others unlikely
  • Whats a poor agent to do??

3
Non-Deterministic vs. Probabilistic Uncertainty
  • a,b,c
  • decision that is best for worst case

Non-deterministic model
Probabilistic model
Adversarial search
4
Expected Utility
  • Random variable X with n values x1,,xn and
    distribution (p1,,pn)E.g. X is the state
    reached after doing an action A under uncertainty
  • Function U of XE.g., U is the utility of a state
  • The expected utility of A is EUA
    Si1,,n p(xiA)U(xi)

5
One State/One Action Example
U(S0) 100 x 0.2 50 x 0.7 70 x 0.1
20 35 7 62
6
One State/Two Actions Example
  • U1(S0) 62
  • U2(S0) 74
  • U(S0) maxU1(S0),U2(S0)
  • 74

80
7
Introducing Action Costs
  • U1(S0) 62 5 57
  • U2(S0) 74 25 49
  • U(S0) maxU1(S0),U2(S0)
  • 57

-5
-25
80
8
MEU Principle
  • A rational agent should choose the action that
    maximizes agents expected utility
  • This is the basis of the field of decision theory
  • The MEU principle provides a normative criterion
    for rational choice of action

AI is Solved!!!
9
Not quite
  • Must have complete model of
  • Actions
  • Utilities
  • States
  • Even if you have a complete model, will be
    computationally intractable
  • In fact, a truly rational agent takes into
    account the utility of reasoning as
    well---bounded rationality
  • Nevertheless, great progress has been made in
    this area recently, and we are able to solve much
    more complex decision-theoretic problems than
    ever before

10
Well look at
  • Decision-Theoretic Planning
  • Simple decision making (ch. 16)
  • Sequential decision making (ch. 17)

11
Axioms of Utility Theory
  • Orderability
  • (AgtB) ? (AltB) ? (AB)
  • Transitivity
  • (AgtB) ? (BgtC) ? (AgtC)
  • Continuity
  • AgtBgtC ? ?p p,A 1-p,C B
  • Substitutability
  • AB ? p,A 1-p,Cp,B 1-p,C
  • Monotonicity
  • AgtB ? (pq ? p,A 1-p,B gt q,A 1-q,B)
  • Decomposability
  • p,A 1-p, q,B 1-q, C p,A (1-p)q, B
    (1-p)(1-q), C

12
Money Versus Utility
  • Money ltgt Utility
  • More money is better, but not always in a linear
    relationship to the amount of money
  • Expected Monetary Value
  • Risk-averse U(L) lt U(SEMV(L))
  • Risk-seeking U(L) gt U(SEMV(L))
  • Risk-neutral U(L) U(SEMV(L))

13
Value Function
  • Provides a ranking of alternatives, but not a
    meaningful metric scale
  • Also known as an ordinal utility function
  • Remember the expectiminimax example
  • Sometimes, only relative judgments (value
    functions) are necessary
  • At other times, absolute judgments (utility
    functions) are required

14
Multiattribute Utility Theory
  • A given state may have multiple utilities
  • ...because of multiple evaluation criteria
  • ...because of multiple agents (interested
    parties) with different utility functions
  • We will talk about this more later in the
    semester, when we discuss multi-agent systems and
    game theory

15
Decision Networks
  • Extend BNs to handle actions and utilities
  • Also called influence diagrams
  • Use BN inference methods to solve
  • Perform Value of Information calculations

16
Decision Networks cont.
  • Chance nodes random variables, as in BNs
  • Decision nodes actions that decision maker can
    take
  • Utility/value nodes the utility of the outcome
    state.

17
RN example
18
Umbrella Network
take/dont take
P(rain) 0.4
umbrella
weather
have umbrella
forecast
P(havetake) 1.0 P(havetake)1.0
happiness
f w p(fw) sunny rain
0.3 rainy rain 0.7 sunny no rain
0.8 rainy no rain 0.2
U(have,rain) -25 U(have,rain) 0 U(have,
rain) -100 U(have, rain) 100
19
Evaluating Decision Networks
  • Set the evidence variables for current state
  • For each possible value of the decision node
  • Set decision node to that value
  • Calculate the posterior probability of the parent
    nodes of the utility node, using BN inference
  • Calculate the resulting utility for action
  • Return the action with the highest utility

20
Decision MakingUmbrella Network
Should I take my umbrella??
take/dont take
P(rain) 0.4
umbrella
weather
have umbrella
forecast
P(havetake) 1.0 P(havetake)1.0
happiness
f w p(fw) sunny rain
0.3 rainy rain 0.7 sunny no rain
0.8 rainy no rain 0.2
U(have,rain) -25 U(have,rain) 0 U(have,
rain) -100 U(have, rain) 100
21
Value of Information (VOI)
  • Suppose an agents current knowledge is E. The
    value of the current best action ? is

22
Value of InformationUmbrella Network
What is the value of knowing the weather forecast?
take/dont take
P(rain) 0.4
umbrella
weather
have umbrella
forecast
P(havetake) 1.0 P(havetake)1.0
happiness
f w p(fw) sunny rain
0.3 rainy rain 0.7 sunny no rain
0.8 rainy no rain 0.2
U(have,rain) -25 U(have,rain) 0 U(have,
rain) -100 U(have, rain) 100
23
Sequential Decision Making
  • Finite Horizon
  • Infinite Horizon

24
Simple Robot Navigation Problem
  • In each state, the possible actions are U, D, R,
    and L

25
Probabilistic Transition Model
  • In each state, the possible actions are U, D, R,
    and L
  • The effect of U is as follows (transition
    model)
  • With probability 0.8 the robot moves up one
    square (if the robot is already in the top
    row, then it does not move)

26
Probabilistic Transition Model
  • In each state, the possible actions are U, D, R,
    and L
  • The effect of U is as follows (transition
    model)
  • With probability 0.8 the robot moves up one
    square (if the robot is already in the top
    row, then it does not move)
  • With probability 0.1 the robot moves right one
    square (if the robot is already in the
    rightmost row, then it does not move)

27
Probabilistic Transition Model
  • In each state, the possible actions are U, D, R,
    and L
  • The effect of U is as follows (transition
    model)
  • With probability 0.8 the robot moves up one
    square (if the robot is already in the top
    row, then it does not move)
  • With probability 0.1 the robot moves right one
    square (if the robot is already in the
    rightmost row, then it does not move)
  • With probability 0.1 the robot moves left one
    square (if the robot is already in the
    leftmost row, then it does not move)

28
Markov Property
The transition properties depend only on the
current state, not on previous history (how that
state was reached)
29
Sequence of Actions
3,2
3
2
1
4
3
2
1
  • Planned sequence of actions (U, R)

30
Sequence of Actions
3
2
1
4
3
2
1
  • Planned sequence of actions (U, R)
  • U is executed

31
Histories
3
2
1
4
3
2
1
  • Planned sequence of actions (U, R)
  • U has been executed
  • R is executed
  • There are 9 possible sequences of states
    called histories and 6 possible final states
    for the robot!

32
Probability of Reaching the Goal
3
Note importance of Markov property in this
derivation
2
1
4
3
2
1
  • P(4,3 (U,R).3,2)
  • P(4,3 R.3,3) x
    P(3,3 U.3,2)
    P(4,3 R.4,2) x P(4,2 U.3,2)
  • P(3,3 U.3,2) 0.8
  • P(4,2 U.3,2) 0.1
  • P(4,3 R.3,3) 0.8
  • P(4,3 R.4,2) 0.1
  • P(4,3 (U,R).3,2) 0.65

33
Utility Function
  • 4,3 provides power supply
  • 4,2 is a sand area from which the robot cannot
    escape

34
Utility Function
  • 4,3 provides power supply
  • 4,2 is a sand area from which the robot cannot
    escape
  • The robot needs to recharge its batteries

35
Utility Function
  • 4,3 provides power supply
  • 4,2 is a sand area from which the robot cannot
    escape
  • The robot needs to recharge its batteries
  • 4,3 or 4,2 are terminal states

36
Utility of a History
  • 4,3 provides power supply
  • 4,2 is a sand area from which the robot cannot
    escape
  • The robot needs to recharge its batteries
  • 4,3 or 4,2 are terminal states
  • The utility of a history is defined by the
    utility of the last state (1 or 1) minus
    n/25, where n is the number of moves

37
Utility of an Action Sequence
1
3
-1
2
1
4
3
2
1
  • Consider the action sequence (U,R) from 3,2

38
Utility of an Action Sequence
1
3
-1
2
1
4
3
2
1
  • Consider the action sequence (U,R) from 3,2
  • A run produces one among 7 possible histories,
    each with some probability

39
Utility of an Action Sequence
1
3
-1
2
1
4
3
2
1
  • Consider the action sequence (U,R) from 3,2
  • A run produces one among 7 possible histories,
    each with some probability
  • The utility of the sequence is the expected
    utility of the histories
    U ShUh P(h)

40
Optimal Action Sequence
1
3
-1
2
1
4
3
2
1
  • Consider the action sequence (U,R) from 3,2
  • A run produces one among 7 possible histories,
    each with some probability
  • The utility of the sequence is the expected
    utility of the histories
  • The optimal sequence is the one with maximal
    utility

41
Optimal Action Sequence
1
3
-1
2
1
4
3
2
1
  • Consider the action sequence (U,R) from 3,2
  • A run produces one among 7 possible histories,
    each with some probability
  • The utility of the sequence is the expected
    utility of the histories
  • The optimal sequence is the one with maximal
    utility
  • But is the optimal action sequence what we want
    to compute?

42
Reactive Agent Algorithm
  • Repeat
  • s ? sensed state
  • If s is terminal then exit
  • a ? choose action (given s)
  • Perform a

43
Policy (Reactive/Closed-Loop Strategy)
  • A policy P is a complete mapping from states to
    actions

44
Reactive Agent Algorithm
  • Repeat
  • s ? sensed state
  • If s is terminal then exit
  • a ? P(s)
  • Perform a

45
Optimal Policy
1
3
-1
2
1
4
3
2
1
  • A policy P is a complete mapping from states to
    actions
  • The optimal policy P is the one that always
    yields a history (ending at a terminal state)
    with maximal
  • expected utility

46
Optimal Policy
1
3
-1
2
1
4
3
2
1
  • A policy P is a complete mapping from states to
    actions
  • The optimal policy P is the one that always
    yields a history with maximal expected utility

47
Additive Utility
  • History H (s0,s1,,sn)
  • The utility of H is additive iff
    U(s0,s1,,sn) R(0) U(s1,,sn) S R(i)

Reward
48
Additive Utility
  • History H (s0,s1,,sn)
  • The utility of H is additive iff
    U(s0,s1,,sn) R(0) U(s1,,sn) S R(i)
  • Robot navigation example
  • R(n) 1 if sn 4,3
  • R(n) -1 if sn 4,2
  • R(i) -1/25 if i 0, , n-1

49
Principle of Max Expected Utility
  • History H (s0,s1,,sn)
  • Utility of H U(s0,s1,,sn) S R(i)
  • First-step analysis ?
  • U(i) R(i) maxa SkP(k a.i) U(k)
  • P(i) arg maxa SkP(k a.i) U(k)

50
Value Iteration
  • Initialize the utility of each non-terminal
    state si to U0(i) 0
  • For t 0, 1, 2, , do Ut1(i) ? R(i)
    maxa SkP(k a.i) Ut(k)

51
Value Iteration
Note the importance of terminal states
and connectivity of the state-transition graph
  • Initialize the utility of each non-terminal
    state si to U0(i) 0
  • For t 0, 1, 2, , do Ut1(i) ? R(i)
    maxa SkP(k a.i) Ut(k)

0.812
0.868
0.918
1
3
0.762
0.660
-1
2
0.705
0.655
0.388
0.611
1
4
3
2
1
52
Policy Iteration
  • Pick a policy P at random

53
Policy Iteration
  • Pick a policy P at random
  • Repeat
  • Compute the utility of each state for P Ut1(i)
    ? R(i) SkP(k P(i).i) Ut(k)

54
Policy Iteration
  • Pick a policy P at random
  • Repeat
  • Compute the utility of each state for P Ut1(i)
    ? R(i) SkP(k P(i).i) Ut(k)
  • Compute the policy P given these utilities
    P(i) arg maxa SkP(k a.i) U(k)

55
Policy Iteration
  • Pick a policy P at random
  • Repeat
  • Compute the utility of each state for P Ut1(i)
    ? R(i) SkP(k P(i).i) Ut(k)
  • Compute the policy P given these utilities
    P(i) arg maxa SkP(k a.i) U(k)
  • If P P then return P

56
n-Step decision process
  • Assume that
  • Each state reached after n steps is terminal,
    hence has known utility
  • There is a single initial state
  • Any two states reached after i and j steps are
    different

57
n-Step Decision Process
  • P(i) arg maxa SkP(k a.i) U(k)
  • U(i) R(i) maxa SkP(k a.i) U(k)
  • For j n-1, n-2, , 0 do
  • For every state si attained after step j
  • Compute the utility of si
  • Label that state with the corresponding action

58
What is the Difference?
  • P(i) arg maxa SkP(k a.i) U(k)
  • U(i) R(i) maxa SkP(k a.i) U(k)

59
Infinite Horizon
In many problems, e.g., the robot navigation
example, histories are potentially unbounded and
the same state can be reached many times
What if the robot lives forever?
One trick Use discounting to make
infinite Horizon problem mathematically tractable
1
3
-1
2
1
4
3
2
1
60
Example Tracking a Target
  • An optimal policy cannot be computed ahead
  • of time
  • - The environment might be unknown
  • The environment may only be partially observable
  • The target may not wait
  • ? A policy must be computed on-the-fly
  • The robot must keep the target in view
  • The targets trajectory is not known in
    advance
  • The environment may
  • or may not be known

61
POMDP (Partially Observable Markov Decision
Problem)
  • A sensing operation returns multiple states,
    with a probability distribution
  • Choosing the action that maximizes the
  • expected utility of this state distribution
    assuming state utilities computed as
  • above is not good enough, and actually
  • does not make sense (is not rational)

62
Example Target Tracking
63
Summary
  • Decision making under uncertainty
  • Utility function
  • Optimal policy
  • Maximal expected utility
  • Value iteration
  • Policy iteration
Write a Comment
User Comments (0)