Title: Chapter 1: The DP Algorithm
1Chapter 1 The DP Algorithm
- To do
- sequential decision-making
- state
- random elements
- discrete-time stochastic dynamic system
- optimal control/decision problem
- actions vs. strategy (information gathering,
feedback) - Illustrated via examples, later on the general
model will be described.
2Example Inventory Control Problem
0
1
2
k-1
k
k1
N-1
N
Quantity of a certain item, e.g. gas in a service
station, oil in a refinery, cars in a dealership,
spare parts in a maintenance facility, etc. The
stock is checked at equally spaced periods in
time, e.g. every morning, at the end of each
week, etc. At those times, a decision must be
made as to what quantity of the item to order, so
that demand over the present period is
satisfactorily met (we will give a quantitative
meaning to this).
kth period
k-1
k
k1
check stock, place order
3Example Inventory Control Problem
- Stochastic Difference Equation
- xk1 xk uk wk
-
- xk stock at the beginning of kth period
- uk quantity ordered at beginning of kth
period. Assume delivered during kth period. - wk demand during kth period, wk stochastic
process - assume real-valued variables
4Example Inventory Control Problem
- Negative stock is interpreted as excess demand,
which is backlogged and filled ASAP. - Cost of operation
- purchasing cost cuk (c cost per unit)
- H(xk1) penalty for holding and storage of extra
quantity (xk1gt0), or for shortage (xk1lt0) - Cost for period k cuk H(xkuk-wk)
- g(xk,uk,wk)
xk1
5Example Inventory Control Problem
Let or
6Example Inventory Control Problem
Objective to minimize, in some meaningful
sense, the total cost of operation over a finite
number of periods (finite horizon) total
cost over N periods
7Example Inventory Control Problem
- Two distinct situations can arise
- Deterministic Case xo is perfectly known, and
the demands are known in advance to the manager. - at k0, all future demands are known w0, w1,
..., wN-1. - ? select all orders at once, so as to exactly
meet the demand - ? x1 x2 ... xN-1 0
- 0 x1 x0 u0 w0
- ? u0 w0 x0
- uk wk, 1 ? k ? N-1
- fixed order schedule
assume x0 ? w0
8Example Inventory Control Problem
- What we do is to select a set of fixed actions
(numbers, i.e. precomputed order schedule). - At the beginning of period k, wk becomes known
(perfect forecast). Hence, we must gather
information and make decisions sequentially. - strategy rule for making decisions based on
information as it becomes available
forecast
9Stochastic Case
- Stochastic Case x0 is perfectly known (can
generalize to case when only distribution is
known), but is a random process. - Assume that are i.i.d., -valued
r.v. , with pdf fw , i.e.
Independent of k
Pw Probability distribution or measure,
i.e. is the problem that takes a value
in the set
10Stochastic Case
- Note that the stock is now a r.v.
- Alternatively, we can describe the evolution of
the system in terms of a transition law - Prob
- Prob
- Prob
-
11Stochastic Case
- Also, the cost is a random quantity minimize
expected cost - Action select all orders (numbers) at k0
most likely not optimal (reduces to nonlinear
programming problem) - VS
- Strategy select a sequence of functions
- s.t.
-
Information available of kth period
difficult problem ! Optimization is over a
function space
12Stochastic Dynamic Program
- Let ? (?0, , ?1, ... , ?N-1) control /
decision strategy, policy, law - ? set of all admissible strategies
(e.g. ?k(x) ? 0)
Then, the stochastic DP problem is
minimize
s.t. ? ?? ??
If the problem is feasible, then ? and optimal
strategy ?, i.e.
13Summary of the Problem
1-Discrete time Stochastic System
system equation
transition law
Note No backlogging
14Stochastic Dynamic Program
2-Stochastic element , assumed i.i.d. for
example, will generalize to depending on
xk and uk.
3-Control constraint
if there is a maximum capacity M,
then,
4-Additive cost
15Stochastic Dynamic Program
5-Optimization over admissible strategies
We will see later on that this problem has a neat
closed form solution
for some (threshold levels) Tk base-stock policy
16Role of Information Actions Vs. Strategies
Example Let a two-stage problem be given as
0
0
where w0 is a random variable s.t. it takes
values ?1 w. p. , i.e.
17Role of Information Actions Vs. Strategies
Problem A Choose actions (u0 , u1) (open loop,
control schedule) to minimize
Equivalently, let
N2
minimize
s.t. ()
18Role of Information Actions Vs. Strategies
Solution A
Case (i)
19Role of Information Actions Vs. Strategies
Case (ii)
20Role of Information Actions Vs. Strategies
Can be anything, then chooseappropiately.
No information gathering we choose
at the start and do not take in to
consideration x1 at the beginning of stage 1.
21Role of Information Actions Vs. Strategies
Problem B Choose u0 and u1 sequentially, using
the observed value of x1.
Solution B from (), we select
Sequential decision-making, feedback control.
Thus to take decision u1, we wait until outcome
x1 becomes available, and act accordingly.
22Role of Information Actions Vs. Strategies
Note information gathering doesnt always help
Let
(Deterministic case)
Do not gain anything by making decisions
sequentially
23Discrete-Time Stochastic Dynamic System Modeland
Optimal Decision / Control Problem
1-Discrete time stochastic dynamic system (t, k
can be time or events)
state space of time k
control space
disturbance space (countable)
Also, depending on the state of the system, there
are constraints on the actions that can be taken
Non empty subset
24Discrete-Time Stochastic Dynamic System Modeland
Optimal Decision / Control Problem
2-Stochastic disturbance ?wk .
probability measure (distribution), may depend
explicitly in time, current state and action, but
not on previous disturbances wk-1, , w0 .
25Discrete-Time Stochastic Dynamic System Modeland
Optimal Decision / Control Problem
3-Admissible Control / Decision Laws (Strategies,
Policies)
Define information patterns !
?Feasible policies ?Markov -Deterministic
-Randomize
and
()
() holds
26Discrete-Time Stochastic Dynamic System Modeland
Optimal Decision / Control Problem
4-Finite Horizon Optimal Control / Decision
Problem Given an initial state x0 , and cost
functions gk , k0, , N-1 find ? ? ?
thatminimizes the cost functional
k0, , N-1
subject to the system equation constraint
27Discrete-Time Stochastic Dynamic System Modeland
Optimal Decision / Control Problem
We say that ? ? ? is optimal for the initial
state x0 if
Optimal N-stage cost (or value) function
Likewise, for ? gt 0 given, is said to be
?-optimal if
28Discrete-Time Stochastic Dynamic System Modeland
Optimal Decision / Control Problem
This stochastic optimal control problem is
difficult! we are optimizing over strategies
The Dynamic Programming Algorithm will give us
necessary and sufficient conditions to decompose
this problem into a sequence of coupled
minimization problems over actions,
(optimization) from which we will obtain
.
DP is only general approach for sequential design
making under uncertainty.
29Alternative System Description
Given a dynamic description of a system via a
system equation
Then we can alternatively describe the system via
a transition law.
30Alternative System Description
Given xk and uk , xk1 has distribution
P
? System equation ? system transition law