DecisionTheoretic Planning with Asynchronous Events

About This Presentation
Title:

DecisionTheoretic Planning with Asynchronous Events

Description:

reboot m2: 2.4. 11. Notes on Semantics ... One crash event and one reboot action per machine. At most one action enabled at any time ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 46
Provided by: hakany3
Learn more at: http://www.tempastic.org

less

Transcript and Presenter's Notes

Title: DecisionTheoretic Planning with Asynchronous Events


1
Decision-Theoretic Planning with Asynchronous
Events
  • Håkan L. S. Younes
  • Carnegie Mellon University

2
Introduction
  • Asynchronous processes are abundant in the real
    world
  • Discrete-time models are inappropriate for
    systems with asynchronous events
  • Generalized semi-Markov (decision) processes are
    great for this!

3
Stochastic Processes with Asynchronous Events
m1
m2
m1 upm2 up
t 0
4
Stochastic Processes with Asynchronous Events
m1
m2
m2 crashes
m1 upm2 up
m1 upm2 down
t 0
t 2.5
5
Stochastic Processes with Asynchronous Events
m1
m2
m1 crashes
m2 crashes
m1 upm2 up
m1 upm2 down
m1 downm2 down
t 0
t 2.5
t 3.1
6
Stochastic Processes with Asynchronous Events
m1
m2
m1 crashes
m1 crashes
m2 rebooted
m1 upm2 up
m1 upm2 down
m1 downm2 down
m1 downm2 up
t 0
t 2.5
t 3.1
t 4.9
7
A Model of StochasticDiscrete Event Systems
  • Generalized semi-Markov process (GSMP) Matthes
    1962
  • A set of events E
  • A set of states S

8
Events
  • In a state s, events Es ? E are enabled
  • With each event e is associated
  • A distribution Ge governing the time e must
    remain enabled before it triggers
  • A next-state probability distribution pe(s's)

9
Semantics of GSMP Model
  • Associate a real-valued clock te with e
  • For each e ? Es sample te from Ge
  • Let e argmine ? Es te, t mine ? Es te
  • Sample s' from pe(s's)
  • For each e ? Es'
  • te' te t if e ? Es \ e
  • sample te' from Ge otherwise
  • Repeat with s s' and te te'

10
Semantics Example
m2 crashes
m1 upm2 up
m1 upm2 down
m2 crashes 2.5 m1 crashes 3.1
m1 crashes 0.6 reboot m2 2.4
11
Notes on Semantics
  • Events that remain enabled across state
    transitions without triggering are not
    rescheduled
  • Asynchronous events!
  • Differs from semi-Markov process in this respect
  • Continuous-time Markov chain if all Ge are
    exponential distributions

12
General State-Space Markov Chain (GSSMC)
  • Model is Markovian if we include the clocks in
    the state space
  • Extended state space X
  • Next-state distribution f(x'x) well-defined
  • Clock values are not know to observer
  • Time events have been enabled is know

13
Observation Model
  • An observation o is a state s and a real value ue
    for each event representing the time e has
    currently been enabled
  • f(xo) is well-defined

14
Observations Example
m2 crashes
m1 upm2 up
m1 upm2 down
Actual model
m2 crashes 2.5 m1 crashes 3.1
m1 crashes 0.6 reboot m2 2.4
m2 crashes
m1 upm2 up
m1 upm2 down
Observed model
m2 crashes 0.0 m1 crashes 0.0
m1 crashes 2.5 reboot m2 0.0
15
Actions and Policies (GSMDPs)
  • Identify a set A ? E of controllable events
    (actions)
  • A policy is a mapping from observations to sets
    of actions
  • Action choice can change at any time in a state s

16
Rewards and Discounts
  • Lump sum reward k(s,e,s') associated with
    transition from s to s' caused by e
  • Continuous reward rate r(s,a) associated with a
    being enabled in s
  • Discount factor ?
  • Unit reward earned at time t counts as e ?t

17
Value Function for GSMDPs
18
GSMDP Solution Method Younes Simmons 2004
Continuous-time MDP
GSMDP
Discrete-time MDP
GSMDP
Continuous-time MDP
Phase-type distributions
Uniformization Jensen 1953
19
Continuous Phase-Type Distributions Neuts 1981
  • Time to absorption in a continuous-time Markov
    chain with n transient states

20
Exponential Distribution
?
0
21
Two-Phase Coxian Distribution
p?1
?2
1
0
(1 p)?1
22
Generalized Erlang Distribution
p?
?
?
?
n 1
1
0

(1 p)?
23
Method of Moments
  • Approximate general distribution G with
    phase-type distribution PH by matching the first
    n moments

24
Moments of a Distribution
  • The ith moment ?i EX i
  • Mean ?1
  • Variance ? 2 ?2 ?12
  • Coefficient of variation cv ? /?1

25
Matching One Moment
  • Exponential distribution ? 1/?1

26
Matching Two Moments
Exponential Distribution
cv 2
0
1
27
Matching Two Moments
Exponential Distribution
cv 2
0
1
Generalized Erlang Distribution
28
Matching Two Moments
Two-Phase Coxian Distribution
Exponential Distribution
cv 2
0
1
Generalized Erlang Distribution
29
Matching Moments Example 1
  • Weibull distribution W(1,1/2)
  • ?1 2, cv2 5

30
Matching Moments Example 2
  • Uniform distribution U(0,1)
  • ?1 1/2, cv2 1/3

31
Matching More Moments
  • Closed-form solution for matching three moments
    of positive distributions Osogami
    Harchol-Balter 2003
  • Combination of Erlang distribution and two-phase
    Coxian distribution

32
Approximating GSMDP with Continuous-time MDP
  • Each event with a non-exponential distribution is
    approximated by a set of events with exponential
    distributions
  • Phases become part of state description

33
Policy Execution
  • Phases represent discretization into
    random-length intervals of the time events have
    been enabled
  • Phases are not part of real model
  • Simulate phase transition during execution

34
The Foremans Dilemma
  • When to enable Service action in Working
    state?

Service Exp(10)
Fail G
Workingc 1
Failedc 0
Servicedc 0.5
Return Exp(1)
Replace Exp(1/100)
35
The Foremans Dilemma Optimal Solution
  • Find t0 that maximizes v0

Y is the time to failure in Working state
36
The Foremans Dilemma SMDP Solution
  • Same formulas, but restricted choice
  • Action is immediately enabled (t0 0)
  • Action is never enabled (t0 8)

37
The Foremans Dilemma Performance
Failure-time distribution U(5,x)
100
90
80
Percent of optimal
70
60
50
x
5
10
15
20
25
30
35
40
45
50
38
The Foremans Dilemma Performance
Failure-time distribution W(1.6x,4.5)
100
90
80
Percent of optimal
70
60
50
x
5
10
15
20
25
30
35
40
0
39
System Administration
  • Network of n machines
  • Reward rate c(s) k in states where k machines
    are up
  • One crash event and one reboot action per machine
  • At most one action enabled at any time

40
System Administration Performance
Reboot-time distribution U(0,1)
50
45
40
35
Reward
30
25
20
15
n
1
2
3
4
5
6
7
8
9
10
11
12
13
41
System Administration Performance
42
The Role of Phases
  • Foremans dilemma
  • Phases permit delay in enabling of actions
  • System administration
  • Phases allow us to take into account the time an
    action has already been enabled

43
Summary
  • Generalized semi-Markov (decision) processes
    allow asynchronous events
  • Phase-type distributions can be used to
    approximate a GSMDP with an MDP
  • Allows us to approximately solve GSMDPs using
    existing MDP techniques
  • Phase does matter!

44
Future Work
  • Discrete phase-type distributions
  • Handles deterministic distributions
  • Avoids uniformization step
  • Value function approximation
  • Take advantage of GSMDP structure
  • Other optimization criteria
  • Finite horizon, etc.

45
Tempastic-DTP
  • A tool for GSMDP planning
  • http//www.cs.cmu.edu/lorens/tempastic-dtp.html
Write a Comment
User Comments (0)