Title: Stanford University
1Duke University
Stanford University
- Value computed by linear programming
- One variable V(x) for each state
- One constraint for each state x and action a
- Number of states and actions exponential!
- Search and rescue
- Factory management
- Supply chain
- Firefighting
- Network routing
- Air traffic control
- Use variable elimination for Maximization
Bertele Brioschi 72
- Pick local basis functions hi
- Single LP algorithm for Factored MDPs
- Compute Qis using one-step lookahead
- Coordination graph computes optimal action
)
,
(
)
,
(
max
)
,
(
)
,
(
max
A
A
Q
A
A
Q
A
A
Q
A
A
Q
4
2
4
4
3
3
3
1
2
2
1
1
,
,
A
A
A
A
4
3
1
2
)
,
(
)
,
(
)
,
(
max
A
A
g
A
A
Q
A
A
Q
3
2
1
3
1
2
2
1
1
,
,
A
A
A
- Multiple, simultaneous decisions
- Limited observability
- Limited communication
3
2
1
Here we need only 23, instead of 63 sum
operations.
- Limited communication for optimal action choice
- Comm. bandwidth induced width of coord. graph
t1
t
- Every time step
- Observe only variables in Qi
- Instantiate state in Qi ? Qi depends only on
actions - Use Coordination Graph for optimal action
- Represent as MDP
- Action space joint action a for all agents
- State space joint state x of all agents
- Reward function total reward r
- Action space is exponential
- Action is assignment a a1,, an
- State space
- Exponential in variables
- Global decision requires complete observation
Comparing to Distributed Reward and Distributed
Value Function algorithms Schneider et al. 99
Running Time
- One step utility
- SysAdmin Ai receives reward () if process
completes - Total utility sum of rewards
- Optimal action requires long-term planning
- Long-term utility Q(x,a)
- Expected reward, given current state x and action
a - Optimal action at state x is
Exponentially many linear one nonlinear
constraint
- Functions are factored, can use Variable
Elimination - to represent the constraints
Q(A1,,A4, X1,,X4) ¼ Q1(A1, A4, X1,X4) Q2(A1,
A2, X1,X2) Q3(A2, A3, X2,X3) Q4(A3, A4,
X3,X4)
- Q1 function of
- Actions A1 and A2
- Variables X1, X2 and X4
- In general, Qi function of
- Parents of hi
- Parents of Ri
- Compute Qi efficiently
- Coordination graph for action
- Where do his come from?
A1
X1
X1
h1
R1
A2
X2
X2
R2
Associated with Agent 3
A3
X3
X3
R3
X4
X4
Must choose action to maximize åi Qi
Number of constraints exponentially smaller
R4
A4
2- Local Qi
- Limited communication (coordination graph)
- Limited observation (observe variables in Qi
only) - Where do Qis come from?
- Factored MDPs
- One-step lookahead
- Solving MDP