Stanford University

About This Presentation

Title:

Stanford University

Description:

Title: No Slide Title Author: Bill Mark Last modified by: Robotics Lab Created Date: 7/17/1997 2:58:55 PM Document presentation format: Custom Other titles – PowerPoint PPT presentation

Number of Views:9

Avg rating:3.0/5.0

Slides: 3

Provided by: BillM200

Category:

more less

Transcript and Presenter's Notes

Title: Stanford University

1
Duke University
Stanford University

Value computed by linear programming
One variable V(x) for each state
One constraint for each state x and action a
Number of states and actions exponential!

Search and rescue
Factory management
Supply chain
Firefighting
Network routing
Air traffic control

Use variable elimination for Maximization
Bertele Brioschi 72

Pick local basis functions hi
Single LP algorithm for Factored MDPs
Compute Qis using one-step lookahead
Coordination graph computes optimal action

)
,
(
)
,
(
max
)
,
(
)
,
(
max
A
A
Q
A
A
Q
A
A
Q
A
A
Q
4
2
4
4
3
3
3
1
2
2
1
1
,
,
A
A
A
A
4
3
1
2

)
,
(
)
,
(
)
,
(
max
A
A
g
A
A
Q
A
A
Q
3
2
1
3
1
2
2
1
1
,
,
A
A
A

Multiple, simultaneous decisions
Limited observability
Limited communication

3
2
1
Here we need only 23, instead of 63 sum
operations.

Limited communication for optimal action choice
Comm. bandwidth induced width of coord. graph

t1
t

Every time step
Observe only variables in Qi
Instantiate state in Qi ? Qi depends only on
actions
Use Coordination Graph for optimal action

Represent as MDP
Action space joint action a for all agents
State space joint state x of all agents
Reward function total reward r
Action space is exponential
Action is assignment a a1,, an
State space
Exponential in variables
Global decision requires complete observation

Comparing to Distributed Reward and Distributed
Value Function algorithms Schneider et al. 99
Running Time

One step utility
SysAdmin Ai receives reward () if process
completes
Total utility sum of rewards
Optimal action requires long-term planning
Long-term utility Q(x,a)
Expected reward, given current state x and action
a
Optimal action at state x is

Exponentially many linear one nonlinear
constraint

Functions are factored, can use Variable
Elimination
to represent the constraints

Q(A1,,A4, X1,,X4) ¼ Q1(A1, A4, X1,X4) Q2(A1,
A2, X1,X2) Q3(A2, A3, X2,X3) Q4(A3, A4,
X3,X4)

Q1 function of
Actions A1 and A2
Variables X1, X2 and X4
In general, Qi function of
Parents of hi
Parents of Ri
Compute Qi efficiently
Coordination graph for action
Where do his come from?

A1
X1
X1
h1
R1
A2
X2
X2
R2
Associated with Agent 3
A3
X3
X3
R3
X4
X4
Must choose action to maximize åi Qi
Number of constraints exponentially smaller
R4
A4
2