Stanford University - PowerPoint PPT Presentation

1 / 2
About This Presentation
Title:

Stanford University

Description:

Title: No Slide Title Author: Bill Mark Last modified by: Robotics Lab Created Date: 7/17/1997 2:58:55 PM Document presentation format: Custom Other titles – PowerPoint PPT presentation

Number of Views:9
Avg rating:3.0/5.0
Slides: 3
Provided by: BillM200
Category:

less

Transcript and Presenter's Notes

Title: Stanford University


1
Duke University
Stanford University
  • Value computed by linear programming
  • One variable V(x) for each state
  • One constraint for each state x and action a
  • Number of states and actions exponential!
  • Search and rescue
  • Factory management
  • Supply chain
  • Firefighting
  • Network routing
  • Air traffic control
  • Use variable elimination for Maximization
    Bertele Brioschi 72
  1. Pick local basis functions hi
  2. Single LP algorithm for Factored MDPs
  3. Compute Qis using one-step lookahead
  4. Coordination graph computes optimal action







)
,
(
)
,
(
max
)
,
(
)
,
(
max
A
A
Q
A
A
Q
A
A
Q
A
A
Q
4
2
4
4
3
3
3
1
2
2
1
1
,
,
A
A
A
A
4
3
1
2



)
,
(
)
,
(
)
,
(
max
A
A
g
A
A
Q
A
A
Q
3
2
1
3
1
2
2
1
1
,
,
A
A
A
  • Multiple, simultaneous decisions
  • Limited observability
  • Limited communication

3
2
1
Here we need only 23, instead of 63 sum
operations.
  • Limited communication for optimal action choice
  • Comm. bandwidth induced width of coord. graph

t1
t
  • Every time step
  • Observe only variables in Qi
  • Instantiate state in Qi ? Qi depends only on
    actions
  • Use Coordination Graph for optimal action
  • Represent as MDP
  • Action space joint action a for all agents
  • State space joint state x of all agents
  • Reward function total reward r
  • Action space is exponential
  • Action is assignment a a1,, an
  • State space
  • Exponential in variables
  • Global decision requires complete observation

Comparing to Distributed Reward and Distributed
Value Function algorithms Schneider et al. 99
Running Time
  • One step utility
  • SysAdmin Ai receives reward () if process
    completes
  • Total utility sum of rewards
  • Optimal action requires long-term planning
  • Long-term utility Q(x,a)
  • Expected reward, given current state x and action
    a
  • Optimal action at state x is

Exponentially many linear one nonlinear
constraint
  • Functions are factored, can use Variable
    Elimination
  • to represent the constraints

Q(A1,,A4, X1,,X4) ¼ Q1(A1, A4, X1,X4) Q2(A1,
A2, X1,X2) Q3(A2, A3, X2,X3) Q4(A3, A4,
X3,X4)
  • Q1 function of
  • Actions A1 and A2
  • Variables X1, X2 and X4
  • In general, Qi function of
  • Parents of hi
  • Parents of Ri
  • Compute Qi efficiently
  • Coordination graph for action
  • Where do his come from?

A1
X1
X1
h1
R1
A2
X2
X2
R2
Associated with Agent 3
A3
X3
X3
R3
X4
X4
Must choose action to maximize åi Qi
Number of constraints exponentially smaller
R4
A4
2
  • Local Qi
  • Limited communication (coordination graph)
  • Limited observation (observe variables in Qi
    only)
  • Where do Qis come from?
  • Factored MDPs
  • One-step lookahead
  • Solving MDP
Write a Comment
User Comments (0)
About PowerShow.com