Networked Distributed POMDPs: DCOPInspired Distributed POMDPs - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Networked Distributed POMDPs: DCOPInspired Distributed POMDPs

Description:

Number of Views:44

Avg rating:3.0/5.0

Slides: 24

Provided by: Empl206

Category:

more less

Transcript and Presenter's Notes

Title: Networked Distributed POMDPs: DCOPInspired Distributed POMDPs

1
Networked Distributed POMDPs DCOP-Inspired
Distributed POMDPs

2
Background DPOMDP

Distributed Partially Observable Markov Decision
Problems (DPOMDP) a decision theoretic approach
Performance linked to optimality of decision
making
Explicitly reasons about (/-ve) rewards and
uncertainty.
Current methods use centralized planning and
distributed execution
The complexity of finding optimal policy is
NEXP-Complete
In many domains, not all agents can interact or
affect each other
Most current DPOMDP algorithms do not exploit
locality of interaction

Disaster Rescue simulations
Distributed sensors
Battlefield simulations
3
Background DCOP

4
Key ideas and contributions

Key ideas
Exploit locality of interaction to enable
scale-up
Hybrid DCOP DPOMDP approach to collaboratively
find joint policy
Distributed offline planning and distributed
execution
Key contributions
ND-POMDP
Distributed POMDP model that captures locality of
interaction
Locally Interacting Distributed Joint
Equilibrium-based Search for Policies (LID-JESP)
Hill climbing like Distributed Breakout Algorithm
(DBA)
Distributed Parallel Algorithm for Finding
Locally Optimal Joint Policy
Globally Optimal Algorithm (GOA)
Variable Elimination

5
Outline

Sensor net domain
Networked Distributed POMDPs (ND-POMDPs)
Locally interacting distributed joint
equilibrium-based search for policies (LID-JESP)
Globally optimal algorithm
Experiments
Conclusions and Future Work

6
Example Domain

7
Networked Distributed POMDP

8
ND-POMDP

Transition independence Agent is local state
cannot be affected by other agents
Pi Si Su Ai Si ? 0,1
Pu Su Su ? 0,1
O O1 On , where Oi is set of observations
for agent i
E.g. Target present in sector
Observation independence Agent is observations
not dependent on others
Oi Si Su Ai Oi ? 0,1
Reward function R is decomposable
R(s,a) ?l Rl (sl1, slk, su, al1, alk)
l ? Ag, and k l
Goal To find a joint policy p lt p1, , pngt
where pi is the local policy of agent i such
that p maximizes the expected joint reward over
finite horizon T

9
ND-POMDP as a DCOP

R1 Ag1s cost for scanning R12 Reward for Ag1
and Ag2 tracking target
10
ND-POMDP theorems

Theorem 1 For an ND-POMDP, expected reward for a
policy ? is the sum of expected rewards for each
of the links for policy ?
Global value function is decomposable into value
functions for each link
Local Neighborhood Utility V?Ni Expected
reward obtained from all links involving agent i
for executing policy ?
Theorem 2 Locality of interaction For policies
? and ?, if ?i ?i and ?Ni ?Ni then V?Ni
V?Ni
Given its neighbors policies, local neighborhood
utility of agent i does not depend on any
non-neighbors policy

11
LID-JESP

LID-JESP Algorithm (based on Distributed Breakout
Algorithm)
Choose local policy randomly
Communicate local policy to neighbors
Compute local neighborhood utility of current
policy wrt to neighbors policies
Compute local neighborhood utility of best
response policy wrt neighbors (GetValue)
Communicate the gain (4 - 3) to neighbors
If gain is greater than gain of neighbors
Change local policy to best response policy
Communicate changed policy to neighbors
Else
If not reached termination go to step 3
Theorem 3 Global Utility is strictly increasing
with each iteration until local optimum is
reached

12
Termination Detection

13
Computing best response policy

Given neighbors fixed policies, each agent is
faced with solving a single agent POMDP
State is
Note state is not fully observable
Transition function
Observation function
Reward function
Best response computed using Bellman backup
approach

14
Global Optimal Algorithm (GOA)

15
Experiments

16
Experiments

17
Experiments

Reasons for speedup
C No. of cycles
G No. of GetValue calls
W No. of agents that change their policies in a
cycle
LID-JESP converges in fewer cycles (column C)
LID-JESP allows multiple agents to change their
policies in a single cycle (column W)
JESP has fewer GetValue calls than LID-JESP
But each such call was slower

18
Complexity