Solving Decentralized Markov Decision Processes - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Solving Decentralized Markov Decision Processes

Description:

Multi-agent Rover Example. Simple rovers on Mars. Must explore sites (rocks) ... Mars Rover Experiment. Number of sites 5. Processing a site takes random time amount ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 27
Provided by: marekp4
Category:

less

Transcript and Presenter's Notes

Title: Solving Decentralized Markov Decision Processes


1
Solving Decentralized Markov Decision Processes
  • Marek Petrik, Shlomo Zilberstein
  • University of Massachusetts Amherst
  • petrik_at_cs.umass.edu

2
Motivation
  • Real problems often have multiple agents that
    must coordinate
  • Sensor networks must coordinate use of the
    equipment by multiple users
  • Fleets of unmanned vehicles must automatically
    coordinate

3
Decentralized Problems
  • Types
  • Same as single agent
  • Conflicting objectives (Game theory)
  • Imperfect sharing of information
  • Imperfect information problems
  • The general problem is very hard (NEXP)
  • We assume there is NO information sharing
    (DEC-MDP)
  • Bound through reward
  • NP complete problem

4
Multi-agent Rover Example
  • Simple rovers on Mars
  • Must explore sites (rocks)
  • Must coordinate, without communication
  • Policy, possibly stochastic

Opportunity
Spirit
5
Why It Is Hard
  • Must also plan for actions of other agents
  • Complicated structure of feasible policies
  • Exponential explosion with the number of agents

6
Ideal Algorithm
  • General
  • Simple to implement
  • Simple to apply (modeling)
  • Efficient
  • Quickly provide reasonable results (Good anytime
    behavior)

7
Naïve Algorithms
  • Search over all the combinations of policies and
    choose the best one
  • Too many policies
  • For every policy of Spirit calculate the best
    policy of Opportunity
  • Fewer policies
  • Still too many policies
  • How to reduce the number of policies?

8
Properties of the Policies
9
Coverage Set Algorithm Becker2003
  • Non-dominated policy Policy of Opportunity that
    is optimal for at least one policy of Spirit
  • Main idea
  • Efficiently identify non-dominated policies of
    Opportunity
  • Calculate best Spirit response for them
  • Non-dominated best response of Opportunity
  • Discover non-dominated policies at intersections
    of existing policies

10
Demonstration
11
Mars Rover Experiment
  • Number of sites 5
  • Processing a site takes random time amount
  • Time limited to 15 time steps
  • Visit time does not matter
  • Primitive event / event
  • See Becker2006 for more details
  • Experiments for multiple problems, using random
    rewards, processing times, and shared sites (2
    5)

12
Experimental Results
4 hours
13
Analyzing Results
  • Very good anytime behavior, but only in hindsight
  • Determine the approximation error online?
  • The crucial property the number of interactions
    (rocks) is small
  • Determine the number of interactions?

14
Coverage Set Algorithm
  • General
  • No
  • Simple to implement
  • No
  • Simple to apply (modeling)
  • Somewhat
  • Efficient
  • Yes
  • Quickly provide reasonable results
  • Yes, in hindsight

15
Bilinear Program
  • Two linear programs
  • Independent x and y constraints
  • Nonlinear term R
  • Concave minimization multiple local minima

16
Best Response
  • Best response function
  • Convex
  • Approximate it
  • Bound error using convexity

17
Best Response Function
  • Best response function
  • Approximate it
  • Error

18
Online Bound Results
20s
19
Reducing Dimensionality
  • Dimensionality - size of R number of
    interactions
  • Motivation
  • Crucial for good performance
  • Not obvious in many problems
  • The data is often not precise
  • Need to determine also partial dependence

20
Best Response Function
  • Approximate the problem

21
Reducing Dimensionality (2)
  • Best response quadratic function
  • Identify the significant subspace

y1 y2
y1 - y2
y1 y2
y1 - y2
22
Implementation
  • Singular Value Decomposition to determine the
    significant subspace
  • Approximate best response only in the significant
    subspace

y1 y2
y1 - y2
y1 y2
y1 - y2
23
Improved Coverage Set Algorithm
  • General
  • Yes
  • Simple to implement
  • Yes, using linear programs
  • Simple to apply (modeling)
  • Yes, automatically reduces the model
  • Efficient
  • Yes
  • Quickly provide reasonable results
  • Yes, online

24
Other Algorithms and Problems
  • Coordination
  • Average reward DEC-MDPs
  • Competitive DEC-MDPs (Extensive Games)
  • Mathematical Programming
  • Concave quadratic programming
  • Linear complementarity problems
  • Global Optimization Bilinear program is a
    concave minimization problem
  • Cutting plane based methods

25
Conclusion
  • Some decentralized problems may be formulated as
    bilinear programs
  • Presented an algorithm efficient on a standard
    benchmark
  • Further work
  • Other problems
  • Other algorithms

26
Thank you
Opportunity
Spirit
Write a Comment
User Comments (0)
About PowerShow.com