Optimal Sequential Planning in Partially Observable Multiagent Settings

About This Presentation

Title:

Optimal Sequential Planning in Partially Observable Multiagent Settings

Description:

Tiger emits a growl periodically. Agent may open doors or listen. Tiger game as a POMDP ... Each agent hears growls as well as creaks. Each agent may open doors ... – PowerPoint PPT presentation

Number of Views:77

Avg rating:3.0/5.0

Slides: 24

Provided by: daliA6

Category:

more less

Transcript and Presenter's Notes

Title: Optimal Sequential Planning in Partially Observable Multiagent Settings

1
Optimal Sequential Planning in Partially
Observable Multi-agent Settings
9th AAAI/SIGART Doctoral Consortium
Prashant Doshi Dept. of Computer Science Univ.
of Illinois at Chicago
Joint work with Piotr Gmytrasiewicz
2
Outline

Motivation Real-world Application Settings
General Problem Setting
Background Single-agent POMDPs
Definition and Solution
Single-agent Tiger game
Interactive POMDPs
Definition and Solution
Multi-agent Tiger game
Research Contributions

3
Real-World Application Settings

Surface Mapping of Mars by Autonomous Rovers
Coordinate to explore a pre-defined region of
Mars optimally
Uncertainty
Robot Soccer
RoboCup competitions
Coordination with teammates, Deception of
opponents
Anticipate and track others actions

Spirit
Opportunity
AIBO robots
4
General Problem Setting
5
Background Single-agent POMDPs

Partially Observable Markov Decision Processes
Standard Optimal Sequential Planning Framework
Realistic

POMDP Parameters

S, physical state space of environment
A, action space of the agent
?, observation space of the agent
, transition function
, observation function
, preference function

6
Single-agent Tiger game

Task Maximize collection of gold over a finite
or infinite number of steps while avoiding tiger
Tiger emits a growl periodically
Agent may open doors or listen

Tiger game as a POMDP S TL,TR, A L,OL,OR,
? GL,GR
7
Single-agent Tiger game
Belief Update (SE(b,a,o))
1
8
Single-agent Tiger game
Policy Computation
,
Trace of policy computation
9
Single-agent Tiger game

Value of all beliefs

Policy
Value Function

Properties of Value function
Value function is piecewise linear and convex
Value function converges asymptotically

10
I - POMDPs

Interactive Partially Observable Markov Decision
Processes
Generalization of POMDPs to multi-agent settings
Main Idea 1. Consider other agents as part of
the environment
2. Agent maintains possible models of other
agents, including their beliefs, and their
beliefs about others beliefs.
Borrows concepts from several fields
Bayesian games
Interactive epistemology / recursive modeling
Decision-theoretic planning
Decision-theoretic approach to game theory

11
I-POMDPs

I-POMDPi Parameters
where Bayes rational
Belief Non-manipulability Assumption (BNM)
Actions dont directly manipulate beliefs
(instead, actions ? observations ? belief update)
Belief Non-observability Assumption (BNO)
Beliefs of other agents cannot be directly
observed (instead, beliefs ? actions ?
observations)
Preferences are generally over physical states
and actions

intentional model or type (computable)
12
I-POMDPs

Beliefs
Single-agent POMDP
I-POMDPi

uncountably infinite
countably infinite nesting
13
I-POMDPs

Finitely nested I-POMDP I-POMDPi,l
Computable approximations of I-POMDPs

bottom up

0th level type is a POMDP

14
Multi-agent Tiger game

Task Maximize collection of gold over a finite
or infinite number of steps while avoiding tiger
Each agent hears growls as well as creaks
Each agent may open doors or listen
Each agent is unable to perceive others action
or observation

2 agents
Multi-agent Tiger game as a level 1 I-POMDP
STL,TR, , ,
AL,OL,ORxL,OL,OR, ?iGL,GRxS,CL,CR
15
Multi-agent Tiger game
Example agent is level 1 beliefs
i is uninformed about js beliefs
i knows j is clueless
i believes j is informed
16
Multi-agent Tiger game
Agent is belief update process
L
GL,S
17
Multi-agent Tiger game
Policy Computation
,
Policy traces
Pr(TL,b_j)
Pr(TL,b_j)
0.5
0.5
0.5
0.5
Pr(TR,b_j)
Pr(TR,b_j)
0.5
0.5
0.5
0.05
b_j
b_j
b_j
b_j
18
Multi-agent Tiger game
Value Function
Team behavior amongst agents i prefers
coordination with j
19
I-POMDPs

Theoretical Results
Proposition 1 (Sufficiency) In an I-POMDP,
belief over is a sufficient statistic
for the past history of is observations
Proposition 2 (Belief Update) Under the BNM and
BNO assumptions, the belief update function for
I-POMDPi is
Theorem 1 (Convergence) For any finitely nested
I-POMDP, the Value Iteration algorithm starting
from an arbitrary value function converges to a
unique fixed point
Theorem 2 (PWLC) For any finitely nested
I-POMDP, the value function is piecewise linear
and convex

20
Research Contributions

Limitations of Nash equilibrium as a general
multi-agent control paradigm in AI
Incomplete Does not say what to do
off-equilibria
Non-unique Multiple solutions, no way to choose
Our approach complements Nash equilibrium
adopts optimality and best response to
anticipated actions, rather than stability
Game theoretic concepts Decision theory
Strategic and long term planning
Formalizes greater autonomy amongst agents
actions and
observations of other agents are not known, BNO,
BNM
Applicable to games of cooperation and competition

21
Related Work

Multi-agent Decision-making
Learning in repeated games
Fictitious play
FudenbergLevine97
Rational (Bayesian) learning
KalaiLehrer93, Nyarko97
Learning in stochastic games
Multi-agent reinforcement learning
Littman94, HuWellman98, BowlingVeloso00
Other extensions of POMDPs
DEC-POMDP Restricted to team behavior (common
payoffs)
Bernstein et.al.02, Nair et.al.03
Prior work places importance on Nash equilibrium
Learning in game theory ? attempt to justify Nash
eq.
Learning in stochastic games ? impractical
assumptions to obtain convergence to Nash eq.
Is the emphasis on Nash eq. in AI misguided ?

22
Proposed Work

Develop approximate solution techniques that
tradeoff quality with time
Investigate the effect of increasing levels of
belief nesting on error bounds of approximate
solutions
Investigate if, and how solutions to I-POMDPs
lead to Nash equilibrium type conditions
Study settings of the multi-agent Tiger game that
lead to human-like social interaction patterns
Empirically evaluate the I-POMDP framework on
another realistic problem domain
Develop a graphical model using the language of
influence diagrams to solve I-POMDPs online