Title: Optimal Sequential Planning in Partially Observable Multiagent Settings
1Optimal Sequential Planning in Partially
Observable Multi-agent Settings
9th AAAI/SIGART Doctoral Consortium
Prashant Doshi Dept. of Computer Science Univ.
of Illinois at Chicago
Joint work with Piotr Gmytrasiewicz
2Outline
- Motivation Real-world Application Settings
- General Problem Setting
- Background Single-agent POMDPs
- Definition and Solution
- Single-agent Tiger game
- Interactive POMDPs
- Definition and Solution
- Multi-agent Tiger game
- Research Contributions
3Real-World Application Settings
- Surface Mapping of Mars by Autonomous Rovers
- Coordinate to explore a pre-defined region of
Mars optimally - Uncertainty
- Robot Soccer
- RoboCup competitions
- Coordination with teammates, Deception of
opponents - Anticipate and track others actions
Spirit
Opportunity
AIBO robots
4General Problem Setting
5Background Single-agent POMDPs
- Partially Observable Markov Decision Processes
- Standard Optimal Sequential Planning Framework
- Realistic
POMDP Parameters
- S, physical state space of environment
- A, action space of the agent
- ?, observation space of the agent
- , transition function
- , observation function
- , preference function
6Single-agent Tiger game
- Task Maximize collection of gold over a finite
or infinite number of steps while avoiding tiger - Tiger emits a growl periodically
- Agent may open doors or listen
Tiger game as a POMDP S TL,TR, A L,OL,OR,
? GL,GR
7Single-agent Tiger game
Belief Update (SE(b,a,o))
1
8Single-agent Tiger game
Policy Computation
,
Trace of policy computation
9Single-agent Tiger game
Policy
Value Function
- Properties of Value function
- Value function is piecewise linear and convex
- Value function converges asymptotically
10I - POMDPs
- Interactive Partially Observable Markov Decision
Processes - Generalization of POMDPs to multi-agent settings
- Main Idea 1. Consider other agents as part of
the environment - 2. Agent maintains possible models of other
agents, including their beliefs, and their
beliefs about others beliefs. - Borrows concepts from several fields
- Bayesian games
- Interactive epistemology / recursive modeling
- Decision-theoretic planning
- Decision-theoretic approach to game theory
11I-POMDPs
- I-POMDPi Parameters
- where Bayes rational
-
-
-
- Belief Non-manipulability Assumption (BNM)
Actions dont directly manipulate beliefs
(instead, actions ? observations ? belief update) -
- Belief Non-observability Assumption (BNO)
Beliefs of other agents cannot be directly
observed (instead, beliefs ? actions ?
observations) -
- Preferences are generally over physical states
and actions
intentional model or type (computable)
12I-POMDPs
- Beliefs
- Single-agent POMDP
- I-POMDPi
-
uncountably infinite
countably infinite nesting
13I-POMDPs
- Finitely nested I-POMDP I-POMDPi,l
- Computable approximations of I-POMDPs
bottom up
- 0th level type is a POMDP
14Multi-agent Tiger game
- Task Maximize collection of gold over a finite
or infinite number of steps while avoiding tiger - Each agent hears growls as well as creaks
- Each agent may open doors or listen
- Each agent is unable to perceive others action
or observation
2 agents
Multi-agent Tiger game as a level 1 I-POMDP
STL,TR, , ,
AL,OL,ORxL,OL,OR, ?iGL,GRxS,CL,CR
15Multi-agent Tiger game
Example agent is level 1 beliefs
i is uninformed about js beliefs
i knows j is clueless
i believes j is informed
16Multi-agent Tiger game
Agent is belief update process
L
GL,S
17Multi-agent Tiger game
Policy Computation
,
Policy traces
Pr(TL,b_j)
Pr(TL,b_j)
0.5
0.5
0.5
0.5
Pr(TR,b_j)
Pr(TR,b_j)
0.5
0.5
0.5
0.05
b_j
b_j
b_j
b_j
18Multi-agent Tiger game
Value Function
Team behavior amongst agents i prefers
coordination with j
19I-POMDPs
- Theoretical Results
- Proposition 1 (Sufficiency) In an I-POMDP,
belief over is a sufficient statistic
for the past history of is observations - Proposition 2 (Belief Update) Under the BNM and
BNO assumptions, the belief update function for
I-POMDPi is - Theorem 1 (Convergence) For any finitely nested
I-POMDP, the Value Iteration algorithm starting
from an arbitrary value function converges to a
unique fixed point - Theorem 2 (PWLC) For any finitely nested
I-POMDP, the value function is piecewise linear
and convex
20Research Contributions
- Limitations of Nash equilibrium as a general
multi-agent control paradigm in AI - Incomplete Does not say what to do
off-equilibria - Non-unique Multiple solutions, no way to choose
- Our approach complements Nash equilibrium
adopts optimality and best response to
anticipated actions, rather than stability - Game theoretic concepts Decision theory
Strategic and long term planning - Formalizes greater autonomy amongst agents
actions and - observations of other agents are not known, BNO,
BNM - Applicable to games of cooperation and competition
21Related Work
- Multi-agent Decision-making
- Learning in repeated games
- Fictitious play
- FudenbergLevine97
- Rational (Bayesian) learning
- KalaiLehrer93, Nyarko97
- Learning in stochastic games
- Multi-agent reinforcement learning
- Littman94, HuWellman98, BowlingVeloso00
- Other extensions of POMDPs
- DEC-POMDP Restricted to team behavior (common
payoffs) - Bernstein et.al.02, Nair et.al.03
- Prior work places importance on Nash equilibrium
- Learning in game theory ? attempt to justify Nash
eq. - Learning in stochastic games ? impractical
assumptions to obtain convergence to Nash eq. - Is the emphasis on Nash eq. in AI misguided ?
22Proposed Work
- Develop approximate solution techniques that
tradeoff quality with time - Investigate the effect of increasing levels of
belief nesting on error bounds of approximate
solutions - Investigate if, and how solutions to I-POMDPs
lead to Nash equilibrium type conditions - Study settings of the multi-agent Tiger game that
lead to human-like social interaction patterns - Empirically evaluate the I-POMDP framework on
another realistic problem domain - Develop a graphical model using the language of
influence diagrams to solve I-POMDPs online
23Thank You
Questions