Title: Optimal Sequential Planning in Partially Observable Multiagent Settings
1Optimal Sequential Planning in Partially
Observable Multiagent Settings
- Prashant Doshi
- Department of Computer Science
- University of Illinois at Chicago
Thesis Committee Piotr Gmytrasiewicz
Bing Liu Peter Nelson Gyorgy
Turan Avi Pfeffer
2Introduction
- Background
- Well-known framework for planning in single agent
partially observable settings POMDP - Traditional analysis of multiagent interactions
Game Theory - Problem
- ... there is currently no good way to combine
game theoretic and POMDP control strategies. - - Russell and Norvig
- AI A Modern Approach, 2nd Ed.
3Introduction
General Problem Setting
Environment
State
Optimize an agents preferences given beliefs
4Introduction
- Significance Real world applications
- Robotics
- Planetary exploration
- Surface mapping by rovers
- Coordinate to explore pre-defined region
optimally - Uncertainty due to sensors
- Robot soccer
- Coordinate with teammates and deceive opponents
- Anticipate and track others actions
Spirit
Opportunity
RoboCup Competition
5Introduction
- 2. Defense
- Coordinate troop movements in battlefields
- Exact ground situation unknown
- Coordinate anti-air defense units
(NohGmytrasiewicz04) - Distributed Systems
- Networked Systems
- Packet routing
- Sensor networks
6Introduction
- Related Work
- Game Theory
- Learning in repeated games Convergence to Nash
equilibrium - Fictitious play (FudenbergLevine97)
- Rational (Bayesian) learning (KalaiLehrer93,
Nyarko97) - Shortcomings Framework of repeated games not
realistic - Decision Theory
- Multiagent Reinforcement Learning (Littman94,
HuWellman98, BowlingVeloso00) - Shortcomings Assumes state is completely
observable, slow in generating an optimal plan - Multi-body Planning Nash equilibrium
- DEC-POMDP (Bernstein et al.02, Nair et al.03)
- Shortcomings Restricted to teams, assumes
centralized planning
7Introduction
- Limitations of Nash Equilibrium
- Not suitable for general control
- Incomplete Does not say what to do
off-equilibria - Non-unique Multiple solutions, no way to choose
- game theory has been used primarily to
analyze environments that are at equilibrium,
rather than to control agents within an
environment. - - Russell and Norvig
- AI A Modern Approach, 2nd Ed.
8Introduction
- Our approach Key ideas
- Integrate game theoretic concepts into a decision
theoretic framework - Include possible models of other agents in your
decision making ? intentional (types) and
subintentional models - Address uncertainty by maintaining beliefs over
the state and models of other agents ? Bayesian
learning - Beliefs over models give rise to interactive
belief systems ? Interactive epistemology,
recursive modeling - Computable approximation of the interactive
belief system ? Finitely nested belief system - Compute best responses to your beliefs ?
Subjective rationality
9Introduction
- Claims and Contributions
- Framework
- Novel framework Applicable to agents in complex
multiagent domains that optimize locally with
respect to their beliefs - Addresses limitations of Nash eq. Solution
technique is complete - and unique (upto plans of equal expected
utility) in contrast to Nash equilibrium - Generality Combines strategic and long-term
planning into one framework. Applicable to
non-cooperative and cooperative settings - Better quality plans Interactive beliefs result
in plans that have - larger values than approaches that use flat
beliefs
10Introduction
- Claims and Contributions (Contd.)
- Algorithms and Analysis
- Approximation method
- Interactive particle filter Online (bounded)
anytime approximation technique for addressing
the curse of dimensionality - Look ahead reachability tree sampling
Complementary method for mitigating the policy
space complexity - Exact solutions Solutions for several
non-cooperative and cooperative versions of the
multiagent tiger problem - Approximate solutions Empirical validation of
the approximate method using the multiagent tiger
and machine maintenance problems - Convergence to equilibria Theoretical
convergence to subjective equilibrium under a
truth compatibility condition. Illustrated the
computational obstacles in satisfying the
condition
11Introduction
- Claims and Contributions (Contd.)
- Application
- Simulation of social behaviors Agent based
simulation of commonly observed intuitive social
behaviors - Significant applications in robotics, defense,
healthcare, economics, and networking
12Roadmap
- Interactive POMDPs
- Background POMDPs
- Generalization to I-POMDPs
- Formal Definition and Key Theorems
- Results and Limitations
- Approximating I-POMDPs
- Curses of Dimensionality and History
- Interactive Particle Filter
- Convergence and Error Bounds
- Results
- Sampling the Look Ahead Reachability Tree
- Subjective Equilibrium in I-POMDPs
- Conclusion
13Background POMDPs
- Planning in single agent complex domains
Partially Observable Markov Decision Processes
Single Agent Tiger Problem
Task Maximize collection of gold over a finite
or infinite number of steps while avoiding
tiger Tiger emits a growl periodically (GL or
GR) Agent may listen or open doors (L, OL, or OR)
14Background POMDPs
Steps to compute a plan 1. Model of the decision
making situation
15Background POMDPs
- 3. Optimal plan computation
- Build the look ahead reachability tree
- Dynamic programming
16Interactive POMDPs
- Generalize POMDPs to multiagent settings
- Modify state space
- Include models of other agents into the state
space - Modify belief update
agent type
Uncountably infinite
Hierarchical belief systems (MertensZamir85,
BrandenbergerDekel93, AumannHeifetz02)
New belief update Predict ? Correct
17Interactive POMDPs
- Formal Definition and Key Properties
Proposition 1 (Sufficiency) In an I-POMDP,
belief over is a sufficient
statistic for the past history of is
observations Proposition 2 (Belief Update) Under
the BNM and BNO assumptions, the belief update
function for I-POMDPi when mj ? ist is
intentional Theorem 1 (Convergence) For any
finitely nested I-POMDP, the value iteration
algorithm starting from an arbitrary value
function converges to a unique fixed
point Theorem 2 (PWLC) For any finitely nested
I-POMDP, the value function is piecewise linear
and convex
18Interactive POMDPs
Multiagent Tiger Problem
Task Maximize collection of gold over a finite
or infinite number of steps while avoiding
tiger Each agent hears growls as well as creaks
(S, CL, or CR) Each agent may open doors or
listen Each agent is unable to perceive others
observation
Understanding the I-POMDP (level 1) belief update
19Interactive POMDPs
- Q. Is the extra modeling effort justified?
- Q. What is the computational cost?
of POMDPs that need to be solved for level l
and K other agents
20Interactive POMDPs
- Interesting plans in the multiagent tiger problem
Rule of thumb Two consistent observations from
the same side lead to opening of doors
pj(TL)
pj(TL)
L
L
GR,S
GL,
GR,
GL,CR
GR,CR
GL,S
GL,CL
GR,CL
OR
L
L
,
,
GL,
GR,
GL,
GR,
GL,
GR,
GL,
GR,
,
GL,
GR,
OR
OL
21Interactive POMDPs
- Application
- Agent based simulation of intuitive social
behaviors
Follow the leader
Unconditional follow the leader
Conditional follow the leader
22Interactive POMDPs
Approximation techniques that tradeoff quality
with computations are critically required to
apply I-POMDPs to realistic settings
23Roadmap
- Interactive POMDPs
- Approximating I-POMDPs
- Curses of Dimensionality and History
- Key Idea Sampling
- Interactive Particle Filter
- Convergence and Error Bounds
- Results
- Sampling the Look Ahead Reachability Tree
- Subjective Equilibrium in I-POMDPs
- Convergence of Bayesian Learning
- Subjective Equilibrium
- Computational Limitations
- Conclusion
24Approximating I-POMDPs
- Two sources of complexity
- Curse of dimensionality
- Belief dimension ? of interactive states ?
- Curse of history
- Cardinality of the policy space ?
25Approximating I-POMDPs
Addressing the curse of dimensionality
Details of Particle Filtering
Single agent tiger problem
Projection
Overview of our method
26Approximating I-POMDPs
Interactive Particle Filtering
- Propagation
- Sample others action
- Sample next physical state
- For others observations
- update its belief
27Approximating I-POMDPs
- Convergence and Error Bounds
- Does not necessarily converge
- Theorem For a singly-nested t-horizon I-POMDP
with discount factor , the error introduced by
our approximation technique is upper bounded
Chernoff- Hoeffding Bounds
28Approximating I-POMDPs
- Empirical Results
- Q. How good is the approximation?
Performance Profiles
Level 1 belief
Level 2 belief
Multiagent Tiger Problem
29Approximating I-POMDPs
Performance Profiles (Contd.)
Level 1 belief
Level 2 belief
Multiagent Machine Maintenance Problem
30Approximating I-POMDPs
- Q. Does it save on computational costs?
Reduction in the of POMDPs that need to be
solved
Runtimes on a Pentium IV 2.0GHz, 2GB RAM, Linux.
out of memory
31Approximating I-POMDPs
- Reducing the impact of the curse of history
- Sample observations while building the look ahead
reachability tree - Consider only the likely future beliefs
32Approximating I-POMDPs
Performance Profiles
Horizon 3
Horizon 4
Multiagent Tiger Problem
Computational Savings
Runtimes on a Pentium IV 2.0GHz, 2GB RAM, Linux.
out of memory
33Roadmap
- Interactive POMDPs
- Approximating I-POMDPs
- Subjective Equilibrium in I-POMDPs
- Convergence of Bayesian Learning
- Subjective Equilibrium
- Computational Limitations
- Conclusion
- Summary
- Future Work
34Subjective Equilibrium in I-POMDPs
- Theoretical Analysis
- Joint observation histories in the multiagent
tiger problem - Absolute Continuity Condition (ACC)
- Agents initial belief induced over the future
observation paths should not rule out the ones
considered possible by the true distribution - Cautious beliefs ? Grain of truth assumption
35Subjective Equilibrium in I-POMDPs
- Theorem 1 Under ACC, an agents belief over
others models updated using the I-POMDP belief
update converges with probability 1 - Proof sketch Show that Bayesian learning is a
Martingale -
-
- Apply the Martingale Convergence
Theorem (Doob53) - Subjective ?-Equilibrium (KalaiLehrer93) A
profile of strategies of agents each of which is
an exact best response to a belief that is
?-close to the true distribution over the
observation history - Subjective equilibrium is stable under learning
and optimization
Prediction
36Subjective Equilibrium in I-POMDPs
- Corollary If agents beliefs within the I-POMDP
framework satisfy the ACC, then after finite time
T, their strategies are in subjective
?-equilibrium, where ? is a function of T - When ? 0, subjective equilibrium obtains
- Proof follows from the convergence of the I-POMDP
belief update - ACC is a sufficient condition, but not a
necessary one
37Subjective Equilibrium in I-POMDPs
- Computational Limitations
- There exist computable strategies that admit no
computable exact best responses (NachbarZame96) -
- If possible strategies are assumed computable,
then is best response may not be computable.
Therefore, js cautious beliefs ? grain of truth - Subtle tension between prediction and
optimization - Strictness of ACC
- Theorem 2 Within the finitely nested I-POMDP
framework, all the agents beliefs will never
simultaneously satisfy the grain of truth
assumption
38Roadmap
- Interactive POMDPs
- Approximating I-POMDPs
- Subjective Equilibrium in I-POMDPs
- Conclusion
- Summary
- Future Work
39Summary
- I-POMDP A novel framework for planning in
complex multiagent settings - Combines concepts from decision theory and game
theory - Allows strategic as well as long-term planning
- Applicable to cooperative and non-cooperative
settings - Solution is complete and unique (upto plans of
equal expected utility) - Online anytime approximation technique
- Interactive Particle Filter Addresses the curse
of dimensionality - Reachability Tree Sampling Reduces the effect of
the curse of history - Equilibria in I-POMDPs
- Theoretical convergence to subjective equilibrium
given ACC - Computational obstacles to satisfying ACC
- Applications
- Agent based simulation of social behaviors
- Robotics, defense, healthcare, economics, and
networking
40Future Work
- Other approximation methods
- Tighter error bounds
- Multiagent planning with bounded rational agents
- Models for describing bounded rational agents
- Communication between agents
- Cost optimality profile for plans as a function
of levels of nesting - Other applications
41Thank You
Questions
42Selected PublicationsFull publication list at
http//dali.ai.uic.edu/pdoshi
- Selected Journals
- Piotr Gmytrasiewicz, Prashant Doshi, A Framework
for Sequential Planning in Multiagent Settings,
Journal of AI Research (JAIR), Vol 23, 2005 - Prashant Doshi, Richard Goodwin, Rama Akkiraju,
Kunal Verma, Dynamic Workflow Composition using
Markov Decision Processes, Journal of Web
Services Research (JWSR), 2(1)1-17, 2005 - Selected Conferences
- Prashant Doshi, Piotr Gmytrasiewicz, A Particle
Filtering Based Approach to Approximating
Interactive POMDPs, National Conference on AI
(AAAI), pp. 969-974, July, 2005 - Prashant Doshi, Piotr Gmytrasiewicz,
Approximating State Estimation in Multiagent
Settings using Particle Filters, Autonomous
Agents and Multiagent Systems Conference (AAMAS),
July, 2005 - Piotr Gmytrasiewicz, Prashant Doshi, Interactive
POMDPs Properties and Preliminary Results,
Autonomous Agents and Multiagent Systems
Conference (AAMAS) pp. 1374-1375, July, 2004 - Prashant Doshi, Richard Goodwin, Rama Akkiraju,
Kunal Verma, Dynamic Workflow Composition using
Markov Decision Processes, International
Conference on Web Services (ICWS), pp. 576-582,
July, 2004 - Piotr Gmytrasiewicz, Prashant Doshi, A Framework
for Sequential Planning in Multiagent Settings,
International Symposium on AI Math (AMAI), Jan,
2004
43Interactive POMDPs
- Finitely nested I-POMDP I-POMDPi,l
- Computable approximations of I-POMDPs
bottom up
- 0th level type is a POMDP
44Interactive POMDPs
- Solutions to the enemy version of the multiagent
tiger problem
Agent i believes that j is likely to be uninformed
45Interactive POMDPs
Agent i believes that j is likely to be almost
informed
46Interactive POMDPs
The value of an interaction for an agent is more
when its enemy is uninformed as compared to when
it is informed
47Background POMDPs
Policy Computation
,
Trace of policy computation
48Background POMDPs
Policy
Value Function
- Properties of Value function
- Value function is piecewise linear and convex
- Value function converges asymptotically
49Approximating I-POMDPs
Performance Profiles
Multiagent Tiger Problem