Optimal Sequential Planning in Partially Observable Multiagent Settings

1 / 49

About This Presentation

Title:

Optimal Sequential Planning in Partially Observable Multiagent Settings

Description:

there is currently no good way to combine game theoretic and POMDP control strategies. ... Runtimes on a Pentium IV 2.0GHz, 2GB RAM, Linux. *= out of memory ... – PowerPoint PPT presentation

Number of Views:36

Avg rating:3.0/5.0

Slides: 50

Provided by: csU62

more less

Transcript and Presenter's Notes

Title: Optimal Sequential Planning in Partially Observable Multiagent Settings

1
Optimal Sequential Planning in Partially
Observable Multiagent Settings

Prashant Doshi
Department of Computer Science
University of Illinois at Chicago

Thesis Committee Piotr Gmytrasiewicz
Bing Liu Peter Nelson Gyorgy
Turan Avi Pfeffer
2
Introduction

Background
Well-known framework for planning in single agent
partially observable settings POMDP
Traditional analysis of multiagent interactions
Game Theory
Problem
... there is currently no good way to combine
game theoretic and POMDP control strategies.
- Russell and Norvig
AI A Modern Approach, 2nd Ed.

3
Introduction
General Problem Setting
Environment
State
Optimize an agents preferences given beliefs
4
Introduction

Significance Real world applications

Robotics
Planetary exploration
Surface mapping by rovers
Coordinate to explore pre-defined region
optimally
Uncertainty due to sensors
Robot soccer
Coordinate with teammates and deceive opponents
Anticipate and track others actions

Spirit
Opportunity
RoboCup Competition
5
Introduction

2. Defense
Coordinate troop movements in battlefields
Exact ground situation unknown
Coordinate anti-air defense units
(NohGmytrasiewicz04)
Distributed Systems
Networked Systems
Packet routing
Sensor networks

6
Introduction

Related Work
Game Theory
Learning in repeated games Convergence to Nash
equilibrium
Fictitious play (FudenbergLevine97)
Rational (Bayesian) learning (KalaiLehrer93,
Nyarko97)
Shortcomings Framework of repeated games not
realistic
Decision Theory
Multiagent Reinforcement Learning (Littman94,
HuWellman98, BowlingVeloso00)
Shortcomings Assumes state is completely
observable, slow in generating an optimal plan
Multi-body Planning Nash equilibrium
DEC-POMDP (Bernstein et al.02, Nair et al.03)
Shortcomings Restricted to teams, assumes
centralized planning

7
Introduction

Limitations of Nash Equilibrium
Not suitable for general control
Incomplete Does not say what to do
off-equilibria
Non-unique Multiple solutions, no way to choose
game theory has been used primarily to
analyze environments that are at equilibrium,
rather than to control agents within an
environment.
- Russell and Norvig
AI A Modern Approach, 2nd Ed.

8
Introduction

Our approach Key ideas
Integrate game theoretic concepts into a decision
theoretic framework
Include possible models of other agents in your
decision making ? intentional (types) and
subintentional models
Address uncertainty by maintaining beliefs over
the state and models of other agents ? Bayesian
learning
Beliefs over models give rise to interactive
belief systems ? Interactive epistemology,
recursive modeling
Computable approximation of the interactive
belief system ? Finitely nested belief system
Compute best responses to your beliefs ?
Subjective rationality

9
Introduction

Claims and Contributions
Framework
Novel framework Applicable to agents in complex
multiagent domains that optimize locally with
respect to their beliefs
Addresses limitations of Nash eq. Solution
technique is complete
and unique (upto plans of equal expected
utility) in contrast to Nash equilibrium
Generality Combines strategic and long-term
planning into one framework. Applicable to
non-cooperative and cooperative settings
Better quality plans Interactive beliefs result
in plans that have
larger values than approaches that use flat
beliefs

10
Introduction

Claims and Contributions (Contd.)
Algorithms and Analysis
Approximation method
Interactive particle filter Online (bounded)
anytime approximation technique for addressing
the curse of dimensionality
Look ahead reachability tree sampling
Complementary method for mitigating the policy
space complexity
Exact solutions Solutions for several
non-cooperative and cooperative versions of the
multiagent tiger problem
Approximate solutions Empirical validation of
the approximate method using the multiagent tiger
and machine maintenance problems
Convergence to equilibria Theoretical
convergence to subjective equilibrium under a
truth compatibility condition. Illustrated the
computational obstacles in satisfying the
condition

11
Introduction

Claims and Contributions (Contd.)
Application
Simulation of social behaviors Agent based
simulation of commonly observed intuitive social
behaviors
Significant applications in robotics, defense,
healthcare, economics, and networking

12
Roadmap

Interactive POMDPs
Background POMDPs
Generalization to I-POMDPs
Formal Definition and Key Theorems
Results and Limitations
Approximating I-POMDPs
Curses of Dimensionality and History
Interactive Particle Filter
Convergence and Error Bounds
Results
Sampling the Look Ahead Reachability Tree
Subjective Equilibrium in I-POMDPs
Conclusion

13
Background POMDPs

Planning in single agent complex domains
Partially Observable Markov Decision Processes

Single Agent Tiger Problem
Task Maximize collection of gold over a finite
or infinite number of steps while avoiding
tiger Tiger emits a growl periodically (GL or
GR) Agent may listen or open doors (L, OL, or OR)
14
Background POMDPs
Steps to compute a plan 1. Model of the decision
making situation

2. Update beliefs

15
Background POMDPs

3. Optimal plan computation
Build the look ahead reachability tree
Dynamic programming

16
Interactive POMDPs

Generalize POMDPs to multiagent settings
Modify state space
Include models of other agents into the state
space
Modify belief update

agent type
Uncountably infinite
Hierarchical belief systems (MertensZamir85,
BrandenbergerDekel93, AumannHeifetz02)
New belief update Predict ? Correct
17
Interactive POMDPs

Formal Definition and Key Properties

Proposition 1 (Sufficiency) In an I-POMDP,
belief over is a sufficient
statistic for the past history of is
observations Proposition 2 (Belief Update) Under
the BNM and BNO assumptions, the belief update
function for I-POMDPi when mj ? ist is
intentional Theorem 1 (Convergence) For any
finitely nested I-POMDP, the value iteration
algorithm starting from an arbitrary value
function converges to a unique fixed
point Theorem 2 (PWLC) For any finitely nested
I-POMDP, the value function is piecewise linear
and convex
18
Interactive POMDPs

Results

Multiagent Tiger Problem
Task Maximize collection of gold over a finite
or infinite number of steps while avoiding
tiger Each agent hears growls as well as creaks
(S, CL, or CR) Each agent may open doors or
listen Each agent is unable to perceive others
observation
Understanding the I-POMDP (level 1) belief update
19
Interactive POMDPs

Q. Is the extra modeling effort justified?

Q. What is the computational cost?

of POMDPs that need to be solved for level l
and K other agents
20
Interactive POMDPs

Interesting plans in the multiagent tiger problem

Rule of thumb Two consistent observations from
the same side lead to opening of doors
pj(TL)
pj(TL)
L
L
GR,S
GL,
GR,
GL,CR
GR,CR
GL,S
GL,CL
GR,CL
OR
L
L
,
,
GL,
GR,
GL,
GR,
GL,
GR,
GL,
GR,
,
GL,
GR,
OR
OL
21
Interactive POMDPs

Application
Agent based simulation of intuitive social
behaviors

Follow the leader
Unconditional follow the leader
Conditional follow the leader
22
Interactive POMDPs

Limitations

Approximation techniques that tradeoff quality
with computations are critically required to
apply I-POMDPs to realistic settings
23
Roadmap

Interactive POMDPs
Approximating I-POMDPs
Curses of Dimensionality and History
Key Idea Sampling
Interactive Particle Filter
Convergence and Error Bounds
Results
Sampling the Look Ahead Reachability Tree
Subjective Equilibrium in I-POMDPs
Convergence of Bayesian Learning
Subjective Equilibrium
Computational Limitations
Conclusion

24
Approximating I-POMDPs

Two sources of complexity
Curse of dimensionality
Belief dimension ? of interactive states ?
Curse of history
Cardinality of the policy space ?

25
Approximating I-POMDPs
Addressing the curse of dimensionality
Details of Particle Filtering
Single agent tiger problem
Projection
Overview of our method
26
Approximating I-POMDPs
Interactive Particle Filtering

Propagation
Sample others action
Sample next physical state
For others observations
update its belief

27
Approximating I-POMDPs

Convergence and Error Bounds
Does not necessarily converge
Theorem For a singly-nested t-horizon I-POMDP
with discount factor , the error introduced by
our approximation technique is upper bounded

Chernoff- Hoeffding Bounds
28
Approximating I-POMDPs

Empirical Results
Q. How good is the approximation?

Performance Profiles
Level 1 belief
Level 2 belief
Multiagent Tiger Problem
29
Approximating I-POMDPs
Performance Profiles (Contd.)
Level 1 belief
Level 2 belief
Multiagent Machine Maintenance Problem
30
Approximating I-POMDPs

Q. Does it save on computational costs?

Reduction in the of POMDPs that need to be
solved
Runtimes on a Pentium IV 2.0GHz, 2GB RAM, Linux.
out of memory
31
Approximating I-POMDPs

Reducing the impact of the curse of history
Sample observations while building the look ahead
reachability tree
Consider only the likely future beliefs

32
Approximating I-POMDPs

Empirical Results

Performance Profiles
Horizon 3
Horizon 4
Multiagent Tiger Problem
Computational Savings
Runtimes on a Pentium IV 2.0GHz, 2GB RAM, Linux.
out of memory
33
Roadmap

Interactive POMDPs
Approximating I-POMDPs
Subjective Equilibrium in I-POMDPs
Convergence of Bayesian Learning
Subjective Equilibrium
Computational Limitations
Conclusion
Summary
Future Work

34
Subjective Equilibrium in I-POMDPs

Theoretical Analysis
Joint observation histories in the multiagent
tiger problem
Absolute Continuity Condition (ACC)
Agents initial belief induced over the future
observation paths should not rule out the ones
considered possible by the true distribution
Cautious beliefs ? Grain of truth assumption

35
Subjective Equilibrium in I-POMDPs

Theorem 1 Under ACC, an agents belief over
others models updated using the I-POMDP belief
update converges with probability 1
Proof sketch Show that Bayesian learning is a
Martingale
Apply the Martingale Convergence
Theorem (Doob53)
Subjective ?-Equilibrium (KalaiLehrer93) A
profile of strategies of agents each of which is
an exact best response to a belief that is
?-close to the true distribution over the
observation history
Subjective equilibrium is stable under learning
and optimization

Prediction
36
Subjective Equilibrium in I-POMDPs

Corollary If agents beliefs within the I-POMDP
framework satisfy the ACC, then after finite time
T, their strategies are in subjective
?-equilibrium, where ? is a function of T
When ? 0, subjective equilibrium obtains
Proof follows from the convergence of the I-POMDP
belief update
ACC is a sufficient condition, but not a
necessary one

37
Subjective Equilibrium in I-POMDPs

Computational Limitations
There exist computable strategies that admit no
computable exact best responses (NachbarZame96)
If possible strategies are assumed computable,
then is best response may not be computable.
Therefore, js cautious beliefs ? grain of truth
Subtle tension between prediction and
optimization
Strictness of ACC
Theorem 2 Within the finitely nested I-POMDP
framework, all the agents beliefs will never
simultaneously satisfy the grain of truth
assumption

38
Roadmap

Interactive POMDPs
Approximating I-POMDPs
Subjective Equilibrium in I-POMDPs
Conclusion
Summary
Future Work

39
Summary

I-POMDP A novel framework for planning in
complex multiagent settings
Combines concepts from decision theory and game
theory
Allows strategic as well as long-term planning
Applicable to cooperative and non-cooperative
settings
Solution is complete and unique (upto plans of
equal expected utility)
Online anytime approximation technique
Interactive Particle Filter Addresses the curse
of dimensionality
Reachability Tree Sampling Reduces the effect of
the curse of history
Equilibria in I-POMDPs
Theoretical convergence to subjective equilibrium
given ACC
Computational obstacles to satisfying ACC
Applications
Agent based simulation of social behaviors
Robotics, defense, healthcare, economics, and
networking

40
Future Work

Other approximation methods
Tighter error bounds
Multiagent planning with bounded rational agents
Models for describing bounded rational agents
Communication between agents
Cost optimality profile for plans as a function
of levels of nesting
Other applications

41
Thank You
Questions
42
Selected PublicationsFull publication list at
http//dali.ai.uic.edu/pdoshi

Selected Journals
Piotr Gmytrasiewicz, Prashant Doshi, A Framework
for Sequential Planning in Multiagent Settings,
Journal of AI Research (JAIR), Vol 23, 2005
Prashant Doshi, Richard Goodwin, Rama Akkiraju,
Kunal Verma, Dynamic Workflow Composition using
Markov Decision Processes, Journal of Web
Services Research (JWSR), 2(1)1-17, 2005
Selected Conferences
Prashant Doshi, Piotr Gmytrasiewicz, A Particle
Filtering Based Approach to Approximating
Interactive POMDPs, National Conference on AI
(AAAI), pp. 969-974, July, 2005
Prashant Doshi, Piotr Gmytrasiewicz,
Approximating State Estimation in Multiagent
Settings using Particle Filters, Autonomous
Agents and Multiagent Systems Conference (AAMAS),
July, 2005
Piotr Gmytrasiewicz, Prashant Doshi, Interactive
POMDPs Properties and Preliminary Results,
Autonomous Agents and Multiagent Systems
Conference (AAMAS) pp. 1374-1375, July, 2004
Prashant Doshi, Richard Goodwin, Rama Akkiraju,
Kunal Verma, Dynamic Workflow Composition using
Markov Decision Processes, International
Conference on Web Services (ICWS), pp. 576-582,
July, 2004
Piotr Gmytrasiewicz, Prashant Doshi, A Framework
for Sequential Planning in Multiagent Settings,
International Symposium on AI Math (AMAI), Jan,
2004

43
Interactive POMDPs

Finitely nested I-POMDP I-POMDPi,l
Computable approximations of I-POMDPs

bottom up

0th level type is a POMDP

44
Interactive POMDPs

Solutions to the enemy version of the multiagent
tiger problem

Agent i believes that j is likely to be uninformed
45
Interactive POMDPs
Agent i believes that j is likely to be almost
informed
46
Interactive POMDPs
The value of an interaction for an agent is more
when its enemy is uninformed as compared to when
it is informed
47
Background POMDPs
Policy Computation
,
Trace of policy computation
48
Background POMDPs