Optimal Sequential Planning in Partially Observable Multiagent Settings

1 / 49
About This Presentation
Title:

Optimal Sequential Planning in Partially Observable Multiagent Settings

Description:

there is currently no good way to combine game theoretic and POMDP control strategies. ... Runtimes on a Pentium IV 2.0GHz, 2GB RAM, Linux. *= out of memory ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 50
Provided by: csU62

less

Transcript and Presenter's Notes

Title: Optimal Sequential Planning in Partially Observable Multiagent Settings


1
Optimal Sequential Planning in Partially
Observable Multiagent Settings
  • Prashant Doshi
  • Department of Computer Science
  • University of Illinois at Chicago

Thesis Committee Piotr Gmytrasiewicz
Bing Liu Peter Nelson Gyorgy
Turan Avi Pfeffer
2
Introduction
  • Background
  • Well-known framework for planning in single agent
    partially observable settings POMDP
  • Traditional analysis of multiagent interactions
    Game Theory
  • Problem
  • ... there is currently no good way to combine
    game theoretic and POMDP control strategies.
  • - Russell and Norvig
  • AI A Modern Approach, 2nd Ed.

3
Introduction
General Problem Setting
Environment
State
Optimize an agents preferences given beliefs
4
Introduction
  • Significance Real world applications
  • Robotics
  • Planetary exploration
  • Surface mapping by rovers
  • Coordinate to explore pre-defined region
    optimally
  • Uncertainty due to sensors
  • Robot soccer
  • Coordinate with teammates and deceive opponents
  • Anticipate and track others actions

Spirit
Opportunity
RoboCup Competition
5
Introduction
  • 2. Defense
  • Coordinate troop movements in battlefields
  • Exact ground situation unknown
  • Coordinate anti-air defense units
    (NohGmytrasiewicz04)
  • Distributed Systems
  • Networked Systems
  • Packet routing
  • Sensor networks

6
Introduction
  • Related Work
  • Game Theory
  • Learning in repeated games Convergence to Nash
    equilibrium
  • Fictitious play (FudenbergLevine97)
  • Rational (Bayesian) learning (KalaiLehrer93,
    Nyarko97)
  • Shortcomings Framework of repeated games not
    realistic
  • Decision Theory
  • Multiagent Reinforcement Learning (Littman94,
    HuWellman98, BowlingVeloso00)
  • Shortcomings Assumes state is completely
    observable, slow in generating an optimal plan
  • Multi-body Planning Nash equilibrium
  • DEC-POMDP (Bernstein et al.02, Nair et al.03)
  • Shortcomings Restricted to teams, assumes
    centralized planning

7
Introduction
  • Limitations of Nash Equilibrium
  • Not suitable for general control
  • Incomplete Does not say what to do
    off-equilibria
  • Non-unique Multiple solutions, no way to choose
  • game theory has been used primarily to
    analyze environments that are at equilibrium,
    rather than to control agents within an
    environment.
  • - Russell and Norvig
  • AI A Modern Approach, 2nd Ed.

8
Introduction
  • Our approach Key ideas
  • Integrate game theoretic concepts into a decision
    theoretic framework
  • Include possible models of other agents in your
    decision making ? intentional (types) and
    subintentional models
  • Address uncertainty by maintaining beliefs over
    the state and models of other agents ? Bayesian
    learning
  • Beliefs over models give rise to interactive
    belief systems ? Interactive epistemology,
    recursive modeling
  • Computable approximation of the interactive
    belief system ? Finitely nested belief system
  • Compute best responses to your beliefs ?
    Subjective rationality

9
Introduction
  • Claims and Contributions
  • Framework
  • Novel framework Applicable to agents in complex
    multiagent domains that optimize locally with
    respect to their beliefs
  • Addresses limitations of Nash eq. Solution
    technique is complete
  • and unique (upto plans of equal expected
    utility) in contrast to Nash equilibrium
  • Generality Combines strategic and long-term
    planning into one framework. Applicable to
    non-cooperative and cooperative settings
  • Better quality plans Interactive beliefs result
    in plans that have
  • larger values than approaches that use flat
    beliefs

10
Introduction
  • Claims and Contributions (Contd.)
  • Algorithms and Analysis
  • Approximation method
  • Interactive particle filter Online (bounded)
    anytime approximation technique for addressing
    the curse of dimensionality
  • Look ahead reachability tree sampling
    Complementary method for mitigating the policy
    space complexity
  • Exact solutions Solutions for several
    non-cooperative and cooperative versions of the
    multiagent tiger problem
  • Approximate solutions Empirical validation of
    the approximate method using the multiagent tiger
    and machine maintenance problems
  • Convergence to equilibria Theoretical
    convergence to subjective equilibrium under a
    truth compatibility condition. Illustrated the
    computational obstacles in satisfying the
    condition

11
Introduction
  • Claims and Contributions (Contd.)
  • Application
  • Simulation of social behaviors Agent based
    simulation of commonly observed intuitive social
    behaviors
  • Significant applications in robotics, defense,
    healthcare, economics, and networking

12
Roadmap
  • Interactive POMDPs
  • Background POMDPs
  • Generalization to I-POMDPs
  • Formal Definition and Key Theorems
  • Results and Limitations
  • Approximating I-POMDPs
  • Curses of Dimensionality and History
  • Interactive Particle Filter
  • Convergence and Error Bounds
  • Results
  • Sampling the Look Ahead Reachability Tree
  • Subjective Equilibrium in I-POMDPs
  • Conclusion

13
Background POMDPs
  • Planning in single agent complex domains
    Partially Observable Markov Decision Processes

Single Agent Tiger Problem
Task Maximize collection of gold over a finite
or infinite number of steps while avoiding
tiger Tiger emits a growl periodically (GL or
GR) Agent may listen or open doors (L, OL, or OR)
14
Background POMDPs
Steps to compute a plan 1. Model of the decision
making situation
  • 2. Update beliefs

15
Background POMDPs
  • 3. Optimal plan computation
  • Build the look ahead reachability tree
  • Dynamic programming

16
Interactive POMDPs
  • Generalize POMDPs to multiagent settings
  • Modify state space
  • Include models of other agents into the state
    space
  • Modify belief update

agent type
Uncountably infinite
Hierarchical belief systems (MertensZamir85,
BrandenbergerDekel93, AumannHeifetz02)
New belief update Predict ? Correct
17
Interactive POMDPs
  • Formal Definition and Key Properties

Proposition 1 (Sufficiency) In an I-POMDP,
belief over is a sufficient
statistic for the past history of is
observations Proposition 2 (Belief Update) Under
the BNM and BNO assumptions, the belief update
function for I-POMDPi when mj ? ist is
intentional Theorem 1 (Convergence) For any
finitely nested I-POMDP, the value iteration
algorithm starting from an arbitrary value
function converges to a unique fixed
point Theorem 2 (PWLC) For any finitely nested
I-POMDP, the value function is piecewise linear
and convex
18
Interactive POMDPs
  • Results

Multiagent Tiger Problem
Task Maximize collection of gold over a finite
or infinite number of steps while avoiding
tiger Each agent hears growls as well as creaks
(S, CL, or CR) Each agent may open doors or
listen Each agent is unable to perceive others
observation
Understanding the I-POMDP (level 1) belief update
19
Interactive POMDPs
  • Q. Is the extra modeling effort justified?
  • Q. What is the computational cost?

of POMDPs that need to be solved for level l
and K other agents
20
Interactive POMDPs
  • Interesting plans in the multiagent tiger problem

Rule of thumb Two consistent observations from
the same side lead to opening of doors
pj(TL)
pj(TL)
L
L
GR,S
GL,
GR,
GL,CR
GR,CR
GL,S
GL,CL
GR,CL
OR
L
L
,
,
GL,
GR,
GL,
GR,
GL,
GR,
GL,
GR,
,
GL,
GR,
OR
OL
21
Interactive POMDPs
  • Application
  • Agent based simulation of intuitive social
    behaviors

Follow the leader
Unconditional follow the leader
Conditional follow the leader
22
Interactive POMDPs
  • Limitations

Approximation techniques that tradeoff quality
with computations are critically required to
apply I-POMDPs to realistic settings
23
Roadmap
  • Interactive POMDPs
  • Approximating I-POMDPs
  • Curses of Dimensionality and History
  • Key Idea Sampling
  • Interactive Particle Filter
  • Convergence and Error Bounds
  • Results
  • Sampling the Look Ahead Reachability Tree
  • Subjective Equilibrium in I-POMDPs
  • Convergence of Bayesian Learning
  • Subjective Equilibrium
  • Computational Limitations
  • Conclusion

24
Approximating I-POMDPs
  • Two sources of complexity
  • Curse of dimensionality
  • Belief dimension ? of interactive states ?
  • Curse of history
  • Cardinality of the policy space ?

25
Approximating I-POMDPs
Addressing the curse of dimensionality
Details of Particle Filtering
Single agent tiger problem
Projection
Overview of our method
26
Approximating I-POMDPs
Interactive Particle Filtering
  • Propagation
  • Sample others action
  • Sample next physical state
  • For others observations
  • update its belief

27
Approximating I-POMDPs
  • Convergence and Error Bounds
  • Does not necessarily converge
  • Theorem For a singly-nested t-horizon I-POMDP
    with discount factor , the error introduced by
    our approximation technique is upper bounded

Chernoff- Hoeffding Bounds
28
Approximating I-POMDPs
  • Empirical Results
  • Q. How good is the approximation?

Performance Profiles
Level 1 belief
Level 2 belief
Multiagent Tiger Problem
29
Approximating I-POMDPs
Performance Profiles (Contd.)
Level 1 belief
Level 2 belief
Multiagent Machine Maintenance Problem
30
Approximating I-POMDPs
  • Q. Does it save on computational costs?

Reduction in the of POMDPs that need to be
solved
Runtimes on a Pentium IV 2.0GHz, 2GB RAM, Linux.
out of memory
31
Approximating I-POMDPs
  • Reducing the impact of the curse of history
  • Sample observations while building the look ahead
    reachability tree
  • Consider only the likely future beliefs

32
Approximating I-POMDPs
  • Empirical Results

Performance Profiles
Horizon 3
Horizon 4
Multiagent Tiger Problem
Computational Savings
Runtimes on a Pentium IV 2.0GHz, 2GB RAM, Linux.
out of memory
33
Roadmap
  • Interactive POMDPs
  • Approximating I-POMDPs
  • Subjective Equilibrium in I-POMDPs
  • Convergence of Bayesian Learning
  • Subjective Equilibrium
  • Computational Limitations
  • Conclusion
  • Summary
  • Future Work

34
Subjective Equilibrium in I-POMDPs
  • Theoretical Analysis
  • Joint observation histories in the multiagent
    tiger problem
  • Absolute Continuity Condition (ACC)
  • Agents initial belief induced over the future
    observation paths should not rule out the ones
    considered possible by the true distribution
  • Cautious beliefs ? Grain of truth assumption

35
Subjective Equilibrium in I-POMDPs
  • Theorem 1 Under ACC, an agents belief over
    others models updated using the I-POMDP belief
    update converges with probability 1
  • Proof sketch Show that Bayesian learning is a
    Martingale
  • Apply the Martingale Convergence
    Theorem (Doob53)
  • Subjective ?-Equilibrium (KalaiLehrer93) A
    profile of strategies of agents each of which is
    an exact best response to a belief that is
    ?-close to the true distribution over the
    observation history
  • Subjective equilibrium is stable under learning
    and optimization

Prediction
36
Subjective Equilibrium in I-POMDPs
  • Corollary If agents beliefs within the I-POMDP
    framework satisfy the ACC, then after finite time
    T, their strategies are in subjective
    ?-equilibrium, where ? is a function of T
  • When ? 0, subjective equilibrium obtains
  • Proof follows from the convergence of the I-POMDP
    belief update
  • ACC is a sufficient condition, but not a
    necessary one

37
Subjective Equilibrium in I-POMDPs
  • Computational Limitations
  • There exist computable strategies that admit no
    computable exact best responses (NachbarZame96)
  • If possible strategies are assumed computable,
    then is best response may not be computable.
    Therefore, js cautious beliefs ? grain of truth
  • Subtle tension between prediction and
    optimization
  • Strictness of ACC
  • Theorem 2 Within the finitely nested I-POMDP
    framework, all the agents beliefs will never
    simultaneously satisfy the grain of truth
    assumption

38
Roadmap
  • Interactive POMDPs
  • Approximating I-POMDPs
  • Subjective Equilibrium in I-POMDPs
  • Conclusion
  • Summary
  • Future Work

39
Summary
  • I-POMDP A novel framework for planning in
    complex multiagent settings
  • Combines concepts from decision theory and game
    theory
  • Allows strategic as well as long-term planning
  • Applicable to cooperative and non-cooperative
    settings
  • Solution is complete and unique (upto plans of
    equal expected utility)
  • Online anytime approximation technique
  • Interactive Particle Filter Addresses the curse
    of dimensionality
  • Reachability Tree Sampling Reduces the effect of
    the curse of history
  • Equilibria in I-POMDPs
  • Theoretical convergence to subjective equilibrium
    given ACC
  • Computational obstacles to satisfying ACC
  • Applications
  • Agent based simulation of social behaviors
  • Robotics, defense, healthcare, economics, and
    networking

40
Future Work
  • Other approximation methods
  • Tighter error bounds
  • Multiagent planning with bounded rational agents
  • Models for describing bounded rational agents
  • Communication between agents
  • Cost optimality profile for plans as a function
    of levels of nesting
  • Other applications

41
Thank You
Questions
42
Selected PublicationsFull publication list at
http//dali.ai.uic.edu/pdoshi
  • Selected Journals
  • Piotr Gmytrasiewicz, Prashant Doshi, A Framework
    for Sequential Planning in Multiagent Settings,
    Journal of AI Research (JAIR), Vol 23, 2005
  • Prashant Doshi, Richard Goodwin, Rama Akkiraju,
    Kunal Verma, Dynamic Workflow Composition using
    Markov Decision Processes, Journal of Web
    Services Research (JWSR), 2(1)1-17, 2005
  • Selected Conferences
  • Prashant Doshi, Piotr Gmytrasiewicz, A Particle
    Filtering Based Approach to Approximating
    Interactive POMDPs, National Conference on AI
    (AAAI), pp. 969-974, July, 2005
  • Prashant Doshi, Piotr Gmytrasiewicz,
    Approximating State Estimation in Multiagent
    Settings using Particle Filters, Autonomous
    Agents and Multiagent Systems Conference (AAMAS),
    July, 2005
  • Piotr Gmytrasiewicz, Prashant Doshi, Interactive
    POMDPs Properties and Preliminary Results,
    Autonomous Agents and Multiagent Systems
    Conference (AAMAS) pp. 1374-1375, July, 2004
  • Prashant Doshi, Richard Goodwin, Rama Akkiraju,
    Kunal Verma, Dynamic Workflow Composition using
    Markov Decision Processes, International
    Conference on Web Services (ICWS), pp. 576-582,
    July, 2004
  • Piotr Gmytrasiewicz, Prashant Doshi, A Framework
    for Sequential Planning in Multiagent Settings,
    International Symposium on AI Math (AMAI), Jan,
    2004

43
Interactive POMDPs
  • Finitely nested I-POMDP I-POMDPi,l
  • Computable approximations of I-POMDPs

bottom up
  • 0th level type is a POMDP

44
Interactive POMDPs
  • Solutions to the enemy version of the multiagent
    tiger problem

Agent i believes that j is likely to be uninformed
45
Interactive POMDPs
Agent i believes that j is likely to be almost
informed
46
Interactive POMDPs
The value of an interaction for an agent is more
when its enemy is uninformed as compared to when
it is informed
47
Background POMDPs
Policy Computation
,
Trace of policy computation
48
Background POMDPs
  • Value of all beliefs

Policy
Value Function
  • Properties of Value function
  • Value function is piecewise linear and convex
  • Value function converges asymptotically

49
Approximating I-POMDPs
Performance Profiles
Multiagent Tiger Problem
Write a Comment
User Comments (0)