Markov Games as a Framework for Multiagent Reinforcement Learning Mike L' Littman - PowerPoint PPT Presentation

About This Presentation
Title:

Markov Games as a Framework for Multiagent Reinforcement Learning Mike L' Littman

Description:

MDP is capable of describing only single-agent environments. New mathematical framework is needed to support multi-agent ... Example 'rock, paper, scissors' ... – PowerPoint PPT presentation

Number of Views:160
Avg rating:3.0/5.0
Slides: 19
Provided by: Yan
Category:

less

Transcript and Presenter's Notes

Title: Markov Games as a Framework for Multiagent Reinforcement Learning Mike L' Littman


1
Markov Games as a Framework for Multi-agent
Reinforcement LearningMike L. Littman
  • Jinzhong Niu
  • March 30, 2004

2
Overview
  • MDP is capable of describing only single-agent
    environments.
  • New mathematical framework is needed to support
    multi-agent reinforcement learning.
  • Markov Games
  • A single step in this direction is explored.
  • 2-player zero-sum Markov Games

3
Definitions
  • Markov Decision Process (MDP)

4
Definitions (cont.)
  • Markov Game (MG)

5
Definitions (cont.)
  • Two-player zero-sum Markov Game (2P-MG)

6
2P-MG Is Capable?
Yes
  • Precludes cooperation!
  • Generalizes
  • MDPs (when O1) The opponent has a constant
    behavior, which may be viewed as part of the
    environment.
  • Matrix Games (when S1)The environment doesnt
    hold any information and rewards are totally
    decided by the actions.

7
Matrix Games
  • Example rock, paper, scissors

8
What does optimality exactly mean?
  • MDP
  • A stationary, deterministic, and undominated
    optimal policy always exists.
  • MG
  • The performance of a policy depends on the
    opponents policy, so we cannot evaluate them
    without context.
  • New definition of optimality in game theory
  • Performs best at its worst case compared with
    others
  • At least one optimal policy exists, which may or
    may not be deterministic because the agent is
    uncertain of its opponents move.

9
Finding Optimal Policy - Matrix Games
  • The optimal agents minimum expected reward
    should be as large as possible.
  • Use V to express the minimum value, then
    consider how to maximize it

10
Finding Optimal Policy - MDP
  • Value of a state
  • Quality of a state-action pair

11
Finding Optimal Policy 2P-MG
  • Value of a state
  • Quality of a s-a-o triple

12
Learning Optimal Polices
  • Q-learning
  • minimax-Q learning

13
Minimax-Q Algorithm
14
Experiment - Problem
  • Soccer

15
Experiment - Training
  • 4 agents trained through 106 steps
  • minimax-Q learning
  • vs. random opponent - MR
  • vs. itself - MM
  • Q-learning
  • vs. random opponent - QR
  • vs. itself - QQ

16
Experiment - Testing
  • Test 3
  • QR, QQ 100 loser?
  • Test 1
  • QR gt MR?
  • Test 2
  • QRltltQQ?

17
Contributions
  • A solution to 2-player Markov games with a
    modified Q-learning method in which minimax is in
    place of max
  • Minimax can also be used in single-agent
    environments to avoid risky behavior.

18
Future work
  • Possible performance improvement of the minimax-Q
    learning method
  • Linear programming caused large computational
    complexity.
  • Iterative methods may be used to get approximate
    solutions to minimax much faster, which is
    sufficiently satisfactory.

19
Discussions
  • The paper claims that the training is not
    sufficient for attaining the optimal policy for
    MR and MM. Then how soon will it possible for
    them to do so?
  • It is claimed that MR and MM should break even
    with even the strongest opponent. Why?
  • After training and before testing, the policies
    in agents are fixed. How about not fixing it and
    leaving learning abilities there? Thus we can
    examine how they adapt themselves over the long
    run, say how their winning rate changes.
  • What is a slow enough exponentially weighted
    average?
Write a Comment
User Comments (0)
About PowerShow.com