Regret Minimization in Stochastic Games - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Regret Minimization in Stochastic Games

Description:

Suppose by time t, average reward is , opponent empirical strategy is qt. ... an average reward that is higher than value when the opponent is sub optimal ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 21
Provided by: eceMc
Category:

less

Transcript and Presenter's Notes

Title: Regret Minimization in Stochastic Games


1
Regret Minimization in Stochastic Games
  • Shie Mannor and Nahum Shimkin
  • Technion, Israel Institute of Technology
  • Dept. of Electrical Engineering

2
Introduction
  • Modeling of a dynamic decision process as a
    stochastic game
  • Non stationarity of the environment
  • Environments are not (necessarily) hostile
  • Looking for the best possible strategy in light
    of the environments actions.

3
Repeated Matrix Games
  • The sets of single stage strategies P and Q are
    simplical.
  • Rewards are defined by a reward matrix G
    r(p,q)pGq
  • Reward criteria - average reward
  • Need not converge stationarity is not
  • assumed

4
Regret for Repeated Matrix Games
  • Suppose by time t, average reward is ,
    opponent empirical strategy is qt.
  • The regret is defined as
  • A policy is called regret minimizing if

5
Regret minimization for repeated matrix games
  • Such policies do exist (Hannan, 56)
  • A proof using Approachability theory (Blackwell,
    56)
  • Also for games with partial observation (Auer et
    al. ,1995 Rustichini, 1999)

6
Stochastic Games
  • Formal Model
  • S1,,s state space
  • AA(s) actions of Regret minimizing player, P1
  • BB(s) actions of the environment, P2
  • r - reward function, r(s,a,b)
  • P - transition kernel, P(ss,a,b)
  • Expected average for p?P, q?Q is r(p,q)
  • Single state recurrence assumption

7
Bayes Reward in Strategy Space
  • For every stationary strategy q?Q, the Bayes
    reward is defined as
  • Problems
  • P2s strategy is not completely observed
  • P1s observations may depends on the strategies
    of both players

8
Bayes Reward in State-Action Space
  • Let psb be the observed frequency of P2s action
    b and state s.
  • A natural estimate of q is
  • The associated Bayes envelope is

9
Approachability Theory
  • A standard tool in the theory of repeated matrix
    games (Blackwell, 1956)
  • For a game with vector reward and
    average reward
  • A set is approachable by P1 with a
    policy s if
  • Was extended to recurrent stochastic games
    (Shimkin and Shwartz, 1993)

10
The Convex Bayes Envelope
  • In general BE is not approachable.
  • Define CBEco(BE), that is
  • where is the lower convex hull
  • of
  • Theorem CBE is approachable.
  • (val is the value of the game)

11
Single Controller Games
  • Theorem Assume that P2 alone controls the
    transitions, i.e.
  • then BE itself is approachable.

12
An Application to Prediction with Expert Advice
  • Given a channel and a set of experts
  • At each time epoch each expert states his
    prediction of the next symbol and P1 has to
    choose his prediction, ?
  • Then a letter ? appears in the channel and P1
    receives his prediction reward r(?, ?)
  • Problem can be formulated as stochastic game, P2
    stands for all experts and the channel

13
Prediction Example (cont)
  • Theorem P1 has a zero regret strategy.

14
An example in which BE is not approachable
  • It can be proved that BE for the
  • above game is not approachable

15
Example (cont)
  • In r(q) space the envelopes are

16
Open questions
  • Characterization of minimal approachable sets in
    reward-state-actions space
  • On-line learning schemes for stochastic games
    with unknown parameters
  • Other ways of formulating optimality with respect
    to observed state action frequencies

17
Conclusions
  • The problem of regret minimization for stochastic
    games was considered
  • The proposed solution concept, CBE, is based on
    convexification of the Bayes envelope in the
    natural state action space.
  • The concept of CBE ensures an average reward that
    is higher than value when the opponent is sub
    optimal

18
Regret Minimization in Stochastic Games
  • Shie Mannor and Nahum Shimkin
  • Technion, Israel Institute of Technology
  • Dept. of Electrical Engineering

19
Approachability Theory
  • Let m(p,q) be the average vector valued reward in
    a game when P1 and P2 play p and q
  • Define
  • Theorem Blackwell 56 A convex set C is
    approachable if and only if for every q?Q
  • Extended to stochastic games (Shimkin and
    Shwartz, 1993)

20
A related Vector Valued Game
  • Define the following vector valued game
  • If in state s action b is played by P2 and a
    reward r is gained then the vector valued mt
Write a Comment
User Comments (0)
About PowerShow.com