Nash Equilibria and Reachability Games

1 / 61
About This Presentation
Title:

Nash Equilibria and Reachability Games

Description:

One-Step Game. Regions are sets of states. Let U be a set ... One-Step Game. Player 1's value: Maximal expectation of f(Q) Define the value ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 62
Provided by: rup6

less

Transcript and Presenter's Notes

Title: Nash Equilibria and Reachability Games


1
Nash Equilibria andReachability Games
  • Rupak Majumdar
  • University of California, Los Angeles

2
Systems and Models
Calculate
Model
Mathematics
Predict Analyze Model
Abstract Build Model
Aircraft
System
Test
3
(Qualitative) Systems Theory
  • Trajectory dynamic evolution of state
    sequence of states
  • Model generates a set of trajectories
    transition graph
  • Property assigns boolean values to
    trajectories logical formula
  • Algorithm compute values of the trajectories
    generated by a model

red and green alternate
4
Model Colored Transition Graphs
a
c
b
5
Property Eventually red
a
c
b
On graphs ? ?red some trajectory has the
property ?red
6
For qualitative properties over discrete systems,
there is a beautiful, robust theory Buchi,
Rabin, Emerson, Pnueli et al.

The ?-Regular Properties
-logical characterization (S1S second-order
monadic theory) -modal characterization
(LTL first-order fragment)
-nondeterministic characterization (Buchi
automata) -deterministic
characterization (Rabin automata)
-topological characterization (2.5
Borel levels of Cantor topology) -fixpoint
characterization (?-calculus) -effectively closed
under boolean operations
-decidable (S1S
nonelementary, Buchi linear)
7
Richer Models Games
FAIRNESS ?-automaton
Parity game
graph
ADVERSARIAL CONCURRENCY game graph
  • for compositional modeling of systems
  • for computing winning strategies (control)

8
  • Two players
  • Finite set of states S
  • Finite set of actions S
  • Action assignments ?1,?2S! 2?n
  • Deterministic transition function d(s, a1, a2) t

1,1 1,2
1,1 1,2 2,2
2,1 2,2
a
c
b
2,1
On games ltltleftgtgt ?red player "left" has a
strategy to enforce ?red
9
Strategies
  • Deterministic Strategies
  • Functions from histories S to enabled moves
  • Given a play s0s1 sk,
  • strategy ?i(s0s1...sk) a for some a 2 ?i(sk)

10
Winning Conditions
  • Outcome Sequence of states
  • Winning Condition
  • Language ? over outcomes
  • Player 1s objective
  • Ensure that the outcome is a member of ?
  • no matter what player 2 does

11
Fundamental Questions
  • Fundamental Property Determinacy
  • Set of states can be partitioned into states
    where player 1 wins and states where player 2
    wins
  • Fundamental algorithmic question
  • Given a deterministic turn based game and a
    winning
  • condition, find the set of states from which
    player 1
  • can win. Also find a (deterministic) winning
    strategy.

12
One-Step Game
  • Regions are sets of states
  • Let U be a set
  • From where can we reach U surely in one step?
  • CPre1(U)
  • s9 a2?1(s). 8 b 2?2(s). ?(s,a,b)2 U
  • CPre1 is a transformer on regions
  • Similarly, we can define CPre2 for player 2

13
Multistep Reachability
  • Winning condition Can player 1 eventually reach
    P?
  • This is a least fixpoint
  • ? x. P Ç CPre1(x)

P
.
CPre(P)
CPre2(P)
14
Multistep Reachability
  • The proof is not yet complete.
  • To finish the proof we must show we cannot win
    from the complement

P
.
?
CPre(P)
CPre2(P)
15
More Objectives
  • ?-regular objectives
  • Buchi Landweber69, Gurevich Harrington 82,
    Emerson Jutla91 Every two-player game with
    ?-regular winning conditions is determined.
  • EmersonJutla91 Winning states for parity
    objectives can be computed in NP Å coNP
  • Borel objectives
  • Martin 75 Every two-player game with Borel
    winning conditions is determined.

16
Quantitative Systems Theory
  • Trajectory dynamic evolution of state
    sequence of states
  • Model generates a set of trajectories game
    graph
  • Property assigns real values to trajectories
    quantitative logical formula
  • Algorithm compute real values of the
    trajectories generated by a model

what fraction of paths see red nodes?
17
Models with Probability
FAIRNESS ?-automaton
Parity game
ADVERSARIAL CONCURRENCY game graph
graph
Stochastic games
PROBABILITIES Markov Decision
Processes
18
Concurrent Games
  • Two players
  • Finite set of states S
  • Finite set of actions S
  • Action assignments ?1,?2S! 2?n
  • Probabilistic transition function
  • d(s, a1, a2)(t) Pr t s, a1, a2

19
Concurrent Games
a
c
b
right
right
1
2
1
2
left
left
a 0.6 b 0.4
a 0.5 b 0.5
a 0.0 c 1.0
a 0.0 c 1.0
1
1
a 0.1 b 0.9
a 0.2 b 0.8
a 0.7 b 0.3
a 0.0 b 1.0
2
2
Maximal probability with which player "left" can
enforce ?red against all randomized strategies of
player right ?
20
Overview of Types of Games
Deterministic
Probabilistic
Tic-tac-toe, Control of ?-automata
Control of probabilistic I/O automata
Turn based
Matching pennies, rock- Paper, scissors, Control
of synchronous components
Stochastic games Control of general Competitive
Markov Processes
Concurrent
21
Overview of Types of Games
Deterministic
Probabilistic
8 s2 S.?1(s)1or ?2(s)1 8 a2?1(s)8
b2?2(s)?(s,a,b)1
8 s2 S.?1(s)1or ?2(s)1
Turn based
8 a2?1(s)8 b2?2(s)?(s,a,b)1
Concurrent
22
Concurrent Games Example
01 10
01 10
00 11
00 11
Probability to win with deterministic strategies
is 0
Player 1 has a randomized strategy to win with
probability 1/2
Quantitative winning!
23
Strategies
  • Randomized strategies
  • Functions from histories to lotteries over
    enabled moves given a play s0s1 sk,
  • strategy ?i(s0s1sk) D
  • for some distribution D over the enabled moves
  • Strategy is memoryless if ?i(s0s1sk) ?i(sk)

24
Winning Conditions Concurrent Games
  • Language ? over outcomes
  • Value of a game is the maximal probability of
    ensuring the outcome is in Y
  • h 1 iY(s) supx 1infx 2 Prsx 1x 2 Y
  • (where Y Index set for Y)

25
Winning Conditions Concurrent Games
  • Fundamental Property Determinacy
  • For each state s, h 1i? (s) 1 - h 2i ?(s)
  • Fundamental Algorithmic Question Given a
    concurrent game and a winning condition, find at
    each state the maximal probability with which
    player 1 can ensure the winning condition holds

26
One-Step Game
  • Regions are functions f S ! 0,1
  • Suppose f is a payoff function on states
  • From state s, players choose actions a1, a2
    (simultaneously and independently)
  • The next state Q is chosen according to the
    distribution d, and player 1 gets payoff f(Q)

27
One-Step Game
  • Player 1s value
  • Maximal expectation of f(Q)
  • Define the value
  • Ppre (f) (s) supx 1infx 2ESf(Q)

28
Fundamental Theorem of Zero Sum Games
  • Equivalent to zero-sum matrix games
  • Value and optimal randomized strategies exist for
    both players
  • Minmax Theorem vonNeumann28
  • Can be computed by linear programming
  • Also shows value for finitely repeated games
  • But we are interested in infinite games

29
Reachability
  • Maximal probability of reaching a set U of states
  • Can be reduced to positive stochastic games
  • Characterizing winning value
  • X0 0 Xn1 max(U, Ppre(Xn))
  • X lim Xn
  • Correctness is by induction on the n-step game

30
Reachability Example
01 10
01 10
S3
00 11
00 11
S1
S2
S4
31
No optimal strategy Example
01 10
00
11
Probability of winning is 1
Player 1 has a randomized strategy to win with
probability 1-e for all e
32
More Objectives
  • ?-regular objectives
  • deAlfaroM01 Every two-player concurrent game
    with ?-regular winning conditions is determined.
  • deAlfaroM01 Algorithms to approximate the value
    in 3EXPTIME
  • ChatterjeeMJurdzinski04 Algorithms to
    approximate the value of reachability games in
    NPÅ coNP
  • Borel objectives
  • Martin 98,MaitraSudderth98 Every two-player
    concurrent game with Borel winning conditions is
    determined.

33
Reachability Game
a,b
a,b
s
t
u
Reach u (t) (-32p 5)/5
34
Non Zero Sum Games
  • So far, our games had two players
  • Player 1s goal was ?
  • Player 2s goal was ?
  • Strictly competitive!

35
Non Zero Sum Games
  • But systems are not (always) malicious
  • Usually player 1 has a goal ?1, player 2 has a
    goal ?2
  • These goals are not necessarily contradictory
  • Each is happy to ensure his own goal
  • Such a game is non zero sum

36
Simple Example Ethernet
(s,s), (ns,ns)
(n,s)
(s,n)
(n,s)
(s,n)
(n,s)
(s,n)
37
History Non Zero Sum Games
  • Every finite n-player game has an equilibrium
    Nash50
  • Complexity of finding a Nash equilibrium is open
    Pap94,Pap01
  • Discounted stochastic n player games have a Nash
    equilibrium Fick64,MertensParthasarathy86
  • 2-player nonzero sum stochastic games with
    limiting average payoff Vieille00
  • Closed sets SuddherthSecchi02
  • Open Sets (Reachability) ChatterjeeJurdzinskiM03
  • (This talk)

38
One Shot Games
  • Games in strategic form
  • Bimatrix games
  • A matrix of payoffs for each player
  • If player 1 plays a, and player 2 plays b, then
  • player 1 gets P1a,b
  • Player 2 gets P2a,b

39
Examples
  • Prisoners Dilemma

Chicken
40
Nash Equilibrium
  • A pair of strategies (?1, ?2) is an ?-Nash
    equilibrium if
  • For all ?1, ?2
  • Value2(?1, ?2) Value2(?1, ?2) ?
  • Value1(?1, ?2) Value1(?1, ?2) ?
  • Neither player has advantage of more than ? in
    deviating from the equilibrium strategy
  • A 0-Nash equilibrium is called a Nash equilibrium

41
Nashs Theorem
  • Theorem Every bimatrix game has a Nash
    equilibrium in randomized strategies.
  • Proof uses Kakutanis fixpoint theorem

42
Nashs Theorem
  • Theorem Every bimatrix game has a Nash
    equilibrium in randomized strategies.
  • Idea of proof Define a mapping
  • By Kakutanis fixpoint theorem, there is a
    fixpoint for this map
  • This is a Nash equilibrium point

43
Nashs Theorem
  • Theorem Every bimatrix game has a Nash
    equilibrium in randomized strategies.
  • This also shows Nash equilibria exist in finitely
    repeated games

44
Algorithms?
  • The proof is existential.
  • No polynomial time algorithm to find Nash
    equilibria is known for 2 person games!

45
Reachability Games
  • A non zero sum reachability game consists of
  • A concurrent game G
  • Two sets of states S1 and S2 of G
  • Player 1s goal is to get to S1
  • Player 2s goal is to get to S2
  • Given strategies ?1 and ?2, Valuei(?1,?2) is the
    probability with which the stochastic process
    visits Si

46
Nash Equilibrium in Reachability Games
  • Fundamental Question Do ?-Nash equilibria exist
    in nonzero sum reachability games for every ?gt0?
  • Does not follow from Nashs Theorem!
  • For safety games, the answer is yes
    SudderthSecchi02
  • In fact, Nash equilibria exist
  • But reachability case does not follow by
    duality
  • For reachability games, the question was open

47
No Nash Equilibrium Example
01 10
00
11
Player 1 has a randomized strategy to win with
probability 1-e for all e But no optimal strategy
48
Main Theorem
  • Theorem ChatterjeeMJurdzinski04 An n-player
    nonzero sum reachability game has an ? Nash
    equilibrium in memoryless strategies for all ?gt0.

49
Idea of proof
  • Define ?-discounted games, show memoryless Nash
    equilibria exist in such games.
  • Consider a Nash equilibrium in the ?-discounted
    reachability game. This equilibrium can be
    approximated by strategies of a simple form
    (k-uniform)
  • This strategy profile is an ?-Nash equilibrium in
    the original game for suitable ?.
  • This is because if I fix the strategy of player
    2, in the resulting MDP, the value is close
    to the discounted value
  • Similarly for player 1

50
Discounted Reachability Games
  • A ?-discounted reachability game is played as
    follows.
  • At each stage, the game stops with probability ?,
    and continues with probability 1- ?.
  • Theorem A ?-discounted reachability game has a
    Nash equilibrium in memoryless strategies.
  • The proof is an application of Kakutanis
    fixpoint theorem
  • This is related to Nash equilibria in discounted
    reward games Fink64,Sobel71

51
Approximating Strategies
  • Let J be a bimatrix game with n players
  • Each player has m actions
  • A strategy is k-uniform if it is a uniform
    distribution over a multiset of size k
  • Let ? be a Nash equilibrium profile.
  • LiptonMarkakisMehta03 For every ?gt0, for every
  • k gt (3n2 ln (n2m))/?2 there exists a
    k-uniform strategy profile ? s.t. for every
    action a,
  • if ?(a)0, then ?(a)0.
  • if ?(a)gt0 then ?(a)- ?(a) lt ?

52
Markov Decision Processes
  • A Markov decision process (MDP) is a one player
    game.
  • Reachability, discounted reachability is defined
    on MDPs by restriction from games.
  • When we fix the strategies of all but one player
    i, we have an MDP Gi.

53
Approximating Equilibria inDiscounted Games
  • For an n-player discounted reachability game G?,
    for every ?gt0, there exists a memoryless strategy
    profile ? such that
  • ? is an ?-Nash equilibrium profile of G? and
  • for every player i, the minimum transition
    probability in the MDP Gi is at least f(?,n,G).

54
Approximating MDPs
  • Let G be a MDP reachability game
  • Condon90 For all ?gt0 there exists discount
    factor ? such that for all states s2 S of the
    ?-discounted game G? we have
  • v(s) v?(s) lt ?

55
Complexity
  • Can approximate an ?-Nash equilibrium to within ?
    for constant ?, ? in NP
  • Guess the memoryless (k-uniform) strategy
    profiles
  • Solve the MDPs after fixing all but one players
    strategies
  • Payoffs can be irrational, so we can only hope to
    approximate

56
More Objectives
  • Fundamental Open Question Is there a nonzero sum
    version of Martins Theorem for concurrent games?
  • Dont know even for
  • Mixed safety and reachability objectives
  • Likely to be hard problems

57
Turn Based Games
  • Theorem ChatterjeeMJurdzinki04
  • n-Player turn based probabilistic games with
    Borel payoffs have ?-Nash equilibria in
    deterministic strategies.
  • n-player turn based deterministic games with
    Borel payoffs have Nash equilibria in
    deterministic strategies.

58
Trick with Deterministic Strategies
  • For an n-player game where player i has objective
    ?i
  • Consider the zero sum game of player i with
    objective ?i against all other players with
    objective ?i
  • Suppose this zero sum game has a deterministic
    winning strategy ?i for i and ?i for all the
    others
  • Nash equilibrium
  • Every player i plays ?i from above.
  • As soon as someone deviates, all the other
    players punish by switching to ?i
  • Deterministic strategies are necessary to observe
    deviations
  • Folk result? ThuijsmanRaghavan97.

59
Turn Based Games
  • A careful study of Martins determinacy proof
    shows that we can construct ?-optimal
    deterministic strategies for turn based
    probabilistic games
  • And optimal pure strategies for deterministic
    turn based games

60
Las Vegas Game
Work
Go to Vegas
Play again
1/2
Jackpot
Sorry you lose
61
Las Vegas Game
  • For every ?gt0, Las Vegas game has a (1-?)-optimal
    winning strategy
  • For ? 1/2n, work for n days before heading to
    Vegas
  • But no optimal winning strategy
  • The winning condition is not ?-regular
  • Number of times you are allowed to play is the
    number of days you have worked

62
?-Regular?
  • The Las Vegas game is not ?-regular
  • For ?-regular games, optimal deterministic
    winning strategies exist ChatterjeeJurdzinskiHenz
    inger04
  • Thus, turn based nonzero sum games with ?-regular
    objectives have pure Nash equilibria.
  • For parity conditions, we can compute value
    profile of some Nash equilibrium in NP

63
Credits
  • Work done in collaboration with
  • Luca de Alfaro. Quantitative solution of
    concurrent games, STOC01
  • Krishnendu Chatterjee and Marcin Jurdzinski. On
    Nash equilibria in stochastic games, CSL04

64
Thank You!
  • http//www.cs.ucla.edu/rupak
Write a Comment
User Comments (0)