Nash Equilibria and Reachability Games

1 / 61

About This Presentation

Title:

Nash Equilibria and Reachability Games

Description:

One-Step Game. Regions are sets of states. Let U be a set ... One-Step Game. Player 1's value: Maximal expectation of f(Q) Define the value ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 62

Provided by: rup6

more less

Transcript and Presenter's Notes

Title: Nash Equilibria and Reachability Games

1
Nash Equilibria andReachability Games

Rupak Majumdar
University of California, Los Angeles

2
Systems and Models
Calculate
Model
Mathematics
Predict Analyze Model
Abstract Build Model
Aircraft
System
Test
3
(Qualitative) Systems Theory

Trajectory dynamic evolution of state
sequence of states
Model generates a set of trajectories
transition graph
Property assigns boolean values to
trajectories logical formula
Algorithm compute values of the trajectories
generated by a model

red and green alternate
4
Model Colored Transition Graphs
a
c
b
5
Property Eventually red
a
c
b
On graphs ? ?red some trajectory has the
property ?red
6
For qualitative properties over discrete systems,
there is a beautiful, robust theory Buchi,
Rabin, Emerson, Pnueli et al.

The ?-Regular Properties
-logical characterization (S1S second-order
monadic theory) -modal characterization
(LTL first-order fragment)
-nondeterministic characterization (Buchi
automata) -deterministic
characterization (Rabin automata)
-topological characterization (2.5
Borel levels of Cantor topology) -fixpoint
characterization (?-calculus) -effectively closed
under boolean operations
-decidable (S1S
nonelementary, Buchi linear)
7
Richer Models Games
FAIRNESS ?-automaton
Parity game
graph
ADVERSARIAL CONCURRENCY game graph

for compositional modeling of systems
for computing winning strategies (control)

Two players
Finite set of states S
Finite set of actions S
Action assignments ?1,?2S! 2?n
Deterministic transition function d(s, a1, a2) t

1,1 1,2
1,1 1,2 2,2
2,1 2,2
a
c
b
2,1
On games ltltleftgtgt ?red player "left" has a
strategy to enforce ?red
9
Strategies

Deterministic Strategies
Functions from histories S to enabled moves
Given a play s0s1 sk,
strategy ?i(s0s1...sk) a for some a 2 ?i(sk)

10
Winning Conditions

Outcome Sequence of states
Winning Condition
Language ? over outcomes
Player 1s objective
Ensure that the outcome is a member of ?
no matter what player 2 does

11
Fundamental Questions

Fundamental Property Determinacy
Set of states can be partitioned into states
where player 1 wins and states where player 2
wins
Fundamental algorithmic question
Given a deterministic turn based game and a
winning
condition, find the set of states from which
player 1
can win. Also find a (deterministic) winning
strategy.

12
One-Step Game

Regions are sets of states
Let U be a set
From where can we reach U surely in one step?
CPre1(U)
s9 a2?1(s). 8 b 2?2(s). ?(s,a,b)2 U
CPre1 is a transformer on regions
Similarly, we can define CPre2 for player 2

13
Multistep Reachability

Winning condition Can player 1 eventually reach
P?
This is a least fixpoint
? x. P Ç CPre1(x)

P
.
CPre(P)
CPre2(P)
14
Multistep Reachability

The proof is not yet complete.
To finish the proof we must show we cannot win
from the complement

P
.
?
CPre(P)
CPre2(P)
15
More Objectives

?-regular objectives
Buchi Landweber69, Gurevich Harrington 82,
Emerson Jutla91 Every two-player game with
?-regular winning conditions is determined.
EmersonJutla91 Winning states for parity
objectives can be computed in NP Å coNP
Borel objectives
Martin 75 Every two-player game with Borel
winning conditions is determined.

16
Quantitative Systems Theory

Trajectory dynamic evolution of state
sequence of states
Model generates a set of trajectories game
graph
Property assigns real values to trajectories
quantitative logical formula
Algorithm compute real values of the
trajectories generated by a model

what fraction of paths see red nodes?
17
Models with Probability
FAIRNESS ?-automaton
Parity game
ADVERSARIAL CONCURRENCY game graph
graph
Stochastic games
PROBABILITIES Markov Decision
Processes
18
Concurrent Games

Two players
Finite set of states S
Finite set of actions S
Action assignments ?1,?2S! 2?n
Probabilistic transition function
d(s, a1, a2)(t) Pr t s, a1, a2

19
Concurrent Games
a
c
b
right
right
1
2
1
2
left
left
a 0.6 b 0.4
a 0.5 b 0.5
a 0.0 c 1.0
a 0.0 c 1.0
1
1
a 0.1 b 0.9
a 0.2 b 0.8
a 0.7 b 0.3
a 0.0 b 1.0
2
2
Maximal probability with which player "left" can
enforce ?red against all randomized strategies of
player right ?
20
Overview of Types of Games
Deterministic
Probabilistic
Tic-tac-toe, Control of ?-automata
Control of probabilistic I/O automata
Turn based
Matching pennies, rock- Paper, scissors, Control
of synchronous components
Stochastic games Control of general Competitive
Markov Processes
Concurrent
21
Overview of Types of Games
Deterministic
Probabilistic
8 s2 S.?1(s)1or ?2(s)1 8 a2?1(s)8
b2?2(s)?(s,a,b)1
8 s2 S.?1(s)1or ?2(s)1
Turn based
8 a2?1(s)8 b2?2(s)?(s,a,b)1
Concurrent
22
Concurrent Games Example
01 10
01 10
00 11
00 11
Probability to win with deterministic strategies
is 0
Player 1 has a randomized strategy to win with
probability 1/2
Quantitative winning!
23
Strategies

Randomized strategies
Functions from histories to lotteries over
enabled moves given a play s0s1 sk,
strategy ?i(s0s1sk) D
for some distribution D over the enabled moves
Strategy is memoryless if ?i(s0s1sk) ?i(sk)

24
Winning Conditions Concurrent Games

Language ? over outcomes
Value of a game is the maximal probability of
ensuring the outcome is in Y
h 1 iY(s) supx 1infx 2 Prsx 1x 2 Y
(where Y Index set for Y)

25
Winning Conditions Concurrent Games

Fundamental Property Determinacy
For each state s, h 1i? (s) 1 - h 2i ?(s)
Fundamental Algorithmic Question Given a
concurrent game and a winning condition, find at
each state the maximal probability with which
player 1 can ensure the winning condition holds

26
One-Step Game

Regions are functions f S ! 0,1
Suppose f is a payoff function on states
From state s, players choose actions a1, a2
(simultaneously and independently)
The next state Q is chosen according to the
distribution d, and player 1 gets payoff f(Q)

27
One-Step Game

Player 1s value
Maximal expectation of f(Q)
Define the value
Ppre (f) (s) supx 1infx 2ESf(Q)

28
Fundamental Theorem of Zero Sum Games

Equivalent to zero-sum matrix games
Value and optimal randomized strategies exist for
both players
Minmax Theorem vonNeumann28
Can be computed by linear programming
Also shows value for finitely repeated games
But we are interested in infinite games

29
Reachability

Maximal probability of reaching a set U of states
Can be reduced to positive stochastic games
Characterizing winning value
X0 0 Xn1 max(U, Ppre(Xn))
X lim Xn
Correctness is by induction on the n-step game

30
Reachability Example
01 10
01 10
S3
00 11
00 11
S1
S2
S4
31
No optimal strategy Example
01 10
00
11
Probability of winning is 1
Player 1 has a randomized strategy to win with
probability 1-e for all e
32
More Objectives

?-regular objectives
deAlfaroM01 Every two-player concurrent game
with ?-regular winning conditions is determined.
deAlfaroM01 Algorithms to approximate the value
in 3EXPTIME
ChatterjeeMJurdzinski04 Algorithms to
approximate the value of reachability games in
NPÅ coNP
Borel objectives
Martin 98,MaitraSudderth98 Every two-player
concurrent game with Borel winning conditions is
determined.

33
Reachability Game
a,b
a,b
s
t
u
Reach u (t) (-32p 5)/5
34
Non Zero Sum Games

So far, our games had two players
Player 1s goal was ?
Player 2s goal was ?
Strictly competitive!

35
Non Zero Sum Games

But systems are not (always) malicious
Usually player 1 has a goal ?1, player 2 has a
goal ?2
These goals are not necessarily contradictory
Each is happy to ensure his own goal
Such a game is non zero sum

36
Simple Example Ethernet
(s,s), (ns,ns)
(n,s)
(s,n)
(n,s)
(s,n)
(n,s)
(s,n)
37
History Non Zero Sum Games

Every finite n-player game has an equilibrium
Nash50
Complexity of finding a Nash equilibrium is open
Pap94,Pap01
Discounted stochastic n player games have a Nash
equilibrium Fick64,MertensParthasarathy86
2-player nonzero sum stochastic games with
limiting average payoff Vieille00
Closed sets SuddherthSecchi02
Open Sets (Reachability) ChatterjeeJurdzinskiM03
(This talk)

38
One Shot Games

Games in strategic form
Bimatrix games
A matrix of payoffs for each player
If player 1 plays a, and player 2 plays b, then
player 1 gets P1a,b
Player 2 gets P2a,b

39
Examples

Prisoners Dilemma

Chicken
40
Nash Equilibrium

A pair of strategies (?1, ?2) is an ?-Nash
equilibrium if
For all ?1, ?2
Value2(?1, ?2) Value2(?1, ?2) ?
Value1(?1, ?2) Value1(?1, ?2) ?
Neither player has advantage of more than ? in
deviating from the equilibrium strategy
A 0-Nash equilibrium is called a Nash equilibrium

41
Nashs Theorem

Theorem Every bimatrix game has a Nash
equilibrium in randomized strategies.
Proof uses Kakutanis fixpoint theorem

42
Nashs Theorem

Theorem Every bimatrix game has a Nash
equilibrium in randomized strategies.
Idea of proof Define a mapping
By Kakutanis fixpoint theorem, there is a
fixpoint for this map
This is a Nash equilibrium point

43
Nashs Theorem

Theorem Every bimatrix game has a Nash
equilibrium in randomized strategies.
This also shows Nash equilibria exist in finitely
repeated games

44
Algorithms?

The proof is existential.
No polynomial time algorithm to find Nash
equilibria is known for 2 person games!

45
Reachability Games

A non zero sum reachability game consists of
A concurrent game G
Two sets of states S1 and S2 of G
Player 1s goal is to get to S1
Player 2s goal is to get to S2
Given strategies ?1 and ?2, Valuei(?1,?2) is the
probability with which the stochastic process
visits Si

46
Nash Equilibrium in Reachability Games

Fundamental Question Do ?-Nash equilibria exist
in nonzero sum reachability games for every ?gt0?
Does not follow from Nashs Theorem!
For safety games, the answer is yes
SudderthSecchi02
In fact, Nash equilibria exist
But reachability case does not follow by
duality
For reachability games, the question was open

47
No Nash Equilibrium Example
01 10
00
11
Player 1 has a randomized strategy to win with
probability 1-e for all e But no optimal strategy
48
Main Theorem

Theorem ChatterjeeMJurdzinski04 An n-player
nonzero sum reachability game has an ? Nash
equilibrium in memoryless strategies for all ?gt0.

49
Idea of proof

Define ?-discounted games, show memoryless Nash
equilibria exist in such games.
Consider a Nash equilibrium in the ?-discounted
reachability game. This equilibrium can be
approximated by strategies of a simple form
(k-uniform)
This strategy profile is an ?-Nash equilibrium in
the original game for suitable ?.
This is because if I fix the strategy of player
2, in the resulting MDP, the value is close
to the discounted value
Similarly for player 1

50
Discounted Reachability Games

A ?-discounted reachability game is played as
follows.
At each stage, the game stops with probability ?,
and continues with probability 1- ?.
Theorem A ?-discounted reachability game has a
Nash equilibrium in memoryless strategies.
The proof is an application of Kakutanis
fixpoint theorem
This is related to Nash equilibria in discounted
reward games Fink64,Sobel71

51
Approximating Strategies

Let J be a bimatrix game with n players
Each player has m actions
A strategy is k-uniform if it is a uniform
distribution over a multiset of size k
Let ? be a Nash equilibrium profile.
LiptonMarkakisMehta03 For every ?gt0, for every
k gt (3n2 ln (n2m))/?2 there exists a
k-uniform strategy profile ? s.t. for every
action a,
if ?(a)0, then ?(a)0.
if ?(a)gt0 then ?(a)- ?(a) lt ?

52
Markov Decision Processes

A Markov decision process (MDP) is a one player
game.
Reachability, discounted reachability is defined
on MDPs by restriction from games.
When we fix the strategies of all but one player
i, we have an MDP Gi.

53
Approximating Equilibria inDiscounted Games

For an n-player discounted reachability game G?,
for every ?gt0, there exists a memoryless strategy
profile ? such that
? is an ?-Nash equilibrium profile of G? and
for every player i, the minimum transition
probability in the MDP Gi is at least f(?,n,G).

54
Approximating MDPs

Let G be a MDP reachability game
Condon90 For all ?gt0 there exists discount
factor ? such that for all states s2 S of the
?-discounted game G? we have
v(s) v?(s) lt ?

55
Complexity

Can approximate an ?-Nash equilibrium to within ?
for constant ?, ? in NP
Guess the memoryless (k-uniform) strategy
profiles
Solve the MDPs after fixing all but one players
strategies
Payoffs can be irrational, so we can only hope to
approximate

56
More Objectives

Fundamental Open Question Is there a nonzero sum
version of Martins Theorem for concurrent games?
Dont know even for
Mixed safety and reachability objectives
Likely to be hard problems

57
Turn Based Games

Theorem ChatterjeeMJurdzinki04
n-Player turn based probabilistic games with
Borel payoffs have ?-Nash equilibria in
deterministic strategies.
n-player turn based deterministic games with
Borel payoffs have Nash equilibria in
deterministic strategies.

58
Trick with Deterministic Strategies

For an n-player game where player i has objective
?i
Consider the zero sum game of player i with
objective ?i against all other players with
objective ?i
Suppose this zero sum game has a deterministic
winning strategy ?i for i and ?i for all the
others
Nash equilibrium
Every player i plays ?i from above.
As soon as someone deviates, all the other
players punish by switching to ?i
Deterministic strategies are necessary to observe
deviations
Folk result? ThuijsmanRaghavan97.

59
Turn Based Games

A careful study of Martins determinacy proof
shows that we can construct ?-optimal
deterministic strategies for turn based
probabilistic games
And optimal pure strategies for deterministic
turn based games

60
Las Vegas Game
Work
Go to Vegas
Play again
1/2
Jackpot
Sorry you lose
61
Las Vegas Game

For every ?gt0, Las Vegas game has a (1-?)-optimal
winning strategy
For ? 1/2n, work for n days before heading to
Vegas
But no optimal winning strategy
The winning condition is not ?-regular
Number of times you are allowed to play is the
number of days you have worked

62
?-Regular?

The Las Vegas game is not ?-regular
For ?-regular games, optimal deterministic
winning strategies exist ChatterjeeJurdzinskiHenz
inger04
Thus, turn based nonzero sum games with ?-regular
objectives have pure Nash equilibria.
For parity conditions, we can compute value
profile of some Nash equilibrium in NP

63
Credits

Work done in collaboration with
Luca de Alfaro. Quantitative solution of
concurrent games, STOC01
Krishnendu Chatterjee and Marcin Jurdzinski. On
Nash equilibria in stochastic games, CSL04

64
Thank You!