PreBayesian Games - PowerPoint PPT Presentation

1 / 81
About This Presentation
Title:

PreBayesian Games

Description:

Based on joint work with Itai Ashlagi, Ronen Brafman and Dov Monderer. GT with CS flavor ... Work in computer science frequently uses non-probabilistic models. ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 82
Provided by: tech183
Category:

less

Transcript and Presenter's Notes

Title: PreBayesian Games


1
Pre-Bayesian Games
  • Moshe Tennenholtz
  • TechnionIsrael Institute of Technology

2
Acknowledgements
  • Based on joint work with Itai Ashlagi, Ronen
    Brafman and Dov Monderer.

3
GT with CS flavor
  • Program equilibrium /
  • strong mediated equilibrium
  • Ranking systems
  • Non-cooperative computing
  • Pre-Bayesian games
  • Distributed Games
  • Recommender systems for GT

4
Modeling Uncertainty
  • In game theory and economics the Bayesian
    approach is mainly used.
  • Work in computer science frequently uses
    non-probabilistic models.
  • Work on Pre-Bayesian games incorporates
    game-theoretic reasoning into non-Bayesian
    decision-making settings.

5
Pre-Bayesian Games
  • Modeling and solution concepts in Pre-Bayesian
    games.
  • Applications congestion games with incomplete
    information.
  • Pre-Bayesian repeated/stochastic games as a
    framework for multi-agent learning.

6
Games with Incomplete Information

7
Model
8
Model (cont.)
9
Flexibility of the Model

10
  • Solution Concepts
  • in Pre-Bayesian Games

11
Dominant Strategies

12
Ex-Post Equilibrium

13
Safety-Level Equilibrium
  • For every type play a strategy that maximizes
    the worst-case payoff given the other players
    strategies.
  • Worst case - over the set of possible states!

14
Safety-Level Equilibrium

w
1-w
z
1-z
p
1-p
q
1-q
15
Safety-Level Equilibrium (cont.)
16
Other Non-Bayesian Solution Concepts
  • Minimax-Regret equilibrium (Hyafil and Boutilier
    2004)
  • Competitive-Ratio equilibrium

17
Existence in Mixed Strategies
  • Theorem
  • Safety-level, Minimax-regret and
  • Competitive-ratio equilibria exist in every
  • concave pre-Bayesian Game.
  • A concave pre-Bayesian game -
  • - For every type the set of possible actions is
    compact and convex (for every player)
  • - ui(?,) - concave function for every player i
  • The proof follows by applying Kakutanis fixed
    point theorem.

18
Related Work On Non-Bayesian Solutions
  • Safety-level equilibria
  • Aghassi and Bertsimas (2004)
  • Levin and Ozdenoren (2004)
  • Pure safety-level equilbria
  • Shoham and Tennenholtz (1992), Moses and
    Tennenholtz (1992),Tennenholtz (1991)
  • Axiomatic Foundations
  • Brafman and Tennenholtz (1996)

19
Beyond Existence
  • The main goal analysis!!

20
Modeling Congestion Settings
21
Modeling Congestion Settings
  • Examples
  • Transportation engineering (Wardrop 1952, Beckman
    et al. 1956)
  • Congestion games (Rosenthal 1973)
  • Potential games (Monderer and Shapley 1996)
  • Price of anarchy (Papadimitriou 1999, Tardos and
    Roughgarden 2001)
  • Resource selection games with player specific
    cost functions (Milchtaich 1996)
  • Local effect games (Leyton-Brown and Tennenholtz
    2003)
  • .

22
Where are we heading to?
  • Our Goal Incorporate incomplete information to
    congestion settings.
  • Type of uncertainties
  • number of players
  • job sizes
  • network structure
  • players cost functions

23
Resource Selection Games with Unknown Number of
Players
24
Resource Selection Games
25
Symmetric Equilibrium
  • Theorem
  • Every resource selection game with
    increasing resource cost functions has a unique
    symmetric equilibrium - .

26
Resource Selection Games with Unknown Number of
Players
27
Uniqueness of Symmetric Safety Level Equilibrium
  • - game with complete information
  • - game with incomplete information
  • Theorem
  • Let ? be a resource selection system with
    increasing resource cost functions.
    has a unique symmetric safety-level
    equilibrium.
  • The symmetric safety-level equilibrium profile is
    .
  • is the unique symmetric equilibrium in
    the game .

28
Is Ignorance Bad?
  • K the real state, Kk , kltn
  • Known number of players
  • cost of every player
  • Unknown number of players
  • - cost of every player
  • ?

29
Is Ignorance Bad?
wj(k)wj(1)(k-1)dj
Main Theorem Let ? be a linear resource
selection system with increasing resource cost
functions. There exist an integer
such that for all 1.
2. All inequalities above are strict if
and only if there exists such
that
30
Where is this Useful?
  • Example
  • Mechanism Design -
  • Organizer knows the exact number of active
    players.
  • Wishes to maximize social surplus -- will not
    reveal the information.

31
More Detailed Analysis
Theorem Let ? be a linear resource selection
system with increasing resource cost functions.
There exist an integer L such that for every
kgtL The minimal social cost in
attained with symmetric mixed-action profiles is
attained at . Consequently,
is minimized at n2k-1.
32
Further Research
  • Routing games with unknown number of players
  • extension to general networks
  • unique symmetric equilibrium exists in
    a model where an
  • agent job can be split
  • ignorance helps as long as nltk2
  • Routing games with unknown job sizes
  • extension to variable job sizes
  • uncertainty about job sizes do not
    change surplus in
  • several general settings
  • ignorance helps where we have
    uncertainty on both the
  • number of participants and
    the job sizes
  • Minimax-regret equilibria in the different
    congestion settings
  • Non-Bayesian equilibria in social choice settings

33
Conclusions so far
  • Non-Bayesian Equilibria exist in pre-Bayesian
    Games.
  • Players are better off with common  lack of
    knowledge about the number of participants.
  • More generally, we show illuminating results
    using non-Bayesian solution concepts in
    pre-Bayesian games.

34
  • Non-Bayesian solutions for repeated (and
    stochastic) games with incomplete information
    efficient learning equilibrium.

35
Learning in multi-agent systems
  • Multi-Agent Learning lies in the intersection of
    Machine Learning/Artificial Intelligence and Game
    Theory
  • Basic settings
  • A repeated game where the game (payoff
    functions) is initially unknown, but may be
    learned based on observed history.
  • A stochastic game where both the stage games
    and the transition probabilities are initially
    unknown.
  • What can be observed following an action is part
    of the problem specification.
  • No Bayesian assumptions!

36
The objective of learning in multi-agent systems
  • Descriptive objective how do people behave/adapt
    their behavior in (e.g. repeated) games?
  • Normative objective can we provide the agents
    with advice about how they should behave, to be
    followed by rational agents, which will also
    lead to some good social outcome?

37
Learning in games an existing perspective
  • Most work on learning in games (in machine
    learning/AI extending upon work in game theory),
    deals with the search for learning algorithms
    that if adopted by all agents will lead to
    equilibrium.
  • (another approach regret minimization will be
    discussed and compared to later).

38
Re-Considering Learning in Games
  • But, why should the agents adopt these learning
    algorithms?
  • This seems contradicting to the whole idea of
    self-motivated agents (which led to considering
    equilibrium concepts).

39
Re-Considering Learning in Games
  • (New) Normative answer The learning algorithms
    themselves should be in equilibrium!
  • We call this form of equilibrium Learning
    Equilibrium, and in particular we consider
    Efficient Learning Equilibrium (ELE).
  • Remark In this talk we refer to optimal ELE
    (extending upon the basic ELE we introduced) but
    use the term ELE.

40
Efficient Learning EquilibriumInformal
Definition
  • The learning algorithms themselves are in
    equilibrium. It is irrational for an agent to
    deviate from its algorithm assuming that the
    others stick to their algorithms, regardless of
    the nature of the (actual) game that is being
    played.
  • If the agents follow the provided learning
    algorithms then they will obtain a value that is
    close to the value obtained in an optimal (or
    Pareto-optimal) Nash equilibrium (of the actual
    game) after polynomially many iterations.
  • It is irrational to deviate from the learning
    algorithm. Moreover, the irrationality of
    deviation is manifested within a polynomial
    number of iterations.

41
  • Efficient Learning Equilibrium is a form of
    ex-post equilibrium in Pre-Bayesian repeated games

42
Basic Definitions
  • Game GltN1,,n,S1,.,Sn,U1,.,Ungt
  • UiS1 ? ? Sn? R - utility function
    for i
  • ?(Si) mixed strategies for i.
  • A tuple of (mixed) strategies t(t1,,tn) is a
    Nash equilibrium if
  • ?i ?N, Ui(t) ? Ui(t1,,ti-1,t,ti1,,tn)
    for every t ? Si
  • Optimal Nash equilibrium maximizes social
    surplus
  • (sum of agents
    payoffs)
  • val(t,i,g) the minimal expected payoff that
    may be obtained by i
  • when employing t in the game g.
  • A strategy t ? ?(Si) for which val(.,i,g) is
    maximized is a
  • safety- level strategy (or, probabilistic
    maximin strategy ), and its value is the
    safety-level value.

43
Basic Definitions
  • R(G) -- repeated game with respect to a
    (one-shot) game G.
  • History of player i after t iterations of R(G)
  • Perfect monitoring Hti ((a1j, ,
    anj),(p1j,,pnj))tj1 ---
  • a player can observe all previously chosen
    actions and payoffs
  • Imperfect monitoring Hti ((a1j, ,
    anj),pij)tj1 ---
  • a player can observe previously chosen
    actions (of all players)
  • and payoffs of i.
  • Strictly imperfect monitoring Hti
    (aij ,pij)tj1 ---
  • a player can observe only its own
    payoffs and actions.
  • Possible histories for agent i Hi??t1
    Hti ? ?
  • Policy for agent i ?Hi? ?(Si)
  • Remark in the game theory literature the term
    perfect monitoring is used to refer to the
    concept of imperfect monitoring above

44
Basic Definitions
  • Let G be a (one-shot) game, let MR(G) be the
    corresponding repeated game, and let n(G) be an
    optimal Nash-equilibrium of G. Denote the
    expected payoff of agent i in that equilibrium by
    NVi(n(G)).
  • Given M R(G) and a natural number T, we denote
    the expected T-step undiscounted average reward
    of player i when the players follow the policy
    profile (?1 ,,?i,,?n) by Ui(M,?1
    ,,?i,,?n,T).
  • Ui(M,?1 ,,?i,,?n)liminfT ? ?Ui(M,?1
    ,,?i,,?n,T)

45
Definition (Optimal) ELE (in 2-person repeated
game)

(?,?) is an efficient learning equilibrium with
respect to the class of games ? (where each
one-shot game has k actions) if for every ? gt 0,
0 lt ? lt1, there exists some Tgt0, where T is
polynomial in 1/ ?, 1/ ?, and k, such that with
probability of at least 1- ? (1) If player
1(resp. 2) deviates from ? to ? (resp. from ? to
?) in iteration l, then U1(M,(?,?) ,lt)
?U1(M,(?,?) ,lt) ? (resp. U2(M,(?,?) ,lt)
?U2(M,(?,?) ,lt) ?) for every t ? T and for
every repeated game MR(G) ? ? . (2) For every t
? T and for every repeated game MR(G) ? ? ,
U1(M,(?,?) ,t) U2(M,(?,?) ,t) ?
NV1(n(G))NV2(n(G)) - ? for an optimal (surplus
maximizing) Nash equilibrium n(G).
46
The Existence of ELE

Theorem Let M be a class of repeated games.
Then, there exists an ELE w.r.t. M given perfect
monitoring. The proof of the above is
constructive and use ideas of our Rmax algorithm
(the first near-optimal polynomial time algorithm
for reinforcement learning in stochastic
games)the folk-theorem in economics.
47
The ELE algorithm
  • For ease of presentation assume that the payoff
    functions are non-negative and are bounded by
    Rmax.
  • Player 1 performs action ai one time after the
    other for k times, for all i1,2,...,k.
  • In parallel, player 2 performs the sequence of
    actions (a1,,ak) k times.
  • If both players behaved according to the above
    then an optimal Nash equilibrium of the
    corresponding (revealed) game is computed, and
    the players behave according to the corresponding
    strategies from that point on. If several such
    Nash equilibria exist, one is selected based on a
    pre-determined arrangement.
  • If one of the players deviated from the above,
    we shall call this player the adversary and the
    other player the agent, and do the following
  • Let G be the Rmax-sum game in which the
    adversary's payoff is identical to his payoff in
    the original game, and where the agent's payoff
    is Rmax minus the adversary payoffs. Let M
    denote the corresponding repeated game. Thus, G
    is a constant-sum game where the agent's goal is
    to minimize the adversary's payoff. Notice that
    some of these payoffs will be unknown (because
    the adversary did not cooperate in the
    exploration phase). The agent now plays according
    to the following

48
The ELE algorithm (cont.)
  • Initialize Construct the following model M' of
    the repeated game M, where the game G is replaced
    by a game G' where all the entries in the game
    matrix are assigned the rewards (Rmax,0) (we
    assume w.l.o.g positive payoffs, and also assume
    the maximal possible reward Rmax is known).
  • We associate a boolean valued variable with each
    joint-action assumed,known. This variable is
    initialized to the value assumed.
  • Repeat
  • Compute and Act Compute the optimal
    probabilistic maximin of G' and
  • execute it.
  • Observe and update Following each joint action
    do as follows
  • Let a be the action the agent performed and
    let a be the adversary's action.
  • If (a,a') is performed for the first
    time, update the reward associated with
  • (a,a') in G', as observed, and mark it
    known.

49
Imperfect Monitoring

Theorem There exist classes of games for which
an ELE does not exist given imperfect
monitoring. The proof is based on showing that
you can not get the values obtained in the Nash
equilibria of the following games, when you dont
know initially what game you play, and can not
observe the other agents payoff
50
The Existence of ELE for Imperfect Monitoring
Settings

Theorem Let M be a class of repeated symmetric
games. Then, there exists an ELE w.r.t. M given
imperfect monitoring.
51
The Existence of ELE for Imperfect Monitoring
Settings Proof Idea

Agents are instructed to explore the game matrix.
If it has been done without deviations,
action profiles (s,t) and (t,s) with optimal
surplus are selected to be played indefinitely
when (s,t) is played on odd iterations and (t,s)
is played on even iterations. If there has
been a deviation then we remain with the problem
of effective and efficient punishment. Notice
that here an agent does not learn another agents
payoff in an entry once it is played!
52
The Existence of ELE for Imperfect Monitoring
Settings Proof Idea (cont.)

Assume the row agent is about to punish the
column agent. We say that a column associated
with action s is known if the row agent knows its
payoff for any pair (t,s). Notice that at each
point the squared sub-matrix which corresponds to
actions associated with known columns has the
property that the row agent knows all payoffs of
both agents in it. With some small probability
the row agent plays a random action, and
otherwise plays the probabilistic maximin
associated with the above (known) squared
sub-matrix where its payoffs are the complement
to 0 of the column agent payoffs. Many missing
details and computations.
53
Extensions
  • The results are extended to n-person games and
    stochastic games, providing a general solution to
    the normative problem of multi-agent learning.

54
ELE and Efficiency

Our results for symmetric games imply that we can
get the optimal social surplus as a result of the
learning process, where the learning algorithms
are in equilibrium! This is impossible in
general games without having side payments as
part of the policies, which leads to another
version of the ELE concept.
55
Pareto ELE

Given a 2-person game G, a pair (a,b) of
strategies is (economically) efficient, if
U1(a,b)U2(a,b)maxs?S1,t?S2 (U1(s,t)U2(s,t)) Obt
aining economically efficient outcomes is in
general impossible without side payments (the
probabilistic maximin value for i may be higher
than what he gets in the economically efficient
tuple). Side payments an agent may be asked
to pay the other as part of its policy. If its
payoff at particular point is pi, and the agent
pays ci then the actual payoff/utility is pi-ci.
Pareto ELE is defined similarly to (Nash) ELE
with the following distinctions 1. The
agents should obtain an average total reward
close to the sum of their rewards in an
efficient outcome. 2. Side payments are
allowed as part of the agents policy.
56
Pareto ELE

Theorem Let M be a class of repeated games.
Then, there exists a Pareto ELE w.r.t. M given
perfect monitoring. Theorem There exist classes
of games for which a Pareto ELE does not exist
given imperfect monitoring.
57
Common Interest Games

A game is called a common-interest game if for
every joint-action all agents receive the same
reward. Theorem Let Mc be the class of
common-interest repeated games in which the
number of actions each agent has is a. There
exists an ELE for Mc under strict imperfect
monitoring. The above result is obtained for the
general case where there are no a-priori
conventions on agents ordering or strategies
ordering.
58
Efficient Learning Equilibrium and Regret
Minimization
  • The literature on regret minimization attempts to
    find a best response for arbitrary action
    sequences of an opponent.
  • Notice that in general an agent can not devise
    best response against an adversary whose action
    selection depends on the agents previous
    actions.
  • In such situations it is hard to avoid
    equilibrium concepts.
  • Efficient Learning Equilibrium requires that
    deviations will be irrational considering any
    game from a given set of games, and therefore has
    the flavor of ex-post equilibrium.

59
Stochastic Game
0.3
a2adver.
a1adver.
0.5
a1agent
1
a2agent
0.5
0.7
0.4
0.6
60
SGs Are An Expressive Model
  • SGs are more general than Markov decision
    processes and repeated games
  • Markov decision process the adversary has a
    single action
  • Repeated games a unique stage game

61
Extending ELE to stochastic games

Let M be a stochastic game and let ? gt 0, 0 lt ?
lt1. Let vi(M,?) be the ?-return mixing time of a
probabilistic maximin (safety level) strategy for
agent i. Consider a stochastic game, Mi, which
is identical to M except that the payoffs of
player i are taken as the complement to Rmax of
the other player's payoff. Let vi'(Mi, ?) be the
?-return mixing time of an optimal policy
(safety-level strategy) of i in that game.
Consider also the game M, where M is a Markov
decision process, which is isomorphic to M, but
where the (single) player's reward for the action
(a,b) in state s is the sum of the players'
rewards in M. Let Opt(M') be the value of an
optimal policy in M'. Let vc(M, ?) be the
?-return mixing time of that optimal policy (in
M). Let v(M, ?)max(v1(M, ?),v1(M, ?) , v2(M,
?),v2(M, ?),vc(M, ?))
62
Extending ELE to stochastic games

A policy profile (?,?) is a Pareto efficient
learning equilibrium w.r.t. the class M of
stochastic games if for every ? gt 0, 0 lt ? lt1,
and M ? ?, there exists some Tgt0, where T
is polynomial in 1/?, 1/ ?, the size of M, and
v(M, ?), such that with probability of at least
1- ? (1) for every t ? T, U1(M,?,?,t)
U2(M,?,?,t) ? (1- ?)(Opt(M')) - ? for i1,2 (2)
if player 1 (resp. 2) deviates from ? to ?(resp.
from ? to ?) in iteration l, then
U1(M,?,?,lt)? U1(M,?,?,lt) ? (resp.
U2(M,?,?,lt)? U2(M,?,?,lt) ?) ) Theorem
Given a perfect monitoring setting for stochastic
games, there always exists a Pareto ELE.
63
The R-max Algorithm
  • R-max is the first near-optimal efficient
    reinforcement learning algorithm for stochastic
    games. In particular, it is applicable to
    (efficiently) obtaining the safety-level value in
    stochastic games where the stage games and
    transition probabilities are initially unknown
  • Therefore, when adopted by all agents, R-max
    determines an ELE in zero-sum stochastic games.
  • Efficiency is measured as a function of the
    mixing time of the optimal policy in the known
    model.

64
The R-max Algorithm
  • A model-based learning algorithm utilizing an
    optimistic, fictitious model
  • Model initialization
  • States original states 1 fictitious state
  • All game-matrix entries are marked unknown
  • All joint actions lead to the fictitious state
    with probability 1
  • The agents payoff is Rmax everywhere (the
    adversarys payoff plays no role, 0 is fine)

65
Initial Model
Fictitious Stage Game
- Unknown
1
Real Stage Games
66
The Algorithm (cont.)
  • Repeat
  • Compute optimal policy
  • Execute current policy
  • Update model

67
Model Update
  • Occurs after we play a joint action corresponding
    to an unknown entry
  • Record payoff in matrix (once only)
  • Record the observed transition
  • Once enough transitions from this entry are
    recorded
  • Update the transition model based on the observed
    frequencies
  • Mark the entry as known
  • Recompute the policy

68
The Algorithm (cont.)
  • Repeat
  • Compute optimal T-step policy
  • Execute current policy
  • Update model an entry is known when it has
    been visited
  • times.

69
Main Theorem
  • Let M be an SG with N states and k actions. Let
    egt0 and 0ltdlt1 be constants denoting desired error
    bounds. Denote the policies for M whose e-return
    mixing time is T by pM(e,T), and the optimal
    expected return achievable by such policies by
    OptM(p(e,T)) (i.e., the best value of a policy
    that e-mixes in time T).

70
Main Theorem (cont.)
  • Then, with probability no less than 1-d, the
    R-max algorithm will attain an actual average
    return of no less than OptM(p(e,T))-e within a
    number of steps polynomial in
  • N,T,k,1/d,1/e.

71
Main Technical Contribution Implicit Explore or
Exploit (IEE)
  • R-max either explores efficiently or exploits
    efficiently
  • The adversary can influence whether we exploit
    efficiently or explore
  • But, it cannot prevent us from doing one of the
    two

72
Conclusion (ELE)
  • ELE captures the requirement that the learning
    algorithms themselves should be in equilibrium.
  • Somewhat surprisingly, (optimal) ELE exists for
    large classes of games. The proofs are
    constructive.
  • ELE can be viewed as ex-post equilibrium in
    repeated pre-Bayesian games with (initial) strict
    uncertainty about payoffs.
  • The results can be extended to stochastic games
    (more complicated, and need to refer to mixing
    time of policies in the definition of
    efficiency).

73
Conclusion
  • Pre-Bayesian Games are a natural setting for the
    study of multi-agent interactions with incomplete
    information, where there is no exact
    probabilistic information about the environment.
  • Natural solution concepts such as ex-post
    equilibrium can be extended to non-Bayesian
    equilibrium (such as safety-level equilibrium)
    which always exist.
  • The study of non-Bayesian equilibrium leads to
    illuminating results in areas connecting CS and
    GT.

74
Conclusion (cont.)
  • There are tight connection between Pre-Bayesian
    repeated games and multi-agent learning.
  • Equilibrium of learning algorithms can be shown
    to exist in rich settings. ELE is a notion of
    ex-post equilibrium in Pre-Bayesian repeated
    games.
  • The study of Pre-Bayesian games is a rich,
    attractive, and illuminating research direction!

75
Our research agenda GT with CS flavor
  • Program equilibrium
  • Ranking systems
  • Non-cooperative computing
  • Pre-Bayesian games
  • Distributed Games
  • Recommender systems for GT

76
GT with CS flavor re-visiting equilibrium
analysis
  • Program equilibrium
  • CS brings the idea that strategies can be
    of low capability (resource bounds), but also of
    high capability programs can serve both as data
    and as a set of instructions. This enables to
    obtain phenomena observed in repeated games in
    the context of one-shot games.

77
GT with CS flavor re-visiting social choice
  • Ranking systems
  • The Internet suggests the need to
    extend the theory of social choice to the context
    where the set of players and the set of
    alternatives coincide and transitive effects are
    taken into account. This allows to treat the
    foundations of page ranking systems and of
    reputation systems (e.g. an axiomatization of
    Googles PageRank).

78
GT with CS flavor re-visiting mechanism design
  • Non-cooperative computing
  • Informational mechanism design where
    goals are informational states, and agents
    payoffs are determined by informational states is
    essential in order to deal with distributed
    computing with selfish participants. This allows
    to answer the question of which functions can be
    jointly computed by self-motivated participants.

79
GT with CS flavor action prediction in one-shot
games
  • Recommender systems for GT
  • Find correlations between agents behaviors
    in different games, in order to try and predict
    an agents behavior in a game (he has not played
    yet) based on his behavior in other games. This
    is a useful technique when e.g. selling books in
    Amazon, and here it is suggested for action
    prediction in games, with surprisingly great
    initial success (an experimental study).

80
GT with CS flavor incorporating distributed
systems features into game theoretic models
  • Distributed Games
  • The effects of asynchronous
    interactions
  • The effects of message syntax and the
  • communication structure on
    implementation
  • The effects of failures.

81
GT with CS flavor revisiting uncertainty and in
games and learning
  • Pre-Bayesian games
  • This talk.
Write a Comment
User Comments (0)
About PowerShow.com