Title: Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game
1Cooperative Agent Systems Artificial Agents
Play the Ultimatum Game
- Steven O. Kimbrough
- Presented at
- FMEC 2001, Oslo
- Joint work with
- Fang Zhong and D.J. Wu
-
2Research Motivation
- How design and control cooperative agent systems
in strategic situation. - How well do different identify-centric agents
perform against each other. - How well do various adaptive mechanism perform.
- Value of intelligence What intelligence buys
you?
3Methodology
- Adaptive artificial agents play iterated
ultimatum game. - Ultimatum game is the most fundamental building
block for negotiation (e.g., Croson, 1996) - Reinforcement learning (a simple version)
- Regimes of play
- Two agents play against each other
- Populations of different type of agents
4One-shot Ultimatum Game
- Two players A and B.
- Player A has endowment of N.
- Player A offers x?0, N (N 100 in this study)
- Player B can either accept the offer or reject
the offer.
5One-shot Ultimatum Game (Cont.)
- Classical Game Theory
- Player A offer a tiny amount ?, and player B will
always accept this offer. - Infinite number of Nash Equilibria along the line
of x y N. - Behavior Game Theory
- Human beings in the lab do not behave as
classical game theory predicted (e.g, people
tends to be fair, and reject offers that do not
meet their threshold amounts of share).
6Repeated Ultimatum Game
- A supergame consists of iterations of the
ultimatum game. - Indefinite episodes
- Agents do not know how many iterations are yet to
come. - No single best strategy for the repeated
ultimatum game.
7Reinforcement Learning
- Favoring actions producing better results.
- Estimating the values of state-action pairs.
- Sample-average for estimation/evaluation.
- ?-greedy for selection.
8Reinforcement Learning (Cont.)
Initialize Q(s, a) 0 Repeat for each
episode Choose action a from current
state Receive immediate payoff r, and arrive at
the next state. Q(s, a) lt- QB(s, a)(k-1)/k
r/k Until n episodes have been played.
9Experiment 0 Repeated One-Shot Game
- Agents have no memory of past actions.
- Agents find the game-theoretic result.
- No cooperation among agents.
10Experiment 1 Learning Agent Against Fixed Rules
- Fixing player Bs strategy
-
IF (currentOffer gt p Endowment) Accept
currentOffer. ELSE Reject currentOffer. 0 lt p
lt 1
11Experiment 1 (Cont.)
- Player A will propose an offer no greater than
his last offer if player B accepted his last
offer. - Player A eventually learns the value of p, and
proposes only the amount of pN.
12Experiment 2 Learning Agent Against Dynamic
Rules
- The value of p is changing along the game playing
period. - Agent A can track the change very well given
enough time periods
13Experiment 3 Learning Agent Against Rotating
Rules
- The value of p is changing with a rotating
pattern, i.e. pt-1 .40, pt .50, pt1 .60. - Player A converges to a proposal of 60 which the
highest value of p 100. - Memory of at least one previous move might lead
player A track the rotated rules.
14Experiment 4 Learning Simultaneously
- Both agents have memory of one previous move.
- Player B chooses the value of p for each episode
according to
IF bt 1 is accept THEN pt dt-1 / N ELSE
pt ? 0, N / N
15Experiment 4 (Cont.)
- Decision-making process using finite automata
- Agent A
16Experiment 4 (Cont.)
17Experiment 4 - Result
- Cooperation emerges through co-evolution within
2000 episodes. Player A converges at proposing 55
or 56, and correspondingly, player B converges at
setting his lower limit at 55 or 56.
18Value of Intelligence
- Will smart agents be able to do better than dumb
ones through learning? - Experiment
- 5a A population of smart agents play against a
population of various dumb agents - 5b A population of smart agents play against
each other and against a population of various
dumb agents.
19Experiment 5a One Smart Agent vs. Multiple Dumb
Agents
- Three types of dumb agents using fixed rules
- db1 demand/accept 70 or higher
- db2 demand/accept 50 or higher
- db3 demand/accept 30 or higher.
- Smart agent learns via reinforcement learning.
- There is 25 percent possibility that a smart
agent can be chosen to play the game. - Tracking the changing population of dumb agents
for each generation.
20Experiment 5a Process
- Draw one smart agent with 25 percent possibility
otherwise draw one dumb agent randomly in
proportional to their frequency. - Draw another dumb agent randomly in proportional
to their frequency. - Decide the role of each agent (proposer or
responder). - Agents play the one-shot game against each other.
- Go to the first step until a certain number of
games, e.g. 1000 episodes, has been completed. - Update frequency of the dumb agent.
21Experiment 5a Results.
- Fair dumb (db2 demand/accept 50 or higher)
agents take over the dumb agent population. - Smart agents learn to be fair.
22Experiment 5a Result (Cont.)
23Experiment 5a Result (Cont.)
24Experiment 5b Multiple Smart Agents vs. Dumb
Agents
- Smart agents can play against each other.
25Experiment 5b (Cont.)
26Comparison of 5a 5b
27Impact of Memory
- Repeat experiment 5a and 5b, but introduce
different memory size for each experiment.
28Conclusions
- Artificial agents using reinforcement learning
are able to play the ultimatum game efficiently
and effectively. - Agent intelligence and memory have impacts on
performance. - Agent-based approach replicates and explains real
human behavior better.
29Future Research
- Toward cooperative agent systems in strategic
situations in virtual communities, especially in
electronic commerce such as in supply chains. - Currently investigating two versions of the trust
games The classical economic trust game vs.
The Mad Mex Game. - Comments?