Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game

Description:

Experiment 5a: One Smart Agent vs. Multiple Dumb Agents ... Draw another dumb agent randomly in proportional to their frequency. ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 30
Provided by: lebo59
Category:

less

Transcript and Presenter's Notes

Title: Cooperative Agent Systems: Artificial Agents Play the Ultimatum Game


1
Cooperative Agent Systems Artificial Agents
Play the Ultimatum Game
  • Steven O. Kimbrough
  • Presented at
  • FMEC 2001, Oslo
  • Joint work with
  • Fang Zhong and D.J. Wu

2
Research Motivation
  • How design and control cooperative agent systems
    in strategic situation.
  • How well do different identify-centric agents
    perform against each other.
  • How well do various adaptive mechanism perform.
  • Value of intelligence What intelligence buys
    you?

3
Methodology
  • Adaptive artificial agents play iterated
    ultimatum game.
  • Ultimatum game is the most fundamental building
    block for negotiation (e.g., Croson, 1996)
  • Reinforcement learning (a simple version)
  • Regimes of play
  • Two agents play against each other
  • Populations of different type of agents

4
One-shot Ultimatum Game
  • Two players A and B.
  • Player A has endowment of N.
  • Player A offers x?0, N (N 100 in this study)
  • Player B can either accept the offer or reject
    the offer.

5
One-shot Ultimatum Game (Cont.)
  • Classical Game Theory
  • Player A offer a tiny amount ?, and player B will
    always accept this offer.
  • Infinite number of Nash Equilibria along the line
    of x y N.
  • Behavior Game Theory
  • Human beings in the lab do not behave as
    classical game theory predicted (e.g, people
    tends to be fair, and reject offers that do not
    meet their threshold amounts of share).

6
Repeated Ultimatum Game
  • A supergame consists of iterations of the
    ultimatum game.
  • Indefinite episodes
  • Agents do not know how many iterations are yet to
    come.
  • No single best strategy for the repeated
    ultimatum game.

7
Reinforcement Learning
  • Favoring actions producing better results.
  • Estimating the values of state-action pairs.
  • Sample-average for estimation/evaluation.
  • ?-greedy for selection.

8
Reinforcement Learning (Cont.)
  • Algorithm

Initialize Q(s, a) 0 Repeat for each
episode Choose action a from current
state Receive immediate payoff r, and arrive at
the next state. Q(s, a) lt- QB(s, a)(k-1)/k
r/k  Until n episodes have been played.
9
Experiment 0 Repeated One-Shot Game
  • Agents have no memory of past actions.
  • Agents find the game-theoretic result.
  • No cooperation among agents.

10
Experiment 1 Learning Agent Against Fixed Rules
  • Fixing player Bs strategy

IF (currentOffer gt p Endowment) Accept
currentOffer. ELSE Reject currentOffer.  0 lt p
lt 1
11
Experiment 1 (Cont.)
  • Player A will propose an offer no greater than
    his last offer if player B accepted his last
    offer.
  • Player A eventually learns the value of p, and
    proposes only the amount of pN.

12
Experiment 2 Learning Agent Against Dynamic
Rules
  • The value of p is changing along the game playing
    period.
  • Agent A can track the change very well given
    enough time periods

13
Experiment 3 Learning Agent Against Rotating
Rules
  • The value of p is changing with a rotating
    pattern, i.e. pt-1 .40, pt .50, pt1 .60.
  • Player A converges to a proposal of 60 which the
    highest value of p 100.
  • Memory of at least one previous move might lead
    player A track the rotated rules.

14
Experiment 4 Learning Simultaneously
  • Both agents have memory of one previous move.
  • Player B chooses the value of p for each episode
    according to

IF bt 1 is accept THEN pt dt-1 / N ELSE
pt ? 0, N / N
15
Experiment 4 (Cont.)
  • Decision-making process using finite automata
  • Agent A

16
Experiment 4 (Cont.)
  • Agent B

17
Experiment 4 - Result
  • Cooperation emerges through co-evolution within
    2000 episodes. Player A converges at proposing 55
    or 56, and correspondingly, player B converges at
    setting his lower limit at 55 or 56.

18
Value of Intelligence
  • Will smart agents be able to do better than dumb
    ones through learning?
  • Experiment
  • 5a A population of smart agents play against a
    population of various dumb agents
  • 5b A population of smart agents play against
    each other and against a population of various
    dumb agents.

19
Experiment 5a One Smart Agent vs. Multiple Dumb
Agents
  • Three types of dumb agents using fixed rules
  • db1 demand/accept 70 or higher
  • db2 demand/accept 50 or higher
  • db3 demand/accept 30 or higher.
  • Smart agent learns via reinforcement learning.
  • There is 25 percent possibility that a smart
    agent can be chosen to play the game.
  • Tracking the changing population of dumb agents
    for each generation.

20
Experiment 5a Process
  • Draw one smart agent with 25 percent possibility
    otherwise draw one dumb agent randomly in
    proportional to their frequency.
  • Draw another dumb agent randomly in proportional
    to their frequency.
  • Decide the role of each agent (proposer or
    responder).
  • Agents play the one-shot game against each other.
  • Go to the first step until a certain number of
    games, e.g. 1000 episodes, has been completed.
  • Update frequency of the dumb agent.

21
Experiment 5a Results.
  • Fair dumb (db2 demand/accept 50 or higher)
    agents take over the dumb agent population.
  • Smart agents learn to be fair.

22
Experiment 5a Result (Cont.)
23
Experiment 5a Result (Cont.)
24
Experiment 5b Multiple Smart Agents vs. Dumb
Agents
  • Smart agents can play against each other.

25
Experiment 5b (Cont.)
26
Comparison of 5a 5b
27
Impact of Memory
  • Repeat experiment 5a and 5b, but introduce
    different memory size for each experiment.

28
Conclusions
  • Artificial agents using reinforcement learning
    are able to play the ultimatum game efficiently
    and effectively.
  • Agent intelligence and memory have impacts on
    performance.
  • Agent-based approach replicates and explains real
    human behavior better.

29
Future Research
  • Toward cooperative agent systems in strategic
    situations in virtual communities, especially in
    electronic commerce such as in supply chains.
  • Currently investigating two versions of the trust
    games The classical economic trust game vs.
    The Mad Mex Game.
  • Comments?
Write a Comment
User Comments (0)
About PowerShow.com