Evolving Strategies for the Prisoner - PowerPoint PPT Presentation

About This Presentation
Title:

Evolving Strategies for the Prisoner

Description:

... for a given history h players will always make the same move ... This behavior evolves regardless of the initial makeup of the population. Experiment I ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 31
Provided by: JF52
Learn more at: http://www.cs.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: Evolving Strategies for the Prisoner


1
Evolving Strategies for the Prisoners Dilemma
  • Jennifer Golbeck
  • University of Maryland, College Park
  • Department of Computer Science
  • July 23, 2002

2
Overview
  • Previous Research
  • Prisoners Dilemma
  • The Genetic Algorithm
  • Results
  • Conclusions

3
Previous Research
4
Axelrod
  • Robert Axelrods experiments of the 1980s served
    as the starting point for this research
  • Implementation closely adheres to the
    configuration of his experiments
  • Same model for the Prisoners Dilemma
  • Minor variation in the implementation of the
    Genetic Algorithm

5
Prisoners Dilemma
6
The Prisoners Dilemma Model
  • The basic two-player prisoners Dilemma
  • Both players are arrested for the same crime
  • Each has a choice
  • Confess - Cooperate with the authorities (admit
    to doing the crime)
  • Deny - Defect against the other player (claim the
    other person is responsible)
  • No knowledge of opponents action

7
Payoff Matrix
  • Optimization
  • If both players cooperate, they each receive 3
    points
  • If both players Defect, each receives 1 point
  • If there is a mixed outcome, the Defector gets 5
    points and the cooperator gets 0 points

8
Iterated Game
  • In simulation, the endpoint of the game is
    unknown to the players, making it essentially an
    infinitely iterated game
  • Each player has a memory of the previous three
    rounds on which to base his strategy
  • Strategies are deterministic - for a given
    history h players will always make the same move
  • With 4 possible configurations in each round and
    a history of 3, each strategy is comprised of 43
    64 moves

9
Previous Results
  • Axelrod tournaments
  • Using the three-round history model, teams
    submitted strategies to be competed in a
    round-robin tournament
  • Tit for Tat
  • Pavlov strategy, developed after these
    tournaments, was shown to be an effective
    strategy as well.

10
The Genetic Algorithm
11
The Model
  • Darwinian Survival of the Fittest
  • Genetic representation of entities
  • Fitness function
  • Select most fit individuals to reproduce
  • Mutate
  • Traits of most fit will be passed on
  • Over time, the population will evolve to be more
    fit, optimal

12
GAs and the Prisoners Dilemma
  • Population 20 individuals
  • Chromosome 64-bit string where each bit
    represents the Cooperate or Defect move played
    for a specific strategy

13
(No Transcript)
14
GAs and PD II
  • Fitness Each player competes against every other
    for 64 consecutive rounds, and a cumulative score
    is maintained
  • SelectionRoulette Wheel selection
  • Reproduction Random point crossover with
    replacement
  • Mutation rate 0.001
  • Generations 200,000 generations

15
Simulation and Results
16
Hypothesis
  • Past research has looked at which strategy was
    best. This research looks as what makes a
    good strategy.
  • Tit for Tat and Pavlov both perform very well,
    and share two traits
  • Defend against Defectors
  • Cooperate with other cooperators

17
Hypothesis
  • All populations evolve over time to possess and
    exhibit these two traits
  • This behavior evolves regardless of the initial
    makeup of the population

18
Experiment I
  • Five Initial Populations
  • All Always Cooperate (Confess) (AllC)
  • All Always Defect (Deny) (AllD)
  • All Tit for Tat
  • All Pavolv
  • All Randomly generated (independently)

19
Experiment II
  • Controls Tit for Tat and Pavolv
  • Statistically equal performance
  • Support the hypothesis by showing
  • Traits are not present in other initial
    populations
  • Over time, populations evolve to exhibit those
    traits and perform as well as Tit For Tat and
    Pavlov

20
Experiment II
  • To show that the hypothesized traits evolve,
    populations must demonstrate
  • In the presence of Defectors, evolved populations
    perform identically to the controls
  • In the presence of cooperators, evolved
    populations perform identically to controls

21
Part 1Defend Against Defectors I
  • Mix each initial population with a small set of
    AllD
  • Tit for Tat and Pavolv (controls) perform at
    about 80 of maximum
  • All others perform significantly worse that Tit
    For Tat and Pavolv
  • AllC and Random populations perform significantly
    worse than their normal behavior
  • This shows that a priori, the AllC and random
    populations cannot defend against Defectors

22
Part 1 Defend against Defectors II
  • Evolve each population and then mix with small
    set of AllD
  • All populations now perform equally as well as
    each other, and as well as the TFT and Pavlov
    controls
  • Fitness at about 80 maximum

23
Part 2 Cooperate with Cooperators
  • As before, each startup population is mixed with
    a small set of AllC
  • TFT, Pavlov, do very well
  • AllC does exceptionally well
  • Others do significantly worse
  • Evolve and then add AllC
  • All populations perform equally as well as each
    other
  • Identical performance to TFT and Pavlov

24
Performance of Different Experiments
25
Conclusions
26
Conclusions I
  • Performance measures show that AllC, AllD, and
    random populations do not generally possess
    defensive or cooperative traits a priori
  • After evolution, all populations have changed to
    incorporate both traits
  • Evolved strategies perform as well as TFT and
    Pavlov, traditional best strategies

27
Conclusions II
  • In both experiments there is no statistical
    difference between the performance of evolved
    populations before and after the introduction of
    AllC or AllD players
  • Indicates that not only do the populations
    exhibit hypothesized traits in experimental
    conditions, but it is their normal behavior to
    do so.

28
Future Work
29
Non-deterministic Players
  • This work shows results for players with
    deterministic strategies
  • Much previous research has been done on
    stochastic strategies
  • Preliminary results show that the results
    presented here apply to stochastic strategies as
    well, but a formal study is necessary.

30
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com