JuariBot - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

JuariBot

Description:

Dynamic Environment (unlike solitaire) Although zero-sum and not general sum, poker approximates many issues of the real world. ... 2 Hole Cards per player ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 16
Provided by: csBer
Category:

less

Transcript and Presenter's Notes

Title: JuariBot


1
JuariBot Poker Playing Bot
  • Nimar S. Arora

2
Why Research Poker
  • Game of Imperfect Information
  • Game of Chance
  • Dynamic Environment (unlike solitaire)
  • Although zero-sum and not general sum, poker
    approximates many issues of the real world.

3
Texas Holdem
  • 2 Hole Cards per player
  • 4 rounds of betting Pre-flop, Flop (3 cards),
    Turn (1 card), River (1 card)
  • 4 raises per round
  • Bet size doubles in Turn and River.
  • Big-blind and small-blind (cost of playing)

4
Poker Approaches Game Theory
  • Zero-sum games have a Nash Equilibrium strategy
    which can be proven to be optimal for each
    player.
  • To solve the Nash equilibrium, one needs to
    represent each players strategy and form a
    payoff matrix for every combination of
    strategies.
  • Normal form of the payoff matrix is exponential
    in the game tree size
  • Kollers sequential form is linear in the game
    tree. But Texas Holdem has 1018 nodes in the
    game tree.
  • PsOpti approximates the game tree down to 107
    nodes. No way to tell how good the approximation
    is.

5
Poker Approaches Opponent Modeling
  • Opponent modeling based approaches deduce some
    sort of model of the opponent and predict
    expected payoffs
  • Prediction of payoff is based on either an
    expecti-max search or Bayesian reasoning.
  • Action selection is based on an arbitrary
    function of expected payoffs
  • Example Pra_i eExp(a_i)/n

6
Rejected Approach Reinforcement Learning
  • Learns a policy with a deterministic action in
    each state, but in poker we need a randomized
    strategy
  • RL in Markov Games (Littman) can learn a
    randomized policy. However, this requires both
    players to know the game state
  • In poker if both players know the game state then
    the best policy is deterministic!

7
My Take
  • The key issue is action selection
  • Build a decent opponent modeling system to
    predict odds of winning and expected payoffs
  • Compare different methods for action selection
  • Pra_i eExp(a_i)/n
  • Rule based (example odds gt .7 and opponent has
    not raised then raise)

8
Opponent Modeling
  • Strategy Class
  • Learn Pr action hand
  • Problem the hands is not revealed unless the
    game goes to showdown
  • Observation Class
  • Learn Pr action
  • Deduce hands strength by occurrence of infrequent
    action

9
Action Observation
  • Build a histogram of bets made in each round of
    the game for each opponent
  • For an opponent who ends the round with a raise -
    assume that he would have gone to the maximum bet

10
Deducing Hand Strength
  • From each opponent action (call or raise) deduce
    the upper and lower limit of the relative hand
    strength
  • From a raise we can only deduce a lower limit
  • Definition of Relative Hand Strength
  • Compute the odds of winning for each legal hand
    (randomly simulate unknown cards)
  • Sort all the legal hands by their odds of winning
  • Relative position of a hand in this order is its
    relative hand strength (real number in 0, 1)

11
Odds of Winning
  • Down-weight all possible opponent hands outside
    the limits deduced from the opponent actions
  • Compare my hand to each possible opponent hand
  • Compute weighted odds of winning

12
Action Selection
  • Simple rules
  • Never fold if odds of winning gt 0.5
  • Never call a raise if odds of winning lt 0.1
  • Always raise if odds of winning gt 0.7 (and no one
    else has raised yet)
  • Otherwise Pra_i eExp(a_i)/n

13
Results
  • On Univ. of Alberta Poker Server very
    consistently deduces the opponent hands
  • Over the course of thousands of hands seemed to
    neither lose nor win

14
Future Work
  • Need to improve hand strength accuracy.
  • Prone to simulation errors
  • Python is quite slow!
  • Need to combine hand strength and relative hand
    strength in one pass to reduce the number of
    simulations
  • Need to experiment with different action
    selection rules
  • Experiment with different poker bots
  • Experiment with heads up play (so far only ring
    play)

15
Conclusions
  • Simple opponent modeling with easily available
    information
  • Provides a platform for future research in action
    selection
Write a Comment
User Comments (0)
About PowerShow.com