Pondering Probabilistic Play Policies for Pig - PowerPoint PPT Presentation

About This Presentation
Title:

Pondering Probabilistic Play Policies for Pig

Description:

Piglet. Simpler version of Pig with a coin. Object: First to score 10 points ... Just like Piglet, but more possible outcomes. Pi,j,k = max(1 - Pj,i k,0, ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 40
Provided by: toddwn
Learn more at: http://cs.gettysburg.edu
Category:

less

Transcript and Presenter's Notes

Title: Pondering Probabilistic Play Policies for Pig


1
Pondering Probabilistic Play Policies for Pig
  • Todd W. Neller
  • Gettysburg College

2
Sow Whats This All About?
  • The Dice Game Pig
  • Odds and Ends
  • Playing to Win
  • Piglet
  • Value Iteration
  • Machine Learning

3
Pig The Game
  • Object First to score 100 points
  • On your turn, roll until
  • You roll 1, and score NOTHING.
  • You hold, and KEEP the sum.
  • Simple game ? simple strategy?
  • Lets play

4
Playing to Score
  • Simple odds argument
  • Roll until you risk more than you stand to gain.
  • Hold at 20
  • 1/6 of time -20 ? -20/6
  • 5/6 of time 4 (avg. of 2,3,4,5,6) ? 20/6

5
Hold at 20?
  • Is there a situation in which you wouldnt want
    to hold at 20?
  • Your score 99 you roll 2
  • Case scenario
  • you 79 opponent99
  • Your turn total stands at 20

6
Whats Wrong With Playing to Score?
  • Its mathematically optimal!
  • But what are we optimizing?
  • Playing to score ? Playing to win
  • Optimizing score per turn ? Optimizing
    probability of a win

7
Piglet
  • Simpler version of Pig with a coin
  • Object First to score 10 points
  • On your turn, flip until
  • You flip tails, and score NOTHING.
  • You hold, and KEEP the of heads.
  • Even simpler play to 2 points

8
Essential Information
  • What is the information I need to make a fully
    informed decision?
  • My score
  • The opponents score
  • My turn score

9
A Little Notation
  • Pi,j,k probability of a win ifi my scorej
    the opponents scorek my turn score
  • Hold Pi,j,k 1 - Pj,ik,0
  • Flip Pi,j,k ½(1 - Pj,i,0) ½ Pi,j,k1

10
Assume Rationality
  • To make a smart player, assume a smart opponent.
  • (To make a smarter player, know your opponent.)
  • Pi,j,k max(1 - Pj,ik,0, ½(1 - Pj,i,0
    Pi,j,k1))
  • Probability of win based on best decisions in any
    state

11
The Whole Story
  • P0,0,0 max(1 P0,0,0, ½(1 P0,0,0 P0,0,1))
  • P0,0,1 max(1 P0,1,0, ½(1 P0,0,0 P0,0,2))
  • P0,1,0 max(1 P1,0,0, ½(1 P1,0,0 P0,1,1))
  • P0,1,1 max(1 P1,1,0, ½(1 P1,0,0 P0,1,2))
  • P1,0,0 max(1 P0,1,0, ½(1 P0,1,0 P1,0,1))
  • P1,1,0 max(1 P1,1,0, ½(1 P1,1,0 P1,1,1))

12
The Whole Story
  • P0,0,0 max(1 P0,0,0, ½(1 P0,0,0 P0,0,1))
  • P0,0,1 max(1 P0,1,0, ½(1 P0,0,0 P0,0,2))
  • P0,1,0 max(1 P1,0,0, ½(1 P1,0,0 P0,1,1))
  • P0,1,1 max(1 P1,1,0, ½(1 P1,0,0 P0,1,2))
  • P1,0,0 max(1 P0,1,0, ½(1 P0,1,0 P1,0,1))
  • P1,1,0 max(1 P1,1,0, ½(1 P1,1,0 P1,1,1))

These are winning states!
13
The Whole Story
  • P0,0,0 max(1 P0,0,0, ½(1 P0,0,0 P0,0,1))
  • P0,0,1 max(1 P0,1,0, ½(1 P0,0,0 1))
  • P0,1,0 max(1 P1,0,0, ½(1 P1,0,0 P0,1,1))
  • P0,1,1 max(1 P1,1,0, ½(1 P1,0,0 1))
  • P1,0,0 max(1 P0,1,0, ½(1 P0,1,0 1))
  • P1,1,0 max(1 P1,1,0, ½(1 P1,1,0 1))
  • Simplified

14
The Whole Story
  • P0,0,0 max(1 P0,0,0, ½(1 P0,0,0 P0,0,1))
  • P0,0,1 max(1 P0,1,0, ½(2 P0,0,0))
  • P0,1,0 max(1 P1,0,0, ½(1 P1,0,0 P0,1,1))
  • P0,1,1 max(1 P1,1,0, ½(2 P1,0,0))
  • P1,0,0 max(1 P0,1,0, ½(2 P0,1,0))
  • P1,1,0 max(1 P1,1,0, ½(2 P1,1,0))
  • And simplified more into a hamsome set of
    equations

15
How to Solve It?
  • P0,0,0 max(1 P0,0,0, ½(1 P0,0,0 P0,0,1))
  • P0,0,1 max(1 P0,1,0, ½(2 P0,0,0))
  • P0,1,0 max(1 P1,0,0, ½(1 P1,0,0 P0,1,1))
  • P0,1,1 max(1 P1,1,0, ½(2 P1,0,0))
  • P1,0,0 max(1 P0,1,0, ½(2 P0,1,0))
  • P1,1,0 max(1 P1,1,0, ½(2 P1,1,0))
  • P0,1,0 depends on P0,1,1 depends on P1,0,0
    depends on P0,1,0 depends on

16
A System of Pigquations
Dependencies between non-winning states
17
How Bad Is It?
  • The intersection of a set of bent hyperplanes in
    a hypercube
  • In the general case, no known method (read PhD
    research)
  • Is there a method that works (without being
    guaranteed to work in general)?
  • Yes! Value Iteration!

18
Value Iteration
  • Start out with some values (0s, 1s, random s)
  • Do the following until the values converge (stop
    changing)
  • Plug the values into the RHSs
  • Recompute the LHS values
  • Thats easy. Lets do it!

19
Value Iteration
  • P0,0,0 max(1 P0,0,0, ½(1 P0,0,0 P0,0,1))
  • P0,0,1 max(1 P0,1,0, ½(2 P0,0,0))
  • P0,1,0 max(1 P1,0,0, ½(1 P1,0,0 P0,1,1))
  • P0,1,1 max(1 P1,1,0, ½(2 P1,0,0))
  • P1,0,0 max(1 P0,1,0, ½(2 P0,1,0))
  • P1,1,0 max(1 P1,1,0, ½(2 P1,1,0))
  • Assume Pi,j,k is 0 unless its a win
  • Repeat Compute RHSs, assign to LHSs

20
But Thats GRUNT Work!
  • So have a computer do it, slacker!
  • Not difficult end of CS1 level
  • Fast! Dont blink youll miss it
  • Optimal play
  • Compute the probabilities
  • Determine flip/hold from RHS maxs
  • (For our equations, always FLIP)

21
Piglet Solved
  • Game to 10
  • Play to Score Hold at 1
  • Play to Win

Opponent
You
22
Pig Probabilities
  • Just like Piglet, but more possible outcomes
  • Pi,j,k max(1 - Pj,ik,0, 1/6(1 -
    Pj,i,0 Pi,j,k2 Pi,j,k3
    Pi,j,k4 Pi,j,k5 Pi,j,k6))

23
Solving Pig
  • 505,000 such equations
  • Same simple solution method (value iteration)
  • Speedup Solve groups of interdependent
    probabilities
  • Watch and see!

24
Pig Sow-lution
25
Pig Sow-lution
26
Reachable States
Player 2 Score (j) 30
27
Reachable States
28
Sow-lution forReachable States
29
Probability Contours
30
Summary
  • Playing to score is not playing to win.
  • A simple game is not always simple to play.
  • The computer is an exciting power tool for the
    mind!

31
When Value IterationIsnt Enough
  • Value Iteration assumes a model of the problem
  • Probabilities of state transitions
  • Expected rewards for transitions
  • Loaded die?
  • Optimal play vs. suboptimal player?
  • Game rules unknown?

32
No Model? Then Learn!
  • Cant write equations ? cant solve
  • Must learn from experience!
  • Reinforcement Learning
  • Learn optimal sequences of actions
  • From experience
  • Given positive/negative feedback

33
Clever Mabel the Cat
34
Clever Mabel the Cat
  • Mabel claws new La-Z-Boy ? BAD!
  • Cats hate water ? spray bottle negative
    reinforcement
  • Mabel claws La-Z-Boy ? Todd gets up ? Todd sprays
    Mabel ? Mabel gets negative feedback
  • Mabel learns

35
Clever Mabel the Cat
  • Mabel learns to run when Todd gets up.
  • Mabel first learns local causality
  • Todd gets up ? Todd sprays Mabel
  • Mabel eventually sees no correlation, learns
    indirect cause
  • Mabel happily claws carpet. The End.

36
Backgammon
  • Tesauros Neurogammon
  • Reinforcement Learning Neural Network (memory
    for learning)
  • Learned backgammon through self-play
  • Got better than all but a handful of people in
    the world!
  • Downside Took 1.5 million games to learn

37
Greased Pig
  • My continuous variant of Pig
  • Object First to score 100 points
  • On your turn, generate a random number from 0.5
    to 6.5 until
  • Your rounded number is 1, and you score NOTHING.
  • You hold, and KEEP the sum.
  • How does this change things?

38
Greased Pig Challenges
  • Infinite possible game states
  • Infinite possible games
  • Limited experience
  • Limited memory
  • Learning and approximation challenge

39
Summary
  • Solving equations can only take you so far (but
    much farther than we can fathom).
  • Machine learning is an exciting area of research
    that can take us farther.
  • The power of computing will increasingly aid in
    our learning in the future.
Write a Comment
User Comments (0)
About PowerShow.com