Pondering Probabilistic Play Policies for Pig presentation

About This Presentation

Transcript and Presenter's Notes

Title: Pondering Probabilistic Play Policies for Pig

1
Pondering Probabilistic Play Policies for Pig

Todd W. Neller
Gettysburg College

2
Sow Whats This All About?

The Dice Game Pig
Odds and Ends
Playing to Win
Piglet
Value Iteration
Machine Learning

3
Pig The Game

Object First to score 100 points
On your turn, roll until
You roll 1, and score NOTHING.
You hold, and KEEP the sum.
Simple game ? simple strategy?
Lets play

4
Playing to Score

Simple odds argument
Roll until you risk more than you stand to gain.
Hold at 20
1/6 of time -20 ? -20/6
5/6 of time 4 (avg. of 2,3,4,5,6) ? 20/6

5
Hold at 20?

Is there a situation in which you wouldnt want
to hold at 20?
Your score 99 you roll 2
Case scenario
you 79 opponent99
Your turn total stands at 20

6
Whats Wrong With Playing to Score?

Its mathematically optimal!
But what are we optimizing?
Playing to score ? Playing to win
Optimizing score per turn ? Optimizing
probability of a win

7
Piglet

Simpler version of Pig with a coin
Object First to score 10 points
On your turn, flip until
You flip tails, and score NOTHING.
You hold, and KEEP the of heads.
Even simpler play to 2 points

8
Essential Information

What is the information I need to make a fully
informed decision?
My score
The opponents score
My turn score

9
A Little Notation

Pi,j,k probability of a win ifi my scorej
the opponents scorek my turn score
Hold Pi,j,k 1 - Pj,ik,0
Flip Pi,j,k ½(1 - Pj,i,0) ½ Pi,j,k1

10
Assume Rationality

To make a smart player, assume a smart opponent.
(To make a smarter player, know your opponent.)
Pi,j,k max(1 - Pj,ik,0, ½(1 - Pj,i,0
Pi,j,k1))
Probability of win based on best decisions in any
state

11
The Whole Story

P0,0,0 max(1 P0,0,0, ½(1 P0,0,0 P0,0,1))
P0,0,1 max(1 P0,1,0, ½(1 P0,0,0 P0,0,2))
P0,1,0 max(1 P1,0,0, ½(1 P1,0,0 P0,1,1))
P0,1,1 max(1 P1,1,0, ½(1 P1,0,0 P0,1,2))
P1,0,0 max(1 P0,1,0, ½(1 P0,1,0 P1,0,1))
P1,1,0 max(1 P1,1,0, ½(1 P1,1,0 P1,1,1))

12
The Whole Story

P0,0,0 max(1 P0,0,0, ½(1 P0,0,0 P0,0,1))
P0,0,1 max(1 P0,1,0, ½(1 P0,0,0 P0,0,2))
P0,1,0 max(1 P1,0,0, ½(1 P1,0,0 P0,1,1))
P0,1,1 max(1 P1,1,0, ½(1 P1,0,0 P0,1,2))
P1,0,0 max(1 P0,1,0, ½(1 P0,1,0 P1,0,1))
P1,1,0 max(1 P1,1,0, ½(1 P1,1,0 P1,1,1))

These are winning states!
13
The Whole Story

P0,0,0 max(1 P0,0,0, ½(1 P0,0,0 P0,0,1))
P0,0,1 max(1 P0,1,0, ½(1 P0,0,0 1))
P0,1,0 max(1 P1,0,0, ½(1 P1,0,0 P0,1,1))
P0,1,1 max(1 P1,1,0, ½(1 P1,0,0 1))
P1,0,0 max(1 P0,1,0, ½(1 P0,1,0 1))
P1,1,0 max(1 P1,1,0, ½(1 P1,1,0 1))
Simplified

14
The Whole Story

P0,0,0 max(1 P0,0,0, ½(1 P0,0,0 P0,0,1))
P0,0,1 max(1 P0,1,0, ½(2 P0,0,0))
P0,1,0 max(1 P1,0,0, ½(1 P1,0,0 P0,1,1))
P0,1,1 max(1 P1,1,0, ½(2 P1,0,0))
P1,0,0 max(1 P0,1,0, ½(2 P0,1,0))
P1,1,0 max(1 P1,1,0, ½(2 P1,1,0))
And simplified more into a hamsome set of
equations

15
How to Solve It?

P0,0,0 max(1 P0,0,0, ½(1 P0,0,0 P0,0,1))
P0,0,1 max(1 P0,1,0, ½(2 P0,0,0))
P0,1,0 max(1 P1,0,0, ½(1 P1,0,0 P0,1,1))
P0,1,1 max(1 P1,1,0, ½(2 P1,0,0))
P1,0,0 max(1 P0,1,0, ½(2 P0,1,0))
P1,1,0 max(1 P1,1,0, ½(2 P1,1,0))
P0,1,0 depends on P0,1,1 depends on P1,0,0
depends on P0,1,0 depends on

16
A System of Pigquations
Dependencies between non-winning states
17
How Bad Is It?

The intersection of a set of bent hyperplanes in
a hypercube
In the general case, no known method (read PhD
research)
Is there a method that works (without being
guaranteed to work in general)?
Yes! Value Iteration!

18
Value Iteration

Start out with some values (0s, 1s, random s)
Do the following until the values converge (stop
changing)
Plug the values into the RHSs
Recompute the LHS values
Thats easy. Lets do it!

19
Value Iteration

P0,0,0 max(1 P0,0,0, ½(1 P0,0,0 P0,0,1))
P0,0,1 max(1 P0,1,0, ½(2 P0,0,0))
P0,1,0 max(1 P1,0,0, ½(1 P1,0,0 P0,1,1))
P0,1,1 max(1 P1,1,0, ½(2 P1,0,0))
P1,0,0 max(1 P0,1,0, ½(2 P0,1,0))
P1,1,0 max(1 P1,1,0, ½(2 P1,1,0))
Assume Pi,j,k is 0 unless its a win
Repeat Compute RHSs, assign to LHSs

20
But Thats GRUNT Work!

So have a computer do it, slacker!
Not difficult end of CS1 level
Fast! Dont blink youll miss it
Optimal play
Compute the probabilities
Determine flip/hold from RHS maxs
(For our equations, always FLIP)

21
Piglet Solved

Game to 10
Play to Score Hold at 1
Play to Win

Opponent
You
22
Pig Probabilities

Just like Piglet, but more possible outcomes
Pi,j,k max(1 - Pj,ik,0, 1/6(1 -
Pj,i,0 Pi,j,k2 Pi,j,k3
Pi,j,k4 Pi,j,k5 Pi,j,k6))

23
Solving Pig

505,000 such equations
Same simple solution method (value iteration)
Speedup Solve groups of interdependent
probabilities
Watch and see!

24
Pig Sow-lution
25
Pig Sow-lution
26
Reachable States
Player 2 Score (j) 30
27
Reachable States
28
Sow-lution forReachable States
29
Probability Contours
30
Summary

Playing to score is not playing to win.
A simple game is not always simple to play.
The computer is an exciting power tool for the
mind!

31
When Value IterationIsnt Enough

Value Iteration assumes a model of the problem
Probabilities of state transitions
Expected rewards for transitions
Loaded die?
Optimal play vs. suboptimal player?
Game rules unknown?

32
No Model? Then Learn!

Cant write equations ? cant solve
Must learn from experience!
Reinforcement Learning
Learn optimal sequences of actions
From experience
Given positive/negative feedback

33
Clever Mabel the Cat
34
Clever Mabel the Cat

Mabel claws new La-Z-Boy ? BAD!
Cats hate water ? spray bottle negative
reinforcement
Mabel claws La-Z-Boy ? Todd gets up ? Todd sprays
Mabel ? Mabel gets negative feedback
Mabel learns

35
Clever Mabel the Cat

Mabel learns to run when Todd gets up.
Mabel first learns local causality
Todd gets up ? Todd sprays Mabel
Mabel eventually sees no correlation, learns
indirect cause
Mabel happily claws carpet. The End.

36
Backgammon

Tesauros Neurogammon
Reinforcement Learning Neural Network (memory
for learning)
Learned backgammon through self-play
Got better than all but a handful of people in
the world!
Downside Took 1.5 million games to learn

37
Greased Pig

My continuous variant of Pig
Object First to score 100 points
On your turn, generate a random number from 0.5
to 6.5 until
Your rounded number is 1, and you score NOTHING.
You hold, and KEEP the sum.
How does this change things?

38
Greased Pig Challenges

Infinite possible game states
Infinite possible games
Limited experience
Limited memory
Learning and approximation challenge

39
Summary

Solving equations can only take you so far (but
much farther than we can fathom).
Machine learning is an exciting area of research
that can take us farther.
The power of computing will increasingly aid in
our learning in the future.

Write a Comment

User Comments (0)

About PowerShow.com

Pondering Probabilistic Play Policies for Pig PowerPoint PPT Presentation