game playing presentation

About This Presentation

Transcript and Presenter's Notes

Title: game playing

1
game playing
Notes adapted from lecture notes for CMSC 421 by
B.J. Dorr
2
Outline

What are games?
Optimal decisions in games
Which strategy leads to success?
?-? pruning
Games of imperfect information
Games that include an element of chance

3
What are and why study games?

Games are a form of multi-agent environment
What do other agents do and how do they affect
our success?
Cooperative vs. competitive multi-agent
environments.
Competitive multi-agent environments give rise to
adversarial problems a.k.a. games
Why study games?
Fun historically entertaining
Interesting subject of study because they are
hard
Easy to represent and agents restricted to small
number of actions

4
Relation of Games to Search

Search no adversary
Solution is (heuristic) method for finding goal
Heuristics and CSP techniques can find optimal
solution
Evaluation function estimate of cost from start
to goal through given node
Examples path planning, scheduling activities
Games adversary
Solution is strategy (strategy specifies move for
every possible opponent reply).
Time limits force an approximate solution
Evaluation function evaluate goodness of game
position
Examples chess, checkers, Othello, backgammon

5
Types of Games
6
Game setup

Two players MAX and MIN
MAX moves first and they take turns until the
game is over. Winner gets award, looser gets
penalty.
Games as search
Initial state e.g. board configuration of chess
Successor function list of (move,state) pairs
specifying legal moves.
Terminal test Is the game finished?
Utility function Gives numerical value of
terminal states. E.g. win (1), lose (-1) and
draw (0) in tic-tac-toe (next)
MAX uses search tree to determine next move.

7
Partial Game Tree for Tic-Tac-Toe
8
Optimal strategies

Find the contingent strategy for MAX assuming an
infallible MIN opponent.
Assumption Both players play optimally !!
Given a game tree, the optimal strategy can be
determined by using the minimax value of each
node
MINIMAX-VALUE(n)
UTILITY(n) If n is a terminal
maxs ? successors(n) MINIMAX-VALUE(s) If n is
a max node
mins ? successors(n) MINIMAX-VALUE(s) If n is
a min node

9
Two-Ply Game Tree
10
Two-Ply Game Tree

MAX nodes
MIN nodes
Terminal nodes utility values for MAX
Other nodes minimax values
MAXs best move at root a1 - leads to the
successor with the highest minimax value
MINs best to reply b1 leads to the successor
with the lowest minimax value

11
What if MIN does not play optimally?

Definition of optimal play for MAX assumes MIN
plays optimally maximizes worst-case outcome for
MAX.
But if MIN does not play optimally, MAX will do
even better. Can be proved.

12
Minimax Algorithm
function MINIMAX-DECISION(state) returns an
action inputs state, current state in game
v?MAX-VALUE(state) return the action in
SUCCESSORS(state) with value v
function MAX-VALUE(state) returns a utility
value if TERMINAL-TEST(state) then return
UTILITY(state) v ? 8 for a,s in
SUCCESSORS(state) do v ? MAX(v,MIN-VALUE(s))
return v
function MIN-VALUE(state) returns a utility
value if TERMINAL-TEST(state) then return
UTILITY(state) v ? 8 for a,s in
SUCCESSORS(state) do v ? MIN(v,MAX-VALUE(s))
return v
13
Properties of Minimax
?
?
?
?
14
Tic-Tac-Toe

Depth bound 2
Breadth first search until all nodes at level 2
are generated
Apply evaluation function to positions at these
nodes

15
Tic-Tac-Toe

Evaluation function e(p) of a position p
If p is not a winning position for either player
e(p) (number of a complete rows, columns, or
diagonals that are still open for MAX) (number
of a complete rows, columns, or diagonals that
are still open for MIN)
If p is a win for MAX
e(p) ?
If p is a win for MIN
e(p) - ?

16
Tic-Tac-Toe - First stage
x
o
6-42
5-41
x
o
4-6-2
x
o
x
x
MAXs move
o
6-60
x
4-5-1
o
x
o
5-6-1
5-50
x
x
o
o
6-51
5-50
o
x
x
5-50
o
x
x
5-6-1
x
o
6-51
x
o
17
Tic-Tac-Toe - Second stage
o
MAXs move
3-21
x
o
x
o
x
x
x
o
o
o
x
o
4-22
4-31
x
o
x
o
x
x
x
o
o
o
x
o
3-21
o
x
o
x
4-31
4-22
x
x
x
5-23
o
x
o
o
o
x
o
x
3-30
o
x
4-22
x
x
o
x
x
3-21
x
o
o
x
o
o
5-32
x
o
o
x
3-21
x
x
x
o
4-22
o
x
o
x
o
3-30
5-23
3-30
o
x
x
x
o
x
x
o
o
x
o
4-31
o
x
o
x
4-22
4-31
x
x
o
o
x
o
o
x
4-22
4-31
o
x
x
o
x
o
18
Multiplayer games

Games allow more than two players
Single minimax values become vectors

19
Problem of minimax search

Number of games states is exponential to the
number of moves.
Solution Do not examine every node
Alpha-beta pruning
Alpha value of best choice found so far at any
choice point along the MAX path
Beta value of best choice found so far at any
choice point along the MIN path
Revisit example

20
Tic-Tac-Toe - First stage
Beta value -1
B
x
Alpha value -1
4-5-1
o
x
5-50
x
o
o
6-51
x
5-50
o
x
x
5-6-1
A
x
o
C
6-51
x
o
21
Alpha-beta pruning

Search progresses in a depth first manner
Whenever a tip node is generated, its static
evaluation is computed
Whenever a position can be given a backed up
value, this value is computed
Node A and all its successors have been generated
Backed up value -1
Node B and its successors have not yet been
generated
Now we know that the backed up value of the start
node is bounded from below by -1

22
Alpha-beta pruning

Depending on the back up values of the other
successors of the start node, the final backed up
value of the start node may be greater than -1,
but it cannot be less
This lower bound alpha value for the start
nodea

23
Alpha-beta pruning

Depth first search proceeds node B and its
first successor node C are generated
Node C is given a static value of -1
Backed up value of node B is bounded from above
by -1
This Upper bound on node B beta value.
Therefore discontinue search below node B.
Node B. will not turn out to be preferable to
node A

24
Alpha-beta pruning

Reduction in search effort achieved by keeping
track of bounds on backed up values
As successors of a node are given backed up
values, the bounds on backed up values can be
revised
Alpha values of MAX nodes that can never decrease
Beta values of MIN nodes can never increase

25
Alpha-beta pruning

Therefore search can be discontinued
Below any MIN node having a beta value less than
or equal to the alpha value of any of its MAX
node ancestors
The final backed up value of this MIN node can
then be set to its beta value
Below any MAX node having an alpha value greater
than or equal to the beta value of any of its
MINa node ancestors
The final backed up value of this MAX node can
then be set to its alpha value

26
Alpha-beta pruning

during search, alpha and beta values are computed
as follows
The alpha value of a MAX node is set equal to the
current largest final backed up value of its
successors
The beta value of a MINof the node is set equal
to the current smallest final backed up value of
its successors

27
Alpha-Beta Example
Do DF-search until first leaf
Range of possible values
-8,8
-8, 8
28
Alpha-Beta Example (continued)
-8,8
-8,3
29
Alpha-Beta Example (continued)
-8,8
-8,3
30
Alpha-Beta Example (continued)
3,8
3,3
31
Alpha-Beta Example (continued)
3,8
This node is worse for MAX
-8,2
3,3
32
Alpha-Beta Example (continued)
,
3,14
-8,2
3,3
-8,14
33
Alpha-Beta Example (continued)
,
3,5
-8,2
3,3
-8,5
34
Alpha-Beta Example (continued)
3,3
2,2
-8,2
3,3
35
Alpha-Beta Example (continued)
3,3
2,2
-8,2
3,3
36
Alpha-Beta Algorithm
function ALPHA-BETA-SEARCH(state) returns an
action inputs state, current state in game
v?MAX-VALUE(state, - 8 , 8) return the action
in SUCCESSORS(state) with value v
function MAX-VALUE(state,? , ?) returns a utility
value if TERMINAL-TEST(state) then return
UTILITY(state) v ? - 8 for a,s in
SUCCESSORS(state) do v ? MAX(v,MIN-VALUE(s,
? , ?)) if v ? then return v ? ?
MAX(? ,v) return v
37
Alpha-Beta Algorithm
function MIN-VALUE(state, ? , ?) returns a
utility value if TERMINAL-TEST(state) then
return UTILITY(state) v ? 8 for a,s in
SUCCESSORS(state) do v ? MIN(v,MAX-VALUE(s,
? , ?)) if v ? then return v ? ?
MIN(? ,v) return v
38
General alpha-beta pruning

Consider a node n somewhere in the tree
If player has a better choice at
Parent node of n
Or any choice point further up
n will never be reached in actual play.
Hence when enough is known about n, it can be
pruned.

39
Final Comments about Alpha-Beta Pruning

Pruning does not affect final results
Entire subtrees can be pruned.
Good move ordering improves effectiveness of
pruning
Alpha-beta pruning can look twice as far as
minimax in the same amount of time
Repeated states are again possible.
Store them in memory transposition table

40
Games that include chance

Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-
16) and (5-11,11-16)

41
Games that include chance
chance nodes

Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-
16) and (5-11,11-16)
1,1, 6,6 chance 1/36, all other chance 1/18

42
Games that include chance

1,1, 6,6 chance 1/36, all other chance 1/18
Can not calculate definite minimax value, only
expected value

43
Expected minimax value

EXPECTED-MINIMAX-VALUE(n)
UTILITY(n) If n is a terminal
maxs ? successors(n) MINIMAX-VALUE(s) If n
is a max node
mins ? successors(n) MINIMAX-VALUE(s) If n
is a max node
?s ? successors(n) P(s) . EXPECTEDMINIMAX(s)
If n is a chance node
These equations can be backed-up recursively all
the way to the root of the game tree.

44
Position evaluation with chance nodes

Left, A1 wins
Right A2 wins
Outcome of evaluation function may not change
when values are scaled differently.
Behavior is preserved only by a positive linear
transformation of EVAL.

45
Discussion

Examine section on state-of-the-art games
yourself
Minimax assumes right tree is better than left,
yet
Return probability distribution over possible
values
Yet expensive calculation

46
Discussion

Utility of node expansion
Only expand those nodes which lead to
significanlty better moves
Both suggestions require meta-reasoning

47
Summary

Games are fun (and dangerous)
They illustrate several important points about AI
Perfection is unattainable - approximation
Good idea what to think about
Uncertainty constrains the assignment of values
to states
Games are to AI as grand prix racing is to
automobile design.

Write a Comment

User Comments (0)

About PowerShow.com

game playing PowerPoint PPT Presentation