Title: game playing
1game playing
Notes adapted from lecture notes for CMSC 421 by
B.J. Dorr
2Outline
- What are games?
- Optimal decisions in games
- Which strategy leads to success?
- ?-? pruning
- Games of imperfect information
- Games that include an element of chance
3What are and why study games?
- Games are a form of multi-agent environment
- What do other agents do and how do they affect
our success? - Cooperative vs. competitive multi-agent
environments. - Competitive multi-agent environments give rise to
adversarial problems a.k.a. games - Why study games?
- Fun historically entertaining
- Interesting subject of study because they are
hard - Easy to represent and agents restricted to small
number of actions
4Relation of Games to Search
- Search no adversary
- Solution is (heuristic) method for finding goal
- Heuristics and CSP techniques can find optimal
solution - Evaluation function estimate of cost from start
to goal through given node - Examples path planning, scheduling activities
- Games adversary
- Solution is strategy (strategy specifies move for
every possible opponent reply). - Time limits force an approximate solution
- Evaluation function evaluate goodness of game
position - Examples chess, checkers, Othello, backgammon
5Types of Games
6Game setup
- Two players MAX and MIN
- MAX moves first and they take turns until the
game is over. Winner gets award, looser gets
penalty. - Games as search
- Initial state e.g. board configuration of chess
- Successor function list of (move,state) pairs
specifying legal moves. - Terminal test Is the game finished?
- Utility function Gives numerical value of
terminal states. E.g. win (1), lose (-1) and
draw (0) in tic-tac-toe (next) - MAX uses search tree to determine next move.
7Partial Game Tree for Tic-Tac-Toe
8Optimal strategies
- Find the contingent strategy for MAX assuming an
infallible MIN opponent. - Assumption Both players play optimally !!
- Given a game tree, the optimal strategy can be
determined by using the minimax value of each
node - MINIMAX-VALUE(n)
- UTILITY(n) If n is a terminal
- maxs ? successors(n) MINIMAX-VALUE(s) If n is
a max node - mins ? successors(n) MINIMAX-VALUE(s) If n is
a min node
9Two-Ply Game Tree
10Two-Ply Game Tree
- MAX nodes
- MIN nodes
- Terminal nodes utility values for MAX
- Other nodes minimax values
- MAXs best move at root a1 - leads to the
successor with the highest minimax value - MINs best to reply b1 leads to the successor
with the lowest minimax value
11What if MIN does not play optimally?
- Definition of optimal play for MAX assumes MIN
plays optimally maximizes worst-case outcome for
MAX. - But if MIN does not play optimally, MAX will do
even better. Can be proved.
12Minimax Algorithm
function MINIMAX-DECISION(state) returns an
action inputs state, current state in game
v?MAX-VALUE(state) return the action in
SUCCESSORS(state) with value v
function MAX-VALUE(state) returns a utility
value if TERMINAL-TEST(state) then return
UTILITY(state) v ? 8 for a,s in
SUCCESSORS(state) do v ? MAX(v,MIN-VALUE(s))
return v
function MIN-VALUE(state) returns a utility
value if TERMINAL-TEST(state) then return
UTILITY(state) v ? 8 for a,s in
SUCCESSORS(state) do v ? MIN(v,MAX-VALUE(s))
return v
13Properties of Minimax
?
?
?
?
14Tic-Tac-Toe
- Depth bound 2
- Breadth first search until all nodes at level 2
are generated - Apply evaluation function to positions at these
nodes
15Tic-Tac-Toe
- Evaluation function e(p) of a position p
- If p is not a winning position for either player
- e(p) (number of a complete rows, columns, or
diagonals that are still open for MAX) (number
of a complete rows, columns, or diagonals that
are still open for MIN) - If p is a win for MAX
- e(p) ?
- If p is a win for MIN
- e(p) - ?
16Tic-Tac-Toe - First stage
x
o
6-42
5-41
x
o
4-6-2
x
o
x
x
MAXs move
o
6-60
x
4-5-1
o
x
o
5-6-1
5-50
x
x
o
o
6-51
5-50
o
x
x
5-50
o
x
x
5-6-1
x
o
6-51
x
o
17Tic-Tac-Toe - Second stage
o
MAXs move
3-21
x
o
x
o
x
x
x
o
o
o
x
o
4-22
4-31
x
o
x
o
x
x
x
o
o
o
x
o
3-21
o
x
o
x
4-31
4-22
x
x
x
5-23
o
x
o
o
o
x
o
x
3-30
o
x
4-22
x
x
o
x
x
3-21
x
o
o
x
o
o
5-32
x
o
o
x
3-21
x
x
x
o
4-22
o
x
o
x
o
3-30
5-23
3-30
o
x
x
x
o
x
x
o
o
x
o
4-31
o
x
o
x
4-22
4-31
x
x
o
o
x
o
o
x
4-22
4-31
o
x
x
o
x
o
18Multiplayer games
- Games allow more than two players
- Single minimax values become vectors
19Problem of minimax search
- Number of games states is exponential to the
number of moves. - Solution Do not examine every node
- Alpha-beta pruning
- Alpha value of best choice found so far at any
choice point along the MAX path - Beta value of best choice found so far at any
choice point along the MIN path - Revisit example
20Tic-Tac-Toe - First stage
Beta value -1
B
x
Alpha value -1
4-5-1
o
x
5-50
x
o
o
6-51
x
5-50
o
x
x
5-6-1
A
x
o
C
6-51
x
o
21Alpha-beta pruning
- Search progresses in a depth first manner
- Whenever a tip node is generated, its static
evaluation is computed - Whenever a position can be given a backed up
value, this value is computed - Node A and all its successors have been generated
- Backed up value -1
- Node B and its successors have not yet been
generated - Now we know that the backed up value of the start
node is bounded from below by -1
22Alpha-beta pruning
- Depending on the back up values of the other
successors of the start node, the final backed up
value of the start node may be greater than -1,
but it cannot be less - This lower bound alpha value for the start
nodea
23Alpha-beta pruning
- Depth first search proceeds node B and its
first successor node C are generated - Node C is given a static value of -1
- Backed up value of node B is bounded from above
by -1 - This Upper bound on node B beta value.
- Therefore discontinue search below node B.
- Node B. will not turn out to be preferable to
node A
24Alpha-beta pruning
- Reduction in search effort achieved by keeping
track of bounds on backed up values - As successors of a node are given backed up
values, the bounds on backed up values can be
revised - Alpha values of MAX nodes that can never decrease
- Beta values of MIN nodes can never increase
25Alpha-beta pruning
- Therefore search can be discontinued
- Below any MIN node having a beta value less than
or equal to the alpha value of any of its MAX
node ancestors - The final backed up value of this MIN node can
then be set to its beta value - Below any MAX node having an alpha value greater
than or equal to the beta value of any of its
MINa node ancestors - The final backed up value of this MAX node can
then be set to its alpha value
26Alpha-beta pruning
- during search, alpha and beta values are computed
as follows - The alpha value of a MAX node is set equal to the
current largest final backed up value of its
successors - The beta value of a MINof the node is set equal
to the current smallest final backed up value of
its successors
27Alpha-Beta Example
Do DF-search until first leaf
Range of possible values
-8,8
-8, 8
28Alpha-Beta Example (continued)
-8,8
-8,3
29Alpha-Beta Example (continued)
-8,8
-8,3
30Alpha-Beta Example (continued)
3,8
3,3
31Alpha-Beta Example (continued)
3,8
This node is worse for MAX
-8,2
3,3
32Alpha-Beta Example (continued)
,
3,14
-8,2
3,3
-8,14
33Alpha-Beta Example (continued)
,
3,5
-8,2
3,3
-8,5
34Alpha-Beta Example (continued)
3,3
2,2
-8,2
3,3
35Alpha-Beta Example (continued)
3,3
2,2
-8,2
3,3
36Alpha-Beta Algorithm
function ALPHA-BETA-SEARCH(state) returns an
action inputs state, current state in game
v?MAX-VALUE(state, - 8 , 8) return the action
in SUCCESSORS(state) with value v
function MAX-VALUE(state,? , ?) returns a utility
value if TERMINAL-TEST(state) then return
UTILITY(state) v ? - 8 for a,s in
SUCCESSORS(state) do v ? MAX(v,MIN-VALUE(s,
? , ?)) if v ? then return v ? ?
MAX(? ,v) return v
37Alpha-Beta Algorithm
function MIN-VALUE(state, ? , ?) returns a
utility value if TERMINAL-TEST(state) then
return UTILITY(state) v ? 8 for a,s in
SUCCESSORS(state) do v ? MIN(v,MAX-VALUE(s,
? , ?)) if v ? then return v ? ?
MIN(? ,v) return v
38General alpha-beta pruning
- Consider a node n somewhere in the tree
- If player has a better choice at
- Parent node of n
- Or any choice point further up
- n will never be reached in actual play.
- Hence when enough is known about n, it can be
pruned.
39Final Comments about Alpha-Beta Pruning
- Pruning does not affect final results
- Entire subtrees can be pruned.
- Good move ordering improves effectiveness of
pruning - Alpha-beta pruning can look twice as far as
minimax in the same amount of time - Repeated states are again possible.
- Store them in memory transposition table
40Games that include chance
- Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-
16) and (5-11,11-16)
41Games that include chance
chance nodes
- Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-
16) and (5-11,11-16) - 1,1, 6,6 chance 1/36, all other chance 1/18
42Games that include chance
- 1,1, 6,6 chance 1/36, all other chance 1/18
- Can not calculate definite minimax value, only
expected value
43Expected minimax value
- EXPECTED-MINIMAX-VALUE(n)
- UTILITY(n) If n is a terminal
- maxs ? successors(n) MINIMAX-VALUE(s) If n
is a max node - mins ? successors(n) MINIMAX-VALUE(s) If n
is a max node - ?s ? successors(n) P(s) . EXPECTEDMINIMAX(s)
If n is a chance node - These equations can be backed-up recursively all
the way to the root of the game tree.
44Position evaluation with chance nodes
- Left, A1 wins
- Right A2 wins
- Outcome of evaluation function may not change
when values are scaled differently. - Behavior is preserved only by a positive linear
transformation of EVAL.
45Discussion
- Examine section on state-of-the-art games
yourself - Minimax assumes right tree is better than left,
yet - Return probability distribution over possible
values - Yet expensive calculation
46Discussion
- Utility of node expansion
- Only expand those nodes which lead to
significanlty better moves - Both suggestions require meta-reasoning
47Summary
- Games are fun (and dangerous)
- They illustrate several important points about AI
- Perfection is unattainable - approximation
- Good idea what to think about
- Uncertainty constrains the assignment of values
to states - Games are to AI as grand prix racing is to
automobile design.