game playing PowerPoint PPT Presentation

presentation player overlay
1 / 47
About This Presentation
Transcript and Presenter's Notes

Title: game playing


1
game playing
Notes adapted from lecture notes for CMSC 421 by
B.J. Dorr
2
Outline
  • What are games?
  • Optimal decisions in games
  • Which strategy leads to success?
  • ?-? pruning
  • Games of imperfect information
  • Games that include an element of chance

3
What are and why study games?
  • Games are a form of multi-agent environment
  • What do other agents do and how do they affect
    our success?
  • Cooperative vs. competitive multi-agent
    environments.
  • Competitive multi-agent environments give rise to
    adversarial problems a.k.a. games
  • Why study games?
  • Fun historically entertaining
  • Interesting subject of study because they are
    hard
  • Easy to represent and agents restricted to small
    number of actions

4
Relation of Games to Search
  • Search no adversary
  • Solution is (heuristic) method for finding goal
  • Heuristics and CSP techniques can find optimal
    solution
  • Evaluation function estimate of cost from start
    to goal through given node
  • Examples path planning, scheduling activities
  • Games adversary
  • Solution is strategy (strategy specifies move for
    every possible opponent reply).
  • Time limits force an approximate solution
  • Evaluation function evaluate goodness of game
    position
  • Examples chess, checkers, Othello, backgammon

5
Types of Games
6
Game setup
  • Two players MAX and MIN
  • MAX moves first and they take turns until the
    game is over. Winner gets award, looser gets
    penalty.
  • Games as search
  • Initial state e.g. board configuration of chess
  • Successor function list of (move,state) pairs
    specifying legal moves.
  • Terminal test Is the game finished?
  • Utility function Gives numerical value of
    terminal states. E.g. win (1), lose (-1) and
    draw (0) in tic-tac-toe (next)
  • MAX uses search tree to determine next move.

7
Partial Game Tree for Tic-Tac-Toe
8
Optimal strategies
  • Find the contingent strategy for MAX assuming an
    infallible MIN opponent.
  • Assumption Both players play optimally !!
  • Given a game tree, the optimal strategy can be
    determined by using the minimax value of each
    node
  • MINIMAX-VALUE(n)
  • UTILITY(n) If n is a terminal
  • maxs ? successors(n) MINIMAX-VALUE(s) If n is
    a max node
  • mins ? successors(n) MINIMAX-VALUE(s) If n is
    a min node

9
Two-Ply Game Tree
10
Two-Ply Game Tree
  • MAX nodes
  • MIN nodes
  • Terminal nodes utility values for MAX
  • Other nodes minimax values
  • MAXs best move at root a1 - leads to the
    successor with the highest minimax value
  • MINs best to reply b1 leads to the successor
    with the lowest minimax value

11
What if MIN does not play optimally?
  • Definition of optimal play for MAX assumes MIN
    plays optimally maximizes worst-case outcome for
    MAX.
  • But if MIN does not play optimally, MAX will do
    even better. Can be proved.

12
Minimax Algorithm
function MINIMAX-DECISION(state) returns an
action inputs state, current state in game
v?MAX-VALUE(state) return the action in
SUCCESSORS(state) with value v
function MAX-VALUE(state) returns a utility
value if TERMINAL-TEST(state) then return
UTILITY(state) v ? 8 for a,s in
SUCCESSORS(state) do v ? MAX(v,MIN-VALUE(s))
return v
function MIN-VALUE(state) returns a utility
value if TERMINAL-TEST(state) then return
UTILITY(state) v ? 8 for a,s in
SUCCESSORS(state) do v ? MIN(v,MAX-VALUE(s))
return v
13
Properties of Minimax
?
?
?
?
14
Tic-Tac-Toe
  • Depth bound 2
  • Breadth first search until all nodes at level 2
    are generated
  • Apply evaluation function to positions at these
    nodes

15
Tic-Tac-Toe
  • Evaluation function e(p) of a position p
  • If p is not a winning position for either player
  • e(p) (number of a complete rows, columns, or
    diagonals that are still open for MAX) (number
    of a complete rows, columns, or diagonals that
    are still open for MIN)
  • If p is a win for MAX
  • e(p) ?
  • If p is a win for MIN
  • e(p) - ?

16
Tic-Tac-Toe - First stage
x
o
6-42
5-41
x
o
4-6-2
x
o
x
x
MAXs move
o
6-60
x
4-5-1
o
x
o
5-6-1
5-50
x
x
o
o
6-51
5-50
o
x
x
5-50
o
x
x
5-6-1
x
o
6-51
x
o
17
Tic-Tac-Toe - Second stage
o
MAXs move
3-21
x
o
x
o
x
x
x
o
o
o
x
o
4-22
4-31
x
o
x
o
x
x
x
o
o
o
x
o
3-21
o
x
o
x
4-31
4-22
x
x
x
5-23
o
x
o
o
o
x
o
x
3-30
o
x
4-22
x
x
o
x
x
3-21
x
o
o
x
o
o
5-32
x
o
o
x
3-21
x
x
x
o
4-22
o
x
o
x
o
3-30
5-23
3-30
o
x
x
x
o
x
x
o
o
x
o
4-31
o
x
o
x
4-22
4-31
x
x
o
o
x
o
o
x
4-22
4-31
o
x
x
o
x
o
18
Multiplayer games
  • Games allow more than two players
  • Single minimax values become vectors

19
Problem of minimax search
  • Number of games states is exponential to the
    number of moves.
  • Solution Do not examine every node
  • Alpha-beta pruning
  • Alpha value of best choice found so far at any
    choice point along the MAX path
  • Beta value of best choice found so far at any
    choice point along the MIN path
  • Revisit example

20
Tic-Tac-Toe - First stage
Beta value -1
B
x
Alpha value -1
4-5-1
o
x
5-50
x
o
o
6-51
x
5-50
o
x
x
5-6-1
A
x
o
C
6-51
x
o
21
Alpha-beta pruning
  • Search progresses in a depth first manner
  • Whenever a tip node is generated, its static
    evaluation is computed
  • Whenever a position can be given a backed up
    value, this value is computed
  • Node A and all its successors have been generated
  • Backed up value -1
  • Node B and its successors have not yet been
    generated
  • Now we know that the backed up value of the start
    node is bounded from below by -1

22
Alpha-beta pruning
  • Depending on the back up values of the other
    successors of the start node, the final backed up
    value of the start node may be greater than -1,
    but it cannot be less
  • This lower bound alpha value for the start
    nodea

23
Alpha-beta pruning
  • Depth first search proceeds node B and its
    first successor node C are generated
  • Node C is given a static value of -1
  • Backed up value of node B is bounded from above
    by -1
  • This Upper bound on node B beta value.
  • Therefore discontinue search below node B.
  • Node B. will not turn out to be preferable to
    node A

24
Alpha-beta pruning
  • Reduction in search effort achieved by keeping
    track of bounds on backed up values
  • As successors of a node are given backed up
    values, the bounds on backed up values can be
    revised
  • Alpha values of MAX nodes that can never decrease
  • Beta values of MIN nodes can never increase

25
Alpha-beta pruning
  • Therefore search can be discontinued
  • Below any MIN node having a beta value less than
    or equal to the alpha value of any of its MAX
    node ancestors
  • The final backed up value of this MIN node can
    then be set to its beta value
  • Below any MAX node having an alpha value greater
    than or equal to the beta value of any of its
    MINa node ancestors
  • The final backed up value of this MAX node can
    then be set to its alpha value

26
Alpha-beta pruning
  • during search, alpha and beta values are computed
    as follows
  • The alpha value of a MAX node is set equal to the
    current largest final backed up value of its
    successors
  • The beta value of a MINof the node is set equal
    to the current smallest final backed up value of
    its successors

27
Alpha-Beta Example
Do DF-search until first leaf
Range of possible values
-8,8
-8, 8
28
Alpha-Beta Example (continued)
-8,8
-8,3
29
Alpha-Beta Example (continued)
-8,8
-8,3
30
Alpha-Beta Example (continued)
3,8
3,3
31
Alpha-Beta Example (continued)
3,8
This node is worse for MAX
-8,2
3,3
32
Alpha-Beta Example (continued)
,
3,14
-8,2
3,3
-8,14
33
Alpha-Beta Example (continued)
,
3,5
-8,2
3,3
-8,5
34
Alpha-Beta Example (continued)
3,3
2,2
-8,2
3,3
35
Alpha-Beta Example (continued)
3,3
2,2
-8,2
3,3
36
Alpha-Beta Algorithm
function ALPHA-BETA-SEARCH(state) returns an
action inputs state, current state in game
v?MAX-VALUE(state, - 8 , 8) return the action
in SUCCESSORS(state) with value v
function MAX-VALUE(state,? , ?) returns a utility
value if TERMINAL-TEST(state) then return
UTILITY(state) v ? - 8 for a,s in
SUCCESSORS(state) do v ? MAX(v,MIN-VALUE(s,
? , ?)) if v ? then return v ? ?
MAX(? ,v) return v
37
Alpha-Beta Algorithm
function MIN-VALUE(state, ? , ?) returns a
utility value if TERMINAL-TEST(state) then
return UTILITY(state) v ? 8 for a,s in
SUCCESSORS(state) do v ? MIN(v,MAX-VALUE(s,
? , ?)) if v ? then return v ? ?
MIN(? ,v) return v
38
General alpha-beta pruning
  • Consider a node n somewhere in the tree
  • If player has a better choice at
  • Parent node of n
  • Or any choice point further up
  • n will never be reached in actual play.
  • Hence when enough is known about n, it can be
    pruned.

39
Final Comments about Alpha-Beta Pruning
  • Pruning does not affect final results
  • Entire subtrees can be pruned.
  • Good move ordering improves effectiveness of
    pruning
  • Alpha-beta pruning can look twice as far as
    minimax in the same amount of time
  • Repeated states are again possible.
  • Store them in memory transposition table

40
Games that include chance
  • Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-
    16) and (5-11,11-16)

41
Games that include chance
chance nodes
  • Possible moves (5-10,5-11), (5-11,19-24),(5-10,10-
    16) and (5-11,11-16)
  • 1,1, 6,6 chance 1/36, all other chance 1/18

42
Games that include chance
  • 1,1, 6,6 chance 1/36, all other chance 1/18
  • Can not calculate definite minimax value, only
    expected value

43
Expected minimax value
  • EXPECTED-MINIMAX-VALUE(n)
  • UTILITY(n) If n is a terminal
  • maxs ? successors(n) MINIMAX-VALUE(s) If n
    is a max node
  • mins ? successors(n) MINIMAX-VALUE(s) If n
    is a max node
  • ?s ? successors(n) P(s) . EXPECTEDMINIMAX(s)
    If n is a chance node
  • These equations can be backed-up recursively all
    the way to the root of the game tree.

44
Position evaluation with chance nodes
  • Left, A1 wins
  • Right A2 wins
  • Outcome of evaluation function may not change
    when values are scaled differently.
  • Behavior is preserved only by a positive linear
    transformation of EVAL.

45
Discussion
  • Examine section on state-of-the-art games
    yourself
  • Minimax assumes right tree is better than left,
    yet
  • Return probability distribution over possible
    values
  • Yet expensive calculation

46
Discussion
  • Utility of node expansion
  • Only expand those nodes which lead to
    significanlty better moves
  • Both suggestions require meta-reasoning

47
Summary
  • Games are fun (and dangerous)
  • They illustrate several important points about AI
  • Perfection is unattainable - approximation
  • Good idea what to think about
  • Uncertainty constrains the assignment of values
    to states
  • Games are to AI as grand prix racing is to
    automobile design.
Write a Comment
User Comments (0)
About PowerShow.com