Title: SinglePerson Game
1Single-Person Game
- conventional search problem
- identify a sequence of moves that leads to a
winning state - examples Solitaire, dragons and dungeons,
Rubiks cube - little attention in AI
- some games can be quite challenging
- some versions of solitaire
- a heuristic for Rubiks cube was found by the
Absolver program
2Two-Person Game
- games with two opposing players
- often called MAX and MIN
- usually MAX moves first, then min
- in game terminology, a move comprises one step,
or play by each player - Typically you are Max
- MAX wants a strategy to find a winning state
- no matter what MIN does
- Max must assume MIN does the same
- or at least tries to prevent MAX from winning
3Perfect Decisions
- Optimal strategy for MAX
- traverse all relevant parts of the search tree
- this must include possible moves by MIN
- identify a path that leads MAX to a winning state
- So MAX must do some work to estimate all the
possible moves (to a certain depth) from the
current position and try to plan the best way
forward such that he will win - often impractical
- time and space limitations
4Nodes are discovered using DFS Once leaf nodes
are discovered they are scored Here Max is
building a tree of possibilities Which way should
he play when the tree is finished?
Maxs possible moves
Minss possible moves
Maxs moves
Minss moves
4
7
9
5Max-Min Example
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
- terminal nodes values calculated from the
utility function
6MiniMax Example
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
- other nodes values calculated via minimax
algorithm - Here the green nodes pick the minimum value from
the nodes underneath
7MiniMax Example
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
- other nodes values calculated via minimax
algorithm - Here the red nodes pick the maximum value from
the nodes underneath
8MiniMax Example
5
3
4
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
- other nodes values calculated via minimax
algorithm
9MiniMax Example
5
Max
5
3
4
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
10MiniMax Example
5
Max
5
3
4
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
- moves by Max and countermoves by Min
11MiniMax Observations
- the values of some of the leaf nodes are
irrelevant for decisions at the next level - this also holds for decisions at higher levels
- as a consequence, under certain circumstances,
some parts of the tree can be disregarded - it is possible to still make an optimal decision
without considering those parts
12What is pruning?
- You dont have to look at every node in the tree
- discards parts of the search tree
- That are guaranteed not to contain good moves
- results in substantial time and space savings
- as a consequence, longer sequences of moves can
be explored - the leftover part of the task may still be
exponential, however
13Alpha-Beta Pruning
- extension of the minimax approach
- results in the same move as minimax, but with
less overhead - prunes uninteresting parts of the search tree
- certain moves are not considered
- wont result in a better evaluation value than a
move further up in the tree - they would lead to a less desirable outcome
- applies to moves by both players
- a indicates the best choice for Max so far never
decreases - b indicates the best choice for Min so far never
increases
14Note
- For the following example remember
- Nodes are found with DFS
- As a terminal or leaf node (or at a temporary
terminal node at a certain depth) is found a
utility function gives it a score - So do we need to evaluate F and G once we find E?
Can we prune the tree?
5
E
F
G
15Alpha-Beta Example 1
-8, 8
5
Max
Local Values
-8, 8
Min
a best choice for Max ? b best choice for Min ?
Global Values
- Step 1 --- we expand the tree a little
- we assume a depth-first, left-to-right search as
basic strategy - the range ( -8, 8 ) of the possible values for
each node are indicated - initially the local values -8, 8 reflect the
values of the sub-trees in that node from Maxs
or Mins perspective - Since we havent expanded they are infinite
- the global values ? and ? are the best overall
choices so far for Max or Min
16Alpha-Beta Example 2
-8, 8
5
Max
-8, 7
Min
7
a best choice for Max ? b best choice for Min 7
_at_ min node if value of node lt value of parent
then abandon. NO because no a yet
- We evaluate a node
- Min obtains the first value from a successor node
17Alpha-Beta Example 3
-8, 8
5
Max
-8, 6
Min
7
6
a best choice for Max ? b best choice for Min 6
_at_ min node if value of node lt value of parent
then abandon. NO
- Min obtains the second value from a successor node
18Alpha-Beta Example 4
5, 8
5
Max
5
Min
7
6
5
a best choice for Max 5 b best choice for Min 5
_at_ min node if value of node lt value of parent
then abandon. No more nodes
- Min obtains the third value from a successor node
- this is the last value from this sub-tree, and
the exact value is known - Min is finished on this branch
- Max now has a value for its first successor node,
but hopes that something better might still come
19Alpha-Beta Example 5
5, 8
5
Max
5
-8, 3
Min
7
6
5
3
a best choice for Max 5 b best choice for Min 3
_at_ min node if value of node lt value of parent
then abandon. YES!
- Min continues with the next sub-tree, and gets a
better value - Max has a better choice from its perspective (the
5) and should not consider a move into the
sub-tree currently being explored by Min
20Alpha-Beta Example 6
5, 8
5
Max
5
-8, 3
Min
7
6
5
3
a best choice for Max 5 b best choice for Min 3
_at_ min node if value of node lt value of parent
then abandon. YES!
- Max wont consider a move to this sub-tree it
would choose 5 first gtabandon expansion - this is a case of pruning, indicated by
- Pruning means that we wont do a DFS into these
nodes - No need to use the utility function to calculate
the nodes value - No need to explore that part of the tree further.
21Alpha-Beta Example 7
5, 8
5
Max
5
-8, 3
-8, 6
Min
7
6
5
6
3
a best choice for Max 5 b best choice for Min 3
_at_ min node if value of node lt value of parent
then abandon. NO
- Min explores the next sub-tree, and finds a value
that is worse than the other nodes at this level - Min knows this 6 is good for Max and will then
evaluate the leaf nodes looking for a lower value
(if it exists) - if Min is not able to find something lower, then
Max will choose this branch
22Alpha-Beta Example 8
5, 8
5
Max
5
-8, 3
-8, 5
Min
7
6
5
6
3
5
a best choice for Max 5 b best choice for Min 3
_at_ min node if value of node lt value of parent
then abandon. YES!
- Min is lucky, and finds a value that is the same
as the current worst value at this level - Max can choose this branch, or the other branch
with the same value
23Alpha-Beta Example 9
5
5
Max
5
-8, 3
-8, 5
Min
7
6
5
6
3
5
a best choice for Max 5 b best choice for Min 3
- Min could continue searching this sub-tree to see
if there is a value that is less than the current
worst alternative in order to give Max as few
choices as possible - this depends on the specific implementation
- Max knows the best value for its sub-tree
24Alpha-Beta Example Overview
5
5
Max
Min
5
lt5
lt 3
a best choice for Max 5 b best choice for Min 3
- some branches can be pruned because they would
never be considered - after looking at one branch, Max already knows
that they will not be of interest since Min would
choose a value that is less than what Max already
has at its disposal
25Properties of Alpha-Beta Pruning
- in the ideal case, the best successor node is
examined first - alpha-beta can look ahead twice as far as minimax
- assumes an idealized tree model
- uniform branching factor, path length
- random distribution of leaf evaluation values
- requires additional information for good players
- game-specific background knowledge
- empirical data
26Imperfect Decisions
- Does this mean I have to expand the tree all the
way??? - complete search is impractical for most games
- alternative search the tree only to a certain
depth - requires a cutoff-test to determine where to stop
27Evaluation Function
- If I stop part of the way down how do I score
these nodes (which are not terminal nodes!) - Use an evaluation function to score these nodes!
- must be consistent with the utility function
- values for terminal nodes (or at least their
order) must be the same - tradeoff between accuracy and time cost
- without time limits, minimax could be used
28Example Tic-Tac-Toe
- simple evaluation function
- E(s) (rx cx dx) - (ro co do)
- where r,c,d are the number of rows, columns
and diagonals lines available for a win for X (or
O) in that state - x and o are the pieces of the two players
- 1-ply lookahead
- start at the top of the tree
- evaluate all 9 choices for player 1
- pick the maximum E-value
- 2-ply lookahead
- also looks at the opponents possible move
- assuming that the opponents picks the minimum
E-value
29Tic-Tac-Toe 1-Ply
E(s0) MaxE(s11), E(s1n) Max2,3,4 4
E(s12) 8 - 6 2
E(s13) 8 - 5 3
E(s14) 8 - 6 2
E(s15) 8 - 4 4
E(s16) 8 - 6 2
E(s17) 8 - 5 3
E(s18) 8 - 6 2
E(s19) 8 - 5 3
E(s11) 8 - 5 3
X
X
X
X
X
X
X
X
X
E(s11) E of state _at_ depth-level 1 number
1 Simple evaluation function E(s11) (rx
cx dx) - (ro co do) E(s11) X has
(3rows 3 cols 2 diags) O has (2r 2c
1d) E(s11) 8 -5 3
30Tic-Tac-Toe 2-Ply
E(s0) MaxE(s11), E(s1n) Max2,3,4 4
E(s16) 8 - 6 2
E(s17) 8 - 5 3
E(s18) 8 - 6 2
E(s19) 8 - 5 3
E(s15) 8 - 4 4
E(s13) 8 - 5 3
E(s12) 8 - 6 2
E(s11) 8 - 5 3
E(s14) 8 - 6 2
X
X
X
X
X
X
X
X
X
E(s248) 5 - 4 1
E(s247) 6 - 4 2
E(s245) 6 - 4 2
E(s241) 5 - 4 1
E(s242) 6 - 4 2
E(s243) 5 - 4 1
E(s244) 6 - 4 2
E(s246) 5 - 4 1
O
O
O
X
O
X
X
O
X
X
X
X
X
O
O
O
E(s216) 5 - 6 -1
E(s215) 5 -6 -1
E(s213) 5 - 6 -1
E(s29) 5 - 6 -1
E(s210) 5 -6 -1
E(s211) 5 - 6 -1
E(s212) 5 - 6 -1
E(s214) 5 - 6 -1
X
X
X
X
X
X
O
O
X
X
O
O
O
O
O
O
E(s28) 5 - 5 0
E(s27) 6 - 5 1
E(s25) 6 - 5 1
E(s21) 6 - 5 1
E(s22) 5 - 5 0
E(s23) 6 - 5 1
E(s24) 4 - 5 -1
E(s26) 5 - 5 0
X
X
X
X
X
X
O
X
O
X
O
O
O
O
O
O
Note For 2-Ply we must expand all nodes in the
1st level but as scores of 2 and 3 repeat a few
times we only expand one of each.
31Tic-Tac-Toe 2-Ply
E(s0) MaxE(s11), E(s1n) Max2,3,4 4
E(s16) 8 - 6 2
E(s17) 8 - 5 3
E(s18) 8 - 6 2
E(s19) 8 - 5 3
E(s15) 8 - 4 4
E(s13) 8 - 5 3
E(s12) 8 - 6 2
E(s11) 8 - 5 3
E(s14) 8 - 6 2
X
X
X
X
X
X
X
X
X
E(s248) 5 - 4 1
E(s247) 6 - 4 2
E(s245) 6 - 4 2
E(s241) 5 - 4 1
E(s242) 6 - 4 2
E(s243) 5 - 4 1
E(s244) 6 - 4 2
E(s246) 5 - 4 1
O
O
O
X
O
X
X
O
X
X
X
X
X
O
O
O
E(s216) 5 - 6 -1
E(s215) 5 -6 -1
E(s213) 5 - 6 -1
E(s29) 5 - 6 -1
E(s210) 5 -6 -1
E(s211) 5 - 6 -1
E(s212) 5 - 6 -1
E(s214) 5 - 6 -1
X
X
X
X
X
X
O
O
X
X
O
O
O
O
O
O
E(s28) 5 - 5 0
E(s27) 6 - 5 1
E(s25) 6 - 5 1
E(s21) 6 - 5 1
E(s22) 5 - 5 0
E(s23) 6 - 5 1
E(s24) 4 - 5 -1
E(s26) 5 - 5 0
X
X
X
X
X
X
O
X
O
X
O
O
O
O
O
O
It seems that the centre X (i.e. a move to
E(s15)) is the best because its successor nodes
have a value of 1 (as opposed to 0 or worse -1)
32Checkers Case Study
- initial board configuration
- Black single on 20 single on 21 king on
31 - Red single on 23 king on 22
- evaluation functionE(s) (5 x1 x2) - (5r1
r2) - where
- x1 black king advantage,
- x2 black single advantage,
- r1 red king advantage,
- r2 red single advantage
1
2
3
4
8
6
5
7
9
10
11
12
16
14
13
15
17
18
19
20
24
22
21
23
25
26
27
28
31
32
30
29
33Part 1 MiniMax using DFS
31 -gt 27
20 -gt 16
MAX moves
21 -gt 17
31 -gt 26
MIN moves
22 -gt 17
takes red king
21 -gt 14
MAX moves
Typically you expand DFS style to a certain depth
and then eval. node E(s) (5 x1 x2) - (5r1
r2) (52) (0 1) 6
3431 -gt 27
20 -gt 16
MAX
21 -gt 17
31 -gt 26
MIN
23 -gt 32
23 -gt 30
22 -gt 31
22 -gt 13
22 -gt 26
22 -gt 17
22 -gt 18
22 -gt 25
23 -gt 26
23 -gt 27
MAX
16 -gt 11
16 -gt 11
31 -gt 27
31 -gt 27
31 -gt 24
21 -gt 17
20 -gt 16
31 -gt 27
21 -gt 14
31 -gt 27
20 -gt 16
31 -gt 27
21 -gt 17
16 -gt 11
21 -gt 17
20 -gt 16
20 -gt 16
31 -gt 26
We fast-forward and see the full tree to a
certain depth
35Then we score leaf nodes
31 -gt 27
20 -gt 16
MAX
21 -gt 17
31 -gt 26
MIN
23 -gt 32
23 -gt 30
22 -gt 31
22 -gt 13
22 -gt 26
22 -gt 17
22 -gt 18
22 -gt 25
23 -gt 26
23 -gt 27
MAX
16 -gt 11
16 -gt 11
31 -gt 27
31 -gt 27
31 -gt 24
21 -gt 17
20 -gt 16
31 -gt 27
21 -gt 14
31 -gt 27
20 -gt 16
31 -gt 27
21 -gt 17
16 -gt 11
21 -gt 17
20 -gt 16
20 -gt 16
31 -gt 26
36Then we propagate Min-Max scores upwards
31 -gt 27
20 -gt 16
MAX
21 -gt 17
31 -gt 26
MIN
23 -gt 32
23 -gt 30
22 -gt 31
22 -gt 13
22 -gt 26
22 -gt 17
22 -gt 18
22 -gt 25
23 -gt 26
23 -gt 27
MAX
16 -gt 11
16 -gt 11
31 -gt 27
31 -gt 27
31 -gt 24
21 -gt 17
20 -gt 16
31 -gt 27
21 -gt 14
31 -gt 27
20 -gt 16
31 -gt 27
21 -gt 17
16 -gt 11
21 -gt 17
20 -gt 16
20 -gt 16
31 -gt 26
37Could we make this a little easier?
- This tree will really expand some considerable
distance - What about if we introduce a little pruning
- OK then but first we have to go back to the
start.
38Checkers Min-Max With a/b pruning
Temporarily propagate values up
31 -gt 27
20 -gt 16
MAX moves
21 -gt 17
31 -gt 26
6
MIN moves
6
22 -gt 17
21 -gt 14
MAX moves
39Checkers Alpha-Beta Example
31 -gt 27
MAX
20 -gt 16
21 -gt 17
31 -gt 26
MIN
Next node is expanded
22 -gt 17
22 -gt 18
MAX
_at_ max node if value of node gt value of parent
then abandon. No! No! No! . No . No!
21 -gt 14
31 -gt 27
16 -gt 11
Alas no pruning below this Max node!
40Checkers Alpha-Beta Example
1
2
3
4
8
6
5
7
9
10
11
12
16
14
13
15
31 -gt 27
MAX
20 -gt 16
17
18
19
20
21 -gt 17
31 -gt 26
24
22
21
23
25
26
27
28
31
32
30
29
Notice new value propagated up!
MIN
22 -gt 17
22 -gt 18
22 -gt 25
_at_ max node if value of node gt value of parent
then abandon. No! So its in the interest of Max
node to see if it can do better No! etc No!
etc . No etc . No!
MAX
16 -gt 11
31 -gt 27
21 -gt 14
31 -gt 27
16 -gt 11
41Checkers Alpha-Beta Example
1
2
3
4
8
6
5
7
9
10
11
12
16
14
13
15
31 -gt 27
MAX
20 -gt 16
17
18
19
20
21 -gt 17
31 -gt 26
24
22
21
23
25
26
27
28
31
32
30
29
MIN
22 -gt 17
22 -gt 18
22 -gt 25
Its not in MAXs interest to bother expanding
these nodes as they have the same value as 16-gt
11 so we prune them
MAX
16 -gt 11
31 -gt 27
21 -gt 14
31 -gt 27
16 -gt 11
42Checkers Alpha-Beta Example
1
2
3
4
8
6
5
7
9
10
11
12
16
14
13
15
31 -gt 27
MAX
20 -gt 16
17
18
19
20
21 -gt 17
31 -gt 26
24
22
21
23
25
26
27
28
31
32
30
29
1
MIN
22 -gt 26
22 -gt 17
22 -gt 18
22 -gt 25
MAX
16 -gt 11
16 -gt 11
31 -gt 22
31 -gt 27
21 -gt 14
31 -gt 27
16 -gt 11
43Checkers Alpha-Beta Example
31 -gt 27
MAX
20 -gt 16
21 -gt 17
31 -gt 26
1
MIN
22 -gt 26
22 -gt 17
22 -gt 18
22 -gt 25
23 -gt 26
MAX
16 -gt 11
16 -gt 11
31 -gt 22
31 -gt 27
31 -gt 27
21 -gt 14
31 -gt 27
16 -gt 11
44Search Limits
- search must be cut off because of time or space
limitations - strategies like depth-limited or iterative
deepening search can be used - dont take advantage of knowledge about the
problem - more refined strategies apply background
knowledge - quiescent search
- cut off only parts of the search space that dont
exhibit big changes in the evaluation function
45Games and Computers
- state of the art for some game programs
- Chess
- Checkers
- Othello
- Backgammon
- Go
46Chess
- Deep Blue, a special-purpose parallel computer,
defeated the world champion Gary Kasparov in 1997 - the human player didnt show his best game
- some claims that the circumstances were
questionable - Deep Blue used a massive data base with games
from the literature - Fritz, a program running on an ordinary PC,
challenged the world champion Vladimir Kramnik to
an eight-game draw in 2002 - top programs and top human players are roughly
equal
47Checkers
- Arthur Samuel develops a checkers program in the
1950s that learns its own evaluation function - reaches an expert level stage in the 1960s
- Chinook becomes world champion in 1994
- human opponent, Dr. Marion Tinsley, withdraws for
health reasons - Tinsley had been the world champion for 40 years
- Chinook uses off-the-shelf hardware, alpha-beta
search, end-games data base for six-piece
positions
48Othello
- Logistello defeated the human world champion in
1997 - many programs play far better than humans
- smaller search space than chess
- little evaluation expertise available
49Backgammon
- TD-Gammon, neural-network based program, ranks
among the best players in the world - improves its own evaluation function through
learning techniques - search-based methods are practically hopeless
- chance elements, branching factor