Title: Lecture 6: Adversarial Search
1Lecture 6Adversarial Search Games
2Adversarial search
- So far, single agent search no opponents or
collaborators - Multi-agent search
- Playing a game with an opponent adversarial
search - Economies even more complex, societies of
cooperative and non-cooperative agents - Game playing and AI
- Games can be complex, require (?) human
intelligence - Have to evolve in real-time
- Well-defined problems
- Limited scope
3Games and AI
Deterministic Chance
perfect info Checkers, Chess, Go, Othello Backgammon, Monopoly
imperfect info Bridge, Poker, Scrabble
4Games and search
- Traditional search single agent, searches for
its well-being, unobstructed - Games search against an opponent
- Consider a two player board game
- e.g., chess, checkers, tic-tac-toe
- board configuration unique arrangement of
"pieces" - Representing board games as search problem
- states board configurations
- operators legal moves
- initial state current board configuration
- goal state winning/terminal board configuration
5Wrong representation
- We want to optimize our (agents) goal, hence
build a search tree based on possible
moves/actions - Problem discounts the opponent
6Better representation game search tree
- Include opponents actions as well
Agent move
Full move
Opponent move
Agent move
10
1
5
Utilities (assigned to goal nodes)
7Game search trees
- What is the size of the game search trees?
- O(bd)
- Tic-tac-toe 9! leaves (max depth 9)
- Chess 35 legal moves, average depth 100
- bd 35100 10154 states, only 1040 legal
states - Too deep for exhaustive search!
8Utilities in search trees
- Assign utility to (terminal) states, describing
how much they are valued for the agent - High utility good for the agent
- Low utility good for the opponent
computer's possible moves
A 9
opponent'spossible moves
E 3
D 2
B -5
C 9
terminal states
board evaluation from agent's perspective
9Search strategy
- Worst-case scenario assume the opponent will
always make a best move (i.e., worst move for us) - Minimax search maximize the utility for our
agent while expecting that the opponent plays his
best moves - High utility favors agent gt chose move with
maximal utility - Low move favors opponent gt assume opponent makes
the move with lowest utility
A
A1
computer's possible moves
opponent'spossible moves
terminal states
10Minimax algorithm
- Start with utilities of terminal nodes
- Propagate them back to root node by choosing the
minimax strategy
11Complexity of minimax algorithm
- Utilities propagate up in a recursive fashion
- DFS
- Space complexity
- O(bd)
- Time complexity
- O(bd)
- Problem time complexity its a game, finite
time to make a move
12Reducing complexity of minimax (1)
- Dont search to full depth d, terminate early
- Prune bad paths
- Problem
- Dont have utility of non-terminal nodes
- Estimate utility for non-terminal nodes
- static board evaluation function (SBE) is a
heuristic that assigns utility to non-terminal
nodes - it reflects the computers chances of winning
from that node - it must be easy to calculate from board
configuration - For example, Chess
- SBE a materialBalance ß centerControl
? - material balance Value of white pieces - Value
of black piecespawn 1, rook 5, queen 9,
etc.
13Minimax with Evaluation Functions
- Same as general Minimax, except
- only goes to depth m
- estimates using SBE function
- How would this algorithm perform at chess?
- if could look ahead 4 pairs of moves (i.e., 8
ply) would be consistently beaten by average
players - if could look ahead 8 pairs as done in a typical
PC, is as good as human master
14Reducing complexity of minimax (2)
- Some branches of the tree will not be taken if
the opponent plays cleverly. Can we detect them
ahead of time? - Prune off paths that do not need to be explored
- Alpha-beta pruning
- Keep track of while doing DFS of game tree
- maximizing level alpha
- highest value seen so far
- lower bound on node's evaluation/score
- minimizing level beta
- lowest value seen so far
- higher bound on node's evaluation/score
15Alpha-Beta Example
CallStack
A
max
A
A a
B
E
D 0
C
F
G -5
K
M
H 3
I 8
J
L 2
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
A
W -3
X -5
16Alpha-Beta Example
CallStack
A a
max
B
E
D 0
C
B
Bß
min
F
G -5
K
M
H 3
I 8
J
L 2
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
B
A
W -3
X -5
17Alpha-Beta Example
CallStack
A a
max
Bß
E
D 0
C
min
F
G -5
K
M
H 3
I 8
J
L 2
F
Fa
max
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
B
A
W -3
X -5
18Alpha-Beta Example
max
CallStack
A a
Bß
E
D 0
C
min
Fa
G -5
K
M
H 3
I 8
J
L 2
max
N
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
N 4
F
B
A
W -3
X -5
blue terminal state
19Alpha-Beta Example
- minimax(F,2,4) is returned to
alpha 4, maximum seen so far
CallStack
A a
max
Bß
E
D 0
C
min
Fa
G -5
K
M
H 3
I 8
J
L 2
Fa4
max
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
B
A
W -3
X -5
blue terminal state
20Alpha-Beta Example
CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
O
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
O
Oß
F
min
B
A
W -3
X -5
blue terminal state
21Alpha-Beta Example
CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
W
O
Oß
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
min
B
A
blue terminal state
W -3
X -5
blue terminal state (depth limit)
W -3
22Alpha-Beta Example
- minimax(O,3,4) is returned to
beta -3, minimum seen so far
CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
O
Oß
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
Oß-3
F
min
B
A
blue terminal state
W -3
X -5
23Alpha-Beta Example
- minimax(O,3,4) is returned to
O's beta ? F's alpha stop expanding O (alpha
cut-off)
CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
O
Oß-3
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
min
B
A
blue terminal state
W -3
X -5
X -5
24Alpha-Beta Example
- Why? Smart opponent will choose W or worse, thus
O's upper bound is 3 - So computer shouldn't choose O-3 since N4 is
better
CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
O
Oß-3
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
min
B
A
blue terminal state
W -3
X -5
25Alpha-Beta Example
- minimax(F,2,4) is returned to
alpha not changed (maximizing)
CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
Oß-3
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
min
B
A
blue terminal state
W -3
X -5
X -5
26Alpha-Beta Example
- minimax(B,1,4) is returned to
beta 4, minimum seen so far
CallStack
A a
max
Bß
E
D 0
C
Bß4
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
Oß-3
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
min
B
A
blue terminal state
W -3
X -5
X -5
27Effectiveness of Alpha-Beta Search
- Effectiveness depends on the order in which
successors are examined. More effective if
bestare examined first - Worst Case
- ordered so that no pruning takes place
- no improvement over exhaustive search
- Best Case
- each players best move is evaluated first
(left-most) - In practice, performance is closer to bestrather
than worst case
28Effectiveness of Alpha-Beta Search
- In practice often get O(b(d/2)) rather than O(bd)
- same as having a branching factor of ?b
- since (?b)d b(d/2)
- For Example Chess
- goes from b 35 to b 6
- permits much deeper search for the same time
- makes computer chess competitive with humans
29Dealing with Limited Time
- In real games, there is usually a time limit T on
making a move - How do we take this into account?
- cannot stop alpha-beta midway and expect to
useresults with any confidence - so, we could set a conservative depth-limit that
guarantees we will find a move in time lt T - but then, the search may finish early andthe
opportunity is wasted to do more search
30Dealing with Limited Time
- In practice, iterative deepening search (IDS) is
used - run alpha-beta search with an increasing depth
limit - when the clock runs out, use the solution
foundfor the last completed alpha-beta
search(i.e., the deepest search that was
completed)
31The Horizon Effect
- Sometimes disaster lurks just beyond search depth
- computer captures queen, but a few moves later
the opponent checkmates (i.e., wins) - The computer has a limited horizon it cannotsee
that this significant event could happen - How do you avoid catastrophic losses due to
short-sightedness? - quiescence search
- secondary search
32The Horizon Effect
- Quiescence Search
- when evaluation frequently changing, look deeper
than limit - look for a point when game quiets down
- Secondary Search
- find best move looking to depth d
- look k steps beyond to verify that it still looks
good - if it doesn't, repeat Step 2 for next best move
33Book Moves
- Build a database of opening moves, end games, and
studied configurations - If the current state is in the database, use
database - to determine the next move
- to evaluate the board
- Otherwise, do alpha-beta search
34Examples of Algorithmswhich Learn to Play Well
- Checkers
- A. L. Samuel, Some Studies in Machine Learning
using the Game of Checkers, IBM Journal of
Research and Development, 11(6)601-617, 1959 - Learned by playing a copy of itself thousands of
times - Used only an IBM 704 with 10,000 words of RAM,
magnetic tape, and a clock speed of 1 kHz - Successful enough to compete well at human
tournaments
35Examples of Algorithmswhich Learn to Play Well
- Backgammon
- G. Tesauro and T. J. Sejnowski, A Parallel
Network that Learns to Play Backgammon,
Artificial Intelligence 39(3), 357-390, 1989 - Also learns by playing copies of itself
- Uses a non-linear evaluation function - a neural
network - Rated one of the top three players in the world
36Non-deterministic Games
- Some games involve chance, for example
- roll of dice
- spin of game wheel
- deal of cards from shuffled deck
- How can we handle games with random elements?
- The game tree representation is extendedto
include chance nodes - agent moves
- chance nodes
- opponent moves
37Non-deterministic Games
- The game tree representation is extended
max
chance
min
38Non-deterministic Games
- Weight score by the probabilities that move
occurs - Use expected value for move sum of possible
random outcomes
max
50/50 4
50/50-2
chance
min
39Non-deterministic Games
- Choose move with highest expected value
Aa4
max
chance
min
40Non-deterministic Games
- Non-determinism increases branching factor
- 21 possible rolls with 2 dice
- Value of lookahead diminishes as depth increases
probability of reaching a given node decreases - alpha-beta pruning less effective
- TDGammon
- depth-2 search
- very good heuristic
- plays at world champion level
41Computers can playGrandMaster Chess
- Deep Blue (IBM)
- Parallel processor, 32 nodes
- Each node has 8 dedicated VLSI chess chips
- Can search 200 million configurations/second
- Uses minimax, alpha-beta, sophisticated
heuristics - It currently can search to 14 ply (i.e., 7 pairs
of moves) - Can avoid horizon by searching as deep as 40 ply
- Uses book moves
42Computers can playGrandMaster Chess
- Kasparov vs. Deep Blue, May 1997
- 6 game full-regulation chess match sponsored by
ACM - Kasparov lost the match 2 wins 1 tie to 3 wins
1 tie - This was an historic achievement for computer
chess being the first time a computer became the
best chess player on the planet - Note that Deep Blue plays by brute force (i.e.,
raw power from computer speed and memory) it
uses relatively little that is similar to human
intuition and cleverness
43Status of Computersin Other Deterministic Games
- Checkers/Draughts
- current world champion is Chinook
- can beat any human, (beat Tinsley in 1994)
- uses alpha-beta search, book moves (gt 443
billion) - Othello
- computers can easily beat the world experts
- Go
- branching factor b 360 (very large!)
- 2 million prize for any system that can beat a
world expert