Lecture 6: Adversarial Search

About This Presentation

Title:

Lecture 6: Adversarial Search

Description:

Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA – PowerPoint PPT presentation

Number of Views:3

Avg rating:3.0/5.0

Slides: 44

Provided by: Vladim94

Learn more at: https://people.cs.rutgers.edu

more less

Transcript and Presenter's Notes

Title: Lecture 6: Adversarial Search

1
Lecture 6Adversarial Search Games

Reading Ch. 6, AIMA

2
Adversarial search

So far, single agent search no opponents or
collaborators
Multi-agent search
Playing a game with an opponent adversarial
search
Economies even more complex, societies of
cooperative and non-cooperative agents
Game playing and AI
Games can be complex, require (?) human
intelligence
Have to evolve in real-time
Well-defined problems
Limited scope

3
Games and AI
Deterministic Chance
perfect info Checkers, Chess, Go, Othello Backgammon, Monopoly
imperfect info Bridge, Poker, Scrabble
4
Games and search

Traditional search single agent, searches for
its well-being, unobstructed
Games search against an opponent
Consider a two player board game
e.g., chess, checkers, tic-tac-toe
board configuration unique arrangement of
"pieces"
Representing board games as search problem
states board configurations
operators legal moves
initial state current board configuration
goal state winning/terminal board configuration

5
Wrong representation

We want to optimize our (agents) goal, hence
build a search tree based on possible
moves/actions
Problem discounts the opponent

6
Better representation game search tree

Include opponents actions as well

Agent move
Full move
Opponent move
Agent move
10
1
5
Utilities (assigned to goal nodes)
7
Game search trees

What is the size of the game search trees?
O(bd)
Tic-tac-toe 9! leaves (max depth 9)
Chess 35 legal moves, average depth 100
bd 35100 10154 states, only 1040 legal
states
Too deep for exhaustive search!

8
Utilities in search trees

Assign utility to (terminal) states, describing
how much they are valued for the agent
High utility good for the agent
Low utility good for the opponent

computer's possible moves
A 9
opponent'spossible moves
E 3
D 2
B -5
C 9
terminal states
board evaluation from agent's perspective
9
Search strategy

Worst-case scenario assume the opponent will
always make a best move (i.e., worst move for us)
Minimax search maximize the utility for our
agent while expecting that the opponent plays his
best moves
High utility favors agent gt chose move with
maximal utility
Low move favors opponent gt assume opponent makes
the move with lowest utility

A
A1
computer's possible moves
opponent'spossible moves
terminal states
10
Minimax algorithm

Start with utilities of terminal nodes
Propagate them back to root node by choosing the
minimax strategy

11
Complexity of minimax algorithm

Utilities propagate up in a recursive fashion
DFS
Space complexity
O(bd)
Time complexity
O(bd)
Problem time complexity its a game, finite
time to make a move

12
Reducing complexity of minimax (1)

Dont search to full depth d, terminate early
Prune bad paths
Problem
Dont have utility of non-terminal nodes
Estimate utility for non-terminal nodes
static board evaluation function (SBE) is a
heuristic that assigns utility to non-terminal
nodes
it reflects the computers chances of winning
from that node
it must be easy to calculate from board
configuration
For example, Chess
SBE a materialBalance ß centerControl
?
material balance Value of white pieces - Value
of black piecespawn 1, rook 5, queen 9,
etc.

13
Minimax with Evaluation Functions

Same as general Minimax, except
only goes to depth m
estimates using SBE function
How would this algorithm perform at chess?
if could look ahead 4 pairs of moves (i.e., 8
ply) would be consistently beaten by average
players
if could look ahead 8 pairs as done in a typical
PC, is as good as human master

14
Reducing complexity of minimax (2)

Some branches of the tree will not be taken if
the opponent plays cleverly. Can we detect them
ahead of time?
Prune off paths that do not need to be explored
Alpha-beta pruning
Keep track of while doing DFS of game tree
maximizing level alpha
highest value seen so far
lower bound on node's evaluation/score
minimizing level beta
lowest value seen so far
higher bound on node's evaluation/score

15
Alpha-Beta Example

minimax(A,0,4)

CallStack
A
max
A
A a
B
E
D 0
C
F
G -5
K
M
H 3
I 8
J
L 2
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
A
W -3
X -5
16
Alpha-Beta Example

minimax(B,1,4)

CallStack
A a
max
B
E
D 0
C
B
Bß
min
F
G -5
K
M
H 3
I 8
J
L 2
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
B
A
W -3
X -5
17
Alpha-Beta Example

minimax(F,2,4)

CallStack
A a
max
Bß
E
D 0
C
min
F
G -5
K
M
H 3
I 8
J
L 2
F
Fa
max
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
B
A
W -3
X -5
18
Alpha-Beta Example

minimax(N,3,4)

max
CallStack
A a
Bß
E
D 0
C
min
Fa
G -5
K
M
H 3
I 8
J
L 2
max
N
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
N 4
F
B
A
W -3
X -5
blue terminal state
19
Alpha-Beta Example

minimax(F,2,4) is returned to

alpha 4, maximum seen so far
CallStack
A a
max
Bß
E
D 0
C
min
Fa
G -5
K
M
H 3
I 8
J
L 2
Fa4
max
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
B
A
W -3
X -5
blue terminal state
20
Alpha-Beta Example

minimax(O,3,4)

CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
O
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
O
Oß
F
min
B
A
W -3
X -5
blue terminal state
21
Alpha-Beta Example

minimax(W,4,4)

CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
W
O
Oß
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
min
B
A
blue terminal state
W -3
X -5
blue terminal state (depth limit)
W -3
22
Alpha-Beta Example

minimax(O,3,4) is returned to

beta -3, minimum seen so far
CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
O
Oß
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
Oß-3
F
min
B
A
blue terminal state
W -3
X -5
23
Alpha-Beta Example

minimax(O,3,4) is returned to

O's beta ? F's alpha stop expanding O (alpha
cut-off)
CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
O
Oß-3
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
min
B
A
blue terminal state
W -3
X -5
X -5
24
Alpha-Beta Example

Why? Smart opponent will choose W or worse, thus
O's upper bound is 3
So computer shouldn't choose O-3 since N4 is
better

CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
O
Oß-3
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
min
B
A
blue terminal state
W -3
X -5
25
Alpha-Beta Example

minimax(F,2,4) is returned to

alpha not changed (maximizing)
CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
Oß-3
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
min
B
A
blue terminal state
W -3
X -5
X -5
26
Alpha-Beta Example

minimax(B,1,4) is returned to

beta 4, minimum seen so far
CallStack
A a
max
Bß
E
D 0
C
Bß4
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
Oß-3
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
min
B
A
blue terminal state
W -3
X -5
X -5
27
Effectiveness of Alpha-Beta Search

Effectiveness depends on the order in which
successors are examined. More effective if
bestare examined first
Worst Case
ordered so that no pruning takes place
no improvement over exhaustive search
Best Case
each players best move is evaluated first
(left-most)
In practice, performance is closer to bestrather
than worst case

28
Effectiveness of Alpha-Beta Search

In practice often get O(b(d/2)) rather than O(bd)
same as having a branching factor of ?b
since (?b)d b(d/2)
For Example Chess
goes from b 35 to b 6
permits much deeper search for the same time
makes computer chess competitive with humans

29
Dealing with Limited Time

In real games, there is usually a time limit T on
making a move
How do we take this into account?
cannot stop alpha-beta midway and expect to
useresults with any confidence
so, we could set a conservative depth-limit that
guarantees we will find a move in time lt T
but then, the search may finish early andthe
opportunity is wasted to do more search

30
Dealing with Limited Time

In practice, iterative deepening search (IDS) is
used
run alpha-beta search with an increasing depth
limit
when the clock runs out, use the solution
foundfor the last completed alpha-beta
search(i.e., the deepest search that was
completed)

31
The Horizon Effect

Sometimes disaster lurks just beyond search depth
computer captures queen, but a few moves later
the opponent checkmates (i.e., wins)
The computer has a limited horizon it cannotsee
that this significant event could happen
How do you avoid catastrophic losses due to
short-sightedness?
quiescence search
secondary search

32
The Horizon Effect

Quiescence Search
when evaluation frequently changing, look deeper
than limit
look for a point when game quiets down
Secondary Search
find best move looking to depth d
look k steps beyond to verify that it still looks
good
if it doesn't, repeat Step 2 for next best move

33
Book Moves

Build a database of opening moves, end games, and
studied configurations
If the current state is in the database, use
database
to determine the next move
to evaluate the board
Otherwise, do alpha-beta search

34
Examples of Algorithmswhich Learn to Play Well

Checkers
A. L. Samuel, Some Studies in Machine Learning
using the Game of Checkers, IBM Journal of
Research and Development, 11(6)601-617, 1959
Learned by playing a copy of itself thousands of
times
Used only an IBM 704 with 10,000 words of RAM,
magnetic tape, and a clock speed of 1 kHz
Successful enough to compete well at human
tournaments

35
Examples of Algorithmswhich Learn to Play Well

Backgammon
G. Tesauro and T. J. Sejnowski, A Parallel
Network that Learns to Play Backgammon,
Artificial Intelligence 39(3), 357-390, 1989
Also learns by playing copies of itself
Uses a non-linear evaluation function - a neural
network
Rated one of the top three players in the world

36
Non-deterministic Games

Some games involve chance, for example
roll of dice
spin of game wheel
deal of cards from shuffled deck
How can we handle games with random elements?
The game tree representation is extendedto
include chance nodes
agent moves
chance nodes
opponent moves

37
Non-deterministic Games

The game tree representation is extended

max
chance
min
38
Non-deterministic Games

Weight score by the probabilities that move
occurs
Use expected value for move sum of possible
random outcomes

max
50/50 4
50/50-2
chance
min
39
Non-deterministic Games

Choose move with highest expected value

Aa4
max
chance
min
40
Non-deterministic Games

Non-determinism increases branching factor
21 possible rolls with 2 dice
Value of lookahead diminishes as depth increases
probability of reaching a given node decreases
alpha-beta pruning less effective
TDGammon
depth-2 search
very good heuristic
plays at world champion level

41
Computers can playGrandMaster Chess

Deep Blue (IBM)
Parallel processor, 32 nodes
Each node has 8 dedicated VLSI chess chips
Can search 200 million configurations/second
Uses minimax, alpha-beta, sophisticated
heuristics
It currently can search to 14 ply (i.e., 7 pairs
of moves)
Can avoid horizon by searching as deep as 40 ply
Uses book moves

42
Computers can playGrandMaster Chess

Kasparov vs. Deep Blue, May 1997
6 game full-regulation chess match sponsored by
ACM
Kasparov lost the match 2 wins 1 tie to 3 wins
1 tie
This was an historic achievement for computer
chess being the first time a computer became the
best chess player on the planet
Note that Deep Blue plays by brute force (i.e.,
raw power from computer speed and memory) it
uses relatively little that is similar to human
intuition and cleverness

43
Status of Computersin Other Deterministic Games