Lecture 6: Adversarial Search

About This Presentation
Title:

Lecture 6: Adversarial Search

Description:

Lecture 6: Adversarial Search & Games Reading: Ch. 6, AIMA – PowerPoint PPT presentation

Number of Views:3
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Lecture 6: Adversarial Search


1
Lecture 6Adversarial Search Games
  • Reading Ch. 6, AIMA

2
Adversarial search
  • So far, single agent search no opponents or
    collaborators
  • Multi-agent search
  • Playing a game with an opponent adversarial
    search
  • Economies even more complex, societies of
    cooperative and non-cooperative agents
  • Game playing and AI
  • Games can be complex, require (?) human
    intelligence
  • Have to evolve in real-time
  • Well-defined problems
  • Limited scope

3
Games and AI
Deterministic Chance
perfect info Checkers, Chess, Go, Othello Backgammon, Monopoly
imperfect info Bridge, Poker, Scrabble
4
Games and search
  • Traditional search single agent, searches for
    its well-being, unobstructed
  • Games search against an opponent
  • Consider a two player board game
  • e.g., chess, checkers, tic-tac-toe
  • board configuration unique arrangement of
    "pieces"
  • Representing board games as search problem
  • states board configurations
  • operators legal moves
  • initial state current board configuration
  • goal state winning/terminal board configuration

5
Wrong representation
  • We want to optimize our (agents) goal, hence
    build a search tree based on possible
    moves/actions
  • Problem discounts the opponent

6
Better representation game search tree
  • Include opponents actions as well

Agent move
Full move
Opponent move
Agent move
10
1
5
Utilities (assigned to goal nodes)
7
Game search trees
  • What is the size of the game search trees?
  • O(bd)
  • Tic-tac-toe 9! leaves (max depth 9)
  • Chess 35 legal moves, average depth 100
  • bd 35100 10154 states, only 1040 legal
    states
  • Too deep for exhaustive search!

8
Utilities in search trees
  • Assign utility to (terminal) states, describing
    how much they are valued for the agent
  • High utility good for the agent
  • Low utility good for the opponent

computer's possible moves
A 9
opponent'spossible moves
E 3
D 2
B -5
C 9
terminal states
board evaluation from agent's perspective
9
Search strategy
  • Worst-case scenario assume the opponent will
    always make a best move (i.e., worst move for us)
  • Minimax search maximize the utility for our
    agent while expecting that the opponent plays his
    best moves
  • High utility favors agent gt chose move with
    maximal utility
  • Low move favors opponent gt assume opponent makes
    the move with lowest utility

A
A1
computer's possible moves
opponent'spossible moves
terminal states
10
Minimax algorithm
  1. Start with utilities of terminal nodes
  2. Propagate them back to root node by choosing the
    minimax strategy

11
Complexity of minimax algorithm
  • Utilities propagate up in a recursive fashion
  • DFS
  • Space complexity
  • O(bd)
  • Time complexity
  • O(bd)
  • Problem time complexity its a game, finite
    time to make a move

12
Reducing complexity of minimax (1)
  • Dont search to full depth d, terminate early
  • Prune bad paths
  • Problem
  • Dont have utility of non-terminal nodes
  • Estimate utility for non-terminal nodes
  • static board evaluation function (SBE) is a
    heuristic that assigns utility to non-terminal
    nodes
  • it reflects the computers chances of winning
    from that node
  • it must be easy to calculate from board
    configuration
  • For example, Chess
  • SBE a materialBalance ß centerControl
    ?
  • material balance Value of white pieces - Value
    of black piecespawn 1, rook 5, queen 9,
    etc.

13
Minimax with Evaluation Functions
  • Same as general Minimax, except
  • only goes to depth m
  • estimates using SBE function
  • How would this algorithm perform at chess?
  • if could look ahead 4 pairs of moves (i.e., 8
    ply) would be consistently beaten by average
    players
  • if could look ahead 8 pairs as done in a typical
    PC, is as good as human master

14
Reducing complexity of minimax (2)
  • Some branches of the tree will not be taken if
    the opponent plays cleverly. Can we detect them
    ahead of time?
  • Prune off paths that do not need to be explored
  • Alpha-beta pruning
  • Keep track of while doing DFS of game tree
  • maximizing level alpha
  • highest value seen so far
  • lower bound on node's evaluation/score
  • minimizing level beta
  • lowest value seen so far
  • higher bound on node's evaluation/score

15
Alpha-Beta Example
  • minimax(A,0,4)

CallStack
A
max
A
A a
B
E
D 0
C
F
G -5
K
M
H 3
I 8
J
L 2
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
A
W -3
X -5
16
Alpha-Beta Example
  • minimax(B,1,4)

CallStack
A a
max
B
E
D 0
C
B
Bß
min
F
G -5
K
M
H 3
I 8
J
L 2
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
B
A
W -3
X -5
17
Alpha-Beta Example
  • minimax(F,2,4)

CallStack
A a
max
Bß
E
D 0
C
min
F
G -5
K
M
H 3
I 8
J
L 2
F
Fa
max
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
B
A
W -3
X -5
18
Alpha-Beta Example
  • minimax(N,3,4)

max
CallStack
A a
Bß
E
D 0
C
min
Fa
G -5
K
M
H 3
I 8
J
L 2
max
N
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
N 4
F
B
A
W -3
X -5
blue terminal state
19
Alpha-Beta Example
  • minimax(F,2,4) is returned to

alpha 4, maximum seen so far
CallStack
A a
max
Bß
E
D 0
C
min
Fa
G -5
K
M
H 3
I 8
J
L 2
Fa4
max
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
B
A
W -3
X -5
blue terminal state
20
Alpha-Beta Example
  • minimax(O,3,4)

CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
O
O
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
O
Oß
F
min
B
A
W -3
X -5
blue terminal state
21
Alpha-Beta Example
  • minimax(W,4,4)

CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
W
O
Oß
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
min
B
A
blue terminal state
W -3
X -5
blue terminal state (depth limit)
W -3
22
Alpha-Beta Example
  • minimax(O,3,4) is returned to

beta -3, minimum seen so far
CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
O
Oß
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
Oß-3
F
min
B
A
blue terminal state
W -3
X -5
23
Alpha-Beta Example
  • minimax(O,3,4) is returned to

O's beta ? F's alpha stop expanding O (alpha
cut-off)
CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
O
Oß-3
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
min
B
A
blue terminal state
W -3
X -5
X -5
24
Alpha-Beta Example
  • Why? Smart opponent will choose W or worse, thus
    O's upper bound is 3
  • So computer shouldn't choose O-3 since N4 is
    better

CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
O
Oß-3
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
min
B
A
blue terminal state
W -3
X -5
25
Alpha-Beta Example
  • minimax(F,2,4) is returned to

alpha not changed (maximizing)
CallStack
A a
max
Bß
E
D 0
C
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
Oß-3
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
F
min
B
A
blue terminal state
W -3
X -5
X -5
26
Alpha-Beta Example
  • minimax(B,1,4) is returned to

beta 4, minimum seen so far
CallStack
A a
max
Bß
E
D 0
C
Bß4
min
Fa4
G -5
K
M
H 3
I 8
J
L 2
max
Oß-3
N 4
R 0
P 9
Q -6
S 3
T 5
U -7
V -9
min
B
A
blue terminal state
W -3
X -5
X -5
27
Effectiveness of Alpha-Beta Search
  • Effectiveness depends on the order in which
    successors are examined. More effective if
    bestare examined first
  • Worst Case
  • ordered so that no pruning takes place
  • no improvement over exhaustive search
  • Best Case
  • each players best move is evaluated first
    (left-most)
  • In practice, performance is closer to bestrather
    than worst case

28
Effectiveness of Alpha-Beta Search
  • In practice often get O(b(d/2)) rather than O(bd)
  • same as having a branching factor of ?b
  • since (?b)d b(d/2)
  • For Example Chess
  • goes from b 35 to b 6
  • permits much deeper search for the same time
  • makes computer chess competitive with humans

29
Dealing with Limited Time
  • In real games, there is usually a time limit T on
    making a move
  • How do we take this into account?
  • cannot stop alpha-beta midway and expect to
    useresults with any confidence
  • so, we could set a conservative depth-limit that
    guarantees we will find a move in time lt T
  • but then, the search may finish early andthe
    opportunity is wasted to do more search

30
Dealing with Limited Time
  • In practice, iterative deepening search (IDS) is
    used
  • run alpha-beta search with an increasing depth
    limit
  • when the clock runs out, use the solution
    foundfor the last completed alpha-beta
    search(i.e., the deepest search that was
    completed)

31
The Horizon Effect
  • Sometimes disaster lurks just beyond search depth
  • computer captures queen, but a few moves later
    the opponent checkmates (i.e., wins)
  • The computer has a limited horizon it cannotsee
    that this significant event could happen
  • How do you avoid catastrophic losses due to
    short-sightedness?
  • quiescence search
  • secondary search

32
The Horizon Effect
  • Quiescence Search
  • when evaluation frequently changing, look deeper
    than limit
  • look for a point when game quiets down
  • Secondary Search
  • find best move looking to depth d
  • look k steps beyond to verify that it still looks
    good
  • if it doesn't, repeat Step 2 for next best move

33
Book Moves
  • Build a database of opening moves, end games, and
    studied configurations
  • If the current state is in the database, use
    database
  • to determine the next move
  • to evaluate the board
  • Otherwise, do alpha-beta search

34
Examples of Algorithmswhich Learn to Play Well
  • Checkers
  • A. L. Samuel, Some Studies in Machine Learning
    using the Game of Checkers, IBM Journal of
    Research and Development, 11(6)601-617, 1959
  • Learned by playing a copy of itself thousands of
    times
  • Used only an IBM 704 with 10,000 words of RAM,
    magnetic tape, and a clock speed of 1 kHz
  • Successful enough to compete well at human
    tournaments

35
Examples of Algorithmswhich Learn to Play Well
  • Backgammon
  • G. Tesauro and T. J. Sejnowski, A Parallel
    Network that Learns to Play Backgammon,
    Artificial Intelligence 39(3), 357-390, 1989
  • Also learns by playing copies of itself
  • Uses a non-linear evaluation function - a neural
    network
  • Rated one of the top three players in the world

36
Non-deterministic Games
  • Some games involve chance, for example
  • roll of dice
  • spin of game wheel
  • deal of cards from shuffled deck
  • How can we handle games with random elements?
  • The game tree representation is extendedto
    include chance nodes
  • agent moves
  • chance nodes
  • opponent moves

37
Non-deterministic Games
  • The game tree representation is extended

max
chance
min
38
Non-deterministic Games
  • Weight score by the probabilities that move
    occurs
  • Use expected value for move sum of possible
    random outcomes

max
50/50 4
50/50-2
chance
min
39
Non-deterministic Games
  • Choose move with highest expected value

Aa4
max
chance
min
40
Non-deterministic Games
  • Non-determinism increases branching factor
  • 21 possible rolls with 2 dice
  • Value of lookahead diminishes as depth increases
    probability of reaching a given node decreases
  • alpha-beta pruning less effective
  • TDGammon
  • depth-2 search
  • very good heuristic
  • plays at world champion level

41
Computers can playGrandMaster Chess
  • Deep Blue (IBM)
  • Parallel processor, 32 nodes
  • Each node has 8 dedicated VLSI chess chips
  • Can search 200 million configurations/second
  • Uses minimax, alpha-beta, sophisticated
    heuristics
  • It currently can search to 14 ply (i.e., 7 pairs
    of moves)
  • Can avoid horizon by searching as deep as 40 ply
  • Uses book moves

42
Computers can playGrandMaster Chess
  • Kasparov vs. Deep Blue, May 1997
  • 6 game full-regulation chess match sponsored by
    ACM
  • Kasparov lost the match 2 wins 1 tie to 3 wins
    1 tie
  • This was an historic achievement for computer
    chess being the first time a computer became the
    best chess player on the planet
  • Note that Deep Blue plays by brute force (i.e.,
    raw power from computer speed and memory) it
    uses relatively little that is similar to human
    intuition and cleverness

43
Status of Computersin Other Deterministic Games
  • Checkers/Draughts
  • current world champion is Chinook
  • can beat any human, (beat Tinsley in 1994)
  • uses alpha-beta search, book moves (gt 443
    billion)
  • Othello
  • computers can easily beat the world experts
  • Go
  • branching factor b 360 (very large!)
  • 2 million prize for any system that can beat a
    world expert
Write a Comment
User Comments (0)