Learning Shape in Computer Go

About This Presentation

Title:

Learning Shape in Computer Go

Description:

No good evaluation functions. Human intuition (shape knowledge) has proven difficult to capture. ... Just how good is a particular shape? Enumerating local shapes ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 21

Provided by: csUal

Category:

more less

Transcript and Presenter's Notes

Title: Learning Shape in Computer Go

1
Learning Shape in Computer Go

David Silver

2
A brief introduction to Go

Black and white take turns to place down stones
Once played, a stone cannot move
The aim is to surround the most territory
Usually played on 19x19 board

3
Capturing

The lines radiating from a stone are called
liberties
If a connected group of stones has all of its
liberties removed then it is captured
Captured stones are removed from the board

4
Capturing

The lines radiating from a stone are called
liberties
If a connected group of stones has all of its
liberties removed then it is captured
Captured stones are removed from the board

5
Atari Go (Capture Go)

Atari Go is a simplified version of Go
The winner is the first player to capture
Often used to teach Go to beginners
Circumvents several tricky issues
The game only finishing by agreement
Ko (local repetitions of position)
Seki (local stalemates)

6
Computer Go

Computer Go programs are very weak
Search space is too large for brute force
techniques
No good evaluation functions
Human intuition (shape knowledge) has proven
difficult to capture.
Why not learn shape knowledge?
And use it to learn an evaluation function?

7
Local shape

Local shape describes a pattern of stones
It is used extensively by current Computer Go
programs (pattern databases)
Inputting local shape by hand takes many years of
hard labour
We would like to
Learn local shapes by trial and error
Assign a value for the goodness of a shape
Just how good is a particular shape?

8
Enumerating local shapes

In these experiments all possible local shapes
are used as features
Up to a small maximum size (e.g. 2x2)
A local shape is defined to be
A particular configuration of stones
At a canonical position on the board
Local shapes are used as binary features by the
learning algorithm

9
Invariances

Each canonical local shape can be
Rotated
Reflected
Inverted
So each position may cause updates to multiple
instances of each feature.

10
Algorithm

Value function is learnt for afterstates
Move selection is done by 1-ply greedy search (e
0) over value function
Active local shapes are identified
Linear combination is taken
Sigmoid squashing function is applied
Backups are performed using TD(0)
Reward of 1 for winning, 0 for losing

11
Value function approximation
12
Training procedure

The challenge
Learn to beat the average liberty player
So learning algorithm was trained specifically
against the average liberty player
The problem learning is very slow, since the
agent almost never wins any games by chance.
The solution mix in a proportion of random moves
until the agent wins 50 of all games.
Reduce the proportion of randomness as the agent
learns to win more games.

13
Training procedure

The two pint challenge
Learn to beat the average liberty player
So learning algorithm was trained specifically
against the average liberty player
The problem learning is very slow, since the
agent almost never wins any games by chance.
The solution mix in a proportion of random moves
until the agent wins 50 of all games.
Reduce the proportion of randomness as the agent
learns to win more games.

14
Results for different shape sizes
15
Results for different board sizes
16
Shapes learned (1x1)
17
Shapes learned (2x2)
18
Shapes learned (3x3)
19
Conclusions