Experience-Based Chess Play - PowerPoint PPT Presentation

1 / 71

About This Presentation

Title:

Experience-Based Chess Play

Description:

No computer is ever going to calculate the entire tree. ... It is simply a brute force calculation that applies an evaluation function to ... – PowerPoint PPT presentation

Number of Views:174

Avg rating:3.0/5.0

Slides: 72

Provided by: IIPL

Category:

more less

Transcript and Presenter's Notes

Title: Experience-Based Chess Play

1
Experience-Based Chess Play
Stanford ML Seminar March 16, 2005

Robert Levinson
Machine Intelligence Lab
University of California,
Santa Cruz

2
Outline

Review of State-of-the-art in Games
Review of Computer Chess Method
Blindspots and Unsolved Issues
4. Morph
4a. Philosophy and Results
4b. Patterns/Evaluation
4c. Learning
Td-learning
Neural Nets
Genetic Algorithms

3
Why Chess?

Human/computer approaches very different
Most studied game
Well-known internationally and by public
Cognitive studies available
Accurate, well-defined rating system !
Complex and Non-Uniform
John McCarthy, Alan Turing, Claude Shannon,
Herb Simon and Ken Thompson and.
Game theoretic value unknown
Active Research Community/Journal

4
State of the art
5
State of the art (2)
6
State of the art (3)
7
Kasparov vs. Deep Blue

1. Deep Blue can examine and evaluate up to
200,000,000 chess positions per second
Garry Kasparov can examine and evaluate up to
three chess positions per second
2. Deep Blue has a small amount of chess
knowledge and an enormous amount of calculation
ability.
Garry Kasparov has a large amount of chess
knowledge and a somewhat smaller amount of
calculation ability.
3. Garry Kasparov uses his tremendous sense of
feeling and intuition to play world
champion-calibre chess.
Deep Blue is a machine that is incapable of
feeling or intuition.
4. Deep Blue has benefitted from the guidance of
five IBM research scientists and one
international grandmaster.
Garry Kasparov is guided by his coach Yuri
Dokhoian and by his own driving passion to play
the finest chess in the world.
5. Garry Kasparov is able to learn and adapt very
quickly from his own successes and mistakes.
Deep Blue, as it stands today, is not a "learning
system." It is therefore not capable of utilizing
artificial intelligence to either learn from its
opponent or "think" about the current position of
the chessboard.
6. Deep Blue can never forget, be distracted or
feel intimidated by external forces (such as
Kasparov's infamous "stare").
Garry Kasparov is an intense competitor, but he
is still susceptible to human frailties such as
fatigue, boredom and loss of concentration.

8
Recent Man vs. Machine Matches

Garry Kasparov versus Deep Junior, January 26 -
February 7, 2003
in New York City, USA. Result 3 - 3 draw.
Evgeny Bareev versus Hiarcs-X, January 28 - 31 ,
2003 in Maastricht, Netherlands. Result 2
- 2 draw.
Vladimir Kramnik versus Deep Fritz, October 2 -
22, 2002 in Manama, Bahrain. Result 4 - 4
draw.

9
Minimax Example
Max
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3

terminal nodes values calculated from some
evaluation function

10
Max
Min
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3

other nodes values calculated via minimax
algorithm

11
Max
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
12
Max
5
3
4
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
13
Actual move made by Max
5
Max
Possible later moves
5
3
4
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3

Max makes first move down the left hand side of
the tree, in the expectation that countermove by
Min is now predicted. After Min makes move, tree
will need to be regenerated and the minimax
procedure re-applied to new tree

14
5
Max
5
3
4
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
15
Computer chess

Let's say you start with a chess board set up for
the start of a game. Each player has 16 pieces.
Let's say that white starts. White has 20
possible moves
The white player can move any pawn forward one or
two positions.
The white player can move either knight in two
different ways.
The white player chooses one of those 20 moves
and makes it. For the black player, the options
are the same 20 possible moves. So black chooses
a move.
Now white can move again. This next move depends
on the first move that white chose to make, but
there are about 20 or so moves white can make
given the current board position, and then black
has 20 or so moves it can make, and so on.

16
Chess complexity
There are 20 possible moves for white. There are
20 20 400 possible moves for black, depending
on what white does. Then there are 400 20
8,000 for white. Then there are 8,000 20
160,000 for black, and so on. If you were to
fully develop the entire tree for all possible
chess moves, the total number of board positions
is about 1,000,000,000,000,000,000,000,000,000,00
0,000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000,0
00,000,000,000,000,000, or 10120. There have
only been 1026 nanoseconds since the Big Bang.
There are thought to be only 1075 atoms in the
entire universe.
17
How computer chess works
No computer is ever going to calculate the entire
tree. What a chess computer tries to do is
generate the board-position tree five or 10 or 20
moves into the future. Assuming that there are
about 20 possible moves for any board position, a
five-level tree contains 3,200,000 board
positions. A 10-level tree contains about
10,000,000,000,000 (10 trillion) positions. The
depth of the tree that a computer can calculate
is controlled by the speed of the computer
playing the game.
18
Is computer chess intelligent?
The minimax algorithm alternates between the
maximums and minimums as it moves up the tree.
This process is completely mechanical and
involves no insight. It is simply a brute force
calculation that applies an evaluation function
to all possible board positions in a tree of a
certain depth. What is interesting is that this
sort of technique works pretty well. On a
fast-enough computer, the algorithm can look far
enough ahead to play a very good game. If you add
in learning techniques that modify the evaluation
function based on past games, the machine can
even improve over time. The key thing to keep in
mind, however, is that this is nothing like human
thought. But does it have to be?
19
Leaf nodes examined
Search Depth
20
Hmmm. What to do?

Black is to move

21
Ra5!

White to move

22
(No Transcript)
23
(No Transcript)
24
What is strategy?

the art of devising or employing plans or
stratagems toward a goal where favorable.

25
Chess isMulti-Level
26
Science

Would you say a traditional chess program's
chess strength is based mainly on a firm
axiomatic theory about the strengths and balances
of competing differential semiotic trajectory
units in
a multiagent hypergeoemetric topological
manifold-like zero-sum dynamic environment
or simply because
it is an accurate short-term calculator??!

27
Morph Philosophy

Reinforcement Learning
Mathematical View of
Chess
Dont Cheat!!!

28
Cheating!

define DOUBLED_PAWN_PENALTY 10
define ISOLATED_PAWN_PENALTY 20
define BACKWARDS_PAWN_PENALTY 8
define PASSED_PAWN_BONUS
20
define ROOK_SEMI_OPEN_FILE_BONUS 10
define ROOK_OPEN_FILE_BONUS 15
define ROOK_ON_SEVENTH_BONUS 20
/ the values of the pieces /
int piece_value6 100,300,350,500,900,0

29
White-Queen-square table
30
Goal ELO 3000

2800 World Champion
2600 GM
2400 IM
2200 FM (gt 99 percent)
Expert
1600 Median (gt 50 percent)
1543 Morph
1000 Novice (beginning tournament player)
555 Random

31
Milestones

MorphI 1995. Graph Matching
Draws GnuChess every 10
games. 800 rating
Morph II 1998. Plays any game, compiles
neural net based on First Order Logic
rules. Optimal Tic-Tac-Toe, NIM.
Morph IV 2003. Neural Neighborhoods Improved
Eval
Reaches 1036.
Summer 2004 1450 (improved implementation)
Today 1558 (neural net pattern variety
genetic alg.)
Next genetically selected patterns and nets.

32
Total Information Diversity Symmetry

Diversity corresponds to Comp Sci Complexity
resources required.
Diversity can often only be resolved with
Combinatorial Search ???

33
Exploit Symmetry !! ?

Invariant with respect to transformation.
Shared information between objects
or systems or their representations.
ABAC A(BC). ?

34
Symmetry Synonyms

similarity
commonality
structure
mutual information
relationship
pattern
redundancy

35
Novices
Experts
36
Levels of Learning

Levels
0. None Brute Force It
1. Empirically Statistical Understanding
- Inductive
1a. Supervise
b. Reward/Punish TD
Learning
c. Imitate
2. Analytically Mathematical -
Deductive
But Must Be Efficient!

37
Chess and Stock Trading
Patterns and their interpretation

Technical Analysis Forecasting market activity
based on the study of price charts, trends and
other technical data.
Fundamental Analysis Forecasting based on
analysis of economic and geopolitical data.
A Trading plan is formulated and executed only
after a thorough and systematic Technical and
Fundamental analysis.
All trading plans incorporate Risk Management
procedures.

38
Trade Example I GBP/USD
39
THE MORPH ARCHITECTURE

Pattern-based!

CHESS GAMES
NEW POSITION
INTUITIVELY BEST MOVE
COMPILED MEMORY
Pattern Processor Matcher
4-6 PLY Search
WEIGHTS

ADB
40
MORPH
Positions
Chess
Patterns
Weights
4-Ply Search
Reply
41
Morph patterns

graph patterns
both nodes, edge labeled
direct attack, indirect attack, discovered attack
material patterns
Piece/square tables
(later graph patterns realized as neural
networks)

42
The game of Chess example

White has over 30 possible moves
If blacks turn can capture pawn at c3 and
check (also fork)

43
(No Transcript)
44
(No Transcript)
45
Exploiting Graph Isomorphism
46
Pattern weight formulation of search knowledge

weights
real number within the reinforcement value range,
e.g 0,1
expected value of reinforcement given that the
current state satisfied the pattern
pws ltp1, .7gt
the states that have p1 as a feature are more
likely to lead to a win than a loss
advantage
low-level of granularity and uniformity

47
Evaluation Scheme
64 Neighborhood Values are Computed and Then
Combined
48
Morph - evaluation function
KEY Use product rather than sum!!
WHY? Makes risk adverse..
49
Morph - evaluation asproduct
x1 x2
x1x2 .5 .5 0.25 .2 .8 0.16 .2 .5 0.10 .8 .
5 0.40 .8 .8 0.64 .2 .2 0.04 .1 .8 0.08 .2
.9 0.18 1 .8 0.80 0 .2 0.00
50
Learning Ingredients

Graph Patterns
Neural Networks
Temporal Difference Learning
Simulated Annealing
Representation Change
Genetic Algorithms

51
Modifying the weight of patterns
1. each state in the sequence of states that
proceeded the reinforcement value is
assigned a new value using temporal
difference learning W
B W B W (assume W won)
old 0.6 0.35 0.70 0.3
0.8 new 0.675 0.25 0.85 0.0
1.0 2. the new value assigned to each state is
propagated down to the patterns that matched
the state
52
Difference between Morph and traditional TD
learning 1. the feature set changes throughout
the learning process 2. use a simulated annealing
type scheme to give the weight of each pattern
its own learning rate 3. the more a pattern gets
updated, the slower its learning
rate becomes
53
simulated annealing DB comprises a complex
system of many particles optimal configuration
each weight has its proper value The average
error serves as a the objective evaluation
function average error the difference
between APS's
prediction of a state's value and that
provided by
temporal-difference learning
54
Neural Networks

Perhaps not so dumb?

Morphs neighborhood network is a NON-LINEAR
PERCEPTRON!
55
Biological inspirations

Some numbers
The human brain contains about 10 billion nerve
cells (neurons)
Each neuron is connected to the others through
10000 synapses
Properties of the brain
It can learn, reorganize itself from experience
It adapts to the environment
It is robust and fault tolerant

56
Biological neuron

A neuron has
A branching input (dendrites)
A branching output (the axon)
The information circulates from the dendrites to
the axon via the cell body
Axon connects to dendrites via synapses
Synapses vary in strength
Synapses may be excitatory or inhibitory

57
What is an artificial neuron ?

Definition Non linear, parameterized function
with restricted output range

y
w0
x1
x2
x3
58
Activation functions
Linear
Logistic
Hyperbolic tangent
59
Feed Forward Neural Networks

The information is propagated from the inputs to
the outputs
Computations of non linear functions from input
variables by compositions of algebraic functions
Time has no role (NO cycle between outputs and
inputs)

Output layer
2nd hidden layer
1st hidden layer
x1
x2
xn
..
60
Recurrent Neural Networks

Can have arbitrary topologies
Can model systems with internal states (dynamic
ones)
Delays are associated to a specific weight
Training is more difficult
Performance may be problematic
Stable Outputs may be more difficult to evaluate
Unexpected behavior (oscillation, chaos, )

1
0
0
0
1
0
0
1
x1
x2
61
Properties of Neural Networks

Supervised networks are universal approximators
(Non recurrent networks)
Theorem Any limited function can be
approximated by a neural network with a finite
number of hidden neurons to an arbitrary
precision

62
Multi-Layer Perceptron

One or more hidden layers
Sigmoid activations functions

Output layer
2nd hidden layer
1st hidden layer
Input data
63
Different non linearly separable problems
Types of Decision Regions
Exclusive-OR Problem
Classes with Meshed regions
Most General Region Shapes
Structure
Single-Layer
Half Plane Bounded By Hyperplane
Two-Layer
Convex Open Or Closed Regions
Abitrary (Complexity Limited by No. of Nodes)
Three-Layer
Neural Networks An Introduction Dr. Andrew
Hunter
64
Perceptron Update

We employ
Gradient descent w0w1x1w2x2 w17x17
Exponential gradient w0w1e-x1..w17e-x17
Non-linear terms w0w12x1x2
Triples w0w123x1x2x3
Soon some higher order terms
w0 w1457x1x4x5x7 .
genetically selected

65
Genetic Algorithms

May Help Supervise Training and Development of
the Neural Net structure!

BONUS
We believe use of diversity in weight updating is
underated!!
66
Charles Darwin (1809-1882)

Botanist, Zoologist, Geologist, General Man of
Science

Sailed on the Beagle for about 5 years.

1859 Wrote Origins of Species a very popular
scientific treatise

Goal Develop an A.I. learning method using
principles of Darwins amazing Beagle trip and
subsequent theories of biological evolution

67
Genetic Algorithms (GA)

John Holland, father of Genetic Algorithms
Computer programs that "evolve" in ways that
resemble natural selection can solve complex
problems even their creators do not fully
understand
GA allows A.I. knowledge discovery not yet known
to humans or machines
Adaptable to complex domains where human
knowledge is limited, or sufficient domain
knowledge cannot be easily provided
New hypotheses are generated by mutating and
recombining current hypotheses
At each step we have a population of hypotheses
from which we select the most fit
GA does parallel search over different parts of
the hypothesis space

68
Blondie24 A GA Rock Star Success ??

David Fogels Neural Net/GA Checker Playing
Program
Final rating 2045.85 (Master Level) Better than
95 of all checkers players
Neural Network 2 hidden layers with 40 and 10
nodes respectively, fully connected
Initial 15 randomly-weighted NN
Each of the 15 NNs produce 1 offspring (using
mutation), total of 30 NNs
Each player plays against 5 randomly-selected
opponents using depth 4 alphabeta
Top 15 performers retained
250 generations (time taken about 1 month)

69
Morph GA Fitness, Selection Variation

Initial Experiment 20 Nan (2-Tuple) Brains play
Round Robin each Generation, for several hundred
Generations
Fitness 1 point given for game won, -1 for
game lost. Draw receives no points. Points
totalled for each Nan-Brain at Generation end
Selection Ten highest scoring Brains are
selected for survival, remaining Brains
eliminated .
Variation 10-50 of wgts are randomly mutated
using a Box Mueller algorithm for Normal
Distribution a Sigmoid Function
Initial Results 300 ELO Points gained from
initial 555 (Random) ELO using Nan (2-Tuple)
Neighborhood Knowledge Representation Schema

70
Future Experiments

We are confident we can achieve higher ELO
rating levels using GA but it is important we do
not cheat (provide specific chess knowledge)
Variations upon the Knowledge Representation,
including X-Tuple Neighborhood Networks
Evolving meta-parameters using Genetic Algorithms
e.g. range of mutating weights, size of knowledge
representation, ofbrains competing
Lamarckian Evolution Learning during a
generation (not just selection/mutation alone)
i.e., TD updating
Using GA with alternative Knowledge
Representations including various Neural Net
configurations

71
Grandmaster Database
Database all (507,734 games) Report 1.d4 Nf6
2.Bg5 Ne4 3.h4 (173 games) ECO A45s Trompowsky
Raptor Variation Generated by Scid 3.0,
2001.11.15 1. STATISTICS AND HISTORY -----------
-------------- 1.1 Statistics
Games 1-0 - 0-1
Score --------------------------------------------
--------------- All report games 173 61
44 68 47.9 Both rated 2600 0
0 0 0 0.0 Both rated
2500 17 8 5 4 61.7
Both rated 2400 33 16 9 8
62.1 Both rated 2300 67 26 21
20 54.4

Write a Comment

User Comments (0)