Title: Experience-Based Chess Play
1Experience-Based Chess Play
Stanford ML Seminar March 16, 2005
- Robert Levinson
- Machine Intelligence Lab
- University of California,
- Santa Cruz
2Outline
- Review of State-of-the-art in Games
- Review of Computer Chess Method
- Blindspots and Unsolved Issues
-
- 4. Morph
- 4a. Philosophy and Results
- 4b. Patterns/Evaluation
- 4c. Learning
- Td-learning
- Neural Nets
- Genetic Algorithms
3Why Chess?
- Human/computer approaches very different
- Most studied game
- Well-known internationally and by public
- Cognitive studies available
- Accurate, well-defined rating system !
- Complex and Non-Uniform
- John McCarthy, Alan Turing, Claude Shannon,
- Herb Simon and Ken Thompson and.
- Game theoretic value unknown
- Active Research Community/Journal
4State of the art
5State of the art (2)
6State of the art (3)
7Kasparov vs. Deep Blue
- 1. Deep Blue can examine and evaluate up to
200,000,000 chess positions per second - Garry Kasparov can examine and evaluate up to
three chess positions per second - 2. Deep Blue has a small amount of chess
knowledge and an enormous amount of calculation
ability. - Garry Kasparov has a large amount of chess
knowledge and a somewhat smaller amount of
calculation ability. - 3. Garry Kasparov uses his tremendous sense of
feeling and intuition to play world
champion-calibre chess. - Deep Blue is a machine that is incapable of
feeling or intuition. - 4. Deep Blue has benefitted from the guidance of
five IBM research scientists and one
international grandmaster. - Garry Kasparov is guided by his coach Yuri
Dokhoian and by his own driving passion to play
the finest chess in the world. - 5. Garry Kasparov is able to learn and adapt very
quickly from his own successes and mistakes. - Deep Blue, as it stands today, is not a "learning
system." It is therefore not capable of utilizing
artificial intelligence to either learn from its
opponent or "think" about the current position of
the chessboard. - 6. Deep Blue can never forget, be distracted or
feel intimidated by external forces (such as
Kasparov's infamous "stare"). - Garry Kasparov is an intense competitor, but he
is still susceptible to human frailties such as
fatigue, boredom and loss of concentration.
8Recent Man vs. Machine Matches
- Garry Kasparov versus Deep Junior, January 26 -
February 7, 2003 - in New York City, USA. Result 3 - 3 draw.
- Evgeny Bareev versus Hiarcs-X, January 28 - 31 ,
2003 in Maastricht, Netherlands. Result 2
- 2 draw. - Vladimir Kramnik versus Deep Fritz, October 2 -
22, 2002 in Manama, Bahrain. Result 4 - 4
draw.
9Minimax Example
Max
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
- terminal nodes values calculated from some
evaluation function
10Max
Min
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
- other nodes values calculated via minimax
algorithm
11Max
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
12Max
5
3
4
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
13Actual move made by Max
5
Max
Possible later moves
5
3
4
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
- Max makes first move down the left hand side of
the tree, in the expectation that countermove by
Min is now predicted. After Min makes move, tree
will need to be regenerated and the minimax
procedure re-applied to new tree
145
Max
5
3
4
Min
7
6
5
5
6
4
Max
4
7
6
2
6
3
4
5
1
2
5
4
1
2
6
3
4
3
Min
4
7
9
6
9
8
8
5
6
7
5
2
3
2
5
4
9
3
15Computer chess
- Let's say you start with a chess board set up for
the start of a game. Each player has 16 pieces.
Let's say that white starts. White has 20
possible moves - The white player can move any pawn forward one or
two positions. - The white player can move either knight in two
different ways. - The white player chooses one of those 20 moves
and makes it. For the black player, the options
are the same 20 possible moves. So black chooses
a move. - Now white can move again. This next move depends
on the first move that white chose to make, but
there are about 20 or so moves white can make
given the current board position, and then black
has 20 or so moves it can make, and so on.
16Chess complexity
There are 20 possible moves for white. There are
20 20 400 possible moves for black, depending
on what white does. Then there are 400 20
8,000 for white. Then there are 8,000 20
160,000 for black, and so on. If you were to
fully develop the entire tree for all possible
chess moves, the total number of board positions
is about 1,000,000,000,000,000,000,000,000,000,00
0,000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000,0
00,000,000,000,000,000, or 10120. There have
only been 1026 nanoseconds since the Big Bang.
There are thought to be only 1075 atoms in the
entire universe.
17How computer chess works
No computer is ever going to calculate the entire
tree. What a chess computer tries to do is
generate the board-position tree five or 10 or 20
moves into the future. Assuming that there are
about 20 possible moves for any board position, a
five-level tree contains 3,200,000 board
positions. A 10-level tree contains about
10,000,000,000,000 (10 trillion) positions. The
depth of the tree that a computer can calculate
is controlled by the speed of the computer
playing the game.
18Is computer chess intelligent?
The minimax algorithm alternates between the
maximums and minimums as it moves up the tree.
This process is completely mechanical and
involves no insight. It is simply a brute force
calculation that applies an evaluation function
to all possible board positions in a tree of a
certain depth. What is interesting is that this
sort of technique works pretty well. On a
fast-enough computer, the algorithm can look far
enough ahead to play a very good game. If you add
in learning techniques that modify the evaluation
function based on past games, the machine can
even improve over time. The key thing to keep in
mind, however, is that this is nothing like human
thought. But does it have to be?
19Leaf nodes examined
Search Depth
20Hmmm. What to do?
21Ra5!
22(No Transcript)
23(No Transcript)
24What is strategy?
- the art of devising or employing plans or
stratagems toward a goal where favorable.
25Chess isMulti-Level
26Science
- Would you say a traditional chess program's
- chess strength is based mainly on a firm
axiomatic theory about the strengths and balances
of competing differential semiotic trajectory
units in - a multiagent hypergeoemetric topological
manifold-like zero-sum dynamic environment - or simply because
- it is an accurate short-term calculator??!
27Morph Philosophy
- Reinforcement Learning
- Mathematical View of
- Chess
- Dont Cheat!!!
28Cheating!
- define DOUBLED_PAWN_PENALTY 10
- define ISOLATED_PAWN_PENALTY 20
- define BACKWARDS_PAWN_PENALTY 8
- define PASSED_PAWN_BONUS
20 - define ROOK_SEMI_OPEN_FILE_BONUS 10
- define ROOK_OPEN_FILE_BONUS 15
- define ROOK_ON_SEVENTH_BONUS 20
- / the values of the pieces /
- int piece_value6 100,300,350,500,900,0
-
29White-Queen-square table
30Goal ELO 3000
- 2800 World Champion
- 2600 GM
- 2400 IM
- 2200 FM (gt 99 percent)
- Expert
- 1600 Median (gt 50 percent)
- 1543 Morph
- 1000 Novice (beginning tournament player)
- 555 Random
31Milestones
- MorphI 1995. Graph Matching
- Draws GnuChess every 10
games. 800 rating - Morph II 1998. Plays any game, compiles
- neural net based on First Order Logic
- rules. Optimal Tic-Tac-Toe, NIM.
- Morph IV 2003. Neural Neighborhoods Improved
Eval - Reaches 1036.
- Summer 2004 1450 (improved implementation)
- Today 1558 (neural net pattern variety
genetic alg.) - Next genetically selected patterns and nets.
32Total Information Diversity Symmetry
- Diversity corresponds to Comp Sci Complexity
resources required. - Diversity can often only be resolved with
Combinatorial Search ???
33Exploit Symmetry !! ?
- Invariant with respect to transformation.
- Shared information between objects
- or systems or their representations.
- ABAC A(BC). ?
34Symmetry Synonyms
- similarity
- commonality
- structure
- mutual information
- relationship
- pattern
- redundancy
35Novices
Experts
36Levels of Learning
- Levels
- 0. None Brute Force It
- 1. Empirically Statistical Understanding
- - Inductive
- 1a. Supervise
- b. Reward/Punish TD
Learning - c. Imitate
- 2. Analytically Mathematical -
Deductive - But Must Be Efficient!
37Chess and Stock Trading
Patterns and their interpretation
- Technical Analysis Forecasting market activity
based on the study of price charts, trends and
other technical data. - Fundamental Analysis Forecasting based on
analysis of economic and geopolitical data. - A Trading plan is formulated and executed only
after a thorough and systematic Technical and
Fundamental analysis. - All trading plans incorporate Risk Management
procedures.
38Trade Example I GBP/USD
39THE MORPH ARCHITECTURE
CHESS GAMES
NEW POSITION
INTUITIVELY BEST MOVE
COMPILED MEMORY
Pattern Processor Matcher
4-6 PLY Search
WEIGHTS
ADB
40MORPH
Positions
Chess
Patterns
Weights
4-Ply Search
Reply
41Morph patterns
- graph patterns
- both nodes, edge labeled
- direct attack, indirect attack, discovered attack
- material patterns
- Piece/square tables
- (later graph patterns realized as neural
- networks)
42The game of Chess example
- White has over 30 possible moves
- If blacks turn can capture pawn at c3 and
check (also fork)
43(No Transcript)
44(No Transcript)
45Exploiting Graph Isomorphism
46Pattern weight formulation of search knowledge
- weights
- real number within the reinforcement value range,
e.g 0,1 - expected value of reinforcement given that the
current state satisfied the pattern - pws ltp1, .7gt
- the states that have p1 as a feature are more
likely to lead to a win than a loss - advantage
- low-level of granularity and uniformity
47Evaluation Scheme
64 Neighborhood Values are Computed and Then
Combined
48Morph - evaluation function
KEY Use product rather than sum!!
WHY? Makes risk adverse..
49Morph - evaluation asproduct
x1 x2
x1x2 .5 .5 0.25 .2 .8 0.16 .2 .5 0.10 .8 .
5 0.40 .8 .8 0.64 .2 .2 0.04 .1 .8 0.08 .2
.9 0.18 1 .8 0.80 0 .2 0.00
50Learning Ingredients
- Graph Patterns
- Neural Networks
- Temporal Difference Learning
- Simulated Annealing
- Representation Change
- Genetic Algorithms
51Modifying the weight of patterns
1. each state in the sequence of states that
proceeded the reinforcement value is
assigned a new value using temporal
difference learning W
B W B W (assume W won)
old 0.6 0.35 0.70 0.3
0.8 new 0.675 0.25 0.85 0.0
1.0 2. the new value assigned to each state is
propagated down to the patterns that matched
the state
52Difference between Morph and traditional TD
learning 1. the feature set changes throughout
the learning process 2. use a simulated annealing
type scheme to give the weight of each pattern
its own learning rate 3. the more a pattern gets
updated, the slower its learning
rate becomes
53simulated annealing DB comprises a complex
system of many particles optimal configuration
each weight has its proper value The average
error serves as a the objective evaluation
function average error the difference
between APS's
prediction of a state's value and that
provided by
temporal-difference learning
54Neural Networks
Morphs neighborhood network is a NON-LINEAR
PERCEPTRON!
55Biological inspirations
- Some numbers
- The human brain contains about 10 billion nerve
cells (neurons) - Each neuron is connected to the others through
10000 synapses - Properties of the brain
- It can learn, reorganize itself from experience
- It adapts to the environment
- It is robust and fault tolerant
56Biological neuron
- A neuron has
- A branching input (dendrites)
- A branching output (the axon)
- The information circulates from the dendrites to
the axon via the cell body - Axon connects to dendrites via synapses
- Synapses vary in strength
- Synapses may be excitatory or inhibitory
57What is an artificial neuron ?
- Definition Non linear, parameterized function
with restricted output range
y
w0
x1
x2
x3
58Activation functions
Linear
Logistic
Hyperbolic tangent
59Feed Forward Neural Networks
- The information is propagated from the inputs to
the outputs - Computations of non linear functions from input
variables by compositions of algebraic functions - Time has no role (NO cycle between outputs and
inputs)
Output layer
2nd hidden layer
1st hidden layer
x1
x2
xn
..
60Recurrent Neural Networks
- Can have arbitrary topologies
- Can model systems with internal states (dynamic
ones) - Delays are associated to a specific weight
- Training is more difficult
- Performance may be problematic
- Stable Outputs may be more difficult to evaluate
- Unexpected behavior (oscillation, chaos, )
1
0
0
0
1
0
0
1
x1
x2
61Properties of Neural Networks
- Supervised networks are universal approximators
(Non recurrent networks) - Theorem Any limited function can be
approximated by a neural network with a finite
number of hidden neurons to an arbitrary
precision
62Multi-Layer Perceptron
- One or more hidden layers
- Sigmoid activations functions
Output layer
2nd hidden layer
1st hidden layer
Input data
63Different non linearly separable problems
Types of Decision Regions
Exclusive-OR Problem
Classes with Meshed regions
Most General Region Shapes
Structure
Single-Layer
Half Plane Bounded By Hyperplane
Two-Layer
Convex Open Or Closed Regions
Abitrary (Complexity Limited by No. of Nodes)
Three-Layer
Neural Networks An Introduction Dr. Andrew
Hunter
64Perceptron Update
- We employ
- Gradient descent w0w1x1w2x2 w17x17
- Exponential gradient w0w1e-x1..w17e-x17
- Non-linear terms w0w12x1x2
- Triples w0w123x1x2x3
- Soon some higher order terms
- w0 w1457x1x4x5x7 .
- genetically selected
65Genetic Algorithms
- May Help Supervise Training and Development of
the Neural Net structure! -
BONUS
We believe use of diversity in weight updating is
underated!!
66Charles Darwin (1809-1882)
- Botanist, Zoologist, Geologist, General Man of
Science
- Sailed on the Beagle for about 5 years.
- 1859 Wrote Origins of Species a very popular
scientific treatise
- Goal Develop an A.I. learning method using
principles of Darwins amazing Beagle trip and
subsequent theories of biological evolution
67Genetic Algorithms (GA)
- John Holland, father of Genetic Algorithms
Computer programs that "evolve" in ways that
resemble natural selection can solve complex
problems even their creators do not fully
understand - GA allows A.I. knowledge discovery not yet known
to humans or machines - Adaptable to complex domains where human
knowledge is limited, or sufficient domain
knowledge cannot be easily provided - New hypotheses are generated by mutating and
recombining current hypotheses - At each step we have a population of hypotheses
from which we select the most fit - GA does parallel search over different parts of
the hypothesis space
68Blondie24 A GA Rock Star Success ??
- David Fogels Neural Net/GA Checker Playing
Program - Final rating 2045.85 (Master Level) Better than
95 of all checkers players - Neural Network 2 hidden layers with 40 and 10
nodes respectively, fully connected - Initial 15 randomly-weighted NN
- Each of the 15 NNs produce 1 offspring (using
mutation), total of 30 NNs - Each player plays against 5 randomly-selected
opponents using depth 4 alphabeta - Top 15 performers retained
- 250 generations (time taken about 1 month)
69Morph GA Fitness, Selection Variation
- Initial Experiment 20 Nan (2-Tuple) Brains play
Round Robin each Generation, for several hundred
Generations - Fitness 1 point given for game won, -1 for
game lost. Draw receives no points. Points
totalled for each Nan-Brain at Generation end - Selection Ten highest scoring Brains are
selected for survival, remaining Brains
eliminated . - Variation 10-50 of wgts are randomly mutated
using a Box Mueller algorithm for Normal
Distribution a Sigmoid Function - Initial Results 300 ELO Points gained from
initial 555 (Random) ELO using Nan (2-Tuple)
Neighborhood Knowledge Representation Schema
70Future Experiments
- We are confident we can achieve higher ELO
rating levels using GA but it is important we do
not cheat (provide specific chess knowledge) - Variations upon the Knowledge Representation,
including X-Tuple Neighborhood Networks - Evolving meta-parameters using Genetic Algorithms
e.g. range of mutating weights, size of knowledge
representation, ofbrains competing - Lamarckian Evolution Learning during a
generation (not just selection/mutation alone)
i.e., TD updating - Using GA with alternative Knowledge
Representations including various Neural Net
configurations
71Grandmaster Database
Database all (507,734 games) Report 1.d4 Nf6
2.Bg5 Ne4 3.h4 (173 games) ECO A45s Trompowsky
Raptor Variation Generated by Scid 3.0,
2001.11.15 1. STATISTICS AND HISTORY -----------
-------------- 1.1 Statistics
Games 1-0 - 0-1
Score --------------------------------------------
--------------- All report games 173 61
44 68 47.9 Both rated 2600 0
0 0 0 0.0 Both rated
2500 17 8 5 4 61.7
Both rated 2400 33 16 9 8
62.1 Both rated 2300 67 26 21
20 54.4