Title: A new crossover technique in Genetic Programming
1A new crossover technique in Genetic Programming
Janet Clegg Intelligent Systems Group Electronics
Department
2This presentation
Describe basic evolutionary optimisation
Overview of failed attempts at crossover
methods Describe the new crossover
technique Results from testing on two regression
problems
3Evolutionary optimisation
4Start by choosing a set of random trial
solutions (population)
5Each trial solution is evaluated (fitness / cost)
1.2 0.9 0.8 1.4 0.2 0.3
2.1 2.3 0.3 0.9 1.3 1.0
0.8 2.4 0.4 1.7 1.5 0.6
1.3 0.8 2.5 0.7 1.4 2.1
6Parent selection
Select mother
2.1
0.9 2.1 1.3 0.7
Select father
1.7
1.4 0.6 1.7 1.5
7Perform crossover
child 1
child 2
8Mutation
Probability of mutation small (say 0.1)
9This provides a new population of solutions
the next generation Repeat
generation after generation 1 select parents
2 perform crossover 3 mutate until
converged
10Two types of evolutionary optimisation
Genetic Algorithms (GA) Optimises some
quantity by varying parameter values which have
numerical values Genetic Programming (GP)
Optimises some quantity by varying parameters
which are functions / parts of computer code /
circuit components
11Representation
12Representation
Traditional GAs - binary representation
e.g. 1011100001111 Floating point GA
performs better than binary e.g.
7.2674554 Genetic Program (GP)
Nodes represent functions whose inputs are below
the branches attached to the node
13Some crossover methods
14Crossover in a binary GA
Mother 1 0 0 0 0 0 1 0 130
Father 0 1 1 1 1 0 1 0 122
Child 1 1 1 1 1 1 0 1 0 250
Child 2 0 0 0 0 0 0 1 0 2
15Crossover in a floating point GA
Mother father
Min parameter value
Max parameter value
Offspring chosen as random point between mother
and father
16Traditional method of crossover in a GP
mother
father
Child 2
Child 1
17Motivation for this work Tree crossover in a GP
does not always perform well Angeline and Luke
and Spencer compared- (1) performance of tree
crossover (2) simple mutation of the branches
difference in performance was statistically
insignificant Consequently some people
implement GPs with no crossover - mutation only
18Motivation for this work In a GP many people do
not use crossover so mutation is the more
important operator In a GA the crossover
operator contributes a great deal to its
performance - mutation is a secondary operator
AIM- find a crossover technique in GP which
works as well as the crossover in a GA
19Cartesian Genetic Programming
20Cartesian Genetic Programming (CGP)
Julian Miller introduced CGP Replaces tree
representation with directed graphs represented
by a string of integers The CGP representation
will be explained within the first test
problem CGP uses mutation only no crossover
21First simple test problem A simple regression
problem- Finding the function to best fit data
taken from
The GP should find this exact function as the
optimal fit
22The traditional GP method for this problem Set
of functions and terminals
Functions Terminals
1
- x
23Initial population created by randomly choosing
functions and terminals within the tree structures
(1-x) (xx)
24Crossover by randomly swapping sub-branches of
the parent trees
mother
father
Child 2
Child 1
25CGP representation
Set of functions each function identified by an
integer
Functions Integer representation
0
- 1
2
Set of terminals each identified by an integer
Terminals Integer representation
1 0
x 1
26Creating the initial population
2
0 1
0
1 1
2
3
First integer is random choice of function 0 (),
1 (-), or 2 ()
Second two integers are random choice of
terminals 0 (1) or 1 (x)
Next integers are random choice of inputs for the
function from the set 0 (1) 1 (x) or node 2
27Creating the initial population
2
0 1
0
1 1
1 3 1 0 2 3
2
4 1
5
output
4
5
6
2
3
random choice of inputs from 0 1
2 3 4 5 Terminals
nodes
random choice for output from 0 1
2 3 4 5 6 Terminals
all nodes
282 0 1 0 1 1 1 3 1
0 2 3 2 4 1
5
2 3
4 5 6
output
output
5
3
2
(1x) (xx) 3x
29Run the CGP with test data taken from the function
Population size 30 28 offspring created at each
generation Mutation only to begin with
Fitness is the sum of squared differences
between data and function
30Result 0 0 1 2 2 1 1
2 2 0 3 2 0 5 1
5
2 3
4 5 6
output
31Statistical analysis of GP
Any two runs of a GP (or GA) will not be exactly
the same To analyse the convergence of the GP we
need lots of runs All the following graphs
depict the average convergence out of 4000 runs
32Introduce crossover
33Introducing some Crossover
Pick random crossover point
Parent 1 0 0 1 2 2 1
1 2 2 0 3 2 0 5 1
5
Parent 2 2 0 1 0 1 1
1 3 1 0 2 3 2 4 1
5
Child 1 0 0 1 2 2 1 1
3 1 0 2 3 2 4 1
5
Child 2 2 0 1 0 1 1 1
2 2 0 3 2 0 5 1
5
34GP with and without crossover
35GA with and without crossover
36Random crossover point but must be between the
nodes
Parent 1 0 0 1 2 2 1
1 2 2 0 3 2 0 5 1
5
Parent 2 2 0 1 0 1 1
1 3 1 0 2 3 2 4 1
5
Child 1 0 0 1 2 2 1 1
3 1 0 2 3 2 4 1
5
Child 2 2 0 1 0 1 1 1
2 2 0 3 2 0 5 1
5
370 0 1 1 1 2 2 3 2
2 4 1 0 3 5
6
2 3
4 5 6
output
6
3
5
4
2
3
2
2
38GP crossover at nodes
39Pick a random node along the string and swap this
single node
Parent 1 0 0 1 2 2 1
1 2 2 0 3 2 0 5 1
5
Parent 2 2 0 1 0 1 1
1 3 1 0 2 3 2 4 1
5
Child 1 0 0 1 2 2 1 1
3 1 0 2 3 2 4 1
5
Child 2 2 0 1 0 1 1 1
2 2 0 3 2 0 5 1
5
40Crossover only one node
41Each integer in child randomly takes value from
either mother or father
Parent 1 0 0 1 2 2 1
1 2 2 0 3 2 0 5 1
5
Parent 2 2 0 1 0 1 1
1 3 1 0 2 3 2 4 1
5
Child 1 2 0 1 0 2 1 1
2 1 0 2 2 2 5 1
5
Child 2 0 0 1 2 1 1 1
3 2 0 3 3 0 4 1
5
42Random swap crossover
43Comparison with random search
GP with no crossover performs better than any of
the trial crossover here How much better than a
completely random search is it? The only means
it will improve on a random search are by
parent selection mutation
44Comparison with a random search
45Comparison with a random search
GP converges in 58 generations Random search 73
generations
46GA performance compared with a completely random
search
GA tested on a large problem A random search
would have involved searching through
150,000,000 data points The GA
reached the solution after testing
27,000 data points (
average convergence of 5000 GA runs) Probability
of random search reaching solution in 27,000
trials is 0.00018
!!!!
47Why does GP tree crossover not always work well?
48f1 f2 f4( x1,x2 ), f5( x3,x4 )
, f3 f6( x5,x6 ), f7( x7,x8 )
g1 g2 g4( y1,y2 ), g5( y3,y4 )
, g3 g6( y5,y6 ), g7( y7,y8 )
f1 f2 f4( x1,x2 ), f5( x3,x4 )
, f3 f6( x5,x6 ), g7( y7,y8 )
49g( x1 ) f( x2 )
g
f
Good!
x1
x2
50f( x2 )
g( x1 )
g( x2 )
f( x1 )
g
g( x2 )
f
Good!
x1
x2
f( x1 )
51Suppose we are trying to find a function to fit a
set of data looking like this
52Suppose we have 2 parents which fit the data
fairly well
53Choose crossover point
54Choose crossover point
55(No Transcript)
56(No Transcript)
57Introducing the new technique
Based on Julian Millers Cartesian Genetic
Program (CGP) The CGP representation is changed
Integer values are replaced by floating
point values Crossover is performed as in a
floating point GA
58CGP representation 0 0 1 2 2 1
1 2 2 0 3 2
0 5 1 5
2 3
4 5 6
output
New representation replace integers with
floating point variables x1 x2 x3 x4
x5 x6 x7 x8 x9 x10 x11 x12
x13 x14 x15 x16
1 2
3 4 5
output
Where the xi lie in a defined range, say
59Interpretation of the new representation For the
variables xi which represent choice of
function If the set of functions is
-
60x1 x2 x3 x4 x5 x6 x7 x8 x9
x10 x11 x12 x13 x14 x15
x16
1 2
3 4 5
output
node 1
x
0.66
0
0.33
1
node 3
1
0
0.2
0.4
0.6
0.8
61The crossover operator Crossover is performed
as in a floating point GA Two parents, p1 and p2
are chosen Offspring o1 and o2 are created by
Uniformly generated random number is chosen
0 lt ri lt 1 oi
p1 ri (p2 - p1) when p1 lt
p2
62Crossover in the new representation
Mother father
Min parameter value
Max parameter value
Offspring chosen as random point between mother
and father
63Why is this crossover likely to work better than
tree crossover?
64Mathematical interpretation of tree crossover
f1 f2 f4( x1,x2 ), f5( x3,x4 )
, f3 f6( x5,x6 ), f7( x7,x8 )
g1 g2 g4( y1,y2 ), g5( y3,y4 )
, g3 g6( y5,y6 ), g7( y7,y8 )
f1 f2 f4( x1,x2 ), f5( x3,x4 )
, f3 f6( x5,x6 ), g7( y7,y8 )
65Mathematical interpretation of the new method
x1 x2 x3 x4 x5 x6 x7 x8 x9
x10 x11 x12 x13 x14 x15
x16
1 2
3 4 5
output
The fitness can be thought of as a function of
these 16 variables, say
and the optimisation becomes that of finding the
values of these 16 variables which give the best
fitness The new crossover guides each variable
towards its optimal value in a continuous manner
66 The new technique has been tested on two
problems Two regression problems studied by Koza
67Test data 50 points in the interval
-1,1 Fitness is the sum of the absolute errors
over 50 points
Population size 50 48 offspring each
generation Tournament selection used to select
parents Various rates of crossover tested 0
25 50 75 Number of nodes in representation
10
68Statistical analysis of new method
The following results are based on the average
convergence of the new method out of 1000
runs Considered converged when the absolute
error is less than 0.01 at all of the 50 data
points (same as Koza) Results based on (1)
average convergence graphs (2) the average number
of generations to converge (3) Kozas
computational effort figure
69Average convergence for
70Convergence for latter generations
71Introduce variable crossover At generation
number 1 crossover performed 90 of the
time Rate of crossover linearly decreases
until Generation number 180 crossover is
0
72Variable crossover
73Average number of generations to converge and
computational effort
Percentage crossover Average number of generations to converge Kozas computational effort
0 168 30,000
25 84 9,000
50 57 8,000
75 71 6,000
Variable crossover 47 10,000
74Average convergence for
75Variable crossover
76Average number of generations to converge and
computational effort
Percentage crossover Average number of generations to converge Kozas computational effort
0 516 44,000
25 735 24,000
50 691 14,000
75 655 11,000
Variable crossover 278 13,000
77Number of generations to converge for both
problems over 100 runs
78Average convergence ignoring runs which take over
1000 generations to converge
79Conclusions
The new technique has reduced the average number
of generations to converge From 168 down to 47
for the first problem tested From 516 down
to 278 for the second problem
80Conclusions
When crossover is 0 this new method is
equivalent to the traditional CGP - mutation
only The computational effort figures for 0
crossover here are similar to those reported for
the traditional CGP Although a larger mutation
rate and population size have been used here
81Future work
Investigate the effects of varying the GP
parameters population size mutation
rate selection strategies Test the new
technique on other problems larger
problems other types of problems
82Thankyou for listening!
83Average convergence for using
50 nodes instead of 10
84Average number of generations to converge and
computational effort
Percentage crossover Average number of generations to converge Kozas computational effort
0 78 18,000
25 85 13,000
50 71 11,000
75 104 13,000
Variable crossover 45 14,000
85Average convergence for using 50
nodes instead of 10
86Average number of generations to converge and
computational effort
Percentage crossover Average number of generations to converge Kozas computational effort
0 131 18,000
25 193 17,000
50 224 12,000
75 152 19,000
Variable crossover 58 16,000