Title: The RBFGene Model
1The RBF-Gene Model
Virginie LEFORT, Carole KNIBBE, Guillaume BESLON,
Joël FAVREL PRISMa Lab., INSA Lyon, 69621
Villeurbanne CEDEX, France vlefort, cknibbe,
gbeslon_at_prisma.insa-lyon.fr, joel.favrel_at_insa-lyo
n.fr
Abstract We present here the Radial Basis
Function Gene Model (RBF-Gene) as a new approach
of evolutionary computation. This model
introduces a quasi-realistic notion of gene into
artificial genomes. Thus, it enables artificial
organisms to evolve both on a functional basis
(i.e. to enhance their fitness) and on a
structural basis (i.e. to reorganize permanently
their genome). This approach allows us to
introduce new reproduction operators, and gives
the chromosome the opportunity to evolve its own
evolvability, thus allowing a better
convergence. Acknowledgements This project is
supported by the French Department of Research
and by the Rhône-Alpes Region. It lies within the
framework of the pluridisciplinary research group
Systems Biology and Cellular Modeling
(UCBL/INSA Lyon). We thank Hédi Soula for his
contribution to the technical aspects.
Introduction Although Genetic Algorithms (GAs)
are directly inspired by Darwinian principles,
they use an over-simplistic model of chromosome
and genotype to phenotype mapping. In living
organisms, the structure of the chromosome is
free to evolve (see 1). The main feature
permitting it is the presence of an intermediate
layer (the proteins) between genotype and
phenotype Whatever the size and the locus of a
gene, it is translated into a protein and all the
proteins are combined to "produce" the
phenotype. Thus, we propose a new model of GA
The RBF-Gene. It is a bio-inspired algorithm
introducing such an intermediate level between
the genotype and the phenotype The kernel level.
- RBF-Gene overview
- In the RBF-Gene algorithm, the genetic sequence
is analyzed to find  genes and each gene is
translated into an elementary function A kernel.
The phenotype is then the linear combination of
all the kernels. The kernel level enables us to
introduce new features that we cannot have in
 simple GAs like - Variable size genotype
- Variable number of parameters (i.e. number of
kernels) - Coding and non-coding sequences
- Use of a genetic code The phenotype is always
computable - The algorithm is then able to choose the
complexity (number of kernels) and the precision
of the solution (size of the coding sequences)
DURING and BY the evolution process itself.
Chromosome representation Our chromosome is not a
list of parameters. It is a sequence of bases,
with a variable size. Coding and non-coding
sequences are differentiated thanks to two
special bases  Start (base A) and  StopÂ
(base B). Each coding sequence represents a
gene. Each gene is then translated into a kernel.
Kernels are simply basic Rn to Rm functions
(Gaussian functions here). The size of the
chromosome and the number of genes is then free
to evolve as we have no global rule to associate
a gene with a parameter. The evolutionary process
can therefore permanently reorganize the genome.
Samples Learning set Validation set
Regression task
RBF-Gene
Simple GA
VS.
Genotype to phenotype mapping As the structure of
the chromosome is not fixed, we need a special
mapping in three steps. First, all the genes are
extracted from the chromosome, using the start
and stop bases. Then, we compute the kernel
parameters (mean vector µ and standard deviation
s for a Gaussian, and the kernel weight in the
linear combination) thanks to a genetic code
Each base is associated to a couple parameter
value. All the bases concerning a same parameter
are read sequentially as a variable-length Gray
code. The longer the size of a gene, the longer
the coding sequence of the parameters, the more
precise the associated kernel. Finally, we
compute the phenotype as a linear combination of
all the kernels. If there is no kernel, the
individual is the null function, else it is the
weighted sum of the kernels. The phenotype is
computable whatever is the genotype, so we can
use a more bio-inspired reproduction loop.
Function
Function
with
Goal Choose manually f and the number of
parameters n and let the algorithm optimize them
Goal Let the algorithm optimize the number of
kernels (n) and their parameters
Choose f and n
Chromosome
Chromosome
G1
G2
G3
G4
Genetic code
- The reproduction loop
- The reproduction loop is similar than the one in
simple GAs - Evaluation of the individuals The fitness
function is the mean squared error (for
regression tasks) - Selection of the  parents Use of a
roulette-wheel - Reproduction Use of specific operators for
mutation and/or crossover - Since there is no global rule to analyze the
sequence, the operators can modify the length of
the chromosome or the locus of the genes. They
also can create/destroy genes. Thus we can use a
large set of biologically-inspired operators - Local operators Switch, deletion and insertion,
affecting one base - Large operators Translocation, deletion and
duplication affecting subsequences of the
chromosome - External operators One-point crossover,
two-points crossover and transfer, using genetic
material from an other individual.
w 111(gray) ? 101(bin) ? 0.625 µ 0110(gray) ?
0100(bin) ? 0.25 s 00010(gray) ? 00010(bin) ?
0.0625
Replace the values of the parameters in the
global formula
Kernel K1
Results In the figure, we used our algorithm on a
simple problem in order to have visual
results. Of course, we can use our algorithm on
more complex functions (Rn to Rm). With a Rn to
Rm function, we use 2n bases for the mean vector,
2m bases for the weight vector, 2 bases for the
standard deviation and 2 special bases (start and
stop). Thus we have 2(nm2) bases. We have
tested our algorithm on the abalone benchmark
test 2 and compare our results with 3. This
problem is a R8 to R regression task. The
RBF-Gene generates good results. But the
important fact is the possibility (actually used
by the algorithm) to reorganize its genome while
evolving functionally. In the abalone
simulations, we can see two stages First the
chromosome grows in length adding new genes and
then, stabilizes around 3000 bases while
optimizing the existing genes.
Next generation
Next generation
 Handmade loop
Final individual
Final individual
- Final results
- Number of kernels 10
- Learning fitness 0.0206
- Validation fitness 0.0497
- Final results
- Number of parameters 30 (sum of 10 Gaussian
functions) - Learning fitness 0.0521
- Validation fitness 0.1111
Conclusion and references The RBF-Gene algorithm
introduces a real independence between the
genotype and the phenotype All genes code for
 proteins (i.e. kernels) and the phenotype is
always computable, whatever the number of genes.
Moreover, the chromosome structure is dynamically
modified during the evolution process, improving
future evolution. Although it is still under
study, our model gives promising results, also
proving its ability to add new genes during the
evolutionary process. Future work will focus on
biologically inspired operators By now, we used
random operators while biological large-scale
ones (e.g. translocation, transfer, crossover)
are often based on sequence similarity. We could
use this feature to enhance reorganization
efficiency. References 1 Pennisi, E. How the
genome readies itself for evolution. Science 281
(1998) 11311134 2 UCI Machine Learning
Website Abalone data set (consulted in 2003)
(http//www.ics.uci.edu/mlearn/MLRepository.html)
3 Automatic Knowledge Miner (AKM) Server Data
mining analysis (request abalone). Technical
report, AKM (WEKA), University of Waikato,
Hamilton, New Zealand (2003)
Interesting features of evolution in the RBF-Gene
model
Genome structure This figure shows the size of
the chromosome (in red) and the number of kernels
(in green). The number of kernels and the size
stabilize without direct extern influence. On one
hand a big size is an advantage because it favors
new genes discovery. On the other hand large
non-coding sequences are dangerous because
mutations often create deleterious genes. As a
consequence, the evolutionary process will
globally favor small (of course adapted) genomes,
thus indirectly forbidding from massive new genes
recruitment.
Fitness This figure shows the variation of the
fitness during evolution. The green curve is the
fitness on the validation set and the red one on
the learning set. The two curves remain close,
so we have no over-fitting here.