The RBFGene Model - PowerPoint PPT Presentation

1 / 1

About This Presentation

Title:

The RBFGene Model

Description:

The green curve is the fitness on the validation set and the red one on the learning set. The two curves remain close, so we have no over-fitting here. Genome ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 2

Provided by: ericUni

Category:

more less

Transcript and Presenter's Notes

Title: The RBFGene Model

1
The RBF-Gene Model
Virginie LEFORT, Carole KNIBBE, Guillaume BESLON,
Joël FAVREL PRISMa Lab., INSA Lyon, 69621
Villeurbanne CEDEX, France vlefort, cknibbe,
gbeslon_at_prisma.insa-lyon.fr, joel.favrel_at_insa-lyo
n.fr
Abstract We present here the Radial Basis
Function Gene Model (RBF-Gene) as a new approach
of evolutionary computation. This model
introduces a quasi-realistic notion of gene into
artificial genomes. Thus, it enables artificial
organisms to evolve both on a functional basis
(i.e. to enhance their fitness) and on a
structural basis (i.e. to reorganize permanently
their genome). This approach allows us to
introduce new reproduction operators, and gives
the chromosome the opportunity to evolve its own
evolvability, thus allowing a better
convergence. Acknowledgements This project is
supported by the French Department of Research
and by the Rhône-Alpes Region. It lies within the
framework of the pluridisciplinary research group
Systems Biology and Cellular Modeling
(UCBL/INSA Lyon). We thank Hédi Soula for his
contribution to the technical aspects.
Introduction Although Genetic Algorithms (GAs)
are directly inspired by Darwinian principles,
they use an over-simplistic model of chromosome
and genotype to phenotype mapping. In living
organisms, the structure of the chromosome is
free to evolve (see 1). The main feature
permitting it is the presence of an intermediate
layer (the proteins) between genotype and
phenotype Whatever the size and the locus of a
gene, it is translated into a protein and all the
proteins are combined to "produce" the
phenotype. Thus, we propose a new model of GA
The RBF-Gene. It is a bio-inspired algorithm
introducing such an intermediate level between
the genotype and the phenotype The kernel level.

RBF-Gene overview
In the RBF-Gene algorithm, the genetic sequence
is analyzed to find genes and each gene is
translated into an elementary function A kernel.
The phenotype is then the linear combination of
all the kernels. The kernel level enables us to
introduce new features that we cannot have in
simple GAs like
Variable size genotype
Variable number of parameters (i.e. number of
kernels)
Coding and non-coding sequences
Use of a genetic code The phenotype is always
computable
The algorithm is then able to choose the
complexity (number of kernels) and the precision
of the solution (size of the coding sequences)
DURING and BY the evolution process itself.

Chromosome representation Our chromosome is not a
list of parameters. It is a sequence of bases,
with a variable size. Coding and non-coding
sequences are differentiated thanks to two
special bases Start (base A) and Stop
(base B). Each coding sequence represents a
gene. Each gene is then translated into a kernel.
Kernels are simply basic Rn to Rm functions
(Gaussian functions here). The size of the
chromosome and the number of genes is then free
to evolve as we have no global rule to associate
a gene with a parameter. The evolutionary process
can therefore permanently reorganize the genome.
Samples Learning set Validation set

Regression task
RBF-Gene
Simple GA
VS.
Genotype to phenotype mapping As the structure of
the chromosome is not fixed, we need a special
mapping in three steps. First, all the genes are
extracted from the chromosome, using the start
and stop bases. Then, we compute the kernel
parameters (mean vector µ and standard deviation
s for a Gaussian, and the kernel weight in the
linear combination) thanks to a genetic code
Each base is associated to a couple parameter
value. All the bases concerning a same parameter
are read sequentially as a variable-length Gray
code. The longer the size of a gene, the longer
the coding sequence of the parameters, the more
precise the associated kernel. Finally, we
compute the phenotype as a linear combination of
all the kernels. If there is no kernel, the
individual is the null function, else it is the
weighted sum of the kernels. The phenotype is
computable whatever is the genotype, so we can
use a more bio-inspired reproduction loop.
Function
Function
with
Goal Choose manually f and the number of
parameters n and let the algorithm optimize them
Goal Let the algorithm optimize the number of
kernels (n) and their parameters
Choose f and n
Chromosome
Chromosome
G1
G2
G3
G4
Genetic code

The reproduction loop
The reproduction loop is similar than the one in
simple GAs
Evaluation of the individuals The fitness
function is the mean squared error (for
regression tasks)
Selection of the parents Use of a
roulette-wheel
Reproduction Use of specific operators for
mutation and/or crossover
Since there is no global rule to analyze the
sequence, the operators can modify the length of
the chromosome or the locus of the genes. They
also can create/destroy genes. Thus we can use a
large set of biologically-inspired operators
Local operators Switch, deletion and insertion,
affecting one base
Large operators Translocation, deletion and
duplication affecting subsequences of the
chromosome
External operators One-point crossover,
two-points crossover and transfer, using genetic
material from an other individual.

w 111(gray) ? 101(bin) ? 0.625 µ 0110(gray) ?
0100(bin) ? 0.25 s 00010(gray) ? 00010(bin) ?
0.0625
Replace the values of the parameters in the
global formula
Kernel K1
Results In the figure, we used our algorithm on a
simple problem in order to have visual
results. Of course, we can use our algorithm on
more complex functions (Rn to Rm). With a Rn to
Rm function, we use 2n bases for the mean vector,
2m bases for the weight vector, 2 bases for the
standard deviation and 2 special bases (start and
stop). Thus we have 2(nm2) bases. We have
tested our algorithm on the abalone benchmark
test 2 and compare our results with 3. This
problem is a R8 to R regression task. The
RBF-Gene generates good results. But the
important fact is the possibility (actually used
by the algorithm) to reorganize its genome while
evolving functionally. In the abalone
simulations, we can see two stages First the
chromosome grows in length adding new genes and
then, stabilizes around 3000 bases while
optimizing the existing genes.
Next generation
Next generation
Handmade loop
Final individual
Final individual

Final results
Number of kernels 10
Learning fitness 0.0206
Validation fitness 0.0497

Final results
Number of parameters 30 (sum of 10 Gaussian
functions)
Learning fitness 0.0521
Validation fitness 0.1111

Conclusion and references The RBF-Gene algorithm
introduces a real independence between the
genotype and the phenotype All genes code for
proteins (i.e. kernels) and the phenotype is
always computable, whatever the number of genes.
Moreover, the chromosome structure is dynamically
modified during the evolution process, improving
future evolution. Although it is still under
study, our model gives promising results, also
proving its ability to add new genes during the
evolutionary process. Future work will focus on
biologically inspired operators By now, we used
random operators while biological large-scale
ones (e.g. translocation, transfer, crossover)
are often based on sequence similarity. We could
use this feature to enhance reorganization
efficiency. References 1 Pennisi, E. How the
genome readies itself for evolution. Science 281
(1998) 11311134 2 UCI Machine Learning
Website Abalone data set (consulted in 2003)
(http//www.ics.uci.edu/mlearn/MLRepository.html)
3 Automatic Knowledge Miner (AKM) Server Data
mining analysis (request abalone). Technical
report, AKM (WEKA), University of Waikato,
Hamilton, New Zealand (2003)
Interesting features of evolution in the RBF-Gene
model
Genome structure This figure shows the size of
the chromosome (in red) and the number of kernels
(in green). The number of kernels and the size
stabilize without direct extern influence. On one
hand a big size is an advantage because it favors
new genes discovery. On the other hand large
non-coding sequences are dangerous because
mutations often create deleterious genes. As a
consequence, the evolutionary process will
globally favor small (of course adapted) genomes,
thus indirectly forbidding from massive new genes
recruitment.
Fitness This figure shows the variation of the
fitness during evolution. The green curve is the
fitness on the validation set and the red one on
the learning set. The two curves remain close,
so we have no over-fitting here.

Write a Comment

User Comments (0)