Title: Genetic Programming
1Genetic Programming
2GP quick overview
- Developed USA in the 1990s
- Early names J. Koza
- Typically applied to
- machine learning tasks (prediction,
classification) - Attributed features
- competes with neural nets and alike
- needs huge populations (thousands)
- slow
- Special
- non-linear chromosomes trees, graphs
- mutation possible but not necessary (disputed!)
3GP technical summary
4Preparatory Steps of Genetic Programming
- The preparatory steps (shown at the top of the
figure) are the human-supplied input to the
genetic programming system. The computer program
(shown at the bottom) is the output of the
genetic programming system. - The first two preparatory steps specify the
ingredients that are available to create the
computer programs. A run of genetic programming
is a competitive search among a diverse
population of programs composed of the available
functions and terminals.
5Preparatory Steps of Genetic Programming
- The set of terminals (e.g., the independent
variables of the problem, zero-argument
functions, and constants) for each branch (or
leaf) of the to-be-evolved program tree - The set of primitive functions for each node of
the to-be-evolved program tree - The fitness measure (for explicitly or implicitly
measuring the fitness of individuals in the
population) - Certain parameters for controlling the run
- The termination criterion and method for
designating the result of the run
6Genetic Programming Flowchart
7GP flowchart
GA flowchart
8Introductory example Symbolic Regression
- Genetic programming circumvents the potential for
syntactical errors (e.g. when changing C code)
via only considering the relevant part of the
syntax in a computer friendly format -- the parse
tree. - double solution(double x, double y)
-
- return x y sqrt(0.3)
-
- For all practical purposes the C-style function
above can be described by the parse tree - . . x y .sqrt 0.3
9Introductory example Symbolic Regression
- Symbolic Regression tries to find a function
that fits some input data. Think of linear or
logistic regression, where the structure of the
target is unknown - In the case of the symbolic regression we
typically use problem fundamentals such as
addition, subtraction, multiplication and
division.
10Introductory example Symbolic Regression
Symbolic Regression Applet
11Tree based representation
- Trees are a universal form, e.g. consider
- Arithmetic formula
- Logical formula
- Program
(x ? true) ? (( x ? y ) ? (z ? (x ? y)))
i 1 while (i lt 20) i i 1
12Tree based representation
Original function
Tree representation
LISP Code (parse-tree) ( (.2 ) ( - ( x 3
) ( / y ( 5 1) ) ) ).
13Tree based representation
Tree representation
Original function
i 1 while (i lt 20) i i 1
LISP Code (parse-tree)
( . ( .equals ( i 1 ) ( .while ( ( .lt i 20
) ( .equals ( i (. i 1 ) ) ) ) ) ) )
14Tree based representation
- In GA, ES, EP chromosomes are linear structures
(bit strings, integer string, real-valued
vectors, permutations) - Tree shaped chromosomes are non-linear structures
- In GA, ES, EP the size of the chromosomes is
fixed - Trees in GP may vary in depth and width
15Tree based representation
- Symbolic expressions can be defined by
- Terminal set T
- Function set F (with the aritys of function
symbols) - Adopting the following general recursive
definition - Every t ? T is a correct expression
- f(e1, , en) is a correct expression if f ? F,
arity(f)n and e1, , en are correct expressions - There are no other forms of correct expressions
- In general, expressions in GP are not typed
(closure property any f ? F can take any g ? F
as argument)
16Offspring creation scheme
- Compare
- GA scheme using crossover AND mutation
sequentially (be it probabilistically) - GP scheme using crossover OR mutation (chosen
probabilistically)
17Mutation
- Most common mutation replace randomly chosen
sub-tree by randomly generated tree
18Mutation
- Mutation has two parameters
- Probability pm to choose mutation vs.
recombination - Probability to chose an internal point as the
root of the sub-tree to be replaced - Remarkably pm is advised to be 0 (Koza92) or
very small, like 0.05 (Banzhaf et al. 98) - The size of the child can exceed the size of the
parent
19Recombination
- Most common recombination exchange two randomly
chosen sub-trees among the parents - Recombination has two parameters
- Probability pc to choose recombination vs.
mutation - Probability to chose an internal point within
each parent as crossover point - The size of offspring can exceed that of the
parents
20Parent 1
Parent 2
Child 2
Child 1
21Selection
- Parent selection typically fitness proportionate
- Over-selection in very large populations
- rank population by fitness and divide it into two
groups - group 1 best x of population, group 2 other
(100-x) - 80 of selection operations within group 1, 20
within group 2 - for pop. size 1000, 2000, 4000, 8000 x 32,
16, 8, 4 - motivation to increase efficiency, s come from
rule of thumb - Survivor selection
- Typical generational scheme (thus none)
- Recently steady-state is becoming popular for its
elitism
22Initialization
- Maximum initial depth of trees Dmax is set
- Full method (each branch has depth Dmax)
- nodes at depth d lt Dmax randomly chosen from
function set F - nodes at depth d Dmax randomly chosen from
terminal set T - Grow method (each branch has depth ? Dmax)
- nodes at depth d lt Dmax randomly chosen from F ?
T - nodes at depth d Dmax randomly chosen from T
- Common GP initialisation ramped half-and-half,
where grow full method each deliver half of
initial population
23Bloat
- Bloat survival of the fattest, i.e., the tree
sizes in the population are increasing over time - Ongoing research and debate about the reasons
- Needs countermeasures, e.g.
- Impose maximum tree-depth size, and pass
threshold onto variation operators - Parsimony pressure penalty for being oversized
24Problems involving physical environments
- Trees for data fitting vs. trees (programs) that
are really executable - Execution can change the environment ? the
calculation of fitness - Example robot controller
- Fitness calculations mostly by simulation,
ranging from expensive to extremely expensive (in
time) - Evolved controllers are often very good
25Example RoboCup Soccer Evolving team behaviors
- In contrast to other learning methods, the
automated programming nature of Genetic
Programming automatic makes it a natural approach
for developing algorithmic robot behaviors. - Team of soft-bots evolved bottom-up to the
level of coordinating behaviors and cooperating
so as to play a game of soccer. - Team was able to beat hand-coded heuristic
competitors
26Primordial soup for GP soccer function
set (sample)
- Function Returns
Description - (s1) bool
Returns my internal state flag (1 or 0). - (mate-closer) bool 1 if a
teammate is closer than I am to the ball, else 0. - (near-opp) bool 1 if
there is an opponent within r distance from me,
else 0. - (squadn) bool 1
if I am squad mate n, else 0. - (rand) bool 1
or 0, depending on a random event. - (home) vect A
vector to the my home position. - (ball) vect
A vector to the ball. - (defgoal) vect A
vector to goal I am defending. - (goal) vect
A vector to the goal I am attacking. - (closest-mate) vect A
vector to my closest teammate. - (/2 vect1) vect
Divides the magnitude of vect1 by 2. - (dribble vect1) vect
Sets the magnitude of vect1 to some constant c. - (avg vect1 vect2) vect
Returns the average of vect1 and vect2.
27GP for evolving team behavior
- Example GP tree is shown
- A GP individuals tree can be thought of as a
chunk of LISP program code each node in the tree
is a function, which takes as arguments the
results of the children to the node. - Hence, figure equates with the LISP code
- (if (maggt1/2 (/2 goal))
- (if near-opp goal closest-mate)
- (dribble goal))
28GP for evolving team behavior Mutation
- The point mutation operator in action. This
operator replaces some sub-tree in an individual
with a randomly generated sub-tree.
29GP for evolving team behavior Crossover
- The sub-tree crossover operator in action. This
operator swaps sub-trees among two individuals
30Fitness function(s)
- Fitness functions tried
- Maximize number of goals scored
- Maximize time in possession of the ball
- Maximize number of successful passes
- Minimize number of kicks going out of bounds
- Evolved teams played against each other
(competitive fitness)
31Algorithm specifics
- Player representation
- Homogenous
- Each GP program was a single soccer playing
program that every player in the team used. - Heterogeneous
- Each GP program had different branches assigned
to different players - OR
- Each GP program represented a different player
- Control parameters
- 200 individuals in population
- 50 generations
32Initial Random Population
33Kiddies Soccer
34Learning to Block the Goal
35Becoming Territorial
36Evolved tree for passing behavior
37Problems ?
- Population Size Was Too Small (Not enough
diversity) - Teams Competed Too Infrequently (Evaluations not
sufficiently spread) - Evolved Teams versus Individuals (Credit
assignment problem)