Title: GENETIC ALGORITHMS AND GENETIC PROGRAMMING
1GENETIC ALGORITHMS AND GENETIC PROGRAMMING
2- John R. Koza
- Consulting Professor (Medical Informatics)
- Department of Medicine
- School of Medicine
- Consulting Professor
- Department of Electrical Engineering
- School of Engineering
- Stanford University
- Stanford, California 94305
- koza_at_stanford.edu
- http//www.smi.stanford.edu/people/koza/
3DEFINITION OF THE GENETIC ALGORITHM (GA)
- The genetic algorithm is a probabalistic search
algorithm that iteratively transforms a set
(called a population) of mathematical objects
(typically fixed-length binary character
strings), each with an associated fitness value,
into a new population of offspring objects using
the Darwinian principle of natural selection and
using operations that are patterned after
naturally occurring genetic operations, such as
crossover (sexual recombination) and mutation.
4GENETIC ALGORITHM (GA)
Generation 0 Generation 1
Individuals Fitness Offspring
011 3 111
001 1 010
110 6 110
010 2 010
5HAMBURGER RESTAURANT PROBLEM
- Price
- 1 0.50 price
- 0 10.00 price
- Drink
- 1 Coca Cola
- 0 Wine
- Ambiance
- 1 Fast snappy service
- 0 Leisurely service with tuxedoed waiter
6CHROMOSOME (GENOME) OF THE GLOBAL OPTIMUM
1 1 1
7THE SEARCH SPACE
1 000
2 001
3 010
4 011
5 100
6 101
7 110
8 111
- Alphabet size K2, Length L3
- Size of search space KL2L238
8IMPRACTICALITY OF RANDOM OR ENUMERATIVE SEARCH
- 81-bit problems are very small for GA
- However, even if L is as small as 81, 281 1027
number of nanoseconds since the beginning of
the universe 15 billion years ago
9GA FLOWCHART
10GENERATION 0
Generation 0 Generation 0 Generation 0
1 011 3
2 001 1
3 110 6
4 010 2
Total Total
Worst Worst
Average Average
Best Best
11DEFINITION OF THE GENETIC ALGORITHM (GA)
- The genetic algorithm is a probabalistic search
algorithm that iteratively transforms a set
(called a population) of mathematical objects
(typically fixed-length binary character
strings), each with an associated fitness value,
into a new population of offspring objects using
the Darwinian principle of natural selection and
using operations that are patterned after
naturally occurring genetic operations, such as
crossover (sexual recombination) and mutation.
12PROBABILISTIC SELECTION BASED ON FITNESS
- Better individuals are preferred
- Best is not always picked
- Worst is not necessarily excluded
- Nothing is guaranteed
- Mixture of greedy exploitation and adventurous
exploration - Similarities to simulated annealing (SA)
-
13PROBABILISTIC SELECTION BASED ON FITNESS
14DARWINIAN FITNESS PROPORTIONATE SELECTION
Generation 0 Generation 0 Generation 0 Mating pool Mating pool
1 011 3 .25 011 3
2 001 1 .08 110 6
3 110 6 .50 110 6
4 010 2 .17 010 2
Total Total 12 12 17
Worst Worst 1 1 2
Average Average 3.00 3.00 4.5
Best Best 6 6 6
15DEFINITION OF THE GENETIC ALGORITHM (GA)
- The genetic algorithm is a probabalistic search
algorithm that iteratively transforms a set
(called a population) of mathematical objects
(typically fixed-length binary character
strings), each with an associated fitness value,
into a new population of offspring objects using
the Darwinian principle of natural selection and
using operations that are patterned after
naturally occurring genetic operations, such as
crossover (sexual recombination) and mutation.
16MUTATION OPERATION
- Parent chosen probabilistically based on fitness
- Mutation point chosen at random
- One offspring
Parent
010
Parent
--0
Offspring
011
17AFTER MUTATION OPERATION
Generation 0 Generation 0 Generation 0 Mating pool Mating pool Generation 1 Generation 1 Generation 1
1 011 3 .25 011 3
2 001 1 .08 110 6
3 110 6 .50 110 6
4 010 2 .17 010 2 --- 011 3
Total Total 12 12 17
Worst Worst 1 1 2
Average Average 3.00 3.00 4.5
Best Best 6 6 6
18CROSSOVER OPERATION
- 2 parents chosen probabilistically based on
fitness
Parent 1 Parent 2
011 110
19CROSSOVER (CONTINUED)
- Interstitial point picked at random
- 2 remainders
- 2 offspring produced by crossover
Fragment 1 Fragment 2
01- 11-
Remainder 1 Remainder 2
- - 1 - - 0
Offspring 1 Offspring 2
111 010
20AFTER CROSSOVER OPERATION
Generation 0 Generation 0 Generation 0 Mating pool Mating pool Generation 1 Generation 1 Generation 1
1 011 3 .25 011 3 2 111 7
2 001 1 .08 110 6 2 010 2
3 110 6 .50 110 6
4 010 2 .17 010 2
Total Total 12 12 17
Worst Worst 1 1 2
Average Average 3.00 3.00 4.5
Best Best 6 6 6
21AFTER REPRODUCTION OPERATION
Generation 0 Generation 0 Generation 0 Mating pool Mating pool Generation 1 Generation 1 Generation 1
1 011 3 .25
2 001 1 .08
3 110 6 .50 110 6 --- 110 6
4 010 2 .17
Total Total 12 12 17
Worst Worst 1 1 2
Average Average 3.00 3.00 4.5
Best Best 6 6 6
22DEFINITION OF THE GENETIC ALGORITHM (GA)
- The genetic algorithm is a probabalistic search
algorithm that iteratively transforms a set
(called a population) of mathematical objects
(typically fixed-length binary character
strings), each with an associated fitness value,
into a new population of offspring objects using
the Darwinian principle of natural selection and
using operations that are patterned after
naturally occurring genetic operations, such as
crossover (sexual recombination) and mutation.
23GENERATION 1
Generation 0 Generation 0 Generation 0 Mating pool Mating pool Generation 1 Generation 1 Generation 1
1 011 3 .25 011 3 2 111 7
2 001 1 .08 110 6 2 010 2
3 110 6 .50 110 6 --- 110 6
4 010 2 .17 010 2 --- 011 3
Total Total 12 12 17 18
Worst Worst 1 1 2 2
Average Average 3.00 3.00 4.5 4.5
Best Best 6 6 6 7
24DEFINITION OF THE GENETIC ALGORITHM (GA)
- The genetic algorithm is a probabalistic search
algorithm that iteratively transforms a set
(called a population) of mathematical objects
(typically fixed-length binary character
strings), each with an associated fitness value,
into a new population of offspring objects using
the Darwinian principle of natural selection and
using operations that are patterned after
naturally occurring genetic operations, such as
crossover (sexual recombination) and mutation.
25DEFINITION OF THE GENETIC ALGORITHM (GA)
- The genetic algorithm is a probabalistic search
algorithm that iteratively transforms a set
(called a population) of mathematical objects
(typically fixed-length binary character
strings), each with an associated fitness value,
into a new population of offspring objects using
the Darwinian principle of natural selection and
using operations that are patterned after
naturally occurring genetic operations, such as
crossover (sexual recombination) and mutation.
26PROBABILISTIC STEPS
- The initial population is typically random
- Probabilistic selection based on fitness
- - Best is not always picked
- - Worst is not necessarily excluded
- Random picking of mutation and crossover points
- Often, there is probabilistic scenario as part of
the fitness measure
27ANTENNA DESIGN
28ANTENNA DESIGN
- The problem (Altshuler and Linden 1998) is to
determine the x-y-z coordinates of the
3-dimensional position of the ends (X1, Y1, Z1,
X2, Y2, Z2, , X7, Y7, Z7) of 7 straight wires so
that the resulting 7-wire antenna satisfies
certain performance requirements - The first wire starts at feed point (0, 0, 0) in
the middle of the ground plane - The antenna must fit inside the 0.5? cube
29ANTENNA GENOME
X1 Y1 Z1 X2 Y2 Z2
0010 -1110 0001 0011 -1011 0011
- 105-bit chromosome (genome)
- Each x-y-z coordinate is represented by 5 bits
(4-bit granularity for data plus a sign bit) - Total chromosome is 3 ? 7 ? 5 105 bits
30ANTENNA FITNESS
- Antenna is for ground-to-satellite communications
for cars and handsets - We desire near-uniform gain pattern 10? above the
horizon - Fitness is measured based on the antenna's
radiation pattern. The radiation pattern is
simulated by National Electromagnetics Code (NEC)
31ANTENNA FITNESS
- Fitness is sum of the squares of the difference
between the average gain and the antenna's gain - Sum is taken for angles ? between -90? and 90?
and all azimuth angles ? from 0? to 180? - The smaller the value of fitness, the better
32GRAPH OF ANTENNA FITNESS
33U. S. PATENT 5,719,794
3410-MEMBER TRUSS
3510-MEMBER TRUSS
- Prespecified topological arrangement of the 10
members, the load, and the wall (Goldberg and
Samtani 1986) - Truss has 10 members (6 are length of 30 feet and
4 are length 30v2 41 feet) - The problem is to determine the cross-sectional
areas (A1, , A10) of each of the 10 members so
as to minimize weight of the material for a truss
that supports the 2 loads - The weight is based on volume (i.e.,
cross-sectional area ? length)
36TRUSS GENOME
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
0010 1110 0001 0011 1011 0011 1111 0011 0011 1010
- 40-bit chromosome (genome)
- 4-bit granularity for truss diameters
- 0000 smallest diameter
- 1111 largest diameter
- Total chromosome is 4 ? 10 40 bits
37TRUSS FITNESS
- Two-part (multiobjective) fitness measure
- First, fitness is computed by taking the sum,
over the 10 members, of the cross-sectional area
of each member times the length of each member
(30 feet or 30v2 41 feet). - Second, a penalty (up to 10) is imposed for
violating the stress constraints. Stresses are
computed using standard mechanical engineering
techniques. - The smaller the total fitness, the better
38CELLULAR AUTOMATA
39STATE TRANSITION TABLE
WWW WW W X E EE EEE Rule
0 0 0 0 0 0 0 1 a0
1 0 0 0 0 0 1 0 a1
2 0 0 0 0 1 0 0 a2
3 0 0 0 0 1 1 0 a3
4 0 0 1 1 0 0 0 a4
127 1 1 1 1 1 1 1 a127
40CELLULAR AUTOMATA
A0 A1 A2 A127
a0 a1 a2 a127
- 128-bit chromosome (genome)
41PROBLEM-SPECIFIC GENOMES
1 1 0 1 1 1 1
1 1 1 1 0 1 1
1 0 1 1 1 0 0
1 1 1 1 1 1 1
1 1 0 1 1 1 1
1 1 1 1 0 1 1
1 1 0 1 1 1 0
42GENETIC ALGORITHM USING VARIABLE-LENGTH STRINGS
- 5-WIRE ANTENNA (5 ? 15 75 bits)
- 4-WIRE ANTENNA (4 ? 15 60 bits)
X1 Y1 Z1 X5 Y5 Z5
0010 -1110 0001 0010 -1110 0001
X1 Y1 Z1 X4 Y4 Z4
1010 -0110 1101 1010 -0110 1001
43GENETIC PROGRAMMING
44THE CHALLENGE
- "How can computers learn to solve problems
without being explicitly programmed? In other
words, how can computers be made to do what is
needed to be done, without being told exactly how
to do it?" - ? Attributed to Arthur Samuel (1959)
45CRITERION FOR SUCCESS
- "The aim is ... to get machines to exhibit
behavior, which if done by humans, would be
assumed to involve the use of intelligence. - ? Arthur Samuel (1983)
46REPRESENTATIONS
- Binary decision diagrams
- Formal grammars
- Coefficients for polynomials
- Reinforcement learning tables
- Conceptual clusters
- Classifier systems
- Decision trees
- If-then production rules
- Horn clauses
- Neural nets
- Bayesian networks
- Frames
- Propositional logic
47A COMPUTER PROGRAM
48GENETIC PROGRAMMING (GP)
- GP applies the approach of the genetic algorithm
to the space of possible computer programs - Computer programs are the lingua franca for
expressing the solutions to a wide variety of
problems - A wide variety of seemingly different problems
from many different fields can be reformulated as
a search for a computer program to solve the
problem.
49GP ? MAIN POINTS
- Genetic programming now routinely delivers
high-return human-competitive machine
intelligence. - Genetic programming is an automated invention
machine. - Genetic programming has delivered a progression
of qualitatively more substantial results in
synchrony with five approximately
order-of-magnitude increases in the expenditure
of computer time.
50DEFINITION OF HIGH-RETURN
- The AI ratio (the artificial-to-intelligence
ratio) of a problem-solving method as the ratio
of that which is delivered by the automated
operation of the artificial method to the amount
of intelligence that is supplied by the human
applying the method to a particular problem
51DEFINITION OF ROUTINE
- A problem solving method is routine if it is
general and relatively little human effort is
required to get the method to successfully handle
new problems within a particular domain and to
successfully handle new problems from a different
domain.
52CRITERIA FOR HUMAN-COMPETITIVENESS
- Previously patented, an improvement over a
patented invention, or patentable today - Publishable in its own right as a new scientific
result ¾ independent of the fact that the result
was mechanically created - Holds it own in regulated competition against
humans (or programs) - 5 other similar criteria that are arms-length
from the fields of AI, ML, GP
53PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL
RESULTS PRODUCED BY GP
- Toy problems
- Human-competitive non-patent results
- 20th-century patented inventions
- 21st-century patented inventions
- Patentable new inventions
54GP FLOWCHART
55A COMPUTER PROGRAM IN C
- int foo (int time)
-
- int temp1, temp2
- if (time gt 10)
- temp1 3
- else
- temp1 4
- temp2 temp1 1 2
- return (temp2)
56OUTPUT OF C PROGRAM
Time Output
0 6
1 6
2 6
3 6
4 6
5 6
6 6
7 6
8 6
9 6
10 6
11 7
12 7
57PROGRAM TREE
- ( 1 2 (IF (gt TIME 10) 3 4))
58CREATING RANDOM PROGRAMS
59CREATING RANDOM PROGRAMS
- Available functions
F , -, , , IFLTE - Available terminals
T X, Y, Random-Constants - The random programs are
- Of different sizes and shapes
- Syntactically valid
- Executable
60GP GENETIC OPERATIONS
- Reproduction
- Mutation
- Crossover (sexual recombination)
- Architecture-altering operations
61MUTATION OPERATION
62MUTATION OPERATION
- Select 1 parent probabilistically based on
fitness - Pick point from 1 to NUMBER-OF-POINTS
- Delete subtree at the picked point
- Grow new subtree at the mutation point in same
way as generated trees for initial random
population (generation 0) - The result is a syntactically valid executable
program - Put the offspring into the next generation of the
population
63CROSSOVER OPERATION
64CROSSOVER OPERATION
- Select 2 parents probabilistically based on
fitness - Randomly pick a number from 1 to NUMBER-OF-POINTS
for 1st parent - Independently randomly pick a number for 2nd
parent - The result is a syntactically valid executable
program - Put the offspring into the next generation of the
population - Identify the subtrees rooted at the two picked
points
65REPRODUCTION OPERATION
- Select parent probabilistically based on fitness
- Copy it (unchanged) into the next generation of
the population
66FIVE MAJOR PREPARATORY STEPS FOR GP
- Determining the set of terminals
- Determining the set of functions
- Determining the fitness measure
- Determining the parameters for the run
- Determining the method for designating a result
and the criterion for terminating a run
67ILLUSTRATIVE GP RUN
68SYMBOLIC REGRESSION
Independent variable X Dependent variable Y
-1.00 1.00
-0.80 0.84
-0.60 0.76
-0.40 0.76
-0.20 0.84
0.00 1.00
0.20 1.24
0.40 1.56
0.60 1.96
0.80 2.44
1.00 3.00
69PREPARATORY STEPS
Objective Find a computer program with one input (independent variable X) whose output equals the given data
1 Terminal set T X, Random-Constants
2 Function set F , -, ,
3 Fitness The sum of the absolute value of the differences between the candidate programs output and the given data (computed over numerous values of the independent variable x from 1.0 to 1.0)
4 Parameters Population size M 4
5 Termination An individual emerges whose sum of absolute errors is less than 0.1
70SYMBOLIC REGRESSION
- POPULATION OF 4 RANDOMLY CREATED INDIVIDUALS FOR
GENERATION 0
71SYMBOLIC REGRESSION x2 x 1
- FITNESS OF THE 4 INDIVIDUALS IN GEN 0
72SYMBOLIC REGRESSION x2 x 1
73CLASSIFICATION
74GP TABLEAU INTERTWINED SPIRALS
Objective Create a program to classify a given point in the x-y plane to the red or blue spiral
1 Terminal set T X,Y,Random-Constants
2 Function set F ,-,,,IFLTE,SIN,COS
3 Fitness The number of correctly classified points (0 194)
4 Parameters M 10,000. G 51
5 Termination An individual program scores 194
75WALL-FOLLOWER
76FITNESS
77BEST OF GENERATION 57
78BOX MOVER BEST OF GEN 0
79BOX MOVERGEN 45 FITNESS CASE 1
80TRUCK BACKER UPPER
81TRUCK BACKER UPPER
- 4-Dimensional control problem
- horizontal position, x
- vertical position, y
- angle between trailer and horizontal, Qt
- angle between trailer and cab, Qd
- One control variable (steering wheel turn angle)
- State transition equations map the 4 state
variables into 1 output (the control variable) - Simulation run over many initial conditions and
over hundreds of time steps
82GENETIC PROGRAMMING ON THE PROGRAMMING OF
COMPUTERS BY MEANS OF NATURAL SELECTION(Koza
1992)
832 MAIN POINTS FROM 1992 BOOK
- Virtually all problems in artificial
intelligence, machine learning, adaptive systems,
and automated learning can be recast as a search
for a computer program. - Genetic programming provides a way to
successfully conduct the search for a computer
program in the space of computer programs.
84SOME RESULTS FROM 1992 BOOK
- Intertwined Spirals
- Truck Backer Upper
- Broom Balancer
- Wall Follower
- Box Mover
- Artificial Ant
- Differential Games
- Inverse Kinematics
- Central Place Foraging
- Block Stacking
- Randomizer
- Cellular Automata
- Task Prioritization
- Image Compression
- Econometric Equation
- Optimization
- Boolean Function Learning
- Co-Evolution of Game-Playing Strategies
85PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL
RESULTS PRODUCED BY GP
- Toy problems
- Human-competitive non-patent results
- 20th-century patented inventions
- 21st-century patented inventions
- Patentable new inventions
86COMPUTER PROGRAMS
- Subroutines provide one way to REUSE code ?
possibly with different instantiations of the
dummy variables (formal parameters) - Loops (and iterations) provide a 2nd way to REUSE
code - Recursion provide a 3rd way to REUSE code
- Memory provides a 4th way to REUSE the results of
executing code
87SYMBOLIC REGRESSION
Fitness case L0 W0 H0 L1 W1 H1 Dependent variable D
1 3 4 7 2 5 3 54
2 7 10 9 10 3 1 600
3 10 9 4 8 1 6 312
4 3 9 5 1 6 4 111
5 4 3 2 7 6 1 -18
6 3 3 1 9 5 4 -171
7 5 9 9 1 7 6 363
8 1 2 9 3 9 2 -36
9 2 6 8 2 6 10 -24
10 8 1 10 7 5 1 45
88EVOLVED SOLUTION
- (- ( ( W0 L0) H0)
- ( ( W1 L1) H1))
89DIFFERENCE IN VOLUMES
90AUTOMATICALLY DEFINED FUNCTION volume
91AUTOMATICALLY DEFINED FUNCTION volume
- (progn
- (defun volume (arg0 arg1 arg2)
- (values
- ( arg0 ( arg1 arg2))))
- (values
- (- (volume L0 W0 H0)
- (volume L1 W1 H1))))
92AUTOMATICALLY DEFINED FUNCTIONS
- ADFs provide a way to REUSE code
- Code is typically reused with different
instantiations of the dummy variables (formal
parameters)
93ADDITION OF V0 AND V1
Fitness case L0 W0 H0 L1 W1 H1 V0 V1 D
1 3 4 7 2 5 3 84 30 54
2 7 10 9 10 3 1 630 30 600
3 10 9 4 8 1 6 360 48 312
4 3 9 5 1 6 4 135 24 111
5 4 3 2 7 6 1 24 42 -18
6 3 3 1 9 5 4 9 180 -171
7 5 9 9 1 7 6 405 42 363
8 1 2 9 3 9 2 18 54 -36
9 2 6 8 2 6 10 96 120 -24
10 8 1 10 7 5 1 80 35 45
94DIVIDE AND CONQUER
95DIVIDE AND CONQUER
- Decompose a problem into sub-problems
- Solve the sub-problems
- Assemble the solutions of the sub-problems into a
solution for the overall problem
96CHANGE OF REPRESENTATION
97CHANGE OF REPRESENTATION
- Identify regularities
- Change the representation
- Solve the overall problem
98ADF IMPLEMENTATION
- Each overall program in population includes
- a main result-producing branch (RPB) and
- function-defining branch (i.e., automatically
defined function, ADF) - In generation 0, create random programs with
different ingredients for the RPB and the ADF - Terminal set for ADF typically contains dummy
arguments (formal parameters), such as ARG0,
ARG1, - Function set of the RPB contains ADF0
- ADFs are private and associated with a particular
individual program in the population
99ADF MUTATION
- Select parent probabilistically on the basis of
fitness - Pick a mutation point from either RPB or an ADF
- Delete sub-tree rooted at the picked point
- Grow a new sub-tree at the picked point composed
of the allowable ingredients appropriate for the
picked point - The offspring is a syntactically valid executable
program
100ADF CROSSOVER
- Select parent probabilistically on the basis of
fitness - Pick a crossover point from either RPB or an ADF
of the FIRST patent - The choice of crossover point in the SECOND
parent is RESTRICTED to the picked RPB or to the
picked ADF - The sub-trees are swapped
- The offspring are syntactically valid executable
programs
101GENETIC PROGRAMMING II AUTOMATIC DISCOVERY OF
REUSABLE PROGRAMS(Koza 1994)
102MAIN POINTS OF 1994 BOOK
- Scalability is essential for solving non-trivial
problems in artificial intelligence, machine
learning, adaptive systems, and automated
learning - Scalability can be achieved by reuse
- Genetic programming provides a way to
automatically discover and reuse subprograms in
the course of automatically creating computer
programs to solve problems
103COMPUTER PROGRAMS
- Subroutines provide one way to REUSE code ?
possibly with different instantiations of the
dummy variables (formal parameters) - Loops (and iterations) provide a 2nd way to REUSE
code - Recursion provide a 3rd way to REUSE code
- Memory provides a 4th way to REUSE the results of
executing code
104MEMORY
Settable (named) variables Indexed vector memory Matrix memory Relational memory
105LANGDON'S DATA STRUCTURES
- Stacks
- Queues
- Lists
- Rings
106COMPUTER PROGRAMS
- Subroutines provide one way to REUSE code ?
possibly with different instantiations of the
dummy variables (formal parameters) - Loops (and iterations) provide a 2nd way to REUSE
code - Recursion provide a 3rd way to REUSE code
- Memory provides a 4th way to REUSE the results of
executing code
107AUTOMATICALLY DEFINED ITERATION (ADI)
- The overall program includes an
iteration-performing branch (IPB) in addition to
a result-producing branch (RPB) and
function-defining branches (ADF) - There are no infinite loops because the iteration
is performed over a known, fixed set - protein or DNA sequence (of varying length)
- time-series data
- two-dimensional array of pixels
- Memory is usually involved and is used to
communicate between IPB, RPB, and ADF
108TRANSMEMBRANE SEGMENT IDENTIFICATION PROBLEM
- Goal is to classify a given protein segment as
being a transmembrane domain or non-transmembrane
area of the protein
109TRANSMEMBRANE SEGMENT IDENTIFICATION PROBLEM
- (progn
- (defun ADF0 ()
- (ORN (ORN (ORN (I?) (H?)) (ORN (P?) (G?))) (ORN
(ORN (ORN (Y?) (N?)) (ORN (T?) (Q?))) (ORN (A?)
(H?)))))) - (defun ADF1 ()
- (values (ORN (ORN (ORN (A?) (I?)) (ORN (L?)
(W?))) (ORN (ORN (T?) (L?)) (ORN (T?) (W?)))))) - (defun ADF2 ()
- (values (ORN (ORN (ORN (ORN (ORN (D?) (E?)) (ORN
(ORN (ORN (D?) (E?)) (ORN (ORN (T?) (W?)) (ORN
(Q?) (D?)))) (ORN (K?) (P?)))) (ORN (K?) (P?)))
(ORN (T?) (W?))) (ORN (ORN (E?) (A?)) (ORN (N?)
(R?)))))) - (progn (loop-over-residues (SETM0 ( (-
(ADF1) (ADF2)) (SETM3 M0)))) - (values ( ( M3 M0) ( ( ( (- L -0.53) ( M0
M0)) ( ( ( M3 M0) ( ( M0 M3) ( M1 M2)))
M2)) ( M3 M0))))))
110 TRANSMEMBRANE SEGMENT IDENTIFICATION PROBLEM
- in-sample correlation of 0.976
- out-of-sample correlation of 0.968
- out-of-sample error rate 1.6
111AUTOMATICALLY DEFINED LOOP (ADL)
- loop initialization branch, LIB
- loop condition branch, LCB
- loop body branch, LBB
- loop update branch, LUB
112ADL
113COMPUTER PROGRAMS
- Subroutines provide one way to REUSE code ?
possibly with different instantiations of the
dummy variables (formal parameters) - Loops (and iterations) provide a 2nd way to REUSE
code - Recursion provide a 3rd way to REUSE code
- Memory provides a 4th way to REUSE the results of
executing code
114AUTOMATICALLY DEFINED RECURSION (ADR)
- recursion condition branch, RCB
- recursion body branch, RBB
- recursion update branch, RUB
- recursion ground branch, RGB
115ADR
116HUMAN-COMPETITIVE RESULTS(NOT RELATED TO PATENTS)
Transmembrane segment identification problem for proteins
Motifs for DEAD box family and manganese superoxide dismutase family of proteins
Cellular automata rule for Gacs-Kurdyumov-Levin (GKL) problem
Quantum algorithm for the Deutsch-Jozsa early promise problem
Quantum algorithm for Grovers database search problem
Quantum algorithm for the depth-two AND/OR query problem
Quantum algorithm for the depth-one OR query problem
Protocol for communicating information through a quantum gate
Quantum dense coding
Soccer-playing program that won its first two games in the 1997 Robo Cup competition
Soccer-playing program that ranked in the middle of field in 1998 Robo Cup competition
Antenna designed by NASA for use on spacecraft
Sallen-Key filter
117PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL
RESULTS PRODUCED BY GP
- Toy problems
- Human-competitive non-patent results
- 20th-century patented inventions
- 21st-century patented inventions
- Patentable new inventions
118GENETIC PROGRAMMING III DARWINIAN INVENTION AND
PROBLEM SOLVING(Koza, Bennett, Andre, Keane 1999)
119SUBROUTINE DUPLICATION
120SUBROUTINE CREATION
121SUBROUTINE DELETION
122ARGUMENT DUPLICATION
123ARGUMENT DELETION
12416 ATTRIBUTES OF A SYSTEM FOR AUTOMATICALLY
CREATING COMPUTER PROGRAMS
- Starts with "What needs to be done"
- Tells us "How to do it"
- Produces a computer program
- Automatic determination of program size
- Code reuse
- Parameterized reuse
- Internal storage
- Iterations, loops, and recursions
- Self-organization of hierarchies
- Automatic determination of program architecture
- Wide range of programming constructs
- Well-defined
- Problem-independent
- Wide applicability
- Scalable
- Competitive with human-produced results
125GENETIC PROGRAMMING PROBLEM SOLVER (GPPS)
126AUTOMATIC SYNTHESIS OF BOTH THE TOPOLOGY AND
SIZING OF ANALOG ELECTRICAL CIRCUITS BY MEANS OF
DEVELOPMENTAL GENETIC PROGRAMMING
127AUTOMATED CIRCUIT SYNTHESIS
- The topology of a circuit includes specifying the
gross number of components in the circuit, the
type of each component (e.g., a capacitor), and a
netlist specifying where each lead of each
component is to be connected. - Sizing involves specifying the values (typically
numerical) of each of the circuit's components.
128COMPONENT-CREATING FUNCTIONS
- Resistor R function
- Capacitor C function
- Inductor L function
- Diode D function
- Transistor Q function (3-leaded)
129COMPONENT-CREATING FUNCTIONS
130TOPOLOGY-MODIFYING FUNCTIONS
- SERIES division
- PARALLEL division
- VIA
- FLIP
131TOPOLOGY-MODIFYING FUNCTIONS
132DEVELOPMENT-CONTROLLING FUNCTIONS
- END function
- NOP (No Operation) function
- SAFE_CUT function
133THE INITIAL CIRCUIT
134DEVELOPMENTAL GP
- (LIST (C ( 0.963 ( ( -0.875 -0.113) 0.880))
(series (flip end) (series (flip end) (L -0.277
end) end) (L ( -0.640 0.749) (L -0.123 end))))
(flip (nop (L -0.657 end)))))
135CAPACITOR-CREATING FUNCTION
- (LIST (C ( 0.963 ( ( -0.875
- -0.113) 0.880)) (series (flip
- end) (series (flip end) (L
- 0.277 end) end) (L ( -0.640
- 0.749) (L -0.123 end)))) (flip
- (nop (L -0.657 end)))))
136CAPACITOR-CREATING FUNCTION
137SERIES DIVISION FUNCTION
- (LIST (C ( 0.963 ( ( -0.875
- -0.113) 0.880)) (series (flip
- end) (series (flip end) (L
- 0.277 end) end) (L ( -0.640
- 0.749) (L -0.123 end)))) (flip
- (nop (L -0.657 end)))))
138SERIES DIVISION
139DEVELOPMENTAL GP
140EVALUATION OF FITNESS
141DESIRED BEHAVIOR OF A LOWPASS FILTER
142EVOLVED CAMPBELL FILTER
- U. S. patent 1,227,113
- George Campbell
- American Telephone and Telegraph
- 1917
143EVOLVED ZOBEL FILTER
- U. S. patent 1,538,964
- Otto Zobel
- American Telephone and Telegraph Company
- 1925
144EVOLVED SALLEN-KEY FILTER
145EVOLVED DARLINGTON EMITTER-FOLLOWER SECTION
U. S. patent 2,663,806 Sidney Darlington Bell
Telephone Laboratories 1953
146NEGATIVE FEEDBACK
147HAROLD BLACKS RIDE ON THE LACKAWANNA FERRY
Courtesy of Lucent Technologies
14820th-CENTURY PATENTS
Campbell ladder topology for filters
Zobel M-derived half section and constant K filter sections
Crossover filter
Negative feedback
Cauer (elliptic) topology for filters
PID and PID-D2 controllers
Darlington emitter-follower section and voltage gain stage
Sorting network for seven items using only 16 steps
60 and 96 decibel amplifiers
Analog computational circuits
Real-time analog circuit for time-optimal robot control
Electronic thermometer
Voltage reference circuit
Philbrick circuit
NAND circuit
Simultaneous synthesis of topology, sizing, placement, and routing
149PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL
RESULTS PRODUCED BY GP
- Toy problems
- Human-competitive non-patent results
- 20th-century patented inventions
- 21st-century patented inventions
- Patentable new inventions
150SIX POST-2000 PATENTED INVENTIONS
151EVOLVED HIGH CURRENT LOAD CIRCUIT
152REGISTER-CONTROLLED CAPACITOR CIRCUIT
153LOW-VOLTAGE CUBIC CIRCUIT
154VOLTAGE-CURRENT-CONVERSION CIRCUIT
155LOW-VOLTAGE BALUN CIRCUIT
156TUNABLE INTEGRATED ACTIVE FILTER
15721st-CENTURY PATENTED INVENTIONS
Low-voltage balun circuit
Mixed analog-digital variable capacitor circuit
High-current load circuit
Voltage-current conversion circuit
Cubic function generator
Tunable integrated active filter
158PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL
RESULTS PRODUCED BY GP
- Toy problems
- Human-competitive non-patent results
- 20th-century patented inventions
- 21st-century patented inventions
- Patentable new inventions
159NOVELTY-DRIVEN EVOLUTION
- Two factors in fitness measure
- Circuits behavior in the frequency domain
- Largest number of nodes and edges (circuit
components) of a subgraph of the given circuit
that is isomorphic to a subgraph of a template
representing the prior art. Graph isomorphism
algorithm with the cost function being based on
the number of shared nodes and edges (instead of
just the number of nodes).
160NOVELTY-DRIVEN EVOLUTION
- For circuits not scoring the maximum number of
hits (101), the fitness of a circuit is the
product of the two factors. - For circuits scoring 101 hits (100-compliant
individuals), fitness is the number of shared
nodes and edges divided by 10,000.
161PRIOR ART TEMPLATE
162NON-INFRINGING SOLUTION NO. 1
163NON-INFRINGING SOLUTION NO. 5
164GP AS AN INVENTION MACHINE
165CIRCUIT-CONSTRUCTING PROGRAM TREE WITH ADFs
166LOWPASS FILTER WITH ADFs
167ADF0
168AUTOMATIC SYNTHESIS OF CIRCUIT LAYOUTINCLUDING
THE PLACEMENT OF COMPONENTS AND ROUTING OF WIRES
ALONG WITH THE TOPOLOGY AND SIZING
169CIRCUIT LAYOUT
- Circuit placement involves the assignment of each
of the circuit's components to a particular
physical location on a printed circuit board or
silicon wafer. - Routing involves the assignment of a particular
physical location to the wires between the leads
of the circuit's components.
170LAYOUT
171LAYOUT GENERATION 0
172100-COMPLIANT LOWPASS FILTERÂ GENERATION 25
WITH 5 CAPACITORS AND 11 INDUCTORS ? AREA OF
1775.2
173100-COMPLIANT LOWPASS FILTERGENERATION 30 WITH
10 INDUCTORS AND 5 CAPACITORS ? AREA OF 950.3
174100-COMPLIANT LOWPASS FILTERBEST-OF-RUN
CIRCUIT OF GENERATION 138 WITH 4 INDUCTORS AND 4
CAPACITORS ? AREA OF 359.4
175LAYOUT ? 60 DB AMPLIFIER
176AUTOMATIC SYNTHESIS OF BOTH THE TOPOLOGY AND
TUNING OF CONTROLLERS
177PROGRAM TREE FOR A CONTROLLER
178CONTROLLER BLOCKS
- gain
- integrator
- differentiator
- adder
- subtractor
- multiplier
- differential input integrators
- inverter
- lead
- lag
- two-parameter lag
- absolute value
- limiter
- divider
- delay
- conditional operators (switches)
179FUNCTION SET FOR CONTROLLER SYNTHESIS
- F GAIN,INVERTER,LEAD,LAG,LAG2,
DIFFERENTIAL_INPUT_INTEGRATOR, DIFFERENTIATOR,
ADD_SIGNAL, SUB_SIGNAL,ADD_3_SIGNAL,ADF0,
ADF1,ADF2,ADF3,ADF4
180TERMINAL SET FOR CONTROLLER SYNTHESIS
- T REFERENCE_SIGNAL, CONTROLLER_OUTPUT,
PLANT_OUTPUT
181CONSTRAINED SYNTACTIC STRUCTURE
- A grammar that specifies what functions and
terminals are allowed as arguments to particular
functions - For example, the first argument of the GAIN
function must be a value-setting subtree whereas
the second can be from the general pool of
functions - Also known as strong typing
182TWO-LAG PLANT
1838 FITNESS CASES
- 8 elements of the fitness measure represent 2 ? 2
? 2 choices - 2 different values of the plant's internal gain,
K (1.0 and 2.0), - 2 different values of the plant's time constant ?
(0.5 and 1.0), - 2 different values for the height of the
reference signal (rising from 0 to 1 volts or
from 0 to 1 microvolts at t 100 milliseconds
184FITNESS MEASURE
- For each of these 8 fitness cases, a transient
analysis (time domain) is performed using the
SPICE simulator. - The contribution to fitness for the 8 elements is
- Integral of time-weighted absolute error (ITAE)
- e(t) is difference between plant output and
reference signal. - Multiplication by B (106 or 1) makes both
reference signals equally influential. - Additional weighting function, A, heavily
penalizes non-compliant amounts of overshoot. A
weights all variations up to 2 above the
reference signal by 1.0, but bigger variations by
10.0.
185EVOLVED CONTROLLER FOR TWO-LAG PLANT
186LESS ITAE AND OVERSHOOT
187BETTER DISTURBANCE REJECTION
188REVERSE ENGINEERING OF METABOLIC PATHWAYS
189EVOLVED PATHWAY
190ANTENNA SYNTHESIS USING GP
- (PROGN3
- (TURN-RIGHT 0.125)
- (LANDMARK
- (REPEAT 2
- (PROGN2
- (DRAW 1.0 HALF-MM-WIRE)
- (DRAW 0.5 NO-WIRE)))
- (TRANSLATE-RIGHT 0.125 0.75))
191USING A TURTLE TO DRAW TWO-DIMENSIONAL ANTENNA
192BEST-OF-RUN ANTENNA FROM GENERATION 90
1933-DIMENSIONAL ANTENNA
194NASA EVOLVED ANTENNA
- To be on satellite to be launched in 2004
195OTHER STRUCTURES
196GENETIC NETWORK FOR lac operon
197EVOLVED NETWORK
- (IF (lt LACTOSE_LEVEL 9.139 ) (IF (lt
- REPRESSOR_LEVEL 6.270 ) (IF (gt GLUCOSE_LEVEL
- 5.491 ) 2.02 (IF (lt CAP_LEVEL 0.639 ) 2.033 (IF
- (lt CAP_LEVEL 4.858 ) (IF (gt LACTOSE_LEVEL 2.511 )
- (IF (gt CAP_LEVEL 7.807 ) 5.586 (IF (gt
- LACTOSE_LEVEL 2.114 ) 1.978 2.137 ) ) 0.0 ) (IF
- (gt REPRESSOR_LEVEL 4.015 ) 0.036 (IF (lt
- GLUCOSE_LEVEL 5.128 ) 10.0 (IF (lt REPRESSOR_LEVEL
- 4.268 ) 2.022 9.122 ) ) ) ) ) ) (IF (gt CAP_LEVEL
- 0.842 ) 0.0 5.97 ) ) (IF (lt CAP_LEVEL 1.769 )
- 2.022 (IF (lt GLUCOSE_LEVEL 2.382 ) (IF (gt
- LACTOSE_LEVEL 1.256 ) (IF (gt LACTOSE_LEVEL 1.933
- ) (IF (gt GLUCOSE_LEVEL 2.022 ) (IF (lt
- GLUCOSE_LEVEL 5.183 ) 6.323 (IF (gt CAP_LEVEL
- 1.208 ) 9.713 0.842 ) ) 10.0 ) (IF (gt
- GLUCOSE_LEVEL 6.270 ) 2.109 ) 1.965 ) ) 0.665 )
- 1.982 ) ) )
198IN C-STYLE PSEUDO CODE
- else
-
- if(CAP_LEVEL lt 1.769)
-
- LAC_mRNA_LEVEL 2.022
-
- else
-
- if(GLUCOSE_LEVEL lt 2.382)
-
- LAC_mRNA_LEVEL 10.0
-
- else
-
- LAC_mRNA_LEVEL 1.982
-
- if(LACTOSE_LEVEL lt 9.139)
-
- if(REPRESSOR_LEVEL lt 6.270)
-
- LAC_mRNA_LEVEL 2.022
-
- else
-
- LAC_mRNA_LEVEL 0.0
-
199PARAMETERIZED TOPOLOGIES
- One of the most important characteristics of
computer programs is that they ordinarily contain
inputs (free variables) and conditional operations
200PARAMETERIZED TOPOLOGY FOR LOWPASS FILTER
201PARAMETERIZED TOPOLOGY FOR HIGHPASS FILTER
202PARAMETERIZED TOPOLOGY FOR GENERAL-PURPOSE
CONTROLLER
203EVOLVED EQUATIONS FOR GENERAL-PURPOSE CONTROLLER
204EVOLVED EQUATIONS FOR GENERAL-PURPOSE CONTROLLER
205PATENTABLE NEW INVENTIONS
PID tuning rules that outperform the Ziegler-Nichols and Åström-Hägglund tuning rules
General-purpose controllers outperforming Ziegler-Nichols and Åström-Hägglund rules
206PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL
RESULTS PRODUCED BY GP
- Toy problems
- Human-competitive non-patent results
- 20th-century patented inventions
- 21st-century patented inventions
- Patentable new inventions
207PARALLELIZATION WITH SEMI-ISOLATED SUBPOPULATIONS
208GP PARALLELIZATION
- Like Hormel, Get Everything Out of the Pig,
Including the Oink - Keep on Trucking
- It Takes a Licking and Keeps on Ticking
- The Whole is Greater than the Sum of the Parts
209PETA-OPS
- Human brain operates at 1012 neurons operating at
103 per second 1015 ops per second - 1015 ops 1 peta-op 1 bs (brain second)
210EVOLVABLE HARDWARECORNER OF XILINX XC6216
211FUNCTION UNIT FOR CELL OF XILINIX XC6216
212SORTING NETWORK
213EVOLVED SORTING NETWORK
214GP 19872002
System Dates Speed-up over first system Human-competitive results Problem Category
Serial LISP 19871994 1 (base) 0 toy problems
64 transputers 19941997 9 2 human-competitive results not related to patented inventions
64 PowerPCs 19952000 204 12 20th-century patented inventions
70 Alphas 19992001 1,481 2 20th-century patented inventions
1,000 Pentium IIs 20002002 13,900 12 21st-century patented inventions
4-week runs on 1,000 Pentium IIs 2002-2003 130,000 2 patentable new inventions
215PROMISING GP APPLICATION AREAS
- Problem areas involving many variables that are
interrelated in highly non-linear ways - Inter-relationship of variables is not well
understood - Discovery of the size and shape of the solution
is a major part of the problem - "Black art" problems
216PROMISING GP APPLICATION AREAS (CONTINUED)
- Areas where you simply have no idea how to
program a solution, but where you know what you
want
217PROMISING GP APPLICATION AREAS (CONTINUED)
- Problem areas where a good approximate solution
is satisfactory - ? design
- ? control
- ? bioinformatics
- ? classification
- ? data mining
- ? system identification
- ? forecasting
218PROMISING GP APPLICATION AREAS (CONTINUED)
- ? Areas where large computerized databases are
accumulating and computerized techniques are
needed to analyze the data - ? genome, protein, microarray data
- ? satellite image data
- ? astronomical data
- ? petroleum databases
- ? financial databases
- ? medical records
- ? marketing databases
219PROMISING GP APPLICATION AREAS (CONTINUED)
- ? Areas for which humans find it very difficult
to write good programs - ? parallel computers
- ? cellular automata
- ? multi-agent strategies
- ? field-programmable game arrays
- ? digital signal processors
- ? swarm intelligence
220CHARACTERISTICS SUGGESTING USE OF GP
- (1) discovering the size and shape of the
solution, - (2) reusing substructures,
- (3) discovering the number of substructures,
- (4) discovering the nature of the hierarchical
references among substructures, - (5) passing parameters to a substructure,
- (6) discovering the type of substructures (e.g.,
subroutines, iterations, loops, recursions, or
storage), - (7) discovering the number of arguments possessed
by a substructure, - (8) maintaining syntactic validity and locality
by means of a developmental process, or - (9) discovering a general solution in the form of
a parameterized topology containing free variables
221DESIGNING A GIRAFFE
- Long neck
- Long tongue
- Vegetable-digesting enzymes in stomach
- 4 legs
- Long legs
- Brown coloration
222THE DESIGN OF A GOOD GIRAFE
Neck length Tongue length Carnivorous? Number of legs Leg length Coloration
15.11 feet 14 inches No 4 9.96 feet Brown
Floating point Floating point Boolean Integer Floating point Categorical
223NON-LINEARITY GIRAFE
- Taken one-by-one, some gene values found in a
giraffe, such as the long neck contribute (alone)
negatively to fitness - requires considerable material to construct
- requires considerable energy to maintain
- prone to injury (thereby hurting rate of survival
and reproduction) - Thus, maximizing any one variable will not lead
to the global optimum solution
224NON-LINEARITY (CONTINUED)
- When the variables are taken in pairs (there are
15 possible pairs), many combinations of pairs
(e.g., Long neck and long tongue) are doubly
detrimental
225NON-LINEARITY (CONTINUED)
- But, certain combinations of traits, when taken
together, are "co-adapted sets of alleles" that
yield a very fit animal for eating high acacia
leaves in the jungle environment, having good
camouflage, having high escape velocity when
faced with predators, and exploiting a niche (and
avoiding competition) with other animals feeding
on low-hanging vegetation
226SEARCH METHODS IN GENERAL
- Initial structure(s)
- Fitness measure
- Operations for creating new structures
- Parameters
- Termination criterion and method of designating
the result
227SPACE WITH MANY LOCAL OPTIMA
228SEARCH METHODS
- Blind random search does not use acquired
information in deciding on the future direction
of the search - Hill combing and gradient descent use acquired
information however, they are prone to becoming
trapped on local optima - The previous point is especially true for
non-trivial search spaces
2297 DIFFERENCES BETWEEN GP AND ARTIFICIAL
INTELLIGENCE AND MACHINE LEARNING APPROACHES
230REPRESENTATION
- Genetic programming overtly conducts it
- search for a solution to the given problem
- in program space
231ROLE OF POINT-TO-POINT TRANSFORMATIONS IN THE
SEARCH
- Genetic programming does not conduct its
- search by transforming a single point in the
- search space into another single point, but
- instead transforms a set of points into
- another set of points
232ROLE OF HILL CLIMBING IN THE SEARCH
- Genetic programming does not rely
- exclusively on greedy hill climbing to
- conduct its search, but instead allocates a
- certain number of trials, in a principled
- way, to choices that are known to be
- inferior
233DETERMINISM IN THE SEARCH
- Genetic programming conducts its search
- probabilistically
234ROLE OF AN EXPLICIT KNOWLEDGE BASE
- Genetic programming does NOT make use
- of a knowledge base
235ROLE OF FORMAL LOGIC IN THE SEARCH
- Genetic programming does not utilize
- formal logic in its search strategy.
Contradictory alternatives are created and
actively maintained.
236UNDERPINNINGS OF THE TECHNIQUE
237TURING (1948)
- Turing made the connection between
- searches and the challenge of getting a
- computer to solve a problem without
- explicitly programming it in his 1948 essay
- Intelligent Machines
- "Further research into intelligence of machinery
will probably be very greatly concerned with
'searches' ...
238TURINGS 3 APPROACHES TO MACHINE INTELLIGENCE
(1948)
- LOGIC-BASED SEARCH
- One approach that Turing identified is a
- search through the space of integers
- representing candidate computer
- programs.
239TURINGS 3 APPROACHES (CONTINUED)
- CULTURAL SEARCH
- A second approach is the "cultural search
- which relies on knowledge and expertise
- acquired over a period of years from
- others (akin to present-day knowledge-
- based systems).
240TURINGS 3 APPROACHES (CONTINUED)
- GENETICAL OR EVOLUTIONARY SEARCH
- "There is the genetical or evolutionary
- search by which a combination of genes is
- looked for, the criterion being the survival
- value.
241TURING (1950)
- From Turings 1950 paper "Computing
- Machinery and Intelligence"
- We cannot expect to find a good child-machine at
the first attempt. One must experiment with
teaching one such machine and see how well it
learns. One can then try another and see if it
is better or worse. There is an obvious
connection between this process and evolution, by
the identifications
242TURING (1950) (CONTINUED)
- Structure of the child machine
- Hereditary material
- Changes of the child machine
- Mutations
- Natural selection
Judgment of the experimenter