Title: GENETIC PROGRAMMING
1GENETIC PROGRAMMING
2- John R. Koza
- Consulting Professor (Biomedical Informatics)
- Department of Medicine
- School of Medicine
- Consulting Professor
- Department of Electrical Engineering
- School of Engineering
- Stanford University
- Stanford, California 94305
- koza_at_stanford.edu
- http//www.smi.stanford.edu/people/koza/
3THE CHALLENGE OF ARTIFICIAL INTELLIGENCE
- How can computers learn to solve problems
without being explicitly programmed? In other
words, how can computers be made to do what is
needed to be done, without being told exactly how
to do it? - ? Attributed to Arthur Samuel (1959)
4CRITERION FOR SUCCESS
- "The aim is ... to get machines to exhibit
behavior, which if done by humans, would be
assumed to involve the use of intelligence. - ? Arthur Samuel (1983)
5MAIN POINT No. 1
- Genetic programming now routinely delivers
high-return human-competitive machine intelligence
6MAIN POINT No. 2
- Genetic programming is an automated invention
machine
7MAIN POINT No. 3
- Genetic programming has delivered a progression
of qualitatively more substantial results in
synchrony with five approximately
order-of-magnitude increases in the expenditure
of computer time
8MAIN POINT No. 1
- Genetic programming now routinely delivers
high-return human-competitive machine intelligence
9HUMAN-COMPETITIVE
- The result is equal or better than human-designed
solution to the same problem
10NASA EVOLVED ANTENNA
- X-Band Antenna for NASA's Space Technology 5
Mission in 2004
11HUMAN-COMPETITIVE
- Previously patented, an improvement over a
patented invention, or patentable today
12DEFINITION OF HIGH-RETURN
- The AI ratio (the artificial-to-intelligence
ratio) of a problem-solving method as the ratio
of that which is delivered by the automated
operation of the artificial method to the amount
of intelligence that is supplied by the human
applying the method to a particular problem
13DEFINITION OF ROUTINE
- A problem solving method is routine if it is
general and relatively little human effort is
required to get the method to successfully handle
new problems within a particular domain and to
successfully handle new problems from a different
domain.
14PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL
RESULTS PRODUCED BY GP
- Toy problems
- Human-competitive non-patent results
- 20th-century patented inventions
- 21st-century patented inventions
- Patentable new inventions
15REPRESENTATIONS
- Decision trees
- If-then production rules
- Horn clauses
- Neural nets
- Bayesian networks
- Frames
- Propositional logic
- Binary decision diagrams
- Formal grammars
- Coefficients for polynomials
- Reinforcement learning tables
- Conceptual clusters
- Classifier systems
16A COMPUTER PROGRAM
17DESIRED OUTPUT OF PROGRAM
Time Output
0 6
1 6
2 6
3 6
4 6
5 6
6 6
7 6
8 6
9 6
10 6
11 7
12 7
18PROGRAM TREE
- ( 1 2 (IF (gt TIME 10) 3 4))
19GENETIC PROGRAMMING
- Create initial population (random)
- Main generational loop
- Execute all programs
- Evaluate fitness of all programs
- Select single individuals or pairs of individuals
based on fitness to participate in the genetic
operations (mutation, crossover, reproduction,
architecture-altering operations) - Termination Criterion
20CREATING RANDOM PROGRAMS
21CREATING RANDOM PROGRAMS
- Function Set
F , -, , , IFLTE - Terminal Set
T X, Y, Random-Values
22CREATING RANDOM PROGRAMS
- The random programs are
- Of different sizes and shapes
- Syntactically valid
- Executable
23DARWINIAN SELECTION
- Selection based on fitness
- Better individual more likely to be selected
- Probabilistic selection
- - Best is not always picked
- - Worst is not necessarily excluded
24MUTATION OPERATION
25MUTATION OPERATION
- Select 1 parent probabilistically based on
fitness - Pick point from 1 to NUMBER-OF-POINTS
- Delete subtree at the picked point
- Grow new subtree at the mutation point in same
way as generated trees for initial random
population (generation 0) - The result is a syntactically valid executable
program - Put the offspring into the next generation of the
population
26CROSSOVER OPERATION
27CROSSOVER OPERATION
- Select 2 parents probabilistically based on
fitness - Randomly pick a number from 1 to NUMBER-OF-POINTS
for 1st parent - Independently randomly pick a number for 2nd
parent - The result is a syntactically valid executable
program - Put the offspring into the next generation of the
population - Identify the subtrees rooted at the two picked
points
28REPRODUCTION OPERATION
- Select parent probabilistically based on fitness
- Copy it (unchanged) into the next generation of
the population
29PROBABILISTIC STEPS
- The initial population is typically random
- Probabilistic selection based on fitness
- - Best is not always picked
- - Worst is not necessarily excluded
- Random picking of mutation and crossover points
30ILLUSTRATIVE GP RUN
31SYMBOLIC REGRESSION
Independent variable X Dependent variable Y
-1.00 1.00
-0.80 0.84
-0.60 0.76
-0.40 0.76
-0.20 0.84
0.00 1.00
0.20 1.24
0.40 1.56
0.60 1.96
0.80 2.44
1.00 3.00
325 MAJOR PREPARATORY STEPS OF GP
- Determining the set of terminals
- Determining the set of functions
- Determining the fitness measure
- Determining the parameters for the run
- Determining the criterion for terminating a run
33PREPARATORY STEPS
Objective Find a computer program with one input (independent variable X) whose output equals the given data
1 Terminal set T X, Random-Constants
2 Function set F , -, ,
3 Fitness The sum of the absolute value of the differences between the candidate programs output and the given data (computed over numerous values of the independent variable x from 1.0 to 1.0)
4 Parameters Population size M 4
5 Termination An individual emerges whose sum of absolute errors is less than 0.1
34SYMBOLIC REGRESSION
- POPULATION OF 4 RANDOMLY CREATED INDIVIDUALS FOR
GENERATION 0
35SYMBOLIC REGRESSION x2 x 1
36CLASSIFICATION
37GP TABLEAU INTERTWINED SPIRALS
Objective Create a program to classify a given point in the x-y plane to the red or blue spiral
1 Terminal set T X,Y,Random-Constants
2 Function set F ,-,,,IFLTE
3 Fitness Accuracy of classification (0 194)
4 Parameters M 10,000. G 51
5 Termination A program is 100 accurate
38BOX MOVER BEST OF GEN 0
39BOX MOVERGEN 45 FITNESS CASE 1
40GENETIC PROGRAMMING ON THE PROGRAMMING OF
COMPUTERS BY MEANS OF NATURAL SELECTION(Koza
1992)
412 MAIN POINTS FROM 1992 BOOK
- Virtually all problems in artificial
intelligence, machine learning, adaptive systems,
and automated learning can be recast as a search
for a computer program. - Genetic programming provides a way to
successfully conduct the search for a computer
program in the space of computer programs.
42PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL
RESULTS PRODUCED BY GP
- Toy problems
- Human-competitive non-patent results
- 20th-century patented inventions
- 21st-century patented inventions
- Patentable new inventions
43COMPUTER PROGRAMS
- Subroutines provide one way to REUSE code ?
possibly with different instantiations of the
dummy variables (formal parameters) - Loops (and iterations) provide a 2nd way to REUSE
code - Recursion provide a 3rd way to REUSE code
- Memory provides a 4th way to REUSE the results of
executing code
44DIFFERENCE IN VOLUMES
45EVOLVED SOLUTION
- (- ( ( W0 L0) H0)
- ( ( W1 L1) H1))
46AUTOMATICALLY DEFINED FUNCTION volume
47AUTOMATICALLY DEFINED FUNCTION volume
- (progn
- (defun volume (arg0 arg1 arg2)
- (values
- ( arg0 ( arg1 arg2))))
- (values
- (- (volume L0 W0 H0)
- (volume L1 W1 H1))))
48AUTOMATICALLY DEFINED FUNCTIONS (SUBROUTINES)
- ADFs provide a way to REUSE code
- Code is typically reused with different
instantiations of the dummy variables (formal
parameters)
49DIVIDE AND CONQUER
50DIVIDE AND CONQUER
- Decompose a problem into sub-problems
- Solve the sub-problems
- Assemble the solutions of the sub-problems into a
solution for the overall problem
51CHANGE OF REPRESENTATION
52CHANGE OF REPRESENTATION
- Identify regularities
- Change the representation
- Solve the overall problem
53GENETIC PROGRAMMING II AUTOMATIC DISCOVERY OF
REUSABLE PROGRAMS(Koza 1994)
54MAIN POINTS OF 1994 BOOK
- Scalability is essential for solving non-trivial
problems in artificial intelligence, machine
learning, adaptive systems, and automated
learning - Scalability can be achieved by reuse
- Genetic programming provides a way to
automatically discover and reuse subprograms in
the course of automatically creating computer
programs to solve problems
55MEMORY
Settable (named) variables Indexed vector memory Matrix memory Relational memory
56TRANSMEMBRANE SEGMENT IDENTIFICATION PROBLEM
- (progn
- (defun ADF0 ()
- (ORN (ORN (ORN (I?) (H?)) (ORN (P?) (G?))) (ORN
(ORN (ORN (Y?) (N?)) (ORN (T?) (Q?))) (ORN (A?)
(H?)))))) - (defun ADF1 ()
- (values (ORN (ORN (ORN (A?) (I?)) (ORN (L?)
(W?))) (ORN (ORN (T?) (L?)) (ORN (T?) (W?)))))) - (defun ADF2 ()
- (values (ORN (ORN (ORN (ORN (ORN (D?) (E?)) (ORN
(ORN (ORN (D?) (E?)) (ORN (ORN (T?) (W?)) (ORN
(Q?) (D?)))) (ORN (K?) (P?)))) (ORN (K?) (P?)))
(ORN (T?) (W?))) (ORN (ORN (E?) (A?)) (ORN (N?)
(R?)))))) - (progn (loop-over-residues (SETM0 ( (-
(ADF1) (ADF2)) (SETM3 M0)))) - (values ( ( M3 M0) ( ( ( (- L -0.53) ( M0
M0)) ( ( ( M3 M0) ( ( M0 M3) ( M1 M2)))
M2)) ( M3 M0))))))
57ADL
58ADR
59HUMAN-COMPETITIVE RESULTS(NOT RELATED TO PATENTS)
Transmembrane segment identification problem for proteins
Motifs for DEAD box family and manganese superoxide dismutase family of proteins
Cellular automata rule for Gacs-Kurdyumov-Levin (GKL) problem
Quantum algorithm for the Deutsch-Jozsa early promise problem
Quantum algorithm for Grovers database search problem
Quantum algorithm for the depth-two AND/OR query problem
Quantum algorithm for the depth-one OR query problem
Protocol for communicating information through a quantum gate
Quantum dense coding
Soccer-playing program that won its first two games in the 1997 Robo Cup competition
Soccer-playing program that ranked in the middle of field in 1998 Robo Cup competition
Antenna designed by NASA for use on spacecraft
Sallen-Key filter
60PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL
RESULTS PRODUCED BY GP
- Toy problems
- Human-competitive non-patent results
- 20th-century patented inventions
- 21st-century patented inventions
- Patentable new inventions
61AUTOMATIC SYNTHESIS OF BOTH THE TOPOLOGY AND
SIZING OF ANALOG ELECTRICAL CIRCUITS
62COMPONENT-CREATING FUNCTIONS
- Resistor R function
- Capacitor C function
- Inductor L function
- Diode D function
- Transistor Q function (3-leaded)
63COMPONENT-CREATING FUNCTIONS
64TOPOLOGY-MODIFYING FUNCTIONS
- SERIES division
- PARALLEL division
- VIA
- FLIP
65TOPOLOGY-MODIFYING FUNCTIONS
66DEVELOPMENT-CONTROLLING FUNCTIONS
- END function
- NOP (No Operation) function
- SAFE_CUT function
67DEVELOPMENTAL GP
68DEVELOPMENTAL GP
- (LIST (C ( 0.963 ( ( -0.875 -0.113) 0.880))
(series (flip end) (series (flip end) (L -0.277
end) end) (L ( -0.640 0.749) (L -0.123 end))))
(flip (nop (L -0.657 end)))))
69CAPACITOR-CREATING FUNCTION
- (LIST (C ( 0.963 ( ( -0.875
- -0.113) 0.880)) (series (flip
- end) (series (flip end) (L
- 0.277 end) end) (L ( -0.640
- 0.749) (L -0.123 end)))) (flip
- (nop (L -0.657 end)))))
70CAPACITOR-CREATING FUNCTION
71SERIES DIVISION FUNCTION
- (LIST (C ( 0.963 ( ( -0.875
- -0.113) 0.880)) (series (flip
- end) (series (flip end) (L
- 0.277 end) end) (L ( -0.640
- 0.749) (L -0.123 end)))) (flip
- (nop (L -0.657 end)))))
72SERIES DIVISION
73EVALUATION OF FITNESS
74DESIRED BEHAVIOR OF A LOWPASS FILTER
75EVOLVED CAMPBELL FILTER
- U. S. patent 1,227,113
- George Campbell
- American Telephone and Telegraph
- 1917
76EVOLVED ZOBEL FILTER
- U. S. patent 1,538,964
- Otto Zobel
- American Telephone and Telegraph Company
- 1925
77EVOLVED SALLEN-KEY FILTER
78EVOLVED DARLINGTON EMITTER-FOLLOWER SECTION
U. S. patent 2,663,806 Sidney Darlington Bell
Telephone Laboratories 1953
79NEGATIVE FEEDBACK
80HAROLD BLACKS RIDE ON THE LACKAWANNA FERRY
Courtesy of Lucent Technologies
8120th-CENTURY PATENTS
Campbell ladder topology for filters
Zobel M-derived half section and constant K filter sections
Crossover filter
Negative feedback
Cauer (elliptic) topology for filters
PID and PID-D2 controllers
Darlington emitter-follower section and voltage gain stage
Sorting network for seven items using only 16 steps
60 and 96 decibel amplifiers
Analog computational circuits
Real-time analog circuit for time-optimal robot control
Electronic thermometer
Voltage reference circuit
Philbrick circuit
NAND circuit
Simultaneous synthesis of topology, sizing, placement, and routing
82PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL
RESULTS PRODUCED BY GP
- Toy problems
- Human-competitive non-patent results
- 20th-century patented inventions
- 21st-century patented inventions
- Patentable new inventions
83SIX POST-2000 PATENTED INVENTIONS
84EVOLVED HIGH CURRENT LOAD CIRCUIT
85REGISTER-CONTROLLED CAPACITOR CIRCUIT
86LOW-VOLTAGE CUBIC CIRCUIT
87VOLTAGE-CURRENT-CONVERSION CIRCUIT
88LOW-VOLTAGE BALUN CIRCUIT
89TUNABLE INTEGRATED ACTIVE FILTER
9021st-CENTURY PATENTED INVENTIONS
Low-voltage balun circuit
Mixed analog-digital variable capacitor circuit
High-current load circuit
Voltage-current conversion circuit
Cubic function generator
Tunable integrated active filter
91PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL
RESULTS PRODUCED BY GP
- Toy problems
- Human-competitive non-patent results
- 20th-century patented inventions
- 21st-century patented inventions
- Patentable new inventions
92NOVELTY-DRIVEN EVOLUTION
- Two factors in fitness measure
- Circuits behavior
- Largest number of nodes and edges (circuit
components) of a subgraph of the given circuit
that is isomorphic to a subgraph of a template
representing the prior art.
93PRIOR ART TEMPLATE
94NON-INFRINGING SOLUTION NO. 1
95NON-INFRINGING SOLUTION NO. 5
96GP AS AN INVENTION MACHINE
97AUTOMATIC SYNTHESIS OF CIRCUIT LAYOUTINCLUDING
THE PLACEMENT OF COMPONENTS AND ROUTING OF WIRES
ALONG WITH THE TOPOLOGY AND SIZING
98CIRCUIT LAYOUT
- Circuit placement involves the assignment of each
of the circuit's components to a particular
physical location on a printed circuit board or
silicon wafer. - Routing involves the assignment of a particular
physical location to the wires between the leads
of the circuit's components.
99LAYOUT
100LAYOUT GENERATION 0
101100-COMPLIANT LOWPASS FILTER GENERATION 25
WITH 5 CAPACITORS AND 11 INDUCTORS ? AREA OF
1775.2
102100-COMPLIANT LOWPASS FILTERBEST-OF-RUN
CIRCUIT OF GENERATION 138 WITH 4 INDUCTORS AND 4
CAPACITORS ? AREA OF 359.4
103REVERSE ENGINEERING OF METABOLIC PATHWAYS
104USING A TURTLE TO DRAW TWO-DIMENSIONAL ANTENNA
105BEST-OF-RUN ANTENNA FROM GENERATION 90
1063-DIMENSIONAL ANTENNA
107EVOLVED SORTING NETWORK
108GENETIC NETWORK FOR lac operon
109EVOLVED NETWORK
- (IF (lt LACTOSE_LEVEL 9.139 ) (IF (lt
- REPRESSOR_LEVEL 6.270 ) (IF (gt GLUCOSE_LEVEL
- 5.491 ) 2.02 (IF (lt CAP_LEVEL 0.639 ) 2.033 (IF
- (lt CAP_LEVEL 4.858 ) (IF (gt LACTOSE_LEVEL 2.511 )
- (IF (gt CAP_LEVEL 7.807 ) 5.586 (IF (gt
- LACTOSE_LEVEL 2.114 ) 1.978 2.137 ) ) 0.0 ) (IF
- (gt REPRESSOR_LEVEL 4.015 ) 0.036 (IF (lt
- GLUCOSE_LEVEL 5.128 ) 10.0 (IF (lt REPRESSOR_LEVEL
- 4.268 ) 2.022 9.122 ) ) ) ) ) ) (IF (gt CAP_LEVEL
- 0.842 ) 0.0 5.97 ) ) (IF (lt CAP_LEVEL 1.769 )
- 2.022 (IF (lt GLUCOSE_LEVEL 2.382 ) (IF (gt
- LACTOSE_LEVEL 1.256 ) (IF (gt LACTOSE_LEVEL 1.933
- ) (IF (gt GLUCOSE_LEVEL 2.022 ) (IF (lt
- GLUCOSE_LEVEL 5.183 ) 6.323 (IF (gt CAP_LEVEL
- 1.208 ) 9.713 0.842 ) ) 10.0 ) (IF (gt
- GLUCOSE_LEVEL 6.270 ) 2.109 ) 1.965 ) ) 0.665 )
- 1.982 ) ) )
110OTHER STRUCTURES
111GENETIC PROGRAMMING III DARWINIAN INVENTION AND
PROBLEM SOLVING(Koza, Bennett, Andre, Keane 1999)
112SUBROUTINE DUPLICATION
113SUBROUTINE CREATION
114SUBROUTINE DELETION
115ARGUMENT DUPLICATION
116ARGUMENT DELETION
11716 ATTRIBUTES OF A SYSTEM FOR AUTOMATICALLY
CREATING COMPUTER PROGRAMS
- Starts with "What needs to be done"
- Tells us "How to do it"
- Produces a computer program
- Automatic determination of program size
- Code reuse
- Parameterized reuse
- Internal storage
- Iterations, loops, and recursions
- Self-organization of hierarchies
- Automatic determination of program architecture
- Wide range of programming constructs
- Well-defined
- Problem-independent
- Wide applicability
- Scalable
- Competitive with human-produced results
118GENETIC PROGRAMMING PROBLEM SOLVER (GPPS)
119PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL
RESULTS PRODUCED BY GP
- Toy problems
- Human-competitive non-patent results
- 20th-century patented inventions
- 21st-century patented inventions
- Patentable new inventions
120AUTOMATIC SYNTHESIS OF BOTH THE TOPOLOGY AND
TUNING OF CONTROLLERS
121PARAMETERIZED TOPOLOGY FOR GENERAL-PURPOSE
CONTROLLER
122EVOLVED EQUATIONS FOR GENERAL-PURPOSE CONTROLLER
123EVOLVED EQUATIONS FOR GENERAL-PURPOSE CONTROLLER
124PATENTABLE NEW INVENTIONS
PID tuning rules that outperform the Ziegler-Nichols and Åström-Hägglund tuning rules
General-purpose controllers outperforming Ziegler-Nichols and Åström-Hägglund rules
125PARALLELIZATION WITH SEMI-ISOLATED SUBPOPULATIONS
126GP PARALLELIZATION
- Like Hormel, Get Everything Out of the Pig,
Including the Oink - Keep on Trucking
- It Takes a Licking and Keeps on Ticking
- The Whole is Greater than the Sum of the Parts
127PETA-OPS
- Human brain operates at 1012 neurons operating at
103 per second 1015 ops per second - 1015 ops 1 peta-op 1 bs (brain second)
128GP 19872002
System Dates Speed-up over first system Human-competitive results Problem Category
Serial LISP 19871994 1 (base) 0 toy problems
64 transputers 19941997 9 2 human-competitive results not related to patented inventions
64 PowerPCs 19952000 204 12 20th-century patented inventions
70 Alphas 19992001 1,481 2 20th-century patented inventions
1,000 Pentium IIs 20002002 13,900 12 21st-century patented inventions
4-week runs on 1,000 Pentium IIs 2002-2003 130,000 2 patentable new inventions
129PROMISING GP APPLICATION AREAS
- Problem areas involving many variables that are
interrelated in highly non-linear ways - Inter-relationship of variables is not well
understood - Discovery of the size and shape of the solution
is a major part of the problem
130PROMISING GP APPLICATION AREAS (CONTINUED)
- "Black art" problems
- Areas where you simply have no idea how to
program a solution, but where you know what you
want
131PROMISING GP APPLICATION AREAS (CONTINUED)
- Problem areas where a good approximate solution
is satisfactory - ? design
- ? control
- ? bioinformatics
- ? classification
- ? data mining
- ? system identification
- ? forecasting
132PROMISING GP APPLICATION AREAS (CONTINUED)
- ? Areas where large computerized databases are
accumulating and computerized techniques are
needed to analyze the data - ? genome, protein, microarray data
- ? satellite image data
- ? astronomical data
- ? petroleum databases
- ? financial databases
- ? medical records
- ? marketing databases
133PROMISING GP APPLICATION AREAS (CONTINUED)
- ? Areas for which humans find it very difficult
to write good programs - ? parallel computers
- ? cellular automata
- ? multi-agent strategies
- ? field-programmable game arrays
- ? digital signal processors
- ? swarm intelligence
134DIFFERENCES BETWEEN GP AND ARTIFICIAL
INTELLIGENCE (AI) AND MACHINE LEARNING (ML)
APPROACHES
135REPRESENTATION
- Genetic programming overtly conducts it
- search for a solution to the given problem
- in program space
136POINT-TO-POINT TRANSFORMATIONS IN THE SEARCH
- Genetic programming does not conduct its
- search by transforming a single point in the
- search space into another single point, but
- instead transforms a set of points into
- another set of points
137HILL CLIMBING IN THE SEARCH
- Genetic programming does not rely
- exclusively on greedy hill climbing to
- conduct its search, but instead allocates a
- certain number of trials, in a principled
- way, to choices that are known to be
- inferior
138DETERMINISM IN THE SEARCH
- Genetic programming conducts its search
- probabilistically
139EXPLICIT KNOWLEDGE BASE
- Genetic programming does NOT make use
- of a knowledge base
140ROLE OF FORMAL LOGIC IN THE SEARCH
- Genetic programming does not utilize
- formal logic in its search strategy.
Contradictory alternatives are created and
actively maintained.
141SOURCE
- Genetic programming is biologically inspired.
142RESULTS
- Genetic programming now routinely delivers
high-return human-competitive machine intelligence