Advanced Artificial Intelligence Lecture 18: Genetic Programming

1 / 81
About This Presentation
Title:

Advanced Artificial Intelligence Lecture 18: Genetic Programming

Description:

If Darwinian evolution can create solutions to complex problems of survival in ... A*B and B*A can be arbitrarily far apart. Depending on complexity of A and B ... – PowerPoint PPT presentation

Number of Views:304
Avg rating:3.0/5.0
Slides: 82
Provided by: scSn

less

Transcript and Presenter's Notes

Title: Advanced Artificial Intelligence Lecture 18: Genetic Programming


1
Advanced Artificial IntelligenceLecture 18
Genetic Programming
  • Bob McKay
  • School of Computer Science and Engineering
  • College of Engineering
  • Seoul National University

2
Outline
  • Genetic Programming
  • Introduction
  • Applications
  • Example
  • Representation
  • EDAs in GP

3
Evolutionary ComputationUnderlying Idea
  • If Darwinian evolution can create solutions to
    complex problems of survival in the natural
    World.
  • .Why not apply it to creating solutions to
    problems of interest to us?

4
Evolutionary ComputationGeneric Generational
Algorithm
  • Generate the initial population P(0)
    stochasticallyREPEAT Evaluate the fitness of
    each individual in P(t) Select parents from P(t)
    based on their fitness Use stochastic variation
    operators to get P(t1)UNTIL termination
    conditions are satisfied

5
Evolutionary Computation Variants
  • Evolution Strategies
  • Evolutionary Programming
  • Genetic Algorithms
  • Classifier Systems
  • Genetic Programming

6
Tree Based Genetic Programming
  • Original Idea
  • Evolve populations of trees representing problem
    solutions
  • Cramer (1985) Schmidhuber (1987) Koza (1992)
  • Closure assumption any function can apply to any
    argument

7
GP Initialisation
  • Typical ramped half-and-half initialisation
  • Ramped
  • Choose a lower and upper bound for tree depth
  • Generate trees with maximum depths distributed
    uniformly between these bounds
  • Half and half
  • 50 full trees
  • At depth bound, nodes chosen uniformly randomly
    from constant symbols
  • Elsewhere, nodes chosen randomly from the
    function symbols
  • 50 grow trees
  • At depth bound, nodes chosen randomly from
    constant symbols
  • Elsewhere, nodes chosen randomly from all symbols

8
GP Selection
  • Truncation selection
  • Select the best k of the population
  • Generally too eager
  • Fitness proportionate selection
  • Probability of selection proportionate to fitness
  • Tournament selection
  • Choose k individuals uniformly randomly
  • Select the best of those individuals
  • Eagerness tunable by k
  • Larger k more eager algorithm
  • The most commonly used today

9
Stochastic Variation Operator Mutation
  • Randomly choose a node in the parent tree
  • Delete the sub-tree below that node
  • Generate a new random sub-tree

10
Stochastic Variation Operator Crossover
  • Randomly choose a node in each parent tree
  • Exchange the sub-trees rooted at those points

11
Grammar-Based Representations
  • The chromosome is a derivation tree in a
    predefined grammar
  • S ? B
  • B ? B or B
  • B ? B and B
  • B ? not B
  • B ? if B B B
  • B ? a0 a1
  • B ? d0 d1 d2 d3

12
Graph-Based Representation
  • PADO Teller Veloso 1995
  • Graph represents execution sequence
  • Permits parallelism
  • Wide range of variants

13
Logic-Based Representation
  • Representation is a subset of prolog
  • A number of implementations subsequent to
    Giordanas REGAL (1993)

14
Linear Chromosomes
  • Wide variety of approaches
  • Machine-code Genetic Programming
  • The genotype is a machine-code program
  • Stack-based Genetic Programming
  • The genotype is a program in a stack-based
    language
  • Somewhat forth-like
  • Grammar-based (Grammatical Evolution)
  • The genotype is a linear representation of a
    grammar derivation tree

15
Developmental GP
  • The chromosome is a program for generating the
    phenotype to be evaluated
  • Cellular developmental systems
  • The program specifies rules for iteratively
    rewriting the graph representing the phenotype
  • Best known example the phenotype is a circuit
    diagram
  • L-systems
  • The genotype is an L-system
  • The phenotype is the generated tree

16
Turing Completeness and Genetic Programming
  • Generally Turing Complete
  • Stack-based GP
  • Machine coded GP
  • Graph-based GP
  • Logic-based GP
  • Generally Turing Incomplete
  • Tree-based GP
  • Grammar-based GP
  • Grammatical Evolution
  • Developmental GP
  • Does Turing completeness matter?
  • The overwhelming majority of applications dont
    use Turing completeness
  • The primary focus of this tutorial will be on
    Turing-incomplete search spaces

17
Electronic Design
  • Koza et al Zobel filter

18
Quantum Algorithms
  • Barnum, Bernstein and Spector Depth One OR Query

19
Control System Parameters
  • Koza et al
  • Parameter Equations for Proportional Integral
    Derivative (PID) Controller

20
Bioinformatics
  • Wide variety of applications
  • Well known Motif detection for gene families
  • D-E-A-D
  • manganese superoxide dismutase
  • Koza et al 1999

21
Medical Data Mining
22
Antenna Design
  • Lohn et al (2003)
  • Design of wire antenna for NASA spacecraft

23
Chemical Dynamics Modelling
  • Evolving systems of differential equations
  • Predicting discharge behaviour of a battery
  • Cao et al 2000

24
Ecological Modelling
25
Some GP Implementations
  • C/C based GP systems
  • http//garage.cps.msu.edu/software/software-index.
    html
  • http//beagle.gel.ulaval.ca/index.html
  • Java-Based
  • http//cs.gmu.edu/eclab/projects/ecj/
  • PushGP
  • http//hampshire.edu/lspector/push.html
  • Grammatical Evolution
  • http//www.grammatical-evolution.org/src.html
  • DCTG-GP
  • http//sandcastle.cosc.brocku.ca/bross/research/
  • TAG3P
  • http//www.cs.adfa.edu.au/z3013620/we/hoai.htm

26
Using GP Typical Steps
  • Choose your favourite GP system
  • Define the Chromosome
  • Write code to implement the fitness function
  • Set values for evolutionary parameters
  • Population size
  • Stopping criteria
  • Minimum and maximum genotype sizes
  • Tournament size
  • etc

27
Simple Example 6 Multiplexer
  • Boolean Circuit
  • Six inputs
  • two address lines
  • 4 data lines
  • One output

28
The 6 Multiplexer Problem
  • From the 64 input-output pairs
  • Learn an appropriate program

29
6 Multiplexer in DCTG-GP Grammar and Semantics
  • bool terminalT ltgt(value(Input,V) -
    Tvalue(Input,V)).
  • bool and, boolB1, boolB2
    ltgt(value(Input,V) - B1value(Input,V1), B2
    value(Input,V2), (V1 0 V2 0) -gt V
    0 V 1.
  • bool or, boolB1, boolB2
    ltgt(value(Input,V) - B1value(Input,V1), B2
    value(Input,V2), (V1 1 V2 1) -gt V
    1 V 0.
  • bool not, boolB1 ltgt(value(Input,V)
    - B1value(Input,V1), V is 1 - V1).

terminal a0 ltgt(value(A0,_,_,_,_,_,A0)-
true). terminal a1 ltgt (value(_A1,_,_,_,_
,A1)-true). terminal d0 ltgt
(value(_,_,D0,_,_,_,D0)-true). terminal
d1 ltgt (value(_,_,_,D1,_,_,D1)-true). termi
nal d2 ltgt (value(_,_,_,_,D2,_,D2)-true
). terminal d3 ltgt(value(_,_,_,_,_,D3,D3)
-true).
30
6 Multiplexer in DCTG-GPFitness and parameters
  • Fitness is the proportion of the 64 instances
    correctly predicted
  • Max-depth(8).
  • The maximum acceptable depth of the derivation
    trees
  • Population_size(150).
  • The number of individuals
  • Generations(500).
  • How long to run for
  • Prob_crossover(0.9).
  • Crossover rate
  • Prob_mutation(0.1).
  • Mutation rate
  • Tournament_size(3).
  • Determines the eagerness of the search
  • Etc.

31
Representation in GP
  • Representation is a key issue in intelligent
    systems
  • Emphasis on
  • Sufficiency - the representation can encode the
    class of problems
  • Effectiveness - the representation permits simple
    search
  • Also a key issue in evolutionary systems
  • How to design representations which give rise to
    smoother fitness landscapes?

32
Representation GP vs GA / ES
  • Much of our insight into GP representation comes
    from studies in Genetic Algorithms and Evolution
    Strategies
  • However these insights must be tempered by key
    differences between GP and GA / ES
    representations
  • Feasibility and Connectivity
  • Neighbourhood Complexity
  • Genotype complexity

33
Threshold Question
  • What should we view as the underlying distance
    metric in GP genotype/phenotype spaces?
  • Need a natural analogue of Manhattan distance in
    GA
  • Fine-grained enough to underly other distance
    metrics based on search operators
  • Taking into account both
  • Content variation (as in GA)
  • Structure change

34
Edit Distance
  • We follow OReilly (1997) in viewing edit
    distance as a natural underlying metric
  • Edit distance is the number of operations
    required to transform one genotype into another
  • Single node insertions
  • Single node deletions
  • Single node content substitutions
  • Many variants, but generally only differ by O(1)
  • But edit distance ignores symmetries of the
    domain
  • AB and BA can be arbitrarily far apart
  • Depending on complexity of A and B
  • Can we design a better (perhaps domain-sensitive)
    metric?

35
Feasibility and Connectivity
  • In GA, the Manhattan distance metric passes
    through the feasible region
  • That is, for any distance ? lt d(A,C), there is a
    valid genotype B with d(A,C) d(A,B)d(B,C) and
    d(A,B) ?
  • In most GP representations, this is not the case
  • Deletion and insertion from a GP tree will
    usually result in an invalid tree
  • At best, wrong number of arguments for functions
  • At worst, no longer a tree
  • Feasible paths may be unboundedly longer than
    feasible

36
Neighbourhood Complexity
  • For an individual A, the ?-neighbourhood is
    defined
  • N?(A) X d(A,X) lt ?
  • Neighbourhood size N?(A)
  • In GA/ES, N?(A) is generally independent of A
  • In GP, N?(A) varies over the search space
  • If the search space is unbounded, the
    neighbourhood size is generally monotonic in the
    size of A
  • If we impose size or depth bounds, it may be
    non-monotonic
  • Neighbourhood connectivity
  • In GA / ES, neighbourhoods are graph-connected
  • In many GP representations, neighbourhoods are
    not connected

37
Genotype Complexity
  • Virtually all GP representations offer
  • (in principle) unbounded size individuals
  • (in practice) individuals of varying complexity
  • GA / ES representation studies generally assume
    fixed complexity

vs
38
Problem Specific Representation
  • In most areas of evolutionary computation, there
    is a strong emphasis on tailoring problem
    representations
  • There are many feasible representations for a
    problem
  • The representation is chosen to optimise
    performance
  • Suitable (problem specific) operators
  • Smooth fitness landscape
  • Redundancy and neutral paths

39
Problem Specific Representation GP
  • Most GP representations are generic
  • One basic representation fits all problems
  • Permit only the tailoring necessary to encode the
    problem
  • Suitable function set

40
Problem Specific RepresentationGrammars
  • Grammar-based GP permits further tailoring
  • The same problem search space may be encoded
    multiple ways
  • Usually used to bias search toward particular
    subspaces
  • Little emphasis on transforming the fitness
    landscape or tailoring operators
  • It is unclear whether grammar tailoring can
    usefully transform the search space
  • Standard Context-Free Grammar GP introduces
    stronger feasibility constraints than standard GP
  • The search spaces may be even more disconnected

41
Problem Specific RepresentationSummary
  • Problem-specific encoding in current GP systems
    is very weak
  • There is a need for more flexible representations
    permitting tailoring of the search space and
    fitness landscape
  • For the moment, the focus is on the properties of
    generic representations
  • There is still enormous potential for logic-based
    representations

42
Structural Difficulty and Connectivity
  • Daida has demonstrated that standard tree-based
    GP cannot search some regions effectively
  • GP cannot effectively search for very full or
    very narrow trees
  • Not (as Daida argues) a consequence of tree
    representation
  • With variable-arity trees (TAG3P) we are able to
    solve even with hillclimbing search
  • Rather, a consequence of poor neighbourhood
    connectivity

43
Genotype-Phenotype Mappings
  • Most GP representations do not have an explicit
    genotype-phenotype mapping
  • The genotype is the phenotype
  • Grammar-guided systems usually do have an
    explicit genotype-phenotype mapping
  • Desirable properties
  • Redundancy
  • Connectedness
  • Extension
  • Continuity

44
Redundant Genotype-Phenotype Mappings
  • Mappings in which many genotypes map to one
    phenotype
  • The pre-image of a point forms a neutral set
  • Extensively studied (in GA) by Shipman et al
  • Identified two desirable characteristics
  • Connectedness
  • The pre-image of a point is connected
  • Hence forms a neutral path
  • Extension
  • The pre-images of points are intertwined
  • Permitting movement between neutral paths
  • Most GP Genotype-Phenotype maps appear to be
    neither connected nor extensive

45
Continuity in Genotype-Phenotype Mappings
  • Continuity
  • Neighbourhoods map to neighbourhoods
  • Ie small genotype changes result in small
    phenotype changes
  • Sometimes known as strong causality
  • Often not a property of GP Mappings

46
Operators
  • Feasibility constraints make it difficult to
    design fine-grained operators
  • With tree-based GP, the minimum step size is
    determined by the height of the node
  • Operators applied high in the tree cause large
    steps
  • A key reason why standard GP converges top-down
  • Feasibility constraints make it difficult to
    separate structure modification and content
    modification
  • Search must optimise both at once

47
Context Free Grammars
  • Grammar represents structure of solution space
  • S ? B
  • B ? B or B
  • B ? B and B
  • B ? not B
  • B ? if B B B
  • B ? a0 a1
  • B ? d0 d1 d2 d3

48
Grammar Guided Genetic Programming (GGGP)
  • Problem space represented by a Context Free
    Grammar G
  • Individuals are derivation trees in G
  • Crossover uses sub-tree crossover
  • But the root nodes must have the same label
  • Mutation uses sub-tree mutation
  • But the generated sub-tree must be consistent
    with the grammar

49
GGGP Representation Properties
  • Problem specific representation
  • The same language can be represented multiple ways
  • S ? B
  • B ? B Op B
  • B ? not B
  • B ? if B B B
  • B ? a0 a1
  • B ? d0 d1 d2 d3
  • Op ? and or
  • It is unclear whether this can usefully improve
    the fitness landscape
  • However it is clear that poor choice of grammar
    can lead to serious search problems
  • Eg problems with initialisation

50
GGGP Properties (2)
  • Structural Difficulty
  • Likely to be worse than tree-based GP
  • Neighbourhood connectivity very poor
  • Because of the constraints imposed by grammar
  • Genotype-Phenotype mapping
  • Redundant
  • Dis-connected
  • Extensive
  • Continuous

51
GGGP Properties (3)
  • Operators
  • Structure modification and content modification
    may be partially separated by choice of grammar
  • Content as lexicon
  • The second grammar above
  • Step size distribution similar to standard GP
  • Extremely difficult to define new
    grammar-consistent operators

52
The Grammatical Evolution Transformation
  • Inorder traversal of numbered productions
  • S ? B
  • B ? a0
  • B ? a1
  • B ? d0
  • B ? d1
  • B ? d2
  • B ? d3
  • B ? B or B
  • B ? B and B
  • B ? not B
  • B ? if B B B

1 4 5 4 6 10 9 4 6 8 7
53
Grammatical Evolution
  • Problem space represented by linear strings of
    integers
  • Can apply normal GA-style operators
  • Genotype-phenotype mapping uses the GE
    transformation
  • Using modular arithmetic to guarantee feasibility

54
GE Representation Properties
  • Problem specific representation
  • Essentially the same issues as GGGP
  • Genotype-Phenotype mapping
  • Redundant
  • Dis-connected
  • Non-extensive
  • Highly dis-continuous
  • Genotype-Phenotype mapping is context-dependent
  • The same genotypic component generates different
    phenotypic components depending on context

55
GE Properties (2)
  • Structural Difficulty
  • Unlikely to be better than GGGP?
  • Neighbourhood connectivity very poor
  • Because of the constraints imposed by grammar
    perators
  • Operators
  • Difficult to separate structure and content
    modification
  • Because of the discontinuity of the
    genotype-phenotype mapping
  • Genotypic step size readily controllable
  • Corresponds to highly discontinuous phenotypic
    step size
  • Simple to define new grammar-consistent operators

56
  • Introduction
  • Problems and Issues
  • Representation
  • GP vs GA/ES
  • Issues in GP Representation
  • Examples
  • GGGP Representation
  • GE Representation
  • TAG Representation
  • Tree Adjunct Grammars (TAGs)
  • TAG for GP
  • TAG3P Representation Properties
  • Search Algorithms
  • Structural Components and Incremental Learning
  • Population Structure
  • Biases in Learning
  • Computational Cost

57
Tree Adjunct Grammars Motivation
  • A confession
  • Originally conceived to address perceived
    problems with many-one nature of GE
    genotype-phenotype map
  • Actually, doesnt solve those problems
  • Does have many other useful representational
    properties
  • Fortunately

58
Tree Adjunct Grammars (TAGs)
  • Arise from more modern efforts to represent
    natural language
  • Joshi et al 1975
  • Two types of elementary trees
  • ? trees
  • Represent complete syntactic units
  • ? trees
  • Represent insertable elements
  • Must have an identical non-terminal at root and
    at frontier
  • The foot node

59
TAG Elementary Trees
  • ? tree examples
  • ? tree example

60
TAG Operations
  • Adjunction
  • Substitution



61
TAG to CFG Mapping
  • Derivation Tree
  • Derived Tree

62
TAG Genetic Programming (TAG3P)
  • Basic Form
  • Problem space represented by a TAG Grammar G
  • Individuals are derivation trees in G
  • Crossover uses sub-tree crossover
  • But the root nodes must have the same label
  • Mutation uses sub-tree mutation
  • But the generated sub-tree must be consistent
    with the grammar

63
TAG3P vs GGGP
  • Both tree-based representations
  • What have we gained?
  • GGGP trees have fixed arity
  • Each production determines a fixed number of
    children
  • TAG trees have flexible arity
  • Any sub-tree may be deleted without affecting
    tree validity
  • This non-fixed-arity buys us flexibility

64
TAG3P Representation Properties
  • Problem specific representation
  • The same language can be represented multiple
    ways
  • Similar issues to GGGP
  • Tree adjunct languages are a super-set of context
    free languages

65
TAG3P Properties (2)
  • Operators
  • Structure modification and content modification
    separated through substitution and adjunction
  • Content as lexicon
  • Structure as TAG structure
  • Easy to define new operators
  • Point insertion
  • Point deletion
  • Transposition
  • Relocation
  • Duplication
  • Step size may be minimal (point
    insertion/deletion)

66
TAG3P Properties (3)
  • Structural Difficulty
  • Dramatically less than tree-based GP
  • Especially, using point insertion and deletion
    operators
  • Neighbourhood connectivity good
  • Because of the non-fixed-arity property
  • Genotype-Phenotype mapping
  • Redundant
  • Dis-connected
  • Extensive
  • Continuous

67
Some GP Representations
68
Summary Representation Issues
  • Connectedness of Neighbourhoods
  • Genotype-Phenotype Mappings
  • Connectedness
  • Extension
  • Continuity
  • Operators
  • Grain
  • Separation of Structure and Content Change
  • Problem-specific Representation

69
The Issue
  • GP research has demonstrated that
    population-based stochastic search is effective
    in GP problem spaces
  • But evolutionary search isnt the only
    population-based stochastic search algorithm
  • Could other search methods be effective?

70
Estimation of Distribution Algorithms
  • EDAs have been extremely successful in
    fixed-complexity search spaces
  • Could they extend to GP representations?
  • EDA for GP must explicitly distinguish between
  • Structure learning
  • Content learning
  • In EDA terms, explicitly distinguish
  • Probability Model
  • Probabilities
  • Two primary strands
  • Prototype Tree
  • Grammar-based

71
EDA Algorithm
  • Initialise the EDA probability
    modelRepeat Generate population from
    probability model Evaluate population
    fitness Optionally, generate a new probability
    model Update the probabilities in the
    model based on population fitnessUntil
    stopping criteria are satisfied
  • (GP algorithms to date use truncation
    selection, and update the probability tables to
    increase the probability of generating the
    truncated population)

72
Exploration vs Exploitation in EDA
  • If the algorithm simply learns the probability
    distribution from the current population, it will
    be overly-exploitative
  • Premature convergence
  • EDAs avoid this by using a uniform prior
    probability
  • Either explicitly, with a discount rate
  • Or implicitly, by generating additional
    individuals at random

73
Prototype Tree EDAs
  • The underlying model is a full tree of maximum
    arity
  • Each node holds a probability table for the
    content of the node
  • Original version (PIPE, Salustowicz
    Schmidthuber 1997) has the node probabilities
    independent
  • More recent versions learn dependent
    probabilities
  • Prototype tree gives position-dependent
    probabilities
  • Cannot learn position-independent building blocks

74
PIPE Prototype Tree
75
Grammar-Based EDAs
  • The underlying model is a stochastic Context-Free
    Grammar
  • B ? B or B 0.6
  • B ? B and B 0.3
  • B ? not B 0.1
  • Permits position-independent building blocks
  • B ? C or D 0.6
  • C ? C and C 0.8
  • D ? not D 0.8

76
Learning Building Blocks
  • Learning building blocks in a grammar-based EDA
    is (relatively) easy if the grammar already
    records the building block
  • Just a matter of learning the probabilities
  • Learning new building blocks requires learning
    new, more specific, grammar models
  • B ? C or D
  • C ? C and C
  • D ? not D
  • From
  • B ? B or B
  • B ? B and B 0
  • B ? not B
  • Grammar Learning is extraordinarily
    computationally intensive
  • Current grammar learning methods have been
    developed for other tasks and are not optimal for
    the purpose

77
Learning New Grammars
  • Grammar learning can be
  • Top-down or bottom-up (or inside-out)
  • Specific to general or general to specific (or
    both)
  • We have experimented with
  • Inside-out, general to specific
  • PEEL system, 2003
  • Bottom-up, specific to general
  • GMPE system, 2004
  • Clearly many other possibilities are possible
  • But may require identifying large repeated
    sub-trees

78
Ant-Based Search
  • Closely related to EDA search
  • Also uses probability tables to generate
    individuals
  • Primary differences
  • Learning is from individual ants rather than
    populations
  • Somewhat akin to the distinction between
    generational and steady-state evolutionary
    algorithms
  • Probability update functions are pragmatic
  • Rather than statistically based

79
EDA and Ant Search Issues
  • Can non-grammar representations learn
    position-independent building blocks?
  • If the probability model changes, what uniform
    prior should be used
  • The uniform prior on the original model?
  • The uniform prior on the new model?
  • Or some combination?
  • How should change of model be triggered?
  • How can grammar learning methods, developed for
    noise-free one-shot learning, be adapted for
    multi-shot learning from noisy data

80
Summary
  • Genetic Programming
  • Introduction
  • Applications
  • Example
  • Representation
  • EDAs in GP

81
?????
Write a Comment
User Comments (0)