Behavior of VariableLength Genetic Algorithms Under Random Selection - PowerPoint PPT Presentation

1 / 72
About This Presentation
Title:

Behavior of VariableLength Genetic Algorithms Under Random Selection

Description:

Unrestricted growth in VLGA chromosomes is know as 'Bloat' ... Bloat control in EC ... Bloat related to selection pressure. ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 73
Provided by: halstr
Category:

less

Transcript and Presenter's Notes

Title: Behavior of VariableLength Genetic Algorithms Under Random Selection


1
Behavior of Variable-Length Genetic Algorithms
Under Random Selection
  • A thesis submitted for the degree of Master of
    ScienceSchool of Electrical Engineering and
    Computer ScienceCollege of Engineering and
    Computer ScienceUniversity of Central Florida -
    Orlando, Florida
  • Hal Stringer
  • Spring 2007

2
Defense Overview
  • Introduction and Motivation
  • Overview of EC and GAs
  • Literature Review
  • Variable-Length Structures in EC
  • Variable-Length Genetic Algorithms
  • GAs under Random Selection
  • Over Production of Shorter-Than-Average Children
  • Increase in Size Diversity
  • Putting It All Together
  • New Ideas for Bloat Control
  • Conclusions

3
Introduction and Motivation
  • Why Variable-Length GAs (VLGAs)?
  • Fixed Length GAs dominate current GA
    research/practice
  • Must know number of variables a priori
  • VLGAs have potential to solve more complex and
    open-ended problems
  • of Genes of Variables
  • Example Finding location of cell towers
  • Example Finding rules for autonomous agents
  • Why Random Selection?
  • VLGAs suffer from bloat...growth in chromosome
    size
  • Major stumbling block to practical use of VLGAs
  • VLGAs do not grow under random selection
  • Instead they shrink!
  • Understanding why VLGAs shrink under random
    selection
  • provides new information for understanding bloat
    and
  • new ideas for bloat control methods

4
Narrowing Down the Topic
EC
5
Genetic Algorithms
  • A search technique inspired by biology and
    principles of evolution
  • Natural Selection (survival of the fittest)
  • Genetic Reproduction (DNA)
  • GA applications
  • Function Optimization
  • Scheduling
  • Pattern Recognition
  • Robot Control
  • Simulated Evolution
  • Many Others
  • Goal is to maximize/minimize individual fitness
    thus finding an optimal or near optimal solution

6
Components of a Genetic Algorithm
  • Population of Individuals (Chromosomes)
  • Each Individual represents a possible solution
  • Individuals typically coded as binary strings
  • Fitness Function
  • Evaluates individuals and returns a numeric value
    indicating its fitness (closeness to optimal
    solution)
  • Selection Method
  • Chooses individuals for reproduction
  • Exploits good partial solutions by mating best
    parents
  • Genetic Operators
  • Recombines partial solutions (e.g., bit strings)
    from two parents
  • Crossover, Mutation most common

7
Variable-Length GAs
  • Uses similar operators as SGA
  • Primary difference is in crossover operator
  • points chosen separately on each parent, usually
    at random
  • same as SGA crossover only if both points match
  • VLGA with single point crossover
  • 000010101 111001000 011110110 -gt 000010101
    000000010
  • 000000010 101110000 000000010 -gt 000000010
    101110000 111001000 011110110
  • Different crossover points cause children of
    different lengths
  • Over time VLGA chromosomes grow in length unless
    restrained by some mechanism
  • Unrestricted growth in VLGA chromosomes is know
    as Bloat
  • Bloat does not seem to occur when selection is
    random

8
Effect of Selection on Length
9
Hypothesis
  • The average size of individuals in a
    variable-length GA will shrink over time given a
    selection function that is random (aka, uniform,
    constant)
  • Based on prior work by other researchers in GP
    and our own work on the Chunking GA
  • Believe size reduction occurs due to three forces
    at work over the course of a GA run
  • Increase in Size Diversity Within the Population
  • Over Production of Shorter-Than-Average Children
  • Stochastic Errors During Selection
  • Our goal is to determine why and how this happens
  • What are the mechanisms, processes and conditions
    required to make this occur

10
Literature Review
  • Topic of interest is VLGAs under Random
    Selection
  • Not a lot found in the literature that addresses
    specific topic
  • Split research into two parts
  • Variable-length GAs
  • Bloat control in EC
  • From these two major topics gleaned whatever
    information was available on VLGAs under random
    selection

11
Variable Length Structures in EC
12
Variable-Length Genetic Algorithms
  • Variety of different forms
  • Found it necessary to classify these different
    GAs in some fashion
  • Chose to classify based on solution size
  • Fixed Solution Size
  • Bounded Solution Size
  • Unbounded Solution Size
  • Other classifications possible but this seems
    most appropriate for our work

13
VLGAs - Fixed Solution Size
  • Used for problems with required number of
    variables
  • Requires instantiation of all variables for
    fitness evaluation
  • Must address over- and under-specification
  • Goal is to have GA evolve solutions without
    positional biases bits/genes not restricted to
    single locus
  • Examples
  • MessyGA (Goldberg, Korb Deb, 1989)
  • Gene-Based Tagging GAs
  • The Virtual Virus (Burke, et al., 1998)
  • Proportional Genetic Algorithm (Wu Garibay,
    2002)

14
VLGAs - Bounded Solution Size
  • Applicable to problems with known of potential
    variables
  • Used with tags (each gene maps to specific
    variable/condition) or
  • Used without tags (limit to number of value
    options)
  • DOES NOT require instantiation of all variables
    for fitness evaluation
  • Must address over-specification
  • Goal is to have GA evolve solutions that include
    the best variables with the best values
  • Examples
  • GAs for MAV Rule Development (Wu, Schultz Agah,
    1999)
  • SAMUEL System (Grefenstette, Ramsey Schultz,
    1990)
  • PGA ... again (Wu Garibay, 2002)

15
VLGAs - Unbounded Solution Size
  • Applicable to problems with infinite of
    variables
  • Used without tags (but no limit to number of
    genes/variables)
  • Fitness evaluation possible for all potential
    solutions
  • May need to address over-specification
  • Goal is to have GA evolve solutions to complex
    open-ended problems with no restrictions on
    granularity
  • Examples
  • (Harvey, 1992a) argues in re SAGA
  • ...genotypes must be unrestricted in length if
    we are to evolve a structure with arbitrary and
    potentially unrestricted capabilities
  • Function identification using Voroni regions
  • (Kavka Schoenauer, 2003)

16
VLGAs - Summary
  • Given our classification scheme, believe this
    work will be applicable to GAs with bounded or
    unbounded solution sizes
  • Other characteristics
  • Any alphabet as long as it is finite and stable
  • Any base unit as long as stable and of fixed size
  • Absence of non-coding regions
  • Bits or genes are position independent

17
GP and Random Selection
  • Tackett, Recombination, Selection, and the
    Genetic Construction of Computer Programs, Ph.D
    Thesis 1994.
  • Under random selection bloating did not occur
  • Figure 7.1 actually shows a slight reduction in
    tree size
  • Langdon and Poli, Fitness causes bloat, 1997
  • Experiments confirm some of the Tacketts
    findings.
  • Bloat related to selection pressure.
  • The absence of selection pressure stops the
    growth of program size.
  • Found a slow reduction in program size under
    random selection.

18
Other Related Work
  • From Stephens and Waelbroeck
  • generically, there is no preference for short,
    low-order schemata. In fact, if schema
    reconstruction dominates, the opposite is true
    typically large schemata will be favored.
  • Explains why variable-length chromosomes grow
    rapidly in early generations
  • From Langdon, McPhee, Poli and Rowe
  • Development of exact schema theory for linear
    structures
  • consist solely of 1-arity functions and a single
    terminal.
  • similar to variable-length GAs

19
More from Langdon, McPhee, Poli Rowe
  • Results under standard crossover assuming an
    infinite population and constant fitness
    function
  • average size of individuals within a population
    remains constant from one generation to the next
  • fixed point equal to the average size of the
    initial population
  • distribution of lengths is not a fixed point but
    changes over time
  • shorter-than-average structures are sampled more
    often than larger ones
  • large individuals become larger but fewer in
    number
  • shorter individuals shrink but become more
    numerous.
  • distribution of lengths is a set of fixed points
    over time defined by a family of discrete gamma
    functions.

20
Intermission 1
  • What we know / have
  • For GAs (and GP with linear structures) subject
    to random selection, changes in chromosome sizes
    do occur
  • Infinite populations see changes in
    distribution of lengths
  • Finite populations see changes in average size
    of population
  • Hypothesis with three forces that may explain why
    this occurs in finite populations
  • Increase in Size Diversity Within the Population
  • Over Production of Shorter-Than-Average Children
  • Stochastic Errors During Selection
  • If correct, may be of benefit to variable-length
    GAs with bounded and unbounded solution sizes
  • What we dont know / dont have
  • Confirmation of hypothesis and full understanding
    of the mechanisms and process that cause
    reduction to occur
  • Ways in which to apply this knowledge to combat
    bloat

21
General Approach
  • How to show our hypothesis is correct for GAs
    under random selection?
  • Start with a tractable population
  • Perform probabilistic analyses
  • Perform confirming empirical analyses using
    models of same population
  • Compare with performance of similar experiments
    using other GAs and population types

22
A Tractable Population
  • Model population using a nxn matrix
  • Row/Column headings indicate length of each
    individual in the population
  • Each cell represents the possible mating of two
    parents
  • Probability of a single mating is 1/n2
  • Can be thought of as
  • Population of n individuals with lengths from 1
    to n, selection with replacement
  • Population of 2n individuals with lengths from 1
    to n, selection without replacement

Mating Probability Matrix
23
Counting Crossover Events and Children
  • A crossover event occurs when two parents are
    selected and two children are produced
  • Points are randomly chosen on both parents to
    determine crossover locations
  • The number of crossover events possible for a
    given mating is jk where j and k are the lengths
    of the two parents
  • Each crossover event yields two children

( crossover events / of children)
Crossover Events j,k jk
E j,k (children) 2jk
24
Counting Shorter Than Average Children
Pl Parent of length l Xi Crossover Point at
i Cl Child of length l l of cross over
points
( shorter than average children / of children)
  • For each mating, look at all possible crossover
    events and determine the length of all possible
    children produced.
  • Count the number of possible offspring which are
    shorter than the average size of the parent
    population
  • Divide shorter count by total to determine
    percentage of shorter than average children for
    given cell

25
Probability for Entire Matrix
  • Simplistic Approach
  • Divide total shorter children by total children
  • 130 450 .289 or approximately 29
  • But this is wrong
  • doesnt take into account mating probabilities
  • Better Approach
  • Weight the percentage of shorter than average
    children in each cell by the probability of a
    mating occurring in that cell.

26
Example of Better Approach
27
Calculating Over-Production of Shorter Than
Average Children for Any Size Matrix
  • Need to determine equation for calculating
    expected number of less than average children
  • Problem Not continuous function could be found
  • Must partition mating space into distinct regions
    or areas each with its own equation.

28
Partitioned Mating Space
29
Equations for Expected Number of Shorter than
Average Children by Area
  • The above were then used to calculate PrLT values
    for each area on a cell by cell basis with the
    exception of Area B (required calculating for
    entire area all at once)

30
Results of Probabilistic Analysis
31
Empirical Analysis
  • Conducted experiments to determine probabilities
    for different matrices varying in size from 1 to
    200
  • Shorter than Average
  • Longer than Average
  • Equal to Average
  • Results show curve approaching 55 for
    probability of shorter than average children.
  • Also performed final experiment with larger
    values of n499 and n500

32
Results of Simulations
33
Probabilistic vs. Empirical Analysis
  • Estimates from Equations lower than enumerative
    tests due to imprecise nature of calculation for
    Area B
  • Results for non-integer averages slightly lower

34
Percent of Probability
35
What About Real Populations?
  • GAs suffering from bloat contain a high average
    size population
  • Population gets stuck around the average with
    little variation
  • Very little chance of matings in A or C areas
  • Need greater size diversity to increase chances
    of better matings

36
Increase in Size Diversity
  • In Stringer Wu, 2005 presented empirical
    information about how crossover under random
    selection can maintain length diversity in a
    population of evenly distributed sizes
  • Looked at probabilities for different types of
    crossovers

37
Size Diversity Probabilistic Analysis
  • Take an approach similar to that used to
    determine probability of shorter than average
    children

( of possible crossover events)
38
Size Diversity - Probabilistic Analysis (cont.)
  • Assume theoretical population in an nxn matrix
  • For any give mating pair of sizes j k, where j,
    k ? n, percentage of crossover events by type is

39
Size Diversity - Probabilistic Analysis (cont.)
  • Percentages multiplied by Mating Probability
    Matrix (PrM) to determine contribution of each
    cell to each type of crossover event.

40
Size Diversity - Probabilistic Analysis (cont.)
41
Size Diversity Empirical Results
  • Percentage of crossover events by type for nxn
    population matrices varying in size from n1 to
    200. Event probabilities weighted based on
    mating occurrences.

42
Speed of Diversification
  • The surprising element in this part of the
    research was the speed of diversification.
  • Happened quickly for most initial distributions
  • Began moving to gamma like distribution

Probability of size distributions over time for a
GA with an initial population (Gen 0) uniformly
distributed with respect to size.
43
Speed of Diversification(starting with single
point distribution)
Probability of size distributions over time for a
GA with an initial population (Gen 0) set to a
single distribution point (100) with respect to
length.
44
Speed of Diversification(starting with single
point distribution)
Scatter plot showing sizes of each individual in
a modeled GA population after two generations.
The initial population (Gen 0) contains only
individuals of size 100.
45
Speed of Diversification(starting with two-point
distribution)
Probability of size distributions over time for a
GA with an initial population (Gen 0) set to a
two different distribution points (20, 180) with
respect to length.
46
Speed of Diversification(starting with two-point
distribution)
Scatter plot showing sizes of each individual in
a modeled GA population after two generations.
The initial population (Gen 0) contains equal
numbers of individuals of size 20 or 180.
47
Speed of Diversification(starting with upper
region distribution)
Probability of size distributions over time for a
GA with an initial population (Gen 0) whose size
distribution includes only longer individuals (
180).
48
Speed of Diversification(starting with upper
region distribution)
Scatter plot showing sizes of each individual in
a modeled GA population after two and ten
generations. The initial population (Gen 0)
contains only individuals of length greater than
180.
49
Stochastic Errors During Selection
  • Using a finite population with random selection
    does not insure conservation of material
  • Due to stochastic errors in selection process,
    may not choose long individuals equal to their
    proportion in the population
  • May over-select smaller individuals to be parents
    for the next generation (speeds up reduction) or
  • May under-select smaller individuals (slow down
    reduction)
  • Intuitively, the fewer large individuals the
    greater chance of missing one of them.

50
Chunking GA An Example of the Process
Partition of a ChGA run for 3x8 MaxSum Problem
using 8 Memory Slots. Average Base Chromosome
Length shown with /- one standard deviation.
51
Intermission 2
  • What weve learned
  • Better understanding of the mechanisms and
    processes that cause average chromosome size to
    decrease in finite population variable-length GAs
  • Probabilistic and Empirical Analysis shows a 55
    chance that a GA will produce shorter than
    average children if
  • Population size is distributed uniformly
  • Diversity of chromosome sizes within a single
    generation is sufficiently large
  • Selection is random
  • GAs under random selection create their own size
    diversity
  • Stochastic errors in sampling can speed up (or
    hinder) the process
  • Whats left?
  • Ways in which to apply this knowledge to combat
    bloat

52
Combating Bloat
  • Thesis does not directly investigate cause of
    bloat. Believe it is related to prior work by
    Stephens and Waelbroeck on exact schema theory.
  • Chromosomes grow to better distribute building
    blocks along the chromosome and improve chances
    of creating more fit individuals
  • Most work in bloat control has been related to
    artificially reducing growth via fitness
    evaluation, selection or tailored operators
    (e.g., parsimony pressure, truncation)
  • Assuming Stephens and Waelbroeck are correct, we
    cant eliminate the cause of bloat but can try
    the following
  • Deterministically manage growth (SAGA Cross)
  • Convert variable-length representations to
    fixed-length on the fly (ChGA)
  • Control location of building blocks (aka.
    distribution causes bloat)
  • Find ways to take advantage of GAs natural
    tendency to evolve shorter chromosomes under
    random selection

53
Bloat Control Ideas
  • Maintain size diversity
  • Turn random selection on off periodically
  • Longer Later vs. Shorter Sooner
  • Truncate randomly rather than at size cap

Size probability distributions for 200 member GA
with initial population uniformly distributed
with respect to size. Any children of length gt
200 are either 1) right truncated at 200 or 2)
truncated randomly.
54
Bloat Control Ideas (cont.)
  • Base selection on partitioned map of mating space
  • Use a probabilistic rather than deterministic
    approach
  • P1 gt .75A
  • choose a P2 such that mating is in area A or C
  • .5A lt P1 lt .75A
  • choose a P2 such that mating is in area C or B
  • .25A lt P1 lt .5A
  • choose a P2 such that mating is in area C or D
  • P1 lt .25A
  • choose a P2 such that mating is in area D or E

55
Bloat Control Ideas (cont.)
  • Match Maker selection matrix
  • At each generation
  • Determine fitness of all individuals
  • Select best fit individual for each size
  • Generate pLT matrix for all possible matings of
    best fit
  • Create match maker matrix (or vector) combining
    weighted fitness and percentage of shorter than
    average children possible from the mating
  • Sort based on best result
  • Use top matches for reproduction

56
Conclusions
  • Created a new classification system for VLGAs
    based on solution size
  • Presented a detailed analysis of the behavior of
    genetic algorithms under random selection
  • Identified and explained the mechanisms and
    forces that make reduction happen
  • Increase in Size Diversity Within the Population
  • Over Production of Shorter-Than-Average Children
  • Stochastic Errors During Selection
  • Showed how a large variation in size within a
    population is a precondition
  • Explain the behavior of the Chunking GA

57
Conclusions (cont.)
  • Developed tools for other GA researchers
  • Partitioned map of the mating space (five areas)
  • Equations for calculating of shorter than
    average children
  • Methods for computing the probability of
    producing shorter than average children for
  • any single mating (pLTj,k PrMj,k)
  • an entire population (PrLT)
  • Equations for calculating percentages of possible
    crossover events by type (inside, outside and
    equals)
  • But improving our understanding of the impact of
    mating choices on the size of offspring may be
    most important
  • Opens up new avenues of exploration related to
    bloat control in genetic algorithms as well as EC
    in general.
  • Offered several new ideas for controlling bloat
    to be investigated in future research

58
SUPPORT SLIDES
59
Evolutionary Computation
  • GA Genetic Algorithm (Holland)
  • EP Evolutionary Programming (Fogel, Owens
    Walsh)
  • ES Evolutionary Strategies (Rechenberg
    Schwefel)
  • GE Grammatical Evolution (Ryan, Collins
    O'Neill)
  • GP Genetic Programming (Koza)
  • LCS Learning Classifier Systems (Holland
    Smith)
  • Other biologically Inspired Paradigms

60
Evolutionary Computation
  • Most EC paradigms share common traits
  • the concept of a population of individuals which
    represent possible solutions to a given problem,
  • a notion of fitness associated for each
    individual,
  • a birth-reproduction-death cycle repeated for
    some number of generations, and
  • use of Darwinian-inspired operators or processes
    to create new individuals from selected members
    of the current population
  • e.g., selection, crossover, mutation

61
Population of Individuals
  • Single Individual (9-bit, 3-gene example)
  • Bit strings converted to integer values (most
    common)
  • 000010101 111001000 011110110
  • 21 -56 246
  • Population
  • n-Bit strings, each representing a different
    solution
  • 1 000010101 111001000 011110110
  • 2 000000010 101110000 000000010
  • n 111001010 100010100 000000000
  • Search space is 227

62
Fitness Function
  • Determines the fitness of each individual or how
    close the solution represented by chromosome is
    to optimal
  • Seen by GA as a black box
  • 1 000010101 111001000 011110110 --gt --gt 2534
  • 2 000000010 101110000 000000010 --gt --gt 154
  • 3 000000010 101110000 000000010 --gt --gt 154
  • n 111001010 100010100 000000000 --gt --gt 77469

Fitness Function
63
Selection Function
  • Method for choosing which individuals will be
    mated to create the next generation
  • Goal of selection is to find best partial
    solutions to pass on to next generation
    (exploitation)
  • Variety of selection methods
  • Fitness Proportional
  • Rank Proportional
  • Tournament Selection
  • Random / Uniform / Constant

64
Genetic Operators
  • Used to mix genetic material within the
    population. Based on rates set at run time.
  • Crossover - (exploitation exploration)
  • 000010101 111001000 011110110 -gt 000010101
    101110000 000000010
  • 000000010 101110000 000000010 -gt 000000010
    111001000 011110110
  • Mutation (exploration)
  • 1 0
  • 000010101 101110000 000000010
  • Many different versions of crossover and mutation
    operators exist. Also other types of operators.

65
Simple GA Pseudo-Code
  • procedure SGA
  • initialize population
  • while (stopping condition not satisfied)
  • // Evaluate Current Population
  • for (j1 to population size)
  • evaluate fitness of individual j
  •  
  • // Select and Reproduce
  • for (j1 to population size / 2)
  • select two parents for reproduction
  • crossover to produce two children
  •  
  • // Mutate Chromosomes of Children
  • for (j1 to population size)
  • perform mutation on child j

66
Bloat Control in GAs
  • Causes of Bloat
  • high mutation rates (Ramsey et al., 1998)
  • additional genetic material is free and provides
    more exploration (Burke et al., 1998)
  • Bloat Control Mechanisms
  • Use of competitive templates (MessyGA)
  • Parsimony pressure added to fitness (Virtual
    Virus)
  • Right truncation (PGA)
  • Removing duplicates after crossover (SAMUEL)
  • Controlled growth through crossover (SAGA)

67
Bloat Control in GP
  • Sample of Causes of Bloat
  • hitchhiking of non-coding regions (Tackett, 1994)
  • defense against crossover (Blickle Thiele,
    1994)
  • removal bias (Soule Foster, 1998)
  • fitness causes bloat (Langdon Poli, 1997)
  • depth of modification points within a GP parse
    tree (Luke, 2003)
  • Sample of Bloat Control Mechanisms
  • Parsimony pressure added to fitness
  • Reproduction restrictions
  • New Ideas (Panait Luke, 2004)
  • Multi-Objective Optimization (size vs. raw
    fitness)
  • Waiting Room
  • Death by Size for Steady State GP

68
Other Related Work
  • Poli, R. and Langdon, W.B., A new schema theory
    for genetic programming with one-point crossover
    and point mutation. 1997
  • Stephens, C.R. and Waelbroeck, H., Effective
    degrees of freedom in genetic algorithms and the
    block hypothesis. 1997
  • Stephens, C.R. and Waelbroeck, H., Schemata
    evolution and building blocks. 1999
  • Poli, R., Exact schema theorem and effective
    fitness for GP with one-point crossover. 2000
  • McPhee, N.F. and Poli, R., A schema theory
    analysis of the evolution of size in genetic
    programming with linear representations. 2001
  • Poli, R. and McPhee, N.F., Exact schema theorems
    for GP with one-point and standard crossover
    operating on linear structures and their
    application to the study of the evolution of
    size. 2001
  • Rowe, J.E. and McPhee, N.F., The effects of
    crossover and mutation operators on variable
    length linear structures. 2001

69
Changing the Average and Variance
70
Changing the Average Variance (cont.)
  • Effect of changes in variance on probability of
    children whose size is less than the mean length
    of the parent population matrix. n is equal to
    the average times 2.

71
Changing the Variance Only
72
Changing the Variance Only (cont.)
  • Effect of changes in variance (given a fixed
    average) on probabilities of children whose size
    is less than the average size of the parent
    population for a nxn.
Write a Comment
User Comments (0)
About PowerShow.com