Exact GP Schema Theorems. We now turn to Ch. 5 and the claim that, finally, schema theorems can be m - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Exact GP Schema Theorems. We now turn to Ch. 5 and the claim that, finally, schema theorems can be m

Description:

5 and the claim that, finally, schema theorems can be made precise enough to ... certain aspects of computational evolution (in particular, program bloat in GP) ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 25
Provided by: giampier
Category:

less

Transcript and Presenter's Notes

Title: Exact GP Schema Theorems. We now turn to Ch. 5 and the claim that, finally, schema theorems can be m


1
Lecture 5
  • Exact GP Schema Theorems. We now turn to Ch. 5
    and the claim that, finally, schema theorems can
    be made precise enough to provide a reliable
    predictive (at least theoretical) tool. The
    primary tool is the capability of accounting for
    schema creation (recall the results of Altenberg,
    which were really too complex to be useful), and
    those of Goldberg, use in the study of deception.
  • Let H denote a schema over some population, let
    a(H, t) denote the schema transition probability
    the probability that, at generation t, the
    individual produced by selection, cloning,
    crossover and mutation will match schema H.
  • The previous chapters show, if nothing else, that
    computing a(H, t) is difficult. Lets start from
    the other end if a(H, t) is known, what becomes
    easy?

2
Lecture 5
  • The first observation is if each
    selection/crossover is independent (usually so in
    GA and GP contexts) then the chance of each child
    matching H will be the same. But this means that
    m(H, t 1), the number of instances of schema H
    at time t 1, is binomially distributed the
    probability that the population contain exactly k
    instances (in M trials) of the schema at
    generation t 1 is given by
  • (For a quick introduction to Probability and
    Queuing Theory you may want to check some of the
    materials in giam/Mikkeli/Lecture5.ppt, and
    ../IntroToProbabilityAndApplications.pdf,
    ../IntroQueTheo.ppt)

3
Lecture5
  • We can compute the expected number of instances
  • as well as the variance (see Intropdf for
    details)
  • The schema transition probability at generation t
    corresponds to the expected proportion of the
    population sampling the schema H at generation t
    1.
  • Another useful idea is that of the
    Signal-to-Noise ratio for a schema

4
Lecture 5
  • We can also compute the probability of schema
    extinction in one generation Prm(H, t 1)
    0 (1 - a(H, t))M.
  • We now turn to another result that relates means
    and variances
  • Theorem. (Chebyshev Inequality) Let X be a
    random variable with mean m E(X) and variance
    s2 Var(X). Then, for any t gt 0,
  • Proof. Feller, p. 219.
  • If we take t ks and use the assumption that
    m(H, t 1) is binomially distributed with mean
    Ma(H, t) and variance Ma (H, t) (1 - a (H,
    t)), we can rewrite Chebyshevs Inequality to
    provide

5
Lecture 5
  • Theorem 5.2.1 (Two-Sided Probabilistic Schema
    Theorem). For any given constant k gt 0,
  • Proof. Let t ks. The previous theorem - with X
    replaced by m(H, t 1) - gives
  • Turning the inequality around, we obtain
    immediately
  • Replacing m and s

6
Lecture 5
  • Theorem 5.2.2 (Probabilistic Schema Theorem). For
    any given constant k gt 0,
  • Proof. This follows from the observation
    that since the first
    expression contains the area of the right tail,
    while the second doesnt. The previous theorem
    does the rest.
  • Note the results are valid for GA and GP
    contexts, They depend only on the transmission
    probability.

7
Lecture 5
  • Stephens and Waelbroeck (1997, 1999) For a
    bit-string genetic algorithm with one-point
    crossover applied with probability pxo,
    where L(H, i) the schema
    obtained by replacing all elements of H from i
    1 to N with dont care () symbols R(H, i) the
    schema obtained by replacing all elements of H
    from 1 to i with dont care () symbols p(H, t)
    is the probability of selecting an individual
    matching schema H to be a parent, which, under
    fitness proportionate reproduction is given by
  • Note although a(H, t) and p(H, t) are
    probabilities associated with the current
    generation, a(H, t) provides us with information
    about the next generation, while p(H, t) does not
    (except through this formula).

8
Lecture 5
  • Example of L(H, t) and R(H, t)

9
Lecture 5
  • How do we extend these results to GP (where
    things are more complicated)? Lets start with
    the simplest case all programs are of exactly
    the same size and shape. The formula becomes,
    allowing for translation into GP
    where N(H) number of nodes in schema
    H, with same size and shape as programs in
    population1/N(H) is the probability of choosing
    any one of them for crossover l(H, i) schema
    obtained by replacing all nodes above crossover
    by dont care symbols lower part of the schema
    u(H, i) schema obtained by replacing all nodes
    below crossover by dont care symbols upper
    part of the schema the summation is over all
    valid crossover points.

10
Lecture 5
  • Example

11
Lecture 5
  • Note the detailed internal structure of the
    formula depends on the specific ordering of the
    crossover points the result does not. Thus we
    need not worry about how we numbered the
    crossover points, since it is the shape and size
    of the program tree alone that determines the
    result.
  • The formula would be identical to that for GAs if
    all the functions at the interior nodes were
    unary functions See the text for a more
    detailed explanation.
  • We now need to extend all of this to populations
    of programs of arbitrary size and shape and
    functions of arbitrary arity. It will take a
    while

12
Lecture 5
  • Definition 5.4.1 (GP hyperschema). A GP
    hyperschema is a rooted tree composed of internal
    nodes from the set F ? and leaves from T ?
    , . The hyperschema function set is the
    function set used in a GP run, plus the
    hyperschema terminal set is the GP terminal set
    plus and . The operator is a dont care
    symbol which stands for exactly one node, while
    the operator stands for any valid subtree.
  • Note the introduction of the second terminal
    operator allows trees to change arbitrarily.

13
Lecture 5
  • We have now restricted Altenbergs ideas and
    generalized the ideas of Rosca and OReilly it
    may be enough (if done correctly).
  • Theorem 5.4.1 (Microscopic Exact GP Schema
    Theorem). The total transmission probability for
    a fixed-size-and-shape GP schema H under
    one-point crossover and no mutation
    is where the first
    two summations are over all individuals in the
    population
  • NC(h1, h2) is the number of nodes in the tree
    fragment representing the common region between
    program h1 and program h2
  • C(h1, h2) is the set of indices of the crossover
    points in the common region

14
Lecture 5
  • d(x) is a function that returns 1 if x is true
    and 0 if not
  • L(H, i) is the hyperschema obtained by replacing
    all nodes on the path between crossover point i
    and the root node with nodes, and all the
    subtrees connected to those nodes with nodes
  • U(H, i) is the hyperschema obtained by replacing
    the subtree below the crossover point i with a
    node.
  • Example start with a schema H ( ( x ))
    and examine the figure on the next slide.

15
Lecture 5
  • Ex. H ( ( x ))

16
Lecture 5
  • Proof. Let p(h1, h2, i, t) be the probability
    that, at generation t, the selection/crossover
    process will select h1 and h2 as parents and i as
    the crossover point. Consider
  • g(h1, h2, i, H) ? d(h1 Î L(H, i))d(h2 Î U(H,
    i)).
  • Given two parents, a schema and a crossover
    point, g returns 1 if crossing over h1 and h2 at
    position i yields an offspring which is an
    instance of H, and 0 otherwise. If h1, h2 and i
    are random variables with joint probability
    distribution (dependent on the generation) given
    by p(h1, h2, i, t), the expected value of g is
    just the sum
  • Now we need to find a usable expression for p(h1,
    h2, i, t).

17
Lecture 5
  • We add a reasonable assumption that the choice
    of crossover point is independent of the
    generation, and just depends on the two parents.
    Using conditional probabilities, we can
    obtain p(h1, h2, i, t) p(i h1, h2)p(h1,
    h2, t), where p(h1, h2, t) is the probability
    of choosing h1 and h2 at generation t (and must
    be generation dependent), while p(i h1, h2) is
    just the conditional probability of choosing
    crossover point i, given that we have h1 and h2
    as parents for crossover. Since the parents are
    chosen independently (selection with
    replacement), we can rewrite the formula
    as p(h1, h2, i, t) p(i h1, h2)p(h1,
    t)p(h2, t), where p(h1, t) and p(h2, t) are
    the selection probabilities for the parents.

18
Lecture 5
  • Also, in one-point crossover, p(i h1,
    h2) 1/NC(h1, h2), so that
  • When we substitute this in the summation, and
    adjust for the cloning/crossover possibilities,
    we obtain the formula of the theorem
  • We now have an expression for the total
    transmission probability a(H, t) of a GP schema.
    All the derivative results (expectation,
    variance, S/N ratio, extinction probability) can
    be also obtained.

19
Lecture 5
  • Our final schema theorem will use the set of all
    possible hyperschemata. We can certainly number
    all possible program shapes (schemata of order
    0), G0, G1, G2, There is no canonical
    ordering, but the number of program shapes (over
    finite function and terminal alphabets) of each
    depth is finite, so the numbering up to the
    shapes of any finite depth is, in principle,
    possible. The hyperschemata represent disjoint
    sets of programs, and their union represents the
    whole search space. In particular, for any
    program h1, We start by modifying the second
    term in the previous formula (recall all sums are
    finite)

20
Lecture 5
  • Rearranging the summations

21
Lecture 5
  • The expression
    is simply the probability that we extract
    a program h1 which both belongs to the
    hyperschema Gj and matches a specific
    generalization of the schema H. It can be
    restated as Using
    this simplification, we have
  • Theorem 5.4.3 (Macroscopic Exact GP Schema
    Theorem for One-point Crossover). The total
    transmission probability for a fixed-size-and-shap
    e GP schema H under one-point crossover and no
    mutation is where the
    sets L(H, i) Ç Gj and U(H, i) Ç Gk either are (or
    can be represented by) fixed-size-and-shape
    schemata or are Æ.

22
Lecture 5
  • The textbook has several examples where the
    computations are carried out, showing that it
    would be feasible to compare theoretical
    predictions to actual runs. We end this series
    of theoretical lectures with a final idea that
    of Effective Fitness.
  • Effective Fitness is a measure of the
    reproductive success of a schema that
    incorporates the creative and disruptive effects
    of the genetic operators. Without any mutation
    or crossover, fitness proportionate reproduction
    leads to the equation where
    all the terms have their usual meaning in
    particular, f(H, t) is the fitness of the schema
    H in generation t. We want an equation of the
    form

23
Lecture 5
  • Recalling from previous computations that Em(H,
    t 1) Ma(H, t), and that
    where p(H, t)
    is the probability of selecting schema H for
    reproduction under fitness proportionate
    selection,we have feff(H, t) (a(H, t)/p(H,
    t))f(H, t). Since a(H, t) is computable via
    formulae just derived, we have (at least for
    fixed-size-and-shape schemata under one-point
    crossover and no mutation)

24
Lecture 5
  • A similar formula can be obtained for Genetic
    Algorithm schemata - modifying the earlier result
    of Stephens and Woelbroeck.
  • Our text has a discussion comparing the meaning
    of fitness and effective fitness it appears that
    effective fitness can provide at least some
    plausible explanations for certain aspects of
    computational evolution (in particular, program
    bloat in GP).
  • It is plausible to conclude that the effective
    fitness is what leads to propagation down the
    generations (more than fitness itself) and that
    the search space is determined as least as much
    by effective fitness as by the fitness formula
    used statically on the individuals of each
    generation.
Write a Comment
User Comments (0)
About PowerShow.com