Exact GP Schema Theorems. We now turn to Ch. 5 and the claim that, finally, schema theorems can be m - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Exact GP Schema Theorems. We now turn to Ch. 5 and the claim that, finally, schema theorems can be m

Description:

5 and the claim that, finally, schema theorems can be made precise enough to ... certain aspects of computational evolution (in particular, program bloat in GP) ... – PowerPoint PPT presentation

Number of Views:73

Avg rating:3.0/5.0

Slides: 25

Provided by: giampier

Category:

more less

Transcript and Presenter's Notes

Title: Exact GP Schema Theorems. We now turn to Ch. 5 and the claim that, finally, schema theorems can be m

1
Lecture 5

Exact GP Schema Theorems. We now turn to Ch. 5
and the claim that, finally, schema theorems can
be made precise enough to provide a reliable
predictive (at least theoretical) tool. The
primary tool is the capability of accounting for
schema creation (recall the results of Altenberg,
which were really too complex to be useful), and
those of Goldberg, use in the study of deception.
Let H denote a schema over some population, let
a(H, t) denote the schema transition probability
the probability that, at generation t, the
individual produced by selection, cloning,
crossover and mutation will match schema H.
The previous chapters show, if nothing else, that
computing a(H, t) is difficult. Lets start from
the other end if a(H, t) is known, what becomes
easy?

2
Lecture 5

The first observation is if each
selection/crossover is independent (usually so in
GA and GP contexts) then the chance of each child
matching H will be the same. But this means that
m(H, t 1), the number of instances of schema H
at time t 1, is binomially distributed the
probability that the population contain exactly k
instances (in M trials) of the schema at
generation t 1 is given by
(For a quick introduction to Probability and
Queuing Theory you may want to check some of the
materials in giam/Mikkeli/Lecture5.ppt, and
../IntroToProbabilityAndApplications.pdf,
../IntroQueTheo.ppt)

3
Lecture5

We can compute the expected number of instances
as well as the variance (see Intropdf for
details)
The schema transition probability at generation t
corresponds to the expected proportion of the
population sampling the schema H at generation t
1.
Another useful idea is that of the
Signal-to-Noise ratio for a schema

4
Lecture 5

We can also compute the probability of schema
extinction in one generation Prm(H, t 1)
0 (1 - a(H, t))M.
We now turn to another result that relates means
and variances
Theorem. (Chebyshev Inequality) Let X be a
random variable with mean m E(X) and variance
s2 Var(X). Then, for any t gt 0,
Proof. Feller, p. 219.
If we take t ks and use the assumption that
m(H, t 1) is binomially distributed with mean
Ma(H, t) and variance Ma (H, t) (1 - a (H,
t)), we can rewrite Chebyshevs Inequality to
provide

5
Lecture 5

Theorem 5.2.1 (Two-Sided Probabilistic Schema
Theorem). For any given constant k gt 0,
Proof. Let t ks. The previous theorem - with X
replaced by m(H, t 1) - gives
Turning the inequality around, we obtain
immediately
Replacing m and s

6
Lecture 5

Theorem 5.2.2 (Probabilistic Schema Theorem). For
any given constant k gt 0,
Proof. This follows from the observation
that since the first
expression contains the area of the right tail,
while the second doesnt. The previous theorem
does the rest.
Note the results are valid for GA and GP
contexts, They depend only on the transmission
probability.

7
Lecture 5

Stephens and Waelbroeck (1997, 1999) For a
bit-string genetic algorithm with one-point
crossover applied with probability pxo,
where L(H, i) the schema
obtained by replacing all elements of H from i
1 to N with dont care () symbols R(H, i) the
schema obtained by replacing all elements of H
from 1 to i with dont care () symbols p(H, t)
is the probability of selecting an individual
matching schema H to be a parent, which, under
fitness proportionate reproduction is given by
Note although a(H, t) and p(H, t) are
probabilities associated with the current
generation, a(H, t) provides us with information
about the next generation, while p(H, t) does not
(except through this formula).

8
Lecture 5

Example of L(H, t) and R(H, t)

9
Lecture 5

How do we extend these results to GP (where
things are more complicated)? Lets start with
the simplest case all programs are of exactly
the same size and shape. The formula becomes,
allowing for translation into GP
where N(H) number of nodes in schema
H, with same size and shape as programs in
population1/N(H) is the probability of choosing
any one of them for crossover l(H, i) schema
obtained by replacing all nodes above crossover
by dont care symbols lower part of the schema
u(H, i) schema obtained by replacing all nodes
below crossover by dont care symbols upper
part of the schema the summation is over all
valid crossover points.

10
Lecture 5

Example

11
Lecture 5

Note the detailed internal structure of the
formula depends on the specific ordering of the
crossover points the result does not. Thus we
need not worry about how we numbered the
crossover points, since it is the shape and size
of the program tree alone that determines the
result.
The formula would be identical to that for GAs if
all the functions at the interior nodes were
unary functions See the text for a more
detailed explanation.
We now need to extend all of this to populations
of programs of arbitrary size and shape and
functions of arbitrary arity. It will take a
while

12
Lecture 5

Definition 5.4.1 (GP hyperschema). A GP
hyperschema is a rooted tree composed of internal
nodes from the set F ? and leaves from T ?
, . The hyperschema function set is the
function set used in a GP run, plus the
hyperschema terminal set is the GP terminal set
plus and . The operator is a dont care
symbol which stands for exactly one node, while
the operator stands for any valid subtree.
Note the introduction of the second terminal
operator allows trees to change arbitrarily.

13
Lecture 5

We have now restricted Altenbergs ideas and
generalized the ideas of Rosca and OReilly it
may be enough (if done correctly).
Theorem 5.4.1 (Microscopic Exact GP Schema
Theorem). The total transmission probability for
a fixed-size-and-shape GP schema H under
one-point crossover and no mutation
is where the first
two summations are over all individuals in the
population
NC(h1, h2) is the number of nodes in the tree
fragment representing the common region between
program h1 and program h2
C(h1, h2) is the set of indices of the crossover
points in the common region

14
Lecture 5

d(x) is a function that returns 1 if x is true
and 0 if not
L(H, i) is the hyperschema obtained by replacing
all nodes on the path between crossover point i
and the root node with nodes, and all the
subtrees connected to those nodes with nodes
U(H, i) is the hyperschema obtained by replacing
the subtree below the crossover point i with a
node.
Example start with a schema H ( ( x ))
and examine the figure on the next slide.

15
Lecture 5

Ex. H ( ( x ))

16
Lecture 5

Proof. Let p(h1, h2, i, t) be the probability
that, at generation t, the selection/crossover
process will select h1 and h2 as parents and i as
the crossover point. Consider
g(h1, h2, i, H) ? d(h1 Î L(H, i))d(h2 Î U(H,
i)).
Given two parents, a schema and a crossover
point, g returns 1 if crossing over h1 and h2 at
position i yields an offspring which is an
instance of H, and 0 otherwise. If h1, h2 and i
are random variables with joint probability
distribution (dependent on the generation) given
by p(h1, h2, i, t), the expected value of g is
just the sum
Now we need to find a usable expression for p(h1,
h2, i, t).

17
Lecture 5

We add a reasonable assumption that the choice
of crossover point is independent of the
generation, and just depends on the two parents.
Using conditional probabilities, we can
obtain p(h1, h2, i, t) p(i h1, h2)p(h1,
h2, t), where p(h1, h2, t) is the probability
of choosing h1 and h2 at generation t (and must
be generation dependent), while p(i h1, h2) is
just the conditional probability of choosing
crossover point i, given that we have h1 and h2
as parents for crossover. Since the parents are
chosen independently (selection with
replacement), we can rewrite the formula
as p(h1, h2, i, t) p(i h1, h2)p(h1,
t)p(h2, t), where p(h1, t) and p(h2, t) are
the selection probabilities for the parents.

18
Lecture 5

Also, in one-point crossover, p(i h1,
h2) 1/NC(h1, h2), so that
When we substitute this in the summation, and
adjust for the cloning/crossover possibilities,
we obtain the formula of the theorem
We now have an expression for the total
transmission probability a(H, t) of a GP schema.
All the derivative results (expectation,
variance, S/N ratio, extinction probability) can
be also obtained.

19
Lecture 5

Our final schema theorem will use the set of all
possible hyperschemata. We can certainly number
all possible program shapes (schemata of order
0), G0, G1, G2, There is no canonical
ordering, but the number of program shapes (over
finite function and terminal alphabets) of each
depth is finite, so the numbering up to the
shapes of any finite depth is, in principle,
possible. The hyperschemata represent disjoint
sets of programs, and their union represents the
whole search space. In particular, for any
program h1, We start by modifying the second
term in the previous formula (recall all sums are
finite)

20
Lecture 5

Rearranging the summations

21
Lecture 5

The expression
is simply the probability that we extract
a program h1 which both belongs to the
hyperschema Gj and matches a specific
generalization of the schema H. It can be
restated as Using
this simplification, we have
Theorem 5.4.3 (Macroscopic Exact GP Schema
Theorem for One-point Crossover). The total
transmission probability for a fixed-size-and-shap
e GP schema H under one-point crossover and no
mutation is where the
sets L(H, i) Ç Gj and U(H, i) Ç Gk either are (or
can be represented by) fixed-size-and-shape
schemata or are Æ.

22
Lecture 5

The textbook has several examples where the
computations are carried out, showing that it
would be feasible to compare theoretical
predictions to actual runs. We end this series
of theoretical lectures with a final idea that
of Effective Fitness.
Effective Fitness is a measure of the
reproductive success of a schema that
incorporates the creative and disruptive effects
of the genetic operators. Without any mutation
or crossover, fitness proportionate reproduction
leads to the equation where
all the terms have their usual meaning in
particular, f(H, t) is the fitness of the schema
H in generation t. We want an equation of the
form

23
Lecture 5

Recalling from previous computations that Em(H,
t 1) Ma(H, t), and that
where p(H, t)
is the probability of selecting schema H for
reproduction under fitness proportionate
selection,we have feff(H, t) (a(H, t)/p(H,
t))f(H, t). Since a(H, t) is computable via
formulae just derived, we have (at least for
fixed-size-and-shape schemata under one-point
crossover and no mutation)

24
Lecture 5

A similar formula can be obtained for Genetic
Algorithm schemata - modifying the earlier result
of Stephens and Woelbroeck.
Our text has a discussion comparing the meaning
of fitness and effective fitness it appears that
effective fitness can provide at least some
plausible explanations for certain aspects of
computational evolution (in particular, program
bloat in GP).
It is plausible to conclude that the effective
fitness is what leads to propagation down the
generations (more than fitness itself) and that
the search space is determined as least as much
by effective fitness as by the fitness formula
used statically on the individuals of each
generation.