Title: Geometric Crossover for Biological Sequences
1EuroGP 2006
Geometric Crossover for Biological Sequences
Alberto Moraglio, Riccardo Poli Rolv
Seehuus
2Contents
- Geometric Crossover
- Geometric Crossover for Sequences
- Is Biological Recombination Geometric?
3I. Geometric Crossover
4Geometric Crossover
- Representation-independent generalization of
traditional crossover - Informally all offspring are between parents
- Search space all offspring are on shortest paths
connecting parents
5Geometric Crossover Distance
- Search Space is a Metric Space d(A,B) length of
shortest paths between A and B - Metric space all offspring C are in the segment
between parents - C in A,Bd ?? d(A,C)d(C,B)d(A,B)
6Example1 Traditional Crossover
- Traditional Crossover is Geometric Crossover
under Hamming Distance
Parent1 011101 Parent2 010111 Child
011111
HD(P1,C)HD(C,P2)HD(P1,P2) 1 1
2
7Example2 Blending Crossover
- Blending Crossover for real vectors is geometric
under Euclidean Distance
ED(P1,C)ED(C,P2)ED(P1,P2)
8Many Recombinations are Geometric
- Traditional Crossover for multary strings
- Box and Discrete recombinations for real vectors
- PMX, Cycle and Order Crossovers for permutations
- Homologous Crossover for GP trees
- Ask me for more examples over a coffee!
9Being geometric crossover is important because.
- We know how the search space is going to be
searched by geometric crossover for any
representation convex search - We know a rule-of-thumb on what type of
landscapes geometric crossover will perform well
smooth landscape - This is just a beginning of general theory, in
the future we will know more!
10II. Geometric Crossover for Sequences
11Sequences Edit Distance
- Sequence variable-length string of character
from an alphabet A - Edit distance minimum number of edit operations
insertion, deletion, substitution to
transform one sequence into the other - A a,c,t,g, seq1 agcacaca, seq2 acacacta
- Seq1agcacaca ? acacacta ? acacactaSeq2
- ED(Seq1,Seq2)2 (g deleted, t inserted)
12Sequence Alignment (on contents)
- Alignment put spaces (-) in both sequences such
as they become of the same length - Seq1 agcacac-a
- Seq2 a-cacacta
- Alignment Score number of mismatches 2
- Optimal alignment minimal score alignment (Best
Inexact Alignment on Contents) - The score of the optimal alignment of two
sequences equals their edit distance
ED(Seq1,Seq2)Score(A)2
13Homologous Crossover
- Align optimally two parent sequences
- Generate randomly a crossover mask as long as the
alignment - Recombine as traditional crossover
- Remove dashes from offspring
Mask 111111000 Seq1 agcacac-a Seq2
a-cacacta SeqC a-cacac-a SeqC acacaca
14Theorem Geometricity of HC
- Homologous Crossover is geometric crossover under
edit distance - Seq1agcacaca ? SeqCacacaca ?acacactaSeq2
- ED(Seq1,SeqC)ED(SeqC,Seq2)ED(Seq1,Seq2)
- 1 1
2 -
15More theory on HC in the paper
- Extension to weighted edit distances Extension to
block ins/del edit distances - Peculiarity of metric segments in edit distance
spaces - Bounds on offspring size due to parents size
16III. Is Biological Recombination Geometric?
17Recombination at a molecular level
- DNA strands align on the contents, no
positionally - DNA are flexible, can be stretched or folded to
align better to each others - DNA strands do not need to be aligned at the
extremities - Some pair matching are preferred to others
- DNA strands can form loops
- Crossover points happen to be where DNA strands
align better - Not all details worked out yet!
18Homologous Crossover as a Model of Biological
Recombination
Many possible variants of edit distance that fit
many real requirements of biological
recombination
19Minimum Free Energy Edit Distance
- DNA strands align optimally according to edit
distance because - (i) The alignment of two DNA strands
(macromolecules) obeys chemistry it is the state
at minimum free energy - (ii) The weights of the edit moves can be
interpreted as repulsion forces at a single basis
level - (iii) The best alignment on edit distance is the
best trade-off for which the global effect of
repulsion forces is minimized the minimum free
energy alignment
20Is Biological Recombination Geometric? Yes?!
21So what?
22Bridging Natural and Artificial Evolution
- Bridging Natural and Artificial Evolution
- into a common theoretical framework
- Change in perspective this allows to study real
biological evolution as a computational process - In the paper we use geometric arguments to claim
that biological evolution does efficient
adaptation!
23Summary
- Geometric crossover
- Geometric crossover offspring between parents
- Many recombinations are geometric
- Some general theory for geometric crossover
- Homologous crossover
- Homologous crossover for sequences alignment on
contents before recombination - Homologous crossover is geometric under edit
distance - Biological Recombination
- Homologous crossover models biological
recombination at DNA level, so it is geometric - Geometric theory applies to biological
recombination, bridging biological artificial
evolution
24Questions?