Title: Coevolving Solutions to the Shortest Common Superstring Problem
1Coevolving Solutions to the Shortest Common
Superstring Problem
- Assaf Zaritsky Moshe Sipper
- Ben-Gurion University, Israel
- www.cs.bgu.ac.il/assafza
2Outline
- The Shortest Common Superstring problem.
- DNA sequencing and the input domain.
- Simple genetic algorithm (GA).
- Cooperative coevolutionary algorithm.
- Experimental results.
3The Shortest Common Superstring Problem (SCS)
- Let S s1,,sn be a set of strings (blocks)
over some alphabet S. A superstring of S is a
string x such that each si in S is a substring of
x. - Problem Find shortest (common) superstring.
- NP-Complete.
- MAX-SNP hard.
4SCS Example
- S ate, half, lethal, alpha, alfalfa
- A trivial superstring is atehalflethalalphaalfalf
a of length 25 (a simple concatenation of all
blocks). - A shortest common superstring is
lethalphalfalfate of length 17. - Note that a compressed permutation of the
blocks is actually a superstring.
5Approximation Algorithms
- Several linear approximations for SCS have been
proposed, most of which rely on greedy
approaches. - GREEDY
- The most widely heuristic used in DNA
sequencing. - Conjecture Blum 1994, Sweedyk 1999 Superstring
produced by GREEDY is of length at most two times
the optimal. - We are not aware of any previous evolutionary
approach to the SCS problem.
6DNA Sequencing
The most common usage of the SCS problem.
7The Input Domain
The input strings used in the experiments were
inspired by DNA sequencing
8Input Generation Setup Parameters
NB increasing number of blocks results in
exponential growth of the problems complexity.
9Simple GA for the SCS Problem
- Given a set of strings as input, generate initial
population of random candidate solutions. - The fitness of each individual depends on its
length and accuracy. - The GA uses selection, recombination, and
mutation to create the next generation, each
individual of which is then evaluated. - Theses steps are repeated a predefined number of
times or until the solution is deemed
satisfactory.
10Simple GA for the SCS Problem (contd)
- Blocks of the input set are atomic components.
- Representation.
- Permutation Representation Good or Bad?
- Evaluation.
- Genetic operators selection, recombination,
mutation.
11Coevolution
- Simultaneous evolution of two or more species
with coupled fitness. - Coevolving species either compete or cooperate.
12Cooperative Coevolution
13Cooperative Coevolution (contd)
- Cooperative Coevolution involves a number of
independently evolving species. - Interaction between species occurs via fitness
function only. - The fitness of an individual depends on its
ability to collaborate with individuals from
other species.
14Cooperative Coevolution (contd)
Source Potter DeJong (1997)
15Cooperative Coevolutionary Algorithm for the SCS
problem
- Two species evolve simultaneously.
- First species contains prefixes of candidate
solutions to the SCS problem at hand. - Second species contains candidate suffixes.
- Fitness of an individual in each species depends
on how good it interacts with representatives
from other species to construct a global solution.
16Cooperative Coevolutionary Algorithm for the SCS
problem (evaluation process)
Merge
17Cooperative Coevolutionary Algorithm for the SCS
problem (evaluation process)
Evaluate
18Experiments
Compare GREEDY, Simple GA, Cooperative
Coevolution
19Experimental Setup
Each type of GA was executed twice on each
problem instance the better run of the two was
used for statistical purposes.
20Results Experiment I (50 blocks)
21Results Experiment II (80 blocks)
22Results Summary
size of shortest common superstring
Algorithm
Problem size
Greedy
Genetic
Cooperative
50 blocks
80 blocks
23Conclusions
- Evolutionary algorithms can be applied to the SCS
problem. - Cooperative Coevolution outperforms simple GA and
GREEDY. - The collaboration between the two populations
results in a good decomposition of the problem
into two smaller sub-problems.
24Future Work
- Tackle larger problem instances.
- Construct new species on the fly (as suggested by
Potter and DeJong 2000). - Improved method the puzzle algorithm.
25(No Transcript)
26Current Work - The Puzzle Algorithm
27Puzzle Algorithm The Idea
- Improve Recombination Operator.
- Preserve good building blocks discovered by GA
using selection of recombination loci that do not
destroy good building blocks. - Result Assembly of good building blocks to
construct better solutions (as in a puzzle).