Multiple Sequence Alignment Using A Genetic Algorithm - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Multiple Sequence Alignment Using A Genetic Algorithm

Description:

Extend my work from last year to produce a program ... Given a set of sequences, gaps are inserted into each of the sequences ... New York: Cambridge University ... – PowerPoint PPT presentation

Number of Views:328
Avg rating:5.0/5.0
Slides: 16
Provided by: megansme
Category:

less

Transcript and Presenter's Notes

Title: Multiple Sequence Alignment Using A Genetic Algorithm


1
Multiple Sequence Alignment Using A Genetic
Algorithm
  • Megan Smedinghoff

2
Project Goals
Goal 1
Extend my work from last year to produce a
program that performs multiple alignments using a
genetic algorithm
Goal 2
Evaluate the performance of my program by varying
the number of sequences, the length of the
sequences, and the similarity of the sequences
Goal 3
Compare the quality of my alignments as well as
the time it takes to generate them to one or
more commonly used alignment programs
3
History of Multiple Alignment using GAs
Programs that perform multiple alignment using a
genetic algorithm
  • SAGA (sequence alignment by genetic algorithm)
    breaks possible alignments into pieces and then
    introduces gaps at various positions
  • L. Abdesslem, M. Soham, and B. Mohamed produced
    an alignment program by combining a genetic
    algorithm and quantum computing principals. The
    quantum computing portion was intended to reduce
    the number of generations needed to obtain an
    alignment.
  • MAGA (multiple alignment by genetic algorithm)
    for protein sequences implemented a genetic
    algorithm with basic scoring function, random
    introduction of gaps, and linear propagation to
    the next generation. Results were impressive for
    small alignments with high similarity but not
    very good for larger, more divergent alignments.

4
Review of Multiple Alignment
Definition
Given a set of sequences, gaps are inserted into
each of the sequences in such a way that all
characters in a column are as similar as possible
Example
Alignment
Comments
This problem is known to be NP-Hard and is
additionally complicated by the fact that there
is no obvious way to score an alignment
5
Review of Genetic Algorithms
Choose an initial population
Evaluate the fitness of each individual
Select individuals to reproduce
Breed a new generation via mutation and crossover
Evaluate the fitness of the offspring
Replace part of the population with the offspring
Report the best Individual
15
6
Review of Mutation and Crossover
Mutation
Definition Mutation occurs when a randomly
chosen bit of an individual is changed to a
different state
Example Box c is mutated from blue to purple and
becomes c
Mutation Event
Crossover
Definition Crossover occurs when part of one
individual is replaced with the analogous
part of a second individual
Example abdc and abcd undergo crossover to
produce abcd
Crossover
7
Choosing an Initial Population
Current Scheme
Currently I begin with an alignment produced by
another program and copy it multiple times to
create an initial population
Alignment 1
Alignment 2
Alignment 4
Alignment 3
Improved Scheme
An easy improvement to implement would be to
copy the initial alignment multiple times and
then randomly distribute the gaps within each
alignment
Alignment 1
Alignment 2
Alignment 3
Alignment 4
Ideal Scheme
Ideally I would like to start with sequences
without gaps and find a way to introduce gaps
into the alignment. This would avoid the
problem of having to start with an alignment
generated by another program.
Alignment 1
Alignment 2
Alignment 3
Alignment 4
8
Evaluating Fitness
Naive Scoring Function
  • One point for each match per column
  • No penalty for gaps
  • A gap does not match a gap

Total Score for this column 4
Formula
Score(column) nc2(A) nc2(C) nc2(G)
nc2(T) where nc2(x) is the number of ways you
can choose 2 items from a group of x
Improved Scoring Function
Ideally, I would like to implement a scoring
function with more biological significance. The
most likely choice at present is a pairwise
function with affine gap penalties, but I intend
to research other possibilities as well.
9
Mutation
Current Scheme
Step 1 Determine whether or not to mutate a gap
using a random number generator
Gap selected for mutation
Step 2 Determine how much that gap will be
mutated using a random number
generator
-
2 spaces
C
A
A
T
G
A
-
Step 3 Move gap
C
-
C
G
T
A
A
A
C
A
-
-
A
A
G
C
A
T
-
A
A
10
Crossover
Current Scheme
A crossover alignment is built by randomly
selecting a first sequence from one of the two
alignments, a second sequence from one of the
two alignments, etc.
Example
Alignment 1
Alignment 2
Crossover Alignment
11
Termination
Current Scheme
The algorithm terminates after a set number of
generations
Improved Scheme
  • The algorithm terminates when one of the
    following three
  • criteria are met
  • The alignment has not improved in a set number of
    generations
  • The algorithm has run for a maximum number of
    generations
  • The algorithm has run for a set amount of time

Additional Ideas
  • Add a function to graph scores vs. generations
  • Output best alignment every n generations so
    that the user can kill
  • the program at any time

12
Parameter Optimization
Parameters to Algorithm
  • Probability of gap getting mutated
  • Gap Mutation
  • Probability of crossover
  • Probability of alignment making it to next
    generation
  • Population size
  • Number of generations

Ideas for Optimization
  • I hope to implement a GARLI-type scheme where
    parameters are
  • adjusted based on the alignment and the
    number of generations that
  • have been bread
  • I also intend to run several alignments and try
    to figure out what default
  • parameters work well

13
Time Optimization
Goal
Make my algorithm run as fast and efficiently
as possible while still producing the best
alignment
Ideas for improvement
  • Find/build a good data structure to store
    alignments
  • Implement some (or all) of the algorithm in c

14
Comparing to Other Programs
Ideas for testing quality of my algorithm
  • Find at least one other well known alignment
    program
  • Run several different test cases on both my
    program and a well-established
  • alignment program
  • Compare quality of alignments
  • Compare running times

15
References
  • Wikipedia pages on Multiple Alignment and Genetic
    Algorithms
  • Gusfield, Dan. Algorithms on Strings, Trees, and
    Sequences. New York Cambridge University Press,
    1997.
  • Harada, Yoshitomo, Masato Wayama, and Toshio
    Shimizu. An Inspection of the Multiple Alignment
    Method with use of a Genetic Algorithm. Genome
    Informatics 8 (1997) 272-273.
  • Abdesslem, L., M. Soham, and B. Mohamed.
    Multiple Sequence Alignment by Quantum Genetic
    Algorithm. Parallel and Distributed Processing
    Symposium, 2006. IPDPS 2006. 20th International
    25-29 April 2006 8 pp-
Write a Comment
User Comments (0)
About PowerShow.com