Title: GA for Sequence Alignment
1GA for Sequence Alignment
- Pair-wise alignment
- Multiple string alignment
2Pairwise Sequence Alignment
- VNRLQQNIVSLEVDHKVANYKP
- VNRLQQSIVSLRDAFNDGELD HRVLNYKP
- Solving by a dynamic programming using Dayhoff
matrics - Each pairwise alignment needs O(n1n2)
- VNRLQQNIVSL__________EVDHKVANYKP
- VNRLQQSIVSLRDAFND GELD HRVLNYKP
3How to implement a GA ?
- Representation
- Fitness
- Operators design
- Selection strategy
4Pair-wise Alignment Representation
- How do you think?
- For example (my intuitively way)
- Guess a length n
- Chromosome
5Pair-wise Alignment Representation
- So the chromosome becomes
- You can also use the gap position
(1,2,4,5,6,8.)
(2,4,5,7,8,10.)
6Pair-wise Alignment Fitness Function
- Simplest
- Match 1
- Dismatch -2
- Gap -1
- Using the scoring matrix
- Protein PAM,
- DNA substitution matrix
- Summarize the total score.
7Pair-wise Alignment Genetic Operators
- All our previous operators.
- Image one!!!
- Selection
- Try it!!!
8Conclusion About Pair-wise Alignment
- DP can solve it in O(NM)
- GA cant have too much advantage.
9RPCVCPVLRQAAQ s1 RPCVC_
P__VLRQAAQ a1 RPCACCPVLRQVVQ s2
RPCACCP__VLRQVVQ a2 KPCLCPRQLRQV
s3 KPCLC_ P RQLRQV_ _ a3 KPCCPRQAAQ
s4 KPC_C_ P____ RQAAQ a4 S A
10Multiple String Alignment Representation
- How do you think?
- For example (my intuitively way)
- Guess a length n
- Chromosome
11Multiple String Alignment Representation
- So the chromosome becomes
- You can also use the gap position
- Need fewer space
- Some good operators..
(1,2,4,5,6,8.)
(2,4,5,7,8,10.)
12Multiple String Alignment Fitness Function
- The most hard part
- You can never know what is the real scoring
system! Even biologists!!! - Approximation
- Using SOP (sum of pairs)
- The most widely used
- Using PAM,
- Motif-based