Title: Incorporating Bioinformatics in an Algorithms Course
1Incorporating Bioinformatics in an Algorithms
Course
- Lawrence DAntonio
- Ramapo College of New Jersey
2What is Bioinformatics?
- Algorithms to analyze DNA, RNA, or protein
sequences - Database searches to find homologous sequences
- Construction of evolutionary trees
- Structure prediction
- Human Genome Project
3Why use Bioinformatics in an Algorithms Course?
- Real-life applications of algorithms
- Variety of string processing algorithms
- Use of similarity instead of exact matching
- Dynamic programming examples
- Theory vs. Practice Issues
4Models for Incorporating Bioinformatics
- Infusion include material from bioinformatics
in computer science courses - Paired Courses have joint lectures and projects
from, e.g., Algorithms and Genetics courses - Tracked Courses have a separate Algorithms for
Bioinformatics course
5Biology Basics
- Primary DNA structure Oriented
character string - Double strand constructed through base pairing
- Central Dogma Information passes in one
direction, from DNA to RNA to protein - Amino acids formed from triples of bases, called
codons
6Bonding along a strand
7Bonding between strands
8Complexity of DNA Problems
- 3 billion base pairs in human genome
- Many NP complete problems
- 10600 possible alignments for two 1000 character
sequences
9Sequence Alignment
- Determine the alignment of two sequences that
maximizes similarity (global alignment) - Determine substrings of two sequences with
maximum similarity (local alignment) - Determine the alignment for several sequences
that maximizes the sum of pairs similarity
(multiple alignment)
10Edit Operations
Substitution
Insertion
Deletion
AATAAGC
AAT-AAGC
AATAAGC
ATTAAGC
AATTAAGC
AA-AAGC
11Dynamic Programming Alignment Algorithm
(Needleman-Wunsch)
If a1,a2,,ai and b1,b2,,bj have been
aligned, there are three possible next moves
- Match ai1 with bj1
- Match ai1 with a space
- Match bj1 with a space
Choose the move that maximizes the similarity of
the two sequences
12Alignment Scoring System
- 1 for a character match
- -1 for a mismatch (substitution)
- -2 for using a space (indel)
- or
- a bk for a gap of k spaces (affine gap penalty)
13Global Alignment Matrix
14Optimal Alignment
15Other Bioinformatics Algorithms
- Palindromes
- Tandem Repeats
- Longest Common Subsequence
- Double Digest (NP complete)
- Shortest Common Superstring (NP complete)
16References
- Clote and Backofen, Computational Molecular
Biology, Wiley - Gusfield, Algorithms on Strings, Trees, and
Sequences, Cambridge University Press - Mount, Bioinformatics, Cold Spring Harbor Press
- Setubal and Meidanis, Introduction to
Computational Molecular Biology, PWS - Waterman, Introduction to Computational Biology,
CRC Press