Title: The Genome Access Course Multiple Sequence Alignments
1TheGenomeAccessCourseMultiple
SequenceAlignments
2Pairwise vs. Multiple Alignment
- Aligning two sequences according to the
Smith-Waterman alignment is straightfoward, but
the difficulty of expanding the alignment to more
sequences increases exponentially.
3Methods for Aligning Multiple Sequences
- Dynamic Programming
- Progressive
- Iterative
- Genetic Algorithm
- Hidden Markov Models (HMM)
4Dynamic Programming
- A technique for designing efficient algorithms
for optimization problems - Some specific properties of the problem are
required for the dynamic programming technique to
be applicable - Applicable when a large search problem can be
broken down into stages - Trivial solutions to sub-problems contribute to
the overall solution
5Optimal Alignment
- For a given group of sequences, there is no
single "correct" alignment, only an alignment
that is "optimal" according to some set of
calculations. - Determining which alignment is best for a given
set of sequences is an individual decision.
6Progressive Methods
- Add new sequences one at a time to a pairwise
alignment generated with dynamic programming - Most closely related aligned first
- Modeled by an evolutionary tree
- Sensitive to intial alignments
7Progressive PairwiseMethods
- Most of the available multiple alignment programs
use some sort of incremental or progressive
method that makes pairwise alignments, then adds
new sequences one at a time to these
aligned groups. - This is an approximate method!
8ClustalW
- Weighted Clustal
- Performs pairwise alignments of all sequences
- Produces a phylogenetic tree by neighbor-joining
method - Aligns sequences sequentially
9Other Software
- MultAlin
- SAM
- HMMER
- PIMA
- treealign
- PAM
10Commercial Software
- VectorNTI Suite AlignX
- GCG Pileup
- DNAStar
11Genetic Algorithm
- Machine learning algorithm
- Simulates evolutionary changes in the sequences
- Alignments not necessarily optimal
- Seeks to increase initial msa score by simulating
gap insertion and recombination
12Markov Models
A Markov model is a probabilistic process over a
finite set, S1, ..., Sk, usually called its
states. Each state-transition generates a
character from the alphabet of the
process. State transitions are determined by a
transition probability matrix, which is
ordinarily independent of history.
13Hidden Markov Models
- A Markov model in which the states are hidden.
- A given state in the sequence cannot be
determined, but probabilities can be calculated. - Can be designed such that biological relevance
can be assigned to a state - Originally applied to speech recognition
14Uses of Hidden Markov Models
- Multiple sequence alignment
- Gene prediction
- Protein families (Profile HMM)
- Fold Recognition
- GpC island detection
15DNA Sequence Transition Tables
Normal DNA
CpG Island
16Building a Profile HMM
- Construct a transition matrix for same-length
sequences with no gaps - Correct for single insertions
- Correct for variable length insertions
- Correct for constant deletions
- Correct for variable length deletions
17Sequence Alignment HMM
Match State
Delete State
Insert State
18MSA with HMMs
- Construct a profile HMM
- Find most likely path for each sequence
- Sequence of matching/insert/delete states is the
alignment
19Editing MSAs
- CINEMA
- MACAW
- BOXSHADE
- PRETTYBOX
20Databases Based on MSAs
- PROSITE
- FINGERPRINTS
- BLOCKS