Multiple Sequence Alignment - PowerPoint PPT Presentation

1 / 17

About This Presentation

Title:

Multiple Sequence Alignment

Description:

This alignment provides insights not possible in pairwise ... PRALINE. http://zeus.cs.vu.nl/programs/pralinewww/ Builds profiles of sequences to be aligned ... – PowerPoint PPT presentation

Number of Views:134

Avg rating:3.0/5.0

Slides: 18

Provided by: patt86

Category:

more less

Transcript and Presenter's Notes

Title: Multiple Sequence Alignment

1
Chapter 5
Multiple Sequence Alignment
2

Multiple alignment is an extension of pairwise
alignment where multiple sequences are aligned
This alignment provides insights not possible in
pairwise alignments, such as
Conserved sequence patterns
Conserved and functionally critical amino acid
residues
Prerequisite for phylogenetic analyses
Prediction of protein secondary and tertiary
structures
Design of degenerate PCR primers

3
Scoring Function

The purpose of multiple alignment is to line up
sequences in a way so that a maximum number of
residues from each sequence are matched according
to a scoring function
The scoring function is generally based on sum
of pairs (SP)
The SP is the sum of all pairwise scores for all
residues in the alignment

Sequence 1 G K N Sequence 2 T R N Sequence
3 S H E GT 1 KR2 NN6 TS 1 RH0
NE0 GS 0 KH-1 NE0 Total2 1
6 9
Blosum62 substitution matrix
Thus 29 512 times more likely than by random
chance
4
Exhaustive Algorithms
Brute Force Algorithm Similar to dynamic
programming algorithms that searches for the best
solution, examining every possible solution In
pairwise alignment use a 2D matrix For N
sequences, use an N-dimensional matrix Number of
calculations increase exponentially
(NNNN) Generally only useful for lt10 short
sequences Divide and Conquer Alignment
(DCA) Identify regional similarities in multiple
sequences Do a brute force alignment of the
similar regions Join the independently aligned
regions http//bibiserv.techfak.uni-bielefeld.de/d
ca/
5
(No Transcript)
6
Heuristic Algorithm
Progressive Alignment Method

Pairwise alignment by Needleman-Wunsch of all
pairs
Records similarity scores of aligned pairs
Scores entered into matrix
Guide tree constructed that reflects similarity
between aligned pairs
Most closely related sequences re-aligned with
Needleman-Wunsch
Different substitution matrices are selected
depending on evolutionary distance between
sequences to be aligned
Aligned pair converted to consensus sequence
with fixed gaps
Consensus sequences treated as ordinary sequence
for next step which is pairwise alignment with
most related sequence in guide tree
Next consensus sequence is calculated and
process repeated until all sequences are aligned
Most famous clustalW (command line) clustalX
(GUI)
http//www.ebi.ac.uk/Tools/clustalw2/index.html

7
Download and install clustW from ftp//ftp.ebi.ac
.uk/pub/software/clustalw2/2.0.9/ Spend a few
minutes entering sequences and doing alignments
8

ClustalW uses gap penalties that is context
sensitive
Gaps count more close to runs of hydrophobic
amino acids (more likely to be in internal
conserved regions of a protein) compared to next
to hydrophilic regions or G, likely to be on the
outside in loops
Weighing scheme closely related sequences are
gived a lower weighting score
The weighting score is dependent upon the branch
length divided by the number of shared branches
This has the effect of minimizing a possible
dominating effect of common sequences

9
Drawbacks and Solutions

Based on global alignment thus only sequences
of similar length can be aligned
Long gaps required for alignment of dissimilar
sequence length penalized
Greedy algorithm once gaps are introduced,
they stay in subsequence consensus sequences

10
T-Coffee

Tree-based Consistency Objective Function for
alignment Evaluation
http//www.ebi.ac.uk/Tools/t-coffee/
http//tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee
_cgi/index.cgi
Performs global alignment with clustal
Local pairwise alignment with Lalign
Global and ten best local alignments are pooled
to form a library
All pairwise alignments are then aligned with a
third possible sequence
Distance matrix calculated to build a guide tree
Guide tree used for final multiple alignment
Does not get stuck in sub-optimal initial
alignments
Slower than clustal

11
dbClustal

First performs BLASTP search for a query sequence
Aligned pairs are analyzed to obtain anchor
points (local conserved regions) using a program
called Ballast
Global alignment generated by Clustal, weighed to
anchor points
Initial local alignment minimizes errors in
divergent sequences
Multiple alignment subsequently evaluated by
NorMD which removes poorly aligned sequences
http//bips.u-strasbg.fr/PipeAlign/jump_to.cgi?DbC
lustalnoid

12
Partial Order Alignment (POA)

http//bioinformatics.ucla.edu/poa/
Multiple alignments performed on more and more
sequences from a list
Identical residues condensed to nodes
Each new sequence aligned with each sequence of
the graph model
Eliminates the problem of error fixation
Faster and more accurate than clustal

13
PRALINE

http//zeus.cs.vu.nl/programs/pralinewww/
Builds profiles of sequences to be aligned
Profiles generated by PSI-BLAST
Because profiles contain information on close
relatives, divergent sequences are more
accurately aligned
Program can incorporate secondary protein
structure
Very sophisticated but very slow

14
Iterative Alignment

PRRN
Find optimal solution by iteratively modifying
sub-optimal solutions
http//prrn.ims.u-tokyo.ac.jp/
Multiple alignment is performed on whole group of
sequences
Sequences randomly distributed into two groups
Dynamic programming applied to consensus
sequences derived from each group
The random split is repeated and another round of
dynamic programming alignment performed
This is repeated until the alignment score no
longer increases
A multiple alignment of the sequences are then
again performed
Process repeated until multiple alignment score
no longer improves

15
Iterative Alignment

DIALIGN2
http//mobyle.pasteur.fr/cgi-bin/MobylePortal/port
al.py?formdialign
Breaks all sequences down into segments, and
performs alignment between segments
High-scoring segments are progressively assembled
into larger and larger sequences
The score of an alignment is calculated from the
block and not from individual residues
Sequence regions between block are left unaligned
Very suited to alignment of divergent sequences

16
Practical Issues

DNA alignments are only based on 4 nucleotides,
and are less reliable than protein sequence
alignments
Alignments of DNA sequence does not consider
functional issues, suchas gene boundaries
Insertion of gaps may break codons or cause
frameshift that will not be tolerated in the
protein, and is functional nonsense
Thus, always better toalign protein sequences
Possible to convert DNA to amino acid sequence,
then align, and then decode back to DNA
RevTrans (http//www.cbs.dtu.dk/services/RevTrans/
)
PROTA2DNA (missing link)

17
Editing and Format

Most alignment programs require final editing by
a human to ensure that there are no problems in
functionality
Finding badly aligned regions
Removing non-sensical gaps etc.
http//www.mbio.ncsu.edu/bioEdit/bioedit.html
Need to convert one sequence format to another
http//iubio.bio.indiana.edu/cgi-bin/readseq.cgi/

Write a Comment

User Comments (0)