Multiple Sequence Alignment - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Multiple Sequence Alignment

Description:

This alignment provides insights not possible in pairwise ... PRALINE. http://zeus.cs.vu.nl/programs/pralinewww/ Builds profiles of sequences to be aligned ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 18
Provided by: patt86
Category:

less

Transcript and Presenter's Notes

Title: Multiple Sequence Alignment


1
Chapter 5
Multiple Sequence Alignment
2
  • Multiple alignment is an extension of pairwise
    alignment where multiple sequences are aligned
  • This alignment provides insights not possible in
    pairwise alignments, such as
  • Conserved sequence patterns
  • Conserved and functionally critical amino acid
    residues
  • Prerequisite for phylogenetic analyses
  • Prediction of protein secondary and tertiary
    structures
  • Design of degenerate PCR primers

3
Scoring Function
  • The purpose of multiple alignment is to line up
    sequences in a way so that a maximum number of
    residues from each sequence are matched according
    to a scoring function
  • The scoring function is generally based on sum
    of pairs (SP)
  • The SP is the sum of all pairwise scores for all
    residues in the alignment

Sequence 1 G K N Sequence 2 T R N Sequence
3 S H E GT 1 KR2 NN6 TS 1 RH0
NE0 GS 0 KH-1 NE0 Total2 1
6 9
Blosum62 substitution matrix
Thus 29 512 times more likely than by random
chance
4
Exhaustive Algorithms
Brute Force Algorithm Similar to dynamic
programming algorithms that searches for the best
solution, examining every possible solution In
pairwise alignment use a 2D matrix For N
sequences, use an N-dimensional matrix Number of
calculations increase exponentially
(NNNN) Generally only useful for lt10 short
sequences Divide and Conquer Alignment
(DCA) Identify regional similarities in multiple
sequences Do a brute force alignment of the
similar regions Join the independently aligned
regions http//bibiserv.techfak.uni-bielefeld.de/d
ca/
5
(No Transcript)
6
Heuristic Algorithm
Progressive Alignment Method
  • Pairwise alignment by Needleman-Wunsch of all
    pairs
  • Records similarity scores of aligned pairs
  • Scores entered into matrix
  • Guide tree constructed that reflects similarity
    between aligned pairs
  • Most closely related sequences re-aligned with
    Needleman-Wunsch
  • Different substitution matrices are selected
    depending on evolutionary distance between
    sequences to be aligned
  • Aligned pair converted to consensus sequence
    with fixed gaps
  • Consensus sequences treated as ordinary sequence
    for next step which is pairwise alignment with
    most related sequence in guide tree
  • Next consensus sequence is calculated and
    process repeated until all sequences are aligned
  • Most famous clustalW (command line) clustalX
    (GUI)
  • http//www.ebi.ac.uk/Tools/clustalw2/index.html

7
Download and install clustW from ftp//ftp.ebi.ac
.uk/pub/software/clustalw2/2.0.9/ Spend a few
minutes entering sequences and doing alignments
8
  • ClustalW uses gap penalties that is context
    sensitive
  • Gaps count more close to runs of hydrophobic
    amino acids (more likely to be in internal
    conserved regions of a protein) compared to next
    to hydrophilic regions or G, likely to be on the
    outside in loops
  • Weighing scheme closely related sequences are
    gived a lower weighting score
  • The weighting score is dependent upon the branch
    length divided by the number of shared branches
  • This has the effect of minimizing a possible
    dominating effect of common sequences

9
Drawbacks and Solutions
  • Based on global alignment thus only sequences
    of similar length can be aligned
  • Long gaps required for alignment of dissimilar
    sequence length penalized
  • Greedy algorithm once gaps are introduced,
    they stay in subsequence consensus sequences

10
T-Coffee
  • Tree-based Consistency Objective Function for
    alignment Evaluation
  • http//www.ebi.ac.uk/Tools/t-coffee/
  • http//tcoffee.vital-it.ch/cgi-bin/Tcoffee/tcoffee
    _cgi/index.cgi
  • Performs global alignment with clustal
  • Local pairwise alignment with Lalign
  • Global and ten best local alignments are pooled
    to form a library
  • All pairwise alignments are then aligned with a
    third possible sequence
  • Distance matrix calculated to build a guide tree
  • Guide tree used for final multiple alignment
  • Does not get stuck in sub-optimal initial
    alignments
  • Slower than clustal

11
dbClustal
  • First performs BLASTP search for a query sequence
  • Aligned pairs are analyzed to obtain anchor
    points (local conserved regions) using a program
    called Ballast
  • Global alignment generated by Clustal, weighed to
    anchor points
  • Initial local alignment minimizes errors in
    divergent sequences
  • Multiple alignment subsequently evaluated by
    NorMD which removes poorly aligned sequences
  • http//bips.u-strasbg.fr/PipeAlign/jump_to.cgi?DbC
    lustalnoid

12
Partial Order Alignment (POA)
  • http//bioinformatics.ucla.edu/poa/
  • Multiple alignments performed on more and more
    sequences from a list
  • Identical residues condensed to nodes
  • Each new sequence aligned with each sequence of
    the graph model
  • Eliminates the problem of error fixation
  • Faster and more accurate than clustal

13
PRALINE
  • http//zeus.cs.vu.nl/programs/pralinewww/
  • Builds profiles of sequences to be aligned
  • Profiles generated by PSI-BLAST
  • Because profiles contain information on close
    relatives, divergent sequences are more
    accurately aligned
  • Program can incorporate secondary protein
    structure
  • Very sophisticated but very slow

14
Iterative Alignment
  • PRRN
  • Find optimal solution by iteratively modifying
    sub-optimal solutions
  • http//prrn.ims.u-tokyo.ac.jp/
  • Multiple alignment is performed on whole group of
    sequences
  • Sequences randomly distributed into two groups
  • Dynamic programming applied to consensus
    sequences derived from each group
  • The random split is repeated and another round of
    dynamic programming alignment performed
  • This is repeated until the alignment score no
    longer increases
  • A multiple alignment of the sequences are then
    again performed
  • Process repeated until multiple alignment score
    no longer improves

15
Iterative Alignment
  • DIALIGN2
  • http//mobyle.pasteur.fr/cgi-bin/MobylePortal/port
    al.py?formdialign
  • Breaks all sequences down into segments, and
    performs alignment between segments
  • High-scoring segments are progressively assembled
    into larger and larger sequences
  • The score of an alignment is calculated from the
    block and not from individual residues
  • Sequence regions between block are left unaligned
  • Very suited to alignment of divergent sequences

16
Practical Issues
  • DNA alignments are only based on 4 nucleotides,
    and are less reliable than protein sequence
    alignments
  • Alignments of DNA sequence does not consider
    functional issues, suchas gene boundaries
  • Insertion of gaps may break codons or cause
    frameshift that will not be tolerated in the
    protein, and is functional nonsense
  • Thus, always better toalign protein sequences
  • Possible to convert DNA to amino acid sequence,
    then align, and then decode back to DNA
  • RevTrans (http//www.cbs.dtu.dk/services/RevTrans/
    )
  • PROTA2DNA (missing link)

17
Editing and Format
  • Most alignment programs require final editing by
    a human to ensure that there are no problems in
    functionality
  • Finding badly aligned regions
  • Removing non-sensical gaps etc.
  • http//www.mbio.ncsu.edu/bioEdit/bioedit.html
  • Need to convert one sequence format to another
    http//iubio.bio.indiana.edu/cgi-bin/readseq.cgi/
Write a Comment
User Comments (0)
About PowerShow.com