Multiple Alignment - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Multiple Alignment

Description:

The alignment of two sequences (DNA or protein) is a relatively straightforward ... The best solution seems to be an approach called Dynamic Programming. ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 33
Provided by: stuart67
Category:

less

Transcript and Presenter's Notes

Title: Multiple Alignment


1
Multiple Alignment
  • Stuart M. Brown
  • NYU School of Medicine

2
(No Transcript)
3
Pairwise Alignment
  • The alignment of two sequences (DNA or protein)
    is a relatively straightforward computational
    problem.
  • Can be done in a word processor
  • But how to evaluate the results?
  • A natural job for a computer
  • The best solution seems to be an approach called
    Dynamic Programming.

TGCCATAGAGCGTAGTCGTTCCCT lt gt
CTAGAGAGCGTAGTCAGAGTGTCTTTGAGTTCC
4
Protein or DNA
  • Most of the examples that I will show illustrate
    protein alignments
  • All of the tools can handle either protein or DNA
  • More distantly related protein sequences can be
    aligned because different amino acids can be
    similar

5
Dynamic Programming
  • Dynamic Programming is a very general programming
    technique.
  • It is applicable when a large search space can be
    structured into a succession of stages, such
    that
  • the initial stage contains trivial solutions to
    sub-problems
  • each partial solution in a later stage can be
    calculated by recurring a fixed number of partial
    solutions in an earlier stage
  • the final stage contains the overall solution

6
(No Transcript)
7
Global vs. Local Alignments
  • Global alignment algorithms start at the
    beginning of two sequences and add gaps to each
    until the end of one is reached.
  • Local alignment algorithms finds the region (or
    regions) of highest similarity between two
    sequences and build the alignment outward from
    there.

8
(No Transcript)
9
Global Alignment
  • Global algorithms are often not effective for
    highly diverged sequences and do not reflect the
    biological reality that two sequences may only
    share limited regions of conserved sequence.
  • Sometimes two sequences may be derived from
    ancient recombination events where only a single
    functional domain is shared.
  • Global alignment is useful when you want to force
    two sequences to align over their entire length

10
Global Alignment Programs
  • The Fasta program Align implements the Needleman
    and Wunsch Global alignment algorithm.
  • EMBOSS has the same program called Needle, but it
    also has an improved version called Stretcher
  • (Myers and Miller, CABIOS, 1989)

11
Local Alignment
  • The Fasta program LALIGN implements the
    Smith-Waterman local alignment algorithm.
  • FASTA and BLAST are local alignment algorithms
  • NCBI has a BLAST 2 Sequences feature on its
    website
  • http//www.ncbi.nlm.nih.gov/gorf/bl2.html

12
Pairwise Alignment on the Web
  • The ALIGN global alignment program is available
    at several servers
  • http//molbiol.soton.ac.uk/compute/align.html
  • http//www2.igh.cnrs.fr/bin/align-guess.cgi
  • LALIGN local alignment program is available at
    several servers
  • http//www2.igh.cnrs.fr/bin/lalign-guess.cgi
  • http//www.ch.embnet.org/software/LALIGN_form.html
  • LFASTA uses FASTA for local alignment of 2
    sequences
  • http//pbil.univ-lyon1.fr/lfasta.html
  • BLAST 2 Sequences (NCBI)
  • http//www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html

13
(No Transcript)
14
Multiple Alignments
  • In theory, making an optimal alignment between
    two sequences is computationally straightforward
    (Smith-Waterman algorithm), but aligning a large
    number of sequences using the same method is
    almost impossible.
  • The problem increases exponentially with the
    number of sequences involved
  • (the product of the sequence lengths)

15
Optimal Alignment
  • For a given group of sequences, there is no
    single "correct" alignment, only an alignment
    that is "optimal" according to some set of
    calculations.
  • Determining what alignment is best for a given
    set of sequences is really up to the judgment of
    the investigator.

16
Progressive PairwiseMethods
  • Most of the available multiple alignment programs
    use some sort of incremental or progressive
    method that makes pairwise alignments, then adds
    new sequences one at a time to these
    aligned groups.
  • This is an approximate method!

17
Common Programs
  • CLUSTAL is the most popular alignment program
    that uses a progressive pairwise algorithm.
  • PILEUP is the multiple alignment program in the
    GCG package

18
The CLUSTAL Algorithm
  • First, CLUSTAL calculates approximate pairwise
    similarity scores between all sequences to be
    aligned, and they are clustered into a dendrogram
    (tree structure).
  • Then the most similar pairs of sequences are
    aligned.
  • Averages (similar to consensus sequences) are
    calculated for the aligned pairs.
  • New sequences and clusters of sequences are added
    one by one, according to the branching order in
    the dendrogram.

19
CLUSTAL
  • CLUSTAL is a stand-alone multiple alignment
    program (UNIX) that has been improved on an ad
    hoc basis to deal with the real biology of
    sequence alignment
  • Gap penalties can be adjusted based on specific
    amino acid residues, regions of hydrophobicity,
    proximity to other gaps, or secondary structure.
  • it can re-align just selected sequences or
    selected regions in an existing alignment
  • It can compute phylogenetic trees from a set of
    aligned sequences.
  • There are also Mac and PC versions with a nice
    graphical interface (CLUSTALX).

20
Using Clustal
  • Custal requires that sequences be input as a
    single multi-sequence Fasta file
  • This is where an integrated software platform
    like GCG is nice
  • Output can be in many formats including GCG/MSF,
    Clustal, and multi-Fasta.

21
  • gtPYCDA07TF input_file_1
  • ATGCCCATACTACTCTTCTGGTAGTTGGAATGAAGCCCAAAATATGATAA
    AACCTTTTCT
  • TACTAAAGTTTGTCAGGAAGTAGAAAGAATTGCTCATTGTGGAAAATGGG
    AAGAATGGAG
  • TGAATGTTCTACTACTTGT
  • gtPYCDA08TR input_file_2
  • TATAGAAATAAAACTCCATTAAAAAATATTTTCCTTTTTCCTAATTATTT
    CTCTAAAATA
  • TAACAATCTAATTCATATAATATCATTACAATCACATATATATCTCTTTA
    AATTTTGTTC
  • CCTTTTTCCCTACGAGTTGTATCAGCAATAATCTCCTACAAGGTTAGACG
    TTGCTTCAAG
  • TTATTTTCAACAAATTTGGTCATTTTCAGCAAATTTTGCCATTTTCAGCA
    AATTTTGCCA
  • TTTTCAACAAATTTTGCCATTTTCAACAGATTTTGCCATTTTCAACAGAT
    TTTGCCATTT
  • TCAGCAAATTTTGCCATATTCAACAAATTTTGCCATTTTCAGCAAATTTT
    ACCATTTTTA
  • GCAAATTAGTATACCGTGTTAT
  • gtPYCDA09TRB input_file_3
  • GCGGGAATATAGAAATAAAACTCCATTAAAAAATATAAACCTTTTTTTTA
    ATTATCACCC
  • TAAAACATAACAATCTAATTCATATAATATCATTACAATCACATATATAT
    CTCTTTAAAT
  • AATGATCCCTTTTTCCCTACGAGTTGTATCAGCAATAATCTCCTACAACG
    GATAGACGTT
  • GCTTCAAGTTCTTTTCAACAAATTGGGTCATTTTCAGCGAATTTTGCCAT
    TTTCAGCAAA
  • TTTGGCCATACTCAACAAATTTTGCCATTGGCAACAGATTTTGCCATTTT
    CAACAGATTT
  • TGCCGTTGTCAGCAAATTTTGCCATATTCAACAAATTTTGCCAATCTCAG
    CAAATTTTAC

22
Multiple Alignment tools on the Web
  • There are a variety of multiple alignment tools
    available for free on the web.
  • CLUSTAL is available from a number of sites (with
    a variety of restrictions)
  • Other algorithms are available too
  • Watch out for experimental algorithms there
    may be a good reason why you have never heard of
    some oddball program

23
Some URLs
  • EMBL-EBI
  • http//www.ebi.ac.uk/clustalw/
  • BCM Search Launcher Multiple Alignment
  • http//dot.imgen.bcm.tmc.edu9331/multi-align/mult
    i-align.html
  • Multiple Sequence Alignment for Proteins (Wash.
    U. St. Louis)
  • http//www.ibc.wustl.edu/service/msa/

24
Editing Multiple Alignments
  • There are a variety of tools that can be used to
    modify a multiple alignment.
  • These programs can be very useful in formatting
    and annotating an alignment for publication.
  • An editor can also be used to make modifications
    by hand to improve biologically significant
    regions in a multiple alignment created by one of
    the automated alignment programs.

25
Challenge of Formats
  • There are many different file formats for
    multiple alignments.
  • Hopefully you can get your alignment program and
    your formatting program to agree on one.
  • The GCG MSF format is very common
  • You can also represent aligned sequences in
    multi-sequence Fasta format with gaps.

26
BOXSHADE
  • Shades by similarity
  • UNIX, Mac, and Windows versions, built into GCG
  • On the web at
  • http//www.ch.embnet.org/software/BOX_form.html
  • http//bioweb.pasteur.fr/seqanal/interfaces/boxsha
    de.html
  • http//huge.eng.uiowa.edu/tscheetz/sequence-analy
    sis/examples/BoxShade/BOX_form.html

27
(No Transcript)
28
(No Transcript)
29
Other editors
  • The MACAW and SeqVu program for Macintosh and
    GeneDoc and DCSE for PCs are free and provide
    excellent editor functionality.
  • Many comprehensive molecular biology programs
    include multiple alignment functions
  • MacVector, OMIGA, Vector NTI, and
    GeneTool/PepTool all include a built-in version
    of CLUSTAL

30
SeqVu
31
Editors on the Web
  • Check out CINEMA (Colour INteractive Editor for
    Multiple Alignments)
  • It is an editor created completely in JAVA (old
    browsers beware)
  • It includes a fully functional version of
    CLUSTAL, BLAST, and a DotPlot module

http//www.bioinf.man.ac.uk/dbbrowser/CINEMA2.1/
32
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com