Multiple Alignment - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Multiple Alignment

Description:

Multiple Alignment Stuart M. Brown NYU School of Medicine Pairwise Alignment The alignment of two sequences (DNA or protein) is a relatively straightforward ... – PowerPoint PPT presentation

Number of Views:231
Avg rating:3.0/5.0
Slides: 34
Provided by: Stua152
Category:

less

Transcript and Presenter's Notes

Title: Multiple Alignment


1
Multiple Alignment
  • Stuart M. Brown
  • NYU School of Medicine

2
(No Transcript)
3
Pairwise Alignment
  • The alignment of two sequences (DNA or protein)
    is a relatively straightforward computational
    problem.
  • The best solution seems to be an approach called
    Dynamic Programming.

4
Dynamic Programming
  • Dynamic Programming is a very general programming
    technique.
  • It is applicable when a large search space can be
    structured into a succession of stages, such
    that
  • the initial stage contains trivial solutions to
    sub-problems
  • each partial solution in a later stage can be
    calculated by recurring a fixed number of partial
    solutions in an earlier stage
  • the final stage contains the overall solution

5
(No Transcript)
6
Global vs. Local Alignments
  • Global alignment algorithms start at the
    beginning of two sequences and add gaps to each
    until the end of one is reached.
  • Local alignment algorithms finds the region (or
    regions) of highest similarity between two
    sequences and build the alignment outward from
    there.

7
(No Transcript)
8
GAP
  • The GCG program GAP implements the Needleman and
    Wunsch Global alignment algorithm.
  • Global algorithms are often not effective for
    highly diverged sequences and do not reflect the
    biological reality that two sequences may only
    share limited regions of conserved sequence.
  • Sometimes two sequences may be derived from
    ancient recombination events where only a single
    functional domain is shared.
  • GAP is useful when you want to force two
    sequences to align over their entire length

9
BESTFIT
  • The GCG program BESTFIT implements the
    Smith-Waterman local alignment algorithm.
  • FASTA and BLAST are local alignment algorithms
  • NCBI has a BLAST 2 Sequences feature on its
    website
  • http//www.ncbi.nlm.nih.gov/gorf/bl2.html

10
Pairwise Alignment on the Web
  • The ALIGN global alignment program is available
    at several servers
  • http//molbiol.soton.ac.uk/compute/align.html
  • http//www2.igh.cnrs.fr/bin/align-guess.cgi
  • LALIGN local alignment program is available at
    several servers
  • http//www2.igh.cnrs.fr/bin/lalign-guess.cgi
  • http//www.ch.embnet.org/software/LALIGN_form.html
  • LFASTA uses FASTA for local alignment of 2
    sequences
  • http//pbil.univ-lyon1.fr/lfasta.html
  • BLAST 2 Sequences (NCBI)
  • http//www.ncbi.nlm.nih.gov/blast/bl2seq/bl2.html

11
(No Transcript)
12
Multiple Alignments
  • In theory, making an optimal alignment between
    two sequences is computationally straightforward
    (Smith-Waterman algorithm), but aligning a large
    number of sequences using the same method is
    almost impossible.
  • The problem increases exponentially with the
    number of sequences involved
  • (the product of the sequence lengths)

13
Optimal Alignment
  • For a given group of sequences, there is no
    single "correct" alignment, only an alignment
    that is "optimal" according to some set of
    calculations.
  • Determining what alignment is best for a given
    set of sequences is really up to the judgement of
    the investigator.

14
Progressive PairwiseMethods
  • Most of the available multiple alignment programs
    use some sort of incremental or progressive
    method that makes pairwise alignments, then adds
    new sequences one at a time to these
    aligned groups.
  • This is an approximate method!

15
PILEUP
  • PILEUP is the multiple alignment program in the
    GCG package
  • CLUSTAL is another popular program (also
    available on the RCR server) that uses a similar
    algorithm.

16
The PILEUP Algorithm
  • First, PILEUP calculates approximate pairwise
    similarity scores between all sequences to be
    aligned, and they are clustered into a dendrogram
    (tree structure).
  • Then the most similar pairs of sequences are
    aligned.
  • Averages (similar to consensus sequences) are
    calculated for the aligned pairs.
  • New sequences and clusters of sequences are added
    one by one, according to the branching order in
    the dendrogram.

17
PILEUP Considerations
  • Since the alignment is calculated on a
    progressive basis, the order of the initial
    sequences can affect the final alignment.
  • PILEUP paramaters 2 gap penalties (gap insert
    and gap extend) and an amino acid comparison
    matrix.
  • PILEUP will refuse to align sequences that
    require too many gaps or mismatches.
  • PILEUP will take quite a while to align more than
    about 10 sequences

18
Instructions for running PILEUP
  • PILEUP uses a list of sequence files as input
  • You can use output from a FASTA or LOOKUP search
    as a list or make your own list in a text editor
  • A list file can include files from your own
    directory and/or GCG database files.

19
LIST file format
  • List files always begin with two dots ..
  • ..
  • gpS31321
  • gpYno3_Yeast
  • S51900.pep
  • Yan2_Schpo
  • Ypd1_Caeel
  • A36205
  • Mpp1_Rat begin100 end345
  • B46665.pep
  • Ymxg_Bacsu begin150 end464
  • A48043.pep
  • List files can also include Begin and End
    positions within a sequence

20
PILEUP _at_myseqs.list
  • Now at the gt prompt, type PILEUP and the name of
    the file that is your list of sequence names.
  • However, GCG requires that you must precede the
    name of your list file with the _at_ character.
  • So the command looks like this
  • gt PILEUP _at_myseqs.list

21
PILEUP Output
gt more myseqs.msf
1501
1550 Hsirf2 SERPSKKGKK PKTEKEDKVK
HIKQEPVESS LGLSNGVSDL SPEYAVLTST Muirf2
SERPSKKGKK PKTEKEERVK HIKQEPVESS LGLSNGVSGF
SPEYAVLTSA Chirf2 SERPSKKGKK TKSEKDDKFK
QIKQEPVESS FGI.NGLNDV TSDY.FLSSS Muirf1
LTRNQRKERK SKSSRDTKSK TKRKLCGDVS PDTFS..DGL
SSSTLPDDHS Ratirf1 LTKNQRKERK SKSSRDTKSK
TKRKLCGDSS PDTLS..DGL SSSTLPDDHS Hsirf1
LTKNQRKERK SKSSRDAKSK AKRKSCGDSS PDTFS..DGL
SSSTLPDDHS Chkirf1a LTKDQKKERK SKSSREARNK
SKRKLYEDMR MEESA..ERL TSTPLPDDHS Hsirf3a

Mmuirf3
Hsirf5
GPAPTDSQPP EDYSFGAGEE EEEEEELQRM LPSLSLTDAV
QSGPHMTPYS Mmuirf6 IPQPQGS.VI NPGSTGSAPW
DEKDNDVDED EEEDELEQSQ HHVPIQDTFP Hump48
...PPGIVSG QPGTQKVPSK RQHSSVSSER KEEEDAMQNC
TLSPSVLQDS Mup48 ...PAGTLPN QPRNQKSPCK
RSISCVSPER EEN...MENG RTNGVVNHSD Hsirf4
...PEGAKKG AKQLTLEDPQ MSMSHPYTMT TPYPSLPA.Q
VHNYMMPPLD Mupip ...PEGAKKG AKQLTLDDTQ
MAMGHPYPMT APYGSLPAQQ VHNYMMPPHD Huicsbp
...PEEDQK. .......... .......... CKLGVATAGC
VNEVTEMECG Muicsbp ...PEEEQK. ..........
.......... CKLGVAPAGC MSEVPEMECG Chkicsbp
...PEEEQK. .......... .......... CKIGVGNGSS
LTDVGDMDCS 1551
1600 Hsirf2 IKNEVDSTVN
IIVVGQSHLD SNIENQEIVT NPPDICQVVE VTTESDEQPV
Muirf2 IKNEVDSTVN IIVVGQSHLD SNIEDQEIVT
NPPDICQVVE VTTESDDQPV Chirf2 IKNEVDSTVN
IVVVGQPHLD GSSEEQVIVA NPPDVCQVVE VTTESDEQPL
Muirf1 SYTTQGYLGQ DLDMER.DIT PALSPCVVSS
SLSEWHMQMD I.IPDSTTDL Ratirf1 SYTAQGYLGQ
DLDMDR.DIT PALSPCVVSS SLSEWHMQMD I.MPDSTTDL
Hsirf1 SYTVPGYM.Q DLEVEQ.ALT PALSPCAVSS
TLPDWHIPVE V.VPDSTSDL Chkirf1a SYTAHDYTGQ
EVEVENTSIT LDLSSCEVSG SLTDWRMPME IAMADSTNDI
Hsirf3a
Mmuirf3

Hsirf5 LLKEDVKWPP TLQPPTLQPP VVLGPPAPDP
SPLAPPPGNP AGFRELLSEV Mmuirf6 FL........
NINGSPMAPA SVGNCSVGNC SPESVWP... ......KTEP
Hump48 LNNEEEGASG GAVHSDIGSS SSSSSPEPQE
VTDTTEAPFQ ........GD Mup48 SGSNIGGGGN
GSNRSD...S NSNCNSELEE GAGTTEATIR ........ED
Hsirf4 RSWRDYVPDQ PHPEIPYQCP MTFGPRGHHW
QGPACENGCQ VTGTFYACAP Mupip RSWRDYAPDQ
SHPEIPYQCP VTFGPRGHHW QGPSCENGCQ VTGTFYACAP
Huicsbp RSEIDELIKE .PSVDDYMGM IKRSPSP...
P.DACRS..Q LLPDWWAHEP Muicsbp RSEIEELIKE
.PSVDEYMGM TKRSPSP... P.EACRS..Q ILPDWWVQQP
Chkicsbp PSAIDDLMKE PPCVDEYLGI IKRSPSP...
PQETCRN..P PIPDWWMQQP
22
PILEUP options
  • For a first try, take the default options, but
    give the output file a meaningful name.
  • If you dont get a good alignment, try a less
    stringent matrix and/or gap penalties.
  • gt PILEUP -matroldpep.cmp
  • It is a good idea to run PILEUP in batch mode if
    you have more than 10 sequences to align
  • gt PILEUP -bat

23
CLUSTAL
  • CLUSTAL is a stand-alone (i.e. not integrated
    into GCG) multiple alignment program that is
    superior in some respects to PILEUP
  • Gap penalties can be adjusted based on specific
    amino acid residues, regions of hydrophobicity,
    proximity to other gaps, or secondary structure.
  • it can re-align just selected sequences or
    selected regions in an existing alignment
  • It can compute phylogenetic trees from a set of
    aligned sequences.
  • There are also Mac and PC versions with a nice
    graphical interface (CLUSTALX).

24
Using CLUSTAL
  • On mcrcr0 type clustal
  • CLUSTAL can only work with sequences in
    multi-sequence FASTA format.
  • The GCG program TOFASTA can convert lists of file
    names into FASTA multi-sequence format.

25
Multiple Alignment tools on the Web
  • There are a variety of multiple alignment tools
    available for free on the web.
  • CLUSTAL is available from a number of sites (with
    a variety of restrictions)
  • Other algorithms are available too
  • Watch out for experimental algorithms there
    may be a good reason why you have never heard of
    some oddball program

26
Some URLs
  • EMBL-EBI
  • http//www.ebi.ac.uk/clustalw/
  • BCM Search Launcher Multiple Alignment
  • http//dot.imgen.bcm.tmc.edu9331/multi-align/mult
    i-align.html
  • Multiple Sequence Alignment for Proteins (Wash.
    U. St. Louis)
  • http//www.ibc.wustl.edu/service/msa/

27
Editing Multiple Alignments
  • There are a variety of tools that can be used to
    modify a multiple alignment.
  • These programs can be very useful in formatting
    and annotating an alignment for publication.
  • An editor can also be used to make modifications
    by hand to improve biologically significant
    regions in a multiple alignment created by one of
    the automated alignment programs.

28
GCG alignment editors
  • Alignments produced with PILEUP (or CLUSTAL) can
    be adjusted with LINEUP.
  • Nicely shaded printouts can be produced with
    PRETTYBOX
  • GCG's SeqLab X-Windows interface has a superb
    multiple sequence editor - the best editor
    of any kind.

29
(No Transcript)
30
Other editors
  • The MACAW and SeqVu program for Macintosh and
    GeneDoc and DCSE for PCs are free and provide
    excellent editor functionality.
  • Many comprehensive molecular biology programs
    include multiple alignment functions
  • MacVector, OMIGA, Vector NTI, and
    GeneTool/PepTool all include a built-in version
    of CLUSTAL

31
SeqVu
32
Editors on the Web
  • Check out CINEMA (Colour INteractive Editor for
    Multiple Alignments)
  • It is an editor created completely in JAVA (old
    browsers beware)
  • It includes a fully functional version of
    CLUSTAL, BLAST, and a DotPlot module

http//www.bioinf.man.ac.uk/dbbrowser/CINEMA2.1/
33
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com