Multiple Alignment - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Multiple Alignment

Description:

The alignment of two sequences (DNA or protein) is a relatively straightforward ... The MACAW and SeqVu program for Macintosh and GeneDoc and DCSE for PCs are free ... – PowerPoint PPT presentation

Number of Views:87
Avg rating:3.0/5.0
Slides: 30
Provided by: stuart67
Category:

less

Transcript and Presenter's Notes

Title: Multiple Alignment


1
Multiple Alignment
  • Stuart M. Brown
  • NYU School of Medicine

2
(No Transcript)
3
Pairwise Alignment
  • The alignment of two sequences (DNA or protein)
    is a relatively straightforward computational
    problem.
  • The best solution seems to be an approach called
    Dynamic Programming.

4
(No Transcript)
5
Dynamic Programming
  • Dynamic Programming is a general programming
    technique.
  • It is applicable when a large search space can be
    structured into a succession of stages, such
    that
  • the initial stage contains trivial solutions to
    sub-problems
  • each partial solution in a later stage can be
    calculated by recurring a fixed number of partial
    solutions in an earlier stage
  • the final stage contains the overall solution

6
Multiple Alignments
  • Making an optimal alignment between two sequences
    is computationally straightforward, but aligning
    a large number of sequences using the same method
    is almost impossible.
  • The problem increases exponentially with the
    number of sequences involved, so it becomes
    computationally expensive (and inefficient) for
    large numbers of sequences.

7
Longer Sequences
  • What happens to the number of cells in the matrix
    when we add another base to one sequence?
  • How about to both?
  • cells L1 x L2 or L2 if we use 2
    sequences of the same length.
  • So the amount of computing grows with the square
    of seq. length bad but not terrible, because
    the compute time for each cell remains constant

8
Align Three Sequences by Dynamic programming
Georg Fullen, VSNS Biocomputing, Univ. Munster
So how many cells (that contain values that must
be computed) do we add for each additional
sequence its a power function! For N
sequences of length L of cells 2n x Ln
This is very bad for computing alignments of a
lot of sequences!
If the calculation takes 1 nanosecond per cell,
then for 6 sequences of length 100, we'll have a
running time of is 26 x 1006 x 10-9 seconds
(64000 seconds). Just add 2 more sequences, and
the running time is 28 x 1008 x 10-9 2.6 x 109
seconds (28 days)
9
Global vs. Local Multiple Alignments
  • Global alignment algorithms start at the
    beginning of two sequences and add gaps to each
    until the end of one is reached.
  • Local alignment algorithms finds the region (or
    regions) of highest similarity between two
    sequences and build the alignment outward from
    there.

10
Optimal Alignment
  • For a given group of sequences, there is no
    single "correct" alignment, only an alignment
    that is "optimal" according to some set of
    calculations.
  • Determining what alignment is best for a given
    set of sequences is really up to the judgement of
    the investigator.

11
Progressive PairwiseMethods
  • Most of the available multiple alignment programs
    use some sort of incremental or progressive
    method that makes pairwise alignments, averages
    them into a consensus (actually a profile), then
    adds new sequences one at a time to the aligned
    set.
  • This is an approximate method!

12
CLUSTALW
  • CLUSTAL is the most popular multiple alignment
    program
  • Gap penalties can be adjusted based on specific
    amino acid residues, regions of hydrophobicity,
    proximity to other gaps, or secondary structure.
  • it can re-align just selected sequences or
    selected regions in an existing alignment
  • It can compute phylogenetic trees from a set of
    aligned sequences.
  • Unix command line program
  • Website http//www.ebi.ac.uk/Tools/clustalw2/ind
    ex.html
  • There are also Mac and PC versions with a nice
    graphical interface (CLUSTALX).

13
http//www.ebi.ac.uk/Tools/clustalw2/index.html
CLUSTALW2 at the EBI website
14
Other Multiple Alignment Tools
  • MUSCLE
  • http//www.ebi.ac.uk/Tools/muscle/index.html
  • TCOFFE http//www.ebi.ac.uk/Tools/t-coffee/
  • MSA

15
Editing Multiple Alignments
  • There are a variety of tools that can be used to
    modify and display a multiple alignment.
  • These programs can be very useful in formatting
    and annotating an alignment for publication.
  • An editor can also be used to make modifications
    by hand to improve biologically significant
    regions in a multiple alignment created by an
    alignment program.

16
Alignment editors
  • The MACAW and SeqVu program for Macintosh and
    GeneDoc and DCSE for PCs are free and provide
    excellent editor functionality.
  • Many comprehensive molecular biology programs
    include multiple alignment functions
  • Sequencher, MacVector, DS Gene, Vector NTI, all
    include a built-in version of CLUSTAL

17
SeqVu
18
JalView
  • Install on your machine
  • or run as a Java WebStart application

19
  • Check out CINEMA (Colour INteractive Editor for
    Multiple Alignments)
  • It is an editor created completely in JAVA (old
    browsers beware)
  • It includes a fully functional version of
    CLUSTAL, BLAST, and a DotPlot module

http//www.bioinf.man.ac.uk/dbbrowser/CINEMA2.1/
20
(No Transcript)
21
Analysis of Alignments
  • Once you have a multiple alignment, what can you
    do with it?
  • 1) Identify regions of similarity and difference
  • conserved regions may be functionally important,
    and/or sites for inclusive (cross species) primer
    design
  • Variable regions may be functionally important,
    and/or sites for gene/allele-specific primer
    design
  • 2) Create a sequence logo
  • 3) Build a Phylogenetic Tree (next week)

22
Format a Multiple Alignment
  • The concept of a consensus sequence is implied
    by any multiple alignment. There can be various
    rules for building the consensus simple majority
    rules, plurality by a specific , etc.
  • The alignment may look nicer by showing how each
    letter matches the consensus highlight the
    differences.
  • PLOTSIMILARITY (a graph of overall similarity
    across the alignment) EMBOSS plotcon
  • Show match to consensus showalign
  • Shade by similarity prettyplot/Boxshade

23
(No Transcript)
24
Plurality 2.00 Threshold 4 AveWeight 0.55
AveMatch 2.91 AvMisMatch -2.00 PRETTY of
_at_pretty.list October 7, 1998 1035 ..
1
50 fa10.ugly .......... ..........
.......... ..TTttGESA D.PvtTtVE. fa12.ugly
.......... .......... .......... ..TTatGESA
D.PvtTtVE. fo1k.ugly .......... ..........
.......... ..TTsaGESA D.PvtTtVE. e.ugly
Gvenae.kgv tEnTna.Tad fvaqpvyLPe .nqT......
kv.Affynrs p1m.ugly GlgqmlEsmI .dnTvreTvg
AatsrdaLPn teasGPthSk eiPALTAVET p1s.ugly
GlgqmlEsmI .dnTvreTvg AatsrdaLPn teasGPahSk
eiPALTAVET p2s.ugly GigdmiEgav .Egitknalv
pptstnsLPg hkpsGPahSk eiPALTAVET p3s.ugly
Giedliseva .qgal..Tls lpkqqdsLPd tkasGPahSk
evPALTAVET cb3.ugly ...gpvEdaI .......T..
Aaigr..vad tvgTGPtnSe aiPALTAaET r14.ugly
GlgdelEevI vEkT.kqTv. Asi....... ..ssGPkhtq
kvPiLTAnET r2.ugly ...npvEnyI dEvlnevlv.
.......vPn inssnPttSn saPALdAaET Consensus
G-----E--I -E-T---T-- A------LP- --TTGPGESA
D-PALTAVET //////////////////////////////////////
/////////////////////////// 301

349 fa10.ugly aElyCPRPll AIkvtsqdRy KqKI.iAPa.
..KQll.... ......... fa12.ugly aElyCPRPll
AIevssqdRh KqKI.iAPg. ..KQll....
......... fo1k.ugly aEtyCPRPll AIhpt.eaRh
KqKI.vAPv. ..KQTl.... ......... e.ugly
krvfCPRPtv ffPwpTsG.D Kidmtpragv lmlespnald
isrty.... p1m.ugly irvWCPRPPR AlaYygpGvD
ykdgtltPls tkdlTTy... ......... p1s.ugly
irvWCPRPPR AvaYygpGvD ykdgtltPls tkdlTTy...
......... p2s.ugly VrvWCPRPPR AvPYfgpGvD
ykdg.ltPlp ekglTTy... ......... p3s.ugly
VrvWCPRPPR AvPYygpGvD yrn.nldPls ekglTTy...
......... cb3.ugly VkaWiPRPPR lcqYekakn.
vnfrssgvtt trqsiTtmtn tgaiwtti. r14.ugly
VEaWiPRaPR AlPY.Tsigr tny..pknte pvikkrk.gd
i.ksy.... r2.ugly VkaWCPRPPR AleY.Trahr
tnfkiedrsi qtaivTrpii ttagpsdmy Consensus
VE-WCPRPPR AIPY-T-GRD K-KI--AP-- --KQTT----
---------
25
Boxshade
Shade each letter of the alignment based on its
match to the consensus highlights conserved
regions much more informative for protein
alignments (shades of grey for similar amino
acids)
http//mobyle.pasteur.fr/cgi-bin/MobylePortal/port
al.py?formboxshade
http//www.ch.embnet.org/software/BOX_form.html
26
(No Transcript)
27
(No Transcript)
28
Sequence Logos
http//weblogo.berkeley.edu/logo.cgi
http//weblogo.threeplusone.com/create.cgi
http//genome.tugraz.at/Logo/
T. D. Schneider and R. M. Stephens. Sequence
logos a new way to display consensus sequences.
Nucleic Acids Research, Vol. 18, No 20, p.
6097-6100.
29
Buidling on Alignments
  • Multiple Alignments are the starting point for
    calculating phylogenetic trees
  • Motifs and Profiles are calculated from multiple
    alignments
Write a Comment
User Comments (0)
About PowerShow.com