Algorithms for Multiple Alignment - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Algorithms for Multiple Alignment

Description:

Christian E. Loza and Joseph Sheinberg. University of North Texas - 2005. The Task ... Note that in the case of the biased algorithm, we proceed in the same way, but ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 15
Provided by: cse54
Learn more at: http://www.cse.unt.edu
Category:

less

Transcript and Presenter's Notes

Title: Algorithms for Multiple Alignment


1
Algorithms for Multiple Alignment
  • Christian E. Loza and Joseph Sheinberg
  • University of North Texas - 2005

2
The Task
  • Given N sequences, provide an optimal sequence
    alignment

C T T G T A T
C
C T
A T
T C
C T
T T
C T
C G T G T A C
C
3
Brute Force approach
  • For 3 sequences ( k 3 )

O( nk! )
4
Dynamic Programming
O( 2nk )
5
Pairwise algorithm (Bains)
C T T G T A T
C
C T
A T
T C
C T
O( k n2 )
T T
C
T
Problems initial errors are chaotic
C G T G T A C
C
6
Clustal (Higgins and Sharp)
  • Alignments are done in four steps.
  • Pairwise alignments
  • Similarity matrix
  • Cluster similar sequences based on their
    similarity score
  • Progressive multiple alignment

O( k n2 ) or faster, depending on pairwise
algorithms
7
Discrete Alignment (Loza, Shneirder)
  • Discrete alignment
  • Calculate window size
  • Calculation of relative frequencies
  • Alignment and consensus

8
1. Window Size
  • One sequence of codons has the following
    probability to appear in a n sequence.
  • 4w n, ? w (0.5) lg n
  • Where 4 is the number of symbols, in this case,
    ACTG.

9
2. Calculation of frequencies
  • We divide each of the sequences in the window
    size, and store the occurrences of the different
    sub sequences.
  • ACTCGTGCGT
  • with a window size of three will return
  • ACT, CTC, TCG, CGT(2), GTG, TGC, GCG
  • We store not only the frequencies, but also the
    positions to be used in the next step.

10
2. (Biased Option)
  • We divide each of the sequences in the window
    size, and store the occurrences of the different
    sub sequences.
  • ACTCGTGCGT
  • with a window size of three will return
  • ACT, CTC, TCG, CGT(2), GTG, TGC, GCG
  • we do this for the frequencies of all
    sequences except the one we want to be biased to.
  • In the Biased Option, we store just the
    frequencies relative to one of the sequences.

11
3. Final alignment
  • We proceed to use the same cluster algorithm as
    Higgins and Sharp to align in a hierarchal
    approach the result subsequences. We cluster them
    and do a incremental global alignment.
  • Note that in the case of the biased algorithm, we
    proceed in the same way, but the result is going
    to be biased to one of the sequences.
  • This is a greedy approach

12
3. Final alignment
  • The final alignment resembles the pairwise
    alignment, with the difference that is between
    one sequence and globally the rest of the
    sequences.
  • Therefore, the time complexity is the same as in
    pairwise alignment O(kn2).

13
Time Complexity
  • This approximation algorithm has the complexity
    of O(kn2).
  • The biased algorithm has a time complexity of
    O(k2n2).
  • It is less vulnerable to initial bad choices.
  • The biased alignment shows us how we can align
    all of the sequences considering one of the
    sequences more important than the others.

14
Questions?
  • Thank you )
Write a Comment
User Comments (0)
About PowerShow.com