Sequence alignment with rearrangements - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Sequence alignment with rearrangements

Description:

... (1970), LAGAN(2003) Local ... The Shuffle-LAGAN algorithm. Consists of three ... Aligning consistent subsegments using LAGAN Global Aligner. Generation of ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 21
Provided by: csPu
Category:

less

Transcript and Presenter's Notes

Title: Sequence alignment with rearrangements


1
Sequence alignment with rearrangements
  • Pankaj Kumar
  • Jitesh Kumar
  • Shouvik Som

2
Alignments
  • Global
  • One string is transformed into the other.
  • Less prone to demonstrating false homology.
  • Each letter of one sequence is constrained to
    being aligned to only one letter of other.
  • Needleman Wunsch (1970), LAGAN(2003)
  • Local
  • All locations of similarity between 2 strings r
    returned.
  • Higher false positive rates cant take overall
    conservation maps.
  • CHAOS(2002),BLASTZ(2003).

3
Rearrangement Events What are they ?
  • DNA mutates by various rearrangement events,
    during evolution. They are -
  • Translocation a subsegment removed inserted in
    a different location but same orientation.
  • Inversion subseg removed from seq then
    inserted in same location but in different
    orientation.
  • Duplication a copy of subseg inserted into the
    seq, the original subseg is unchanged.
  • Or, a combination of above.

4
Glocal Aligner its need?
  • Neither Global, nor Local aligners handle
    rearrangements events satisfactorily.
  • Global Aligners dont handle these events at all,
    as they should be monotonically increasing in
    both sequences.
  • Local Aligners, dont suggest how sequences could
    have evolved from a common ancestor.
  • So, a Hybrid.., a glocal aligner capable of
    quickly aligning long genomic sequences has to be
    used.

5
Glocal Alignment
  • A glocal alignment between two sequences is a
    series of operations that transform one sequence
    into the other.
  • The necessary set of operations are includes
    insertions, deletions, point mutations,
    inversions, translocations, and duplications.
  • Each operation, incurs a penalty, and total edit
    distance, is the sum of penalties.
  • One such Glocal Alignment Algorithm is SLAGAN.
  • (Michael Brudno1,, Sanket Malde1,, Alexander
    Poliakov 2,
  • Chuong B. Do1, Olivier Couronne2, Inna Dubchak2
    and Serafim Batzoglou 1,).

6
The Shuffle-LAGAN algorithm
  • Consists of three distinct stages.
  • Generation of local alignments using CHAOS tool.
  • Building the 1-monotonic conservation map.
  • Aligning consistent subsegments using LAGAN
    Global Aligner.

7
Generation of Local Alignments
  • SLAGAN uses CHAOS (Brudno and Morgenstern, 2002
    Brudno et al., 2003) a method that finds small
    matching words with degeneracy, and chains them
    into local alignments.

8
Working with Chaos
  • ./chaos sample.fasta dbase.fasta -co 4 -b -wl 5
  • co cutoff score b both strands wl
    word length
  • sample.fasta
  • gtsample1
  • AAATGTCC
  • dbase.fasta
  • gtsample2
  • GGCATGTCCAGAAAATCCAAGTGCCTCTTCCTCTTGATCTTCTCCAACGA
    TGTCCAGA
  • AAATCCAAGTGCCTCATTCCTCTTGATCTTCTCCAGGCATGTCCAGAAAA
    TCCAAGTG
  • CCTCTTCCTCTCTGATCTTCTCCTCGGTTGGTCCAGAAAATCCAAGTGCC
    TCTTCCTC
  • TTGATCTTCTCCAGAAATGTCCAGAAAATCCAAGTAGCCTCTTCCTCTTG
    ATCGGCTC
  • CAGAAATGTCCAGAAAAATCCAAGTGCCTCTTCCTCTTGATCGGCTCCAT
    AAATGTCC
  • AGAAAATCCAACGTGCCTCTTCCTCTTGATCGGCTCCAGAAATGTCCAGA
    AATATCCA
  • AGTGCCTCTTCCTCTTGATCGGCTCCTTA
  • Chaos Output

9
CHAOS Output
  • sample1 1 4 sample2 246 243 score 4100.000000
    (-)
  • sample1 1 8 sample2 340 347 score 624.000000
    ()
  • sample1 1 7 sample2 19 25 score 411.000000
    ()
  • sample1 1 7 sample2 65 71 score 411.000000
    ()
  • sample1 1 7 sample2 112 118 score 411.000000
    ()
  • sample1 1 7 sample2 160 166 score 411.000000
    ()
  • sample1 1 8 sample2 189 196 score 1369.000000
    ()
  • sample1 1 8 sample2 236 243 score 1369.000000
    ()
  • sample1 1 7 sample2 254 260 score 411.000000
    ()
  • sample1 1 8 sample2 330 337 score 1369.000000
    ()
  • sample1 1 7 sample2 348 354 score 411.000000
    ()
  • sample1 1 4 sample2 59 62 score 364.000000
    ()
  • sample1 1 4 sample2 106 109 score 364.000000
    ()
  • sample1 1 4 sample2 154 157 score 364.000000
    ()
  • sample1 2 8 sample2 3 9 score 459.000000 ()
  • sample1 2 8 sample2 49 55 score 542.000000
    ()
  • sample1 2 8 sample2 96 102 score 459.000000
    ()
  • sample1 2 8 sample2 300 306 score 317.000000
    ()
  • sample1 5 8 sample2 147 150 score 391.000000
    ()

10
Building the 1-monotonic map
  • What is a 1-Monotonic Chain?
  • A local alignment is represented as
  • L (start1, end1, start2, end2, score, strand)
  • Consider two alignments L1 L2. We call them
    1-monotonic if,
  • L2.start1 gt L1.end1 .
  • An ordered list of local alignments L1Lk is
    1-monotonic if for any pair of local alignments
    Li, Lj if i lt j then Li Lj are 1-monotonic
    respectively. Intuitively, a list of local
    alignments is 1-monotonic if it is strictly
    increasing in the first sequence.

11
Implementation of 1-monotonic chaining
  • Parse the Chaos File
  • Create a maximum score Chain using Dynamic
    Programming

12
Structure used for a local alignment
  • Linked list of local alignments after parsing
    Chaos output file
  • typedef struct frag
  • int startx
  • int endx
  • int starty
  • int endy
  • float score
  • float chainscore
  • char strand
  • struct frag chained_with
  • struct frag chained_with_front
  • struct frag forward
  • struct frag backward
  • Anchor
  • back

frag3
frag4
frag1
frag2
13
Dynamic programming pseudo code
  • for every v
  • v-gtchainscore 0
  • traverse every node w before v in the
    doubly linked list .
  • if (chainable(w,v) cal_chainscore
    (w,v)gtv-gtchainscore)
  • v-gtchainedwith w
  • v-gtchainscore cal_chainscore(w,v)

14
Consistent Subsegments
  • What are Consistent Subsegments?
  • Two local alignments are consistent if -
  • They are 1-monotonic.
  • Both on same strand.
  • L2.start2 gt L1.end2 if on ve strand L2.start2
    lt L1.end2 on ve strand.
  • .

15
Implementation
  • Traverse the 1-monotone chain to find consistent
    subsegments
  • For every consistent subsegment creates two files
    seq1.fasta seq2.fasta, containing the
    corresponding substrings of the sequences to be
    aligned.
  • LAGAN is called to align the sequences.

16
Test Case (one direct and one reversed)
  • GGCATGTCCAGAAAATCCAAGTGCCTCTTCCTCTTGATCTTCTCCAACGA
    TGTCCAGA
  • AAATCCAAGTGCCTCATTCCTCTTGATCTTCTCCAGGCATGTCCAGAAAA
    TCCAAGTG
  • CCTCTTCCTCTCTGATCTTCTCCTCGGTTGGTCCAGAAAATCCAAGTGCC
    TCTTCCTC
  • TTGATCTTCTCCAGAAATGTCCAGAAAATCCAAGTAGCCTCTTCCTCTTG
    ATCGGCTC
  • CAGAAATGTCCAGAAAAATCCAAGTGCCTCTTCCTCTTGATCGGCTCCAT
    AAATGTCC
  • AGAAAATCCAACGTGCCTCTTCCTCTTGATCGGCTCCAGAAATGTCCAGA
    AATATCCA
  • AGTGCCTCTTCCTCTTGATCGGCTCCTTA
  • GGCACTTGGATTTTCTTCTCTCTTTAAACCTCTTGATCGGCTCC

17
Chaos output
  • ./chaos sample1.fasta sample2.fasta co 5 wl 10
    b
  • co cutoff score wl word length b
    both strands
  • sample1 28 44 sample2 217 233 score
    5315.000000 ()
  • sample1 28 44 sample2 264 280 score
    5315.000000 ()
  • sample1 28 44 sample2 311 327 score
    5315.000000 ()
  • sample1 28 44 sample2 358 374 score
    5315.000000 ()
  • sample1 28 39 sample2 29 40 score 2177.000000
    ()
  • sample1 28 39 sample2 76 87 score 2382.000000
    ()
  • sample1 28 39 sample2 170 181 score
    2177.000000 ()
  • sample1 1 14 sample2 354 341 score
    1114.000000 (-)
  • sample1 1 15 sample2 260 246 score
    2494.000000 (-)
  • sample1 1 17 sample2 166 150 score
    5157.000000 (-)
  • sample1 1 17 sample2 118 102 score
    5157.000000 (-)
  • sample1 1 17 sample2 71 55 score 5157.000000
    (-)
  • sample1 1 17 sample2 25 9 score 5157.000000
    (-)
  • sample1 3 17 sample2 210 196 score
    2054.000000 (-)
  • sample1 5 17 sample2 302 290 score
    3042.000000 (-)

18
Result 1
  • seq1 GGCACTTGGATTTTCT-------------------------
    -------------------
  • seq2 GGCACTTGGATTTTCTGGACCAACCGAGGAGAAGATCAGAG
    AGGAAGAGGCACTTGGATT
  • 10 20 30 40
    50 60
  • seq1 -----------------------------------------
    -------------------
  • seq2 TTCTGGACATGCCTGGAGAAGATCAAGAGGAATGAGGCACT
    TGGATTTTCTGGACATCGT
  • 70 80 90 100
    110 120
  • seq1 -----------------------------------------
    -----------------------------------T
  • seq2 TGGAGAAGATCAAGAGGAAGAGGCACTTGGATTTTCTGGAC
    ATGCC
  • 130 140 150 160

19
Result 2
  • seq1 CTCTCTT----------------------------------
    -------------------
  • seq2 TCTTCCTCTTGATCTTCTCCAGAAATGTCCAGAAAATCCAA
    GTAGCCTCTTCCTCTTGAT
  • 10 20 30 40
    50 60
  • seq1 -----------------------------------------
    -------------------
  • seq2 CGGCTCCAGAAATGTCCAGAAAAATCCAAGTGCCTCTTCCT
    CTTGATCGGCTCCATAAAT
  • 70 80 90 100
    110 120
  • seq1 -----------------------------------------
    -------------------
  • seq2 GTCCAGAAAATCCAACGTGCCTCTTCCTCTTGATCGGCTCC
    AGAAATGTCCAGAAATATC
  • 130 140 150 160
    170 180
  • 10 20
  • seq1 -------------TAAACCTCTTGATCGGCTCC---

20
Cost Function
  • The gap penalty charged for all
  • transitions consists of 3 parts -
  • Gap Open Constant
  • Gap Continue Penalty (L1.end1L1.end2)(L2.star
    t1L2.start2)constant
  • Distance Alignment Penalty
  • min(L1.end1L2.start1, L1.end2-L2.start2)
  • back

21
  • startx 1 endx 4 starty 246 endy 243
    strand -
  • startx 5 endx 8 starty 147 endy 150
    strand
Write a Comment
User Comments (0)
About PowerShow.com