Improving MSA - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Improving MSA

Description:

Improving the Practical Space and Time Efficiency of the ... MSA Defintions (cont.) Function sub(a,b) sub(-,-) = 0. sub(a,b) is symmetric. Function c(Ai,j) ... – PowerPoint PPT presentation

Number of Views:143
Avg rating:3.0/5.0
Slides: 23
Provided by: billz7
Category:

less

Transcript and Presenter's Notes

Title: Improving MSA


1
Improving MSA
  • Bill Zeller
  • Trinity College

April 11, 2005
2
Presenting
  • Improving the Practical Space and Time Efficiency
    of the Shortest-Paths Approach to Sum-of-Pairs
    Multiple Sequence Alignment
  • Sandeep K. Gupta
  • John D. Kececioglu
  • Alejandro A. Schäffer

3
Background
  • Tries
  • From the word retrieval
  • Allows for fast key lookup
  • Requires less space than BST

4
Background (cont.)
  • Dijkstras Algorithm
  • Finds shortest path from source vertex to every
    other vertex in a graph

5
MSA Definitions
  • S1,,SK input sequences
  • S is the input alphabet
  • Resulting alignment is a rectangular char array
    which satisfies the following
  • There are exactly K rows
  • Row I is precisely the string SI if dashes are
    ignored

6
MSA Definitions
  • S1,,SK input sequences
  • S is the input alphabet
  • Resulting alignment is a rectangular char array
    which satisfies the following
  • There are exactly K rows
  • Row I is precisely the string SI if dashes are
    ignored
  • --VLSPADKTNVKAAWGKVGAHAGEYGAEALE-
  • -VHLTPEEKSAVTALWGKVNVD--EVGGEALGR
  • --GLSDGEWQLVLNVWGKVEADIPGHGQEVLI-
  • MKFFAVLALCIVGAIASPLTADEASLVQSS---

7
MSA Defintions (cont.)
  • Function sub(a,b)
  • sub(-,-) 0
  • sub(a,b) is symmetric
  • Function c(Ai,j)

8
The Problem
  • Basic sum-of-pairs multiple sequence alignment
    problem is to minimize the pairwise sum
  • By default, MSA multiplies each pairwise
    alignment cost by a weight, denoted

9
Boundin
  • MSA searches only among alignments whose cost is
    lt U for some U
  • How to get U?
  • U L d

10
More MSA
  • For each pair of sequences Si, Sj MSA computes
    the standard two-dimensional dynamic programming
    graph Di,j
  • For each vertex in the graph, the cost of an
    optimal alignment is also computed
  • Accepted if admissible
  • Admissible if cost at most d(Si,Sj) ei,j

11
MSA Setup
  • Directed acyclic graph in which source-to-sink
    paths of weight C correspond to alignments of
    cost C
  • Vertices represented as integers in K
    dimensional space
  • Example
  • For Dijkstra, instead of array D, each vertex has
    label v.D

12
MSA Setup
  • Directed acyclic graph in which source-to-sink
    paths of weight C correspond to alignments of
    cost C
  • Vertices represented as integers in K
    dimensional space
  • Example
  • lt0,0,0,0gt ? lt0,0,0,1gt ? lt0,1,0,2gt
  • For Dijkstra, instead of array D, each vertex has
    label v.D

13
MSA w/o Gap Penalties
  • Two Stages
  • Stage 1
  • Find the set of points that make up a path
    between Si and Sj with cost at most d(Si,Sj)
    ei,j
  • Stage 2
  • Shortest source to sink path is computed

14
MSA Pruning
  • Two Types of pruning
  • Return cost pruning
  • Carrillo-Lipman

15
MSA w/ Gap Penalties
  • Principle of Optimality
  • Allows us to do
  • Principle does not apply in the same form with
    gap penalties

16
MSA Improvements
  • Running time mainly depends on three functions
  • Cost
  • Adjacent
  • Edge
  • Majority of space taken up by records that store
    edges

17
Original Edge Storage
  • Edges stored at incoming vertex
  • Pros
  • Easy backtracking
  • Easily determine whether edge exists
  • Cons
  • Many list searches

18
New Edge Storage
  • Store outgoing edges v?w at v
  • Add backtracking field to each edge
  • Add reference count to each edge

19
Facts Helping to Free Space
  • Once an edge e is extracted from PQ, e.D will
    never decrease
  • Edge can be deleted when either
  • (with e v?w and w!t)
  • w has no outgoing edges
  • All edges in the form w?x have an edge other than
    e as backtrack edge
  • If the list of outgoing edges for v becomes
    empty, after v has been visited, then all edges
    into v can be deleted.
  • If vertex v has no outgoing edges or incoming
    edges after some edge outgoing from v is
    extracted, then v can never occur in an optimal
    path. V can be deleted and need not be recreated
  • Deletion of vertices permits the deletion of all
    outgoing and incoming edges, which can cause
    cascading deletions.

20
Improvements to Cost Function
  • Inlined
  • Loop invariants
  • Comparison of D values
  • (Given e p?q and f q?r)
  • We check to see if e.D cost(e,f) lt f.D
  • Experiments showed that this test very often
    fails.
  • Is it possible to check this without calling the
    expensive function Cost?
  • Yes. Check e.D extra lt f.D instead, where extra
    lt cost(e,f)
  • Changed cost function to check extra in loop and
    exit early if value is too large

21
Results
22
Conclusions
  • Made feasible many problems which were not
    previously possible
  • Much of what was changed were low-level
    implementation details
  • Shows that it is very useful in programs which
    allocate a lot of memory to look for
    opportunities to free memory
  • Coding of inner loop that takes most of the time
    is crucial
Write a Comment
User Comments (0)
About PowerShow.com