A Parallel Solution to Global Sequence Comparisons - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

A Parallel Solution to Global Sequence Comparisons

Description:

1981: Smith and Waterman (local, dynamic) Shortcomings: not ... Parallel Smith-Waterman (localized; start and continue while 0 then end); (BLAZE-Stanford) ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 17
Provided by: acade129
Category:

less

Transcript and Presenter's Notes

Title: A Parallel Solution to Global Sequence Comparisons


1
A Parallel Solution to Global Sequence Comparisons
  • CSC 583 Parallel Programming
  • By Nnamdi Ihuegbu
  • 12/19/03

2
Brief Introduction
  • Human Genome Project (and others) -gt Vast amount
    of biological data
  • Venture Computer Science and Biology (BCB) -gt
    Genetic Databases (map,genomic,proteomic)
  • Expected date of Completed map of human genome
    end of 2003
  • Next stage Sequence comp. and Seq-Protein
    function.
  • Useful to Pharm. Companies (CADD e.g. SKBs
    Relenza).

3
Results - Sequence
  • Current Sequence Generation Technologies
  • Maxam-Gilbert (use chemicals to cleave DNA at a
    specific base/length)
  • Sanger (use enzymatic procedures to produce DNA
    based on specific basei.e. length)

4
Derivation of nucleotide sequence from human
chromosome
5
Sequence Comparison Methods
  • Types of Sequence Comparisons/alignmts.
  • Global (How similar are these two sequences?)
  • To find best overall alignment b/w two sequences
  • 1970 Needleman and Wunch (global, dynamic)
  • Shortcomings in small similarities w/in 2
    subseq.
  • Local (What sequences in a database are most
    similar to this sequence?)
  • To find the best subseq. match b/w two sequences
  • 1981 Smith and Waterman (local, dynamic)
  • Shortcomings not computationally efficient, slow

6
Results - Sequence
7
Results - Sequence
  • Heuristic Search (Quick, Approximate)
  • Quickly search for words that match sequence.
    Then recursively perform local search on each
    matched word until no other matches
  • FASTA (1998), BLAST(1990)
  • Shortcomings approximate not exact, E-Value (sig
    if lt0.05)

8
Results Sequence (CSC Implementation)
  • Sequence alignment can be represented as matrices
    and graphs (using rules and costs)
  • When converted into a directed acyclic graph,
    solution of the sequence alignment is the
    shortest-path with maximum value (max. path
    problem).

9
Sequencing (CSC Implementation)
  • Can be solved dynamically as a running max
    score (RMS).
  • For each D(i,j), best RMS max(westgap1,
    northgap2, NWcurrent_score)
  • Replace D(i,j) with max
  • Needleman-Wunch Dynamic Program

Diag. edge character matches down edge gap
in string 2 across edge gap in string 1
10
Parallel Solution
Work (Slaves) allocated in stripes
11
Parallel Solution (Contd)
 
Allocating Strips in SubMatrix
12
Parallel Results
Path
T A -1
G T -3
_ T -6
-10
 
Each cell in each strip computes maximum of
NEIGHBORS (running max)
13
Improvements
  • Parallel Smith-Waterman (localized start and
    continue while gt0 then end) (BLAZE-Stanford).
  • Pipeline implementation on an actual Mesh
    Topology
  • Other possible data infrastructures to traverse
    data in search of shortest path (e.g. Trees --
    specialized)

14
Improvements (Contd)
  • Faster means of comparing and aligning multiple
    sequences simultaneously (e.g. comparing novel
    protein sequence to family).

15
Any Questions?
16
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com