Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis

Description:

Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis Yaw-Ling Lin Tao Jiang Kun-Mao Chao – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 21
Provided by: yll81
Learn more at: http://www.cs.ucr.edu
Category:

less

Transcript and Presenter's Notes

Title: Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis


1
Efficient Algorithms for Locating the
Length-Constrained Heaviest Segments, with
Applications to Biomolecular Sequence Analysis
  • Yaw-Ling Lin Tao Jiang Kun-Mao Chao
  • Dept CSIM, Providence Univ, Taiwan
  • Dept CS and Engineering, UC Riverside, USA
  • Dept Life Science, Nat. Yang-Ming Univ, Taiwan

2
Outline
  • Introduction.
  • Applications to Biomolecular Sequence Analysis.
  • Maximum Sum Consecutive Subsequence.
  • Maximum Average Consecutive Subsequence.
  • Implementation and Preliminary Experiments
  • Concluding Remarks

3
Introduction
  • Two fundamental algorithms in searching for
    interesting regions in sequences
  • Given a sequence of real numbers of length n and
    an upper bound U, find a consecutive subsequence
    of length at most U with the maximum sum --- an
    O(n)-time algorithm.
  • Given a sequence of real numbers of length n and
    a lower bound L, find a consecutive subsequence
    of length at least L with the maximum average.
    --- an O(n log L)-time algorithm.

4
Applications to Biomolecular Sequence Analysis (I)
  • Locating GC-Rich Regions
  • Finding GC-rich regions an important problem in
    gene recognition and comparative genomics.
  • CpG islands ( 200 1400 bp )
  • Huang94 O(n L)-time algorithm.
  • Post-Processing Sequence Alignments
  • Comparative analysis of human and mouse DNA
    useful in gene prediction in human genome.
  • Mosaic effect bad inner sequence.
  • Normalized local alignment.
  • Post-processing local aligned subsequences

5
Applications to Biomolecular Sequence Analysis
(II)
  • Annotating Multiple Sequence Alignments
  • Stojanovic99 conserved regions in
    biomolecular sequences.
  • Numerical scores for columns of a multiple
    alignment each column score shall be adjusted by
    subtracting an anchor value.
  • Ungapped Local Alignments with Length Constraints
  • Computing the length-constrained segment of each
    diagonal in the matrix with the largest sum (or
    average) of scores.
  • Applications in motif identification.

6
Maximum Sum Consecutive Subsequence
lt-4,1,-2,3gt is left-negative
lt 5, -3, 4, -1, 2, -6 gt is not.
lt5gt lt-3,4gt lt-1,2gt lt-6gt is minimal left-negative
partitioned.
7
Minimal left-negative partition
8
MLN-partition linear time
9
Max-Sum with LC
10
Analysis of MSLC
11
Max Average Subsequence
lt4,2,3,8gt is right-skew
lt 5, 3, 4, 1, 2, 6 gt is not.
lt5gt lt3,4gt lt1,2,6gt is decreasing right-skew
partitioned.
12
Decreasing right-skiew partition
13
DRS-partition linear time
14
Max-Avg-Seq with LC
15
Locate good-partner
16
Analysis of MaxAvgSeq
17
Implementation and Preliminary Experiments
18
Implementation and Preliminary Experiments
19
Conclusion
  • Find a max-sum subsequence of length at most U
    can be done in O(n)-time.
  • Find a max-avg subsequence of length at least L
    can be done in O(n log L)-time.
  • Is there a linear-time algorithm to find a
    max-avg subsequence of length at least L?

20
Future Research
  • Best k (nonintersecting) subsequences?
  • Normalized local alignment?
  • Measurement of goodness?
Write a Comment
User Comments (0)
About PowerShow.com