Title: Efficient Algorithms for Locating the Length-Constrained Heaviest Segments, with Applications to Biomolecular Sequence Analysis
1Efficient Algorithms for Locating the
Length-Constrained Heaviest Segments, with
Applications to Biomolecular Sequence Analysis
- Yaw-Ling Lin Tao Jiang Kun-Mao Chao
- Dept CSIM, Providence Univ, Taiwan
- Dept CS and Engineering, UC Riverside, USA
- Dept Life Science, Nat. Yang-Ming Univ, Taiwan
2Outline
- Introduction.
- Applications to Biomolecular Sequence Analysis.
- Maximum Sum Consecutive Subsequence.
- Maximum Average Consecutive Subsequence.
- Implementation and Preliminary Experiments
- Concluding Remarks
3Introduction
- Two fundamental algorithms in searching for
interesting regions in sequences - Given a sequence of real numbers of length n and
an upper bound U, find a consecutive subsequence
of length at most U with the maximum sum --- an
O(n)-time algorithm. - Given a sequence of real numbers of length n and
a lower bound L, find a consecutive subsequence
of length at least L with the maximum average.
--- an O(n log L)-time algorithm.
4Applications to Biomolecular Sequence Analysis (I)
- Locating GC-Rich Regions
- Finding GC-rich regions an important problem in
gene recognition and comparative genomics. - CpG islands ( 200 1400 bp )
- Huang94 O(n L)-time algorithm.
- Post-Processing Sequence Alignments
- Comparative analysis of human and mouse DNA
useful in gene prediction in human genome. - Mosaic effect bad inner sequence.
- Normalized local alignment.
- Post-processing local aligned subsequences
5Applications to Biomolecular Sequence Analysis
(II)
- Annotating Multiple Sequence Alignments
- Stojanovic99 conserved regions in
biomolecular sequences. - Numerical scores for columns of a multiple
alignment each column score shall be adjusted by
subtracting an anchor value. - Ungapped Local Alignments with Length Constraints
- Computing the length-constrained segment of each
diagonal in the matrix with the largest sum (or
average) of scores. - Applications in motif identification.
6Maximum Sum Consecutive Subsequence
lt-4,1,-2,3gt is left-negative
lt 5, -3, 4, -1, 2, -6 gt is not.
lt5gt lt-3,4gt lt-1,2gt lt-6gt is minimal left-negative
partitioned.
7Minimal left-negative partition
8MLN-partition linear time
9Max-Sum with LC
10Analysis of MSLC
11Max Average Subsequence
lt4,2,3,8gt is right-skew
lt 5, 3, 4, 1, 2, 6 gt is not.
lt5gt lt3,4gt lt1,2,6gt is decreasing right-skew
partitioned.
12Decreasing right-skiew partition
13DRS-partition linear time
14Max-Avg-Seq with LC
15Locate good-partner
16Analysis of MaxAvgSeq
17Implementation and Preliminary Experiments
18Implementation and Preliminary Experiments
19Conclusion
- Find a max-sum subsequence of length at most U
can be done in O(n)-time. - Find a max-avg subsequence of length at least L
can be done in O(n log L)-time. - Is there a linear-time algorithm to find a
max-avg subsequence of length at least L?
20Future Research
- Best k (nonintersecting) subsequences?
- Normalized local alignment?
- Measurement of goodness?