Computer Science Department Technion - PowerPoint PPT Presentation

About This Presentation
Title:

Computer Science Department Technion

Description:

Computer Science Department Technion Israel Institute of Technology Genomic Sorting with Length-Weighted Reversals Ron Y. Pinter Technion Steve Skiena – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 18
Provided by: riv66
Category:

less

Transcript and Presenter's Notes

Title: Computer Science Department Technion


1
Computer Science DepartmentTechnion Israel
Institute of TechnologyGenomic Sorting with
Length-Weighted ReversalsRon Y.
PinterTechnionSteve SkienaSUNY Stony Brook
2
Genome Rearrangement
  • events
  • duplication
  • translocation
  • reversal (inversion)
  • occur primarily during reproduction
  • allow large-scale genomic comparisons

3
Sorting by Reversals
  • genome represented as a permutation on
  • 1, 2, , n
  • n homologous genes among species
  • assumptions
  • can identify genes
  • genes are distinct
  • operation reversal of a subsequence (of genes)
  • models inversion (occurs during crossover)
  • one of the permutations can be 1, 2, , n
  • appropriately relabel others

4
Example
4 3 2 8 7 1 5 6 11 10 9

4 3 2 1 7 8 5 6 9 10 11

1 2 3 4 8 7 6 5 9 10 11

1 2 3 4 5 6 7 8 9 10 11
5
Our Model
  • unsigned
  • cost of reversal of subsequence of length l is
    f(l)
  • total sorting cost (or distance) is
  • f (length(sj))

S Sj are reversed subsequences
6
Cost Functions
  • additive
  • f(xy) f(x) f(y)
  • subadditive
  • f(xy) lt f(x) f(y)
  • superadditive
  • f(xy) gt f(x) f(y)
  • other
  • e.g. bitonic

7
Problems
  • algorithm to sort any permutation
  • worst-case min cost
  • approximate min cost for a given permutation

8
Extremal Costs
  • highly subadditive e.g. unit cost, f(l) 1
  • NP complete Caprara, 97
  • series of approximation ratios 2, 1.75, 1.375
  • highly superadditive f(l) gt l2
  • essentially bubblesort

9
Our Results
  • additive cost function
  • specifically f(l) l
  • QuickSort-like algorithm for worst-case
  • complexity O(n lg2n)
  • min cost approximation ratio of O(lg2n)

10
MedianEject(a,b)
  • find r maximal blocks of wrong-sided elements
    with respect to median
  • for lg r do flip every other pair of blocks
    of wrong-sided and adjacent blocks
  • move wrong-sided blocks to median boundary
  • reverse left and right blocks

11
Sample Run
  • complexity O((b-a) lg r)

12
ReversalSort(a,b)
  • MedianEject (a,b)
  • ReversalSort (a, )
  • ReversalSort ( ,b)
  • Complexity
  • T(n) 2 ? T ( ) O(f(n) lg n) O(f(n)lg2n)
  • O(n lg2n) for f(n)n

13
Algorithmic Improvements
  • I simplify short phases
  • II merge 2 last steps of MedianEject
  • when possible (2pq vs. 3pq)
  • III apply II recursively

14
Approximation Ratio
  • M(p) is the maximal total distance between pairs
    of out-of order elements
  • Lemma 4 min cost is ?(M(p))
  • but
  • Lemma 6 of out-of order elts lt 3 ? M(p)
  • Lemma 7 MedianEject touches only elements within
    linear range from
  • out-of-order elements
  • yields
  • each round of MedianEject takes O(M(p) ? lg2 n)
  • ReversalSort costs O(M(p) ? lg2 n)
  • ReversalSort is at most O((lg2 n) times optimal

15
Bioinformatic Validation
  • use our cost ( distance) to build phylogenetic
    trees
  • 4 plants (chloroplastic genes)
  • consistent with Martin et al., PNAS Sept 02
  • work in progress M. Shoham

Cyanophora
Cyanidium
Guilardia
Porphyra
16
Open Problems Algorithmic
  • weighted genes
  • tighter approximation ratio
  • close to O(lg n)
  • can get to constant?
  • other cost functions (incl. bitonic)
  • the signed case

17
Open Problems Modeling
  • chromosomal ordering
  • what is the right cost function?
  • consider cost(l) ld
  • combine with constant-based models
  • restricted regions
  • undesired reversal sequences
  • deal with duplication and translocation events
Write a Comment
User Comments (0)
About PowerShow.com