Title: An Algorithm for the Consecutive Ones Property
1An Algorithm forthe Consecutive Ones Property
Claudio Eccher
2Outline
- C1P definition
- Biological background
- Hybridization mapping
- An algorithm for the C1P problem
- Dividing in components
- Taking care of a component
- Joining the components together
3The consecutive ones property
Definition A binary matrix is said to have
the consecutive ones property (C1P) if a
permutation of its columns can be found such that
all 1s in each row are consecutive
A B C D
1 1 0 0 1
2 0 1 0 1
3 1 0 1 0
C A D B
1 0 1 1 0
2 0 0 1 1
3 1 1 0 0
4The consecutive ones property
Observation the C1P is closed under taking
submatrices
A bad matrix
C A D
1 0 1 1
2 1 0 1
3 1 1 0
Whichever column x I put in the middle there is a
row in which x is 0
Hence, every matrix containing this submatrix is
bad
5Hybridization mapping (1)
- Copies of a DNA molecule are broken into several
fragments (104 bases) and replicated by cloning
(clones)
- The possible binding of small sequences (probes)
to a clone are checked, the subset of the probes
bounded (hybridized) to a clone becomes its
fingerprint
- Clones overlap, and thus their relative order,
are determined by comparing fingerprints
6Hybridization mapping (2)
Two clones sharing part of their respective
fingerprints are likely to have come from
overlapping DNA regions
Clone 1
Clone 2
A
D
C
B
Probes
7Assumptions
- All clones x probes hybridization experiments
have been done
8Model
9Problem
Obtaining a physical map from M
10An algorithm for the C1P problem
- The algorithm is from Fulkerson and Gross (1965)
11Algorithm sketch
Separation of the rows into components (subsets
of rows)
Permutation of the columns of each component
Join of the components together
12Row relations
Definition " row iÎM, Sicolumns k Mi,k1
- Given two rows i and j
- Si Ç Sj Æ or
- Si Í Sj or Sj Í Si or
- Si Ç Sj ¹ Æ and none of them is a subset of the
other
13Dividing in components (1)
Lets initially lump together in the same
component the rows with non empty intersection
14Dividing in components (2)
The components we want are the connected
components of Gc
15Building Gc an example
Gc
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
l2
a
l1
Edge (l1, l2)
16Building Gc an example
Gc
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
l2
a
l4
l1
g
l5
Edge (l4, l5)
17Building Gc an example
Gc
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
l2
a
l1
Edge (l6, l7)
l8
d
l6
l7
18Building Gc an example
Gc
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
l2
a
l1
Edge (l6, l8)
l8
d
l6
l7
19Taking care of a component (1)
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 0 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
The 1s of the first row have to be put
consecutive. The possible solutions can be
represented as follows
2,7,8 2,7,8 2,7,8
l1 0 1 1 1 0
The second row is adjacent to the first one.
Hence, for the second row (l2) there are 2
choices the 1s can be placed to the left or to
the right of those of the row l1. In any case the
direction does not really matter
5 2,7 2,7 8
l1 0 0 1 1 1 0
l2 0 1 1 1 0 0
20Taking care of a component (2)
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 0 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
For the third row (l3) we have to consider the
relations with the rows connected by edges to l3
Lets place l3 with respect to l2 we cannot
place l3 in either direction (left or right)
because of its relation with l1
To take into account the relation between l1 and
l3 is necessary to consider the number of
elements in the intersections between S1, S2 and
S3
21Taking care of a component (3)
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 0 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
Definition Let xy Sx Ç Sy be the internal
product of rows x and y
If we have equality it isnt possible to have the
1s of l3 consecutive
22Taking care of a component (4)
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 0 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
For l3, S3 1,4,7,8, l1l3 2, l1l2 2,
l1l3 1, so l3 have to be put to the right of
l2
5 2 7 8 1,4 1,4
l1 0 0 1 1 1 0 0 0
l2 0 1 1 1 0 0 0 0
l3 0 0 0 1 1 1 1 0
23Taking care of a component (5)
The only choice made was in the placement of l2
with respect to l1 and both possibilities result
in the same solutions up to reversal.
24String generator
We have seen the following examples of string
generator
2,7,8
5 2,7 8
5 2 7 8 1,4
A permutation p of the probes is compatible with
a string generator if whenever A, B, C appear in
this order in p and A and C are in a group G,
then B is also included in G
An invariant of the algorithm is that, after
considering rows 1..k, a permutation p
certificates the C1P of the submatrix on rows
1..k iff either p or its reversal is compatible
with the string generator
25Taking care of a component a bad component
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 1 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
The relations between the rows are the same as
the preceding component
5 2,7 8, 3
l1 0 0 1 1 0
l2 0 1 1 0 0
5 2 7 8 3 1,4 1,4
l1 0 0 1 1 1 1 0 0
l2 0 1 1 1 0 0 0 0
l3 0 0 0 1 1 0 1 1
26Taking care of a component (6)
For a new row k in the same component find two
previously placed rows i and j s.t. E(k,i),
E(i,j) in Gc and proceed as for the three-row
case. Check also the consistency with the
solution generator
The algorithm gives all possible permutations of
a component having the C1P, up to reversal
27Algorithm implementation
Construct Gc and traverse it using depth-first
search
When visiting a vertex invoke procedure Place
Algorithm Place input u, v, w vertices of
Gc(V,E) s.t. (u,v)ÎE and (v,w) ÎE output A
placement for row u, if possible if v nil
and w nil then Place all 1s of u
consecutively else if w nil then Left-
or right-place the 1s of u with respect to the 1s
of v Record direction used else if
u w lt min(u v , v w) then Place u
with respect to v in the same direction used
in v, w placement. Record direction used
else Place u with respect to v in
the opposite direction used in v, w
placement. Record direction used Check
consistency of column set
If column sets are not consistent then the
component doesnt have the C1P
28Algorithm running time
For a n x m matrix building graph Gc takes O(nm)
time
To check consistency of column sets requires O(m)
time per row and there are n rows to process
Total time is thus O(nm)
29Joining components together (1)
GM tells us how the components of M fit together
30GM for the example matrix
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
GM
a
b
a
b
g
d
g
d
31Joining components together (2)
For two sets Si Î b, Sj Îa, if Si Í Sj then there
is no row k Î a s.t. Si Ë Sk and Si Ç Sk ¹ Æ
The exact same containments and disjunctions hold
for all other sets from b
GM is acyclic
32Joining components together (3)
The joining of components depends on the way sets
in one component contain or are contained in sets
from other components
Components having sets not contained anywhere
else should be processed first
Containment is specified by the directed edges in
GM
33Joining components together (4)
GM has to be processed in topological order
Remove all sources from GM (e.g. a) and make the
union of their string generators
While GM is not empty take the next source b,
remove b from GM, and refine the current string
generator with the string generator of b
34Example (1)
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
GM
b
a
a
b
g
g
d
d
One topological order is a, b, g, d
35Example (2)
1 2,4,5,7,9 3,6,8
l1 1 1 1 1 1 1 0 0 0
l2 0 1 1 1 1 1 1 1 1
a
2,4,5,7,9
b
l1 1 1 1 1 1
9,5 4 7 2
l6 0 0 1 1 0
l7 0 0 0 1 1
l8 1 1 1 0 0
d
6 3 8
l4 0 1 1
l5 1 1 0
g
36Example (3)
1 2,4,5,7,9 3,6,8
l1 1 1 1 1 1 1 0 0 0
l2 0 1 1 1 1 1 1 1 1
l3 0 1 1 1 1 1 0 0 0
9,5 4 7 2
l6 0 0 1 1 0
l7 0 0 0 1 1
l8 1 1 1 0 0
37Example (4)
1 9,5 4 7 2 3,6,8
l1 1 1 1 1 1 1 0 0 0
l2 0 1 1 1 1 1 1 1 1
l3 0 1 1 1 1 1 0 0 0
l6 0 0 0 1 1 0 0 0 0
l7 0 0 0 0 1 1 0 0 0
l8 0 1 1 1 0 0 0 0 0
6 3 8
l4 0 1 1
l5 1 1 0
38Example (5)
1 9,5 4 7 2 6 3 8
l1 1 1 1 1 1 1 0 0 0
l2 0 1 1 1 1 1 1 1 1
l3 0 1 1 1 1 1 0 0 0
l6 0 0 0 1 1 0 0 0 0
l7 0 0 0 0 1 1 0 0 0
l8 0 1 1 1 0 0 0 0 0
l4 0 0 0 0 0 0 0 1 1
l5 0 0 0 0 0 0 1 1 0
In this particular case there are two solutions
corresponding to the permutation of identical
columns (5 and 9)
39Algorithm solution is not unique
In general multiple solutions may exist because
- Each component may on its own have several
solutions
- Each solution can be used in two ways the
permutation and its reversal
40Algorithm running time
Topological sorting of GM takes time O(nm)
If the entries of M are preprocessed the queries
needed for traversing GM can take constant time
Preprocessing takes at most O(nm)
Total time for processing each component ci is
O(nim)
Algorithm running time is O(nm)
41Concluding remarks (1)
Even if a C1P permutation exists, this is not
necessarily the true permutation
- The solution is not unique
- In general errors do exist, so the true
permutation is not the C1P one
42Concluding remarks (2)
Generalizations to account for errors yield
NP-hard problems
Also relaxing the assumption of unique probes
yields NP-hard problems
43Related works
A considerably more complicated algorithm from
Booth and Leuker exists (1976) that takes
O(nmr) time (r is the total number of 1s)
Quite recently a simple O(nmr)-time algorithm
has been presented by Hsu - J Algorithms 43
(2002), no. 1, 1-16