Physical Mapping - PowerPoint PPT Presentation

About This Presentation
Title:

Physical Mapping

Description:

Introduction. Why physical mapping? -Physical mapping is a central in Molecular Biology. ... are unique a probe can bind to a clone in at most one place ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 38
Provided by: lu8380
Category:

less

Transcript and Presenter's Notes

Title: Physical Mapping


1
  • Physical Mapping
  • --An Algorithm and An Approximation for
    Hybridization Mapping
  • Shi Chen
  • CSE497
  • 04Mar2004

2
Introduction
  • Why physical mapping?
  • -Physical mapping is a central in Molecular
    Biology.
  • -DNA is cut into small fragments for replicate
    and study, and information on the ordering is
    lost.
  • -The goal of physical mapping is to reconstruct
    the relative ordering of the clones.

3
Introduction
  • Two Popular ways of obtaining fingerprints
  • Restriction site analysis.
  • Measure fragments length which is its
    fingerprint.
  • Hybridization.
  • Check whether a small sequence known as a probe
    binds or hybridizes to the clone which is DNA
    fragment.
  • Most often a probe is a STS (sequence tagged
    sites) DNA string of 200-300 bp whose ends
    occur only once in the entire genome.

4
Models for Hybridization Mapping
  • -Interval Graph Models
  • Vertices represent clones and edges represent
    overlap information between clones.
  • -Disadvantage complexity NP-hard.

5
Models for Hybridization Mapping-C1P definition
  • Definition
  • A binary matrix is said to have the
  • consecutive ones property (C1P)
  • if a permutation of its columns can be found such
    that all 1s in each row are consecutive.

A B C D
1 1 0 0 1
2 0 1 0 1
3 1 0 1 0
C A D B
1 0 1 1 0
2 0 0 1 1
3 1 1 0 0
6
Models for Hybridization Mapping C1P
  • Assumptions for Consecutive Ones Property (C1P)
    Model
  • a. Probes are unique a probe can bind to a
    clone in at most one place use STS (sequence
    tagged sites)
  • b. No errors (C1P permutation exists)
  • c. All clonesprobes hybridization experiments
    have been done difficult to achieve.
  • Advantage
  • Polynomial-time solvable.

7
Models for Hybridization Mapping C1P model
  • n clones and m probes
  • n m binary matrix M built from experimental
    data
  • Mij 1 probe j hybridized to clone i
  • Mij 0 probe j not hybridized to clone i

c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
8
Algorithm for C1P - Introduction
  • Goal Find a permutation of the columns such
    that in each row all 1s are consecutive.
  • Assumptions
  • All rows are different, i.e. no two clones have
    the same fingerprint.
  • No row is all zeros, i.e. every clone is
    hybridized by at least one probe.

9
Algorithm for C1P Algorithm sketch
  • Separation of the rows into components (subsets
    of rows).
  • Permutation of the columns of each component.
  • Join of the components together.

10
Algorithm for C1P Row relations
Definition " row iÎM, Sicolumns k Mi,k1
  • Given two rows i and j
  • Si Ç Sj Æ or
  • Si Í Sj or Sj Í Si or
  • Si Ç Sj ¹ Æ and none is a subset of the other.
  • First case i and j have no conflicts - they can
    be dealt with separately.
  • Second case i and j are compatible - any
    solution for the row with
  • fewer 1s is acceptable.
  • Third case i and j have to be treated
    simultaneously - they are
  • connected.

11
Algorithm for C1PTaking care of a component
ß
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
  • l3
  • l2

a
  • l4
  • l1

?
  • l5

d
  • l8
  • l6
  • l7

TABLE 5.1 A binary matrix.
Figure 5.7 Graph Gc corresponding to the matrix
of Table 5.1
12
Algorithm for C1PExample Matrix
A section of a binary matrix
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 0 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
l1
l2
l3
2,7,8 2,7,82,7,8
l1? 0 1 1
1 0
5 2,7 2,7 8
l1? 0 0 1 1 1 0
l2? 0 1 1 1 0 0
13
Algorithm for C1PExample Matrix
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 0 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
l1
l2
l3
What will happen if we place 5 on the right?
8 7,2 7,2 5
l1? 0 1 1 1 0 0
l2? 0 0 1 1 1 0
14
Algorithm for C1PExample Matrix
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 0 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
l1
l2
l3
  • How to place l3?
  • Consider the number of elements in the
    intersections between S1, S2 and S3.
  • Definition Let xy SxnSy be the internal
    product of rows x and y.
  • -If l1l3 lt min(l1l2, l2l3), place l3 in the
    same direction that l2 was placed with respect to
    l1.
  • -If l1l3 gt min(l1l2, l2l3), place l3 in the
    opposite direction that l2 was placed with
    respect to l1.

15
Algorithm for C1PExample Matrix
c1 c2 c3 c4 c5 c6 c7 c8
l1 0 1 0 0 0 0 1 1
l2 0 1 0 0 1 0 1 0
l3 1 0 0 1 0 0 1 1
l1
l2
l3
  • In our case
  • S3 1,4,7,8, Then
  • l1l3 2, l1l2 2, l3l2 1.
  • So, place l3 to the right of l2.

52781,41,4 l1? 0
0 1 1 1 0 0 0 l2? 0 1
1 1 0 0 0 0 l3? 0 0 0
1 1 1 1 0
16
Algorithm for C1PComplexity
  • Building Graph Gc takes O(nm) time.
  • Process n rows, spending O(m) per row to check
    consistency of column sets.
  • Total time is O(nm).

17
Algorithm for C1PJoining Components Together
c1 c2 c3 c4 c5 c6 c7 c8 c9
l1 1 1 0 1 1 0 1 0 1
l2 0 1 1 1 1 1 1 1 1
l3 0 1 0 1 1 0 1 0 1
l4 0 0 1 0 0 0 0 1 0
l5 0 0 1 0 0 1 0 0 0
l6 0 0 0 1 0 0 1 0 0
l7 0 1 0 0 0 0 1 0 0
l8 0 0 0 1 1 0 0 0 1
ß
a
?
d
Figure 5.9 Graph GM corresponding to the
components of the matrix from Table 5.1.
TABLE 5.1 A binary matrix.
18
Algorithm for C1PJoining Components Together
  • Process GM in topological ordering
  • -Process first components that have sets that are
    not contained anywhere else.
  • -Suppose following edge (a,ß), find reference
    column in component athat will tell us how to
    place the rows of ß.
  • a. Choose row l fromßthat has the leftmost 1,
    and call the column where this 1 is cß.
  • b. Find all rows fromathat contain Sl, and
    find the leftmost column where all such rows have
    1s, this column cais the reference column.

19
Algorithm for C1PJoining Components Together
  • 1 2,4,5,7,9 3,6,8
  • l1? 1 1 1 1 1 0 0 0
  • l2? 0 1 1 1 1 1 1 1

a
2,4,5,7,9 l3? 1 1 1 1 1
ß
1 2,4,5,7,9 3,6,8 l1? 1 1 1
1 1 1 0 0 0 l2? 0 1 1 1 1 1 1 1 1
l3? 0 1 1 1 1 1 0 0 0
20
Algorithm for C1PJoining Components Together

9,5 4 7 2 l6? 0 0
1 1 0 l7? 0 0 0 1 1
l8? 1 1 1 0 0
d
1 9,5 4 7 2 3,6,8
l1? 1 1 1 1 1 1 0 0 0
l2? 0 1 1 1 1 1 1 1 1
l3? 0 1 1 1 1 1 0 0 0
l6? 0 0 0 1 1 0 0 0 0
l7? 0 0 0 0 1 1 0 0 0
l8? 0 1 1 1 0 0 0 0 0
21
Algorithm for C1PJoining Components Together
6 3 8 l4? 0 1 1
l5? 1 1 0
?
1 9,5 4 7 2 6 3
8 l1? 1 1 1 1 1 1 0
0 0 l2? 0 1 1 1 1 1
1 1 1 l3? 0 1 1 1 1
1 0 0 0 l6? 0 0 0 1
1 0 0 0 0 l7? 0 0 0 0
1 1 0 0 0 l8? 0 1 1
1 0 0 0 0 0 l4? 0
0 0 0 0 0 0 1 1 l5?
0 0 0 0 0 0 1 1 0
a
ß
d
?
22
Algorithm for C1PJoining Components Together
  • Complexity
  • Topological sorting O(nm)
  • Preprocessing takes at most O(nm),
  • e.g. store for each row the column where its
    leftmost 1 is
  • Total time O(nm).

23
Approximation for Hybridization Mapping with
Errors
  • 0 1 1 0 1 1 1 1 0 0
  • a false negation
  • separate two blocks of 1s, creating another gap
  • Approach find a permutation where the total
    number of gaps in the matrix is minimum.

24
Approximation - Graph Model
Gap minimization is equivalent to solving
traveling salesman problem (TSP).
TABLE 5.3 A clonesprobes matrix with added
column p6.
p1 p2 p3 p4 p5 P6
c1 1 1 1 0 0 0
c2 0 1 1 1 0 0
c3 1 0 0 1 1 0
c4 1 1 1 1 0 0
25
Approximation - Graph Model
p1 p2 p3 p4 p5 P6
c1 1 1 1 0 0 0
c2 0 1 1 1 0 0
c3 1 0 0 1 1 0
c4 1 1 1 1 0 0
p1
3
2
2
2
3
P6
p2
2
1
0
3
4
p5
p3
The weight on each edge of G is the number of
rows where the two corresponding columns differ.
4
2
3
2
2
p4
FIGURE 5.10 TSP graph for matrix of Table 5.3.
26
Approximation - Graph Model
  • a gap a transition from 1 to 0 and further on a
    transition from 0 to 1.
  • -two transitions for each gap, each gap
    contributes 2 to the weight of the cycle.
  • extremal transitions transitions between
    elements in extremal (1 or m) column.
  • -include an extra column of zeros in column
    m1 to ensure every row has a pair of extremal
    transitions. prevent consecutive 1s to wrap
    around in each row.

27
Approximation - Graph Model
  • -Relationship between cycles and permutations
  • Cycle weight number of gap transitions 2n
  • For a given n, minimizing cycle weight is the
    same as minimizing the number of gaps.
  • -Drawback one or a few rows may have many gaps,
    while others may have none. One clone was subject
    to many more errors than other clones, and this
    contradicts laboratory experience.
  • -Solution minimizing the number of gaps per row.

28
Approximation - Guarantee
-Assumptions a. The number of probes is
sufficiently large. b. The mapping process obeys
a certain mathematical model. -Features a. Each
clones position is an independent random
variable, clone locators are distributed
uniformly over 0, N-1. b. Occurrences of a
given probe obey a Poisson process with rate ?.
Pra given probe occurs k times in a given
clone e-??k/k!.
29
Approximation - Guarantee
TSP permutation is a good approximation to the
true permutation. Prove in terms of graph
weights or clone distances. tij lj li
rj-ri 2lj-li tij true distance clones
coordinates l (left), r (right) hij Hamming
distance between clones i and j. Given any four
clones i, j, r, and s, hij lt hrs implies tij lt
trs tij lt trs implies hij lt hrs.
30
Approximation Computational Practice
  • Define hybridization graph H as a bipartite graph
    (U, V, E)
  • Clones are the vertices of the U partition
  • Probes are the vertices of the V partition
  • There is an edge between two vertices if the
    corresponding probe hybridized to the
    corresponding clone.

31
Approximation Computational Practice
TABLE 5.3 A clonesprobes matrix with added
column p6.
p1
p2
p3
p4
p5
p1 p2 p3 p4 p5 P6
c1 1 1 1 0 0 0
c2 0 1 1 1 0 0
c3 1 0 0 1 1 0
c4 1 1 1 1 0 0
c1
c2
c3
c4
FIGURE 5.11 Hybridization graph H corresponding
to hybridization matrix from Table 5.3, without
the added column.
32
Approximation Computational Practice
  • Observations
  • a. H may not be connected, not be able to tell
    the relative order between probes that belong to
    different components.
  • b. Connected component may be as simple as a
    singleton vertex. No hybridization - 0 in Column.
  • c. Redundant probes, or probes that hybridize to
    exactly the same set of clones - same 1s and 0s
    in columns.

33
Approximation Computational Practice
  • Evaluation of a mapping algorithm is a difficult
    task. The fraction of strong adjacencies is used
    to measure a mapping algorithm.
  • -Strong adjacencies the number b of blocks of
    consecutive 1s present in a hybridization matrix
    with a given probe permutation p p1, p2, , pm.
  • -Translocations operations that reverse the
    order of a set of consecutive probes.
  • Two adjacent probes pi and pi1 represent a
    strong adjacency if placing these probes apart by
    any translocation increases b in each row.

34
Approximation Computational Practice
Strong adjacency cost 100(1/m-1?di) di 1, if
pi and pi1 is a strong adjacency in the true
permutation but these probes are not adjacent in
the proposed permutation. d i 0, otherwise.

35
Approximation Computational Practice
TABLE 5.4 Strong adjacency costs for two
algorithms on matrices with different kinds of
errors. Error rates are indicates in the heading
of each column (only one type of error per
column). Coverage in all cases is 10, where
coverage is the ratio between the total length
of all clones and target DNA length.
C1P 0 Chimerism 0.5 False Positives 0.04 False Negatives 0.32
Greedy TSP 1.9 0.9 16.0 28.3
Random 86.4 89.7 94.4 94.9
36
REFERENCES 1. Sections 5.3 and 5.4 in our
textbook Introduction to Computational
Molecualar Biology, Setubal/Meidanis, 1997. 2. On
the Complexity of DNA Physical Mapping, Martin
Charles Golumbic, Haim Kaplan and Ron Shamir,
Advances in Applied Mathematics 15, 251-261
(1994).
37
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com