Title: Engineering a Scalable Placement Heuristic for DNA Probe Arrays
1Engineering a Scalable Placement Heuristic for
DNA Probe Arrays
- A.B. Kahng, I.I. Mandoiu, P. Pevzner,
- S. Reda (all UCSD), A. Zelikovsky (GSU)
2Outline
- DNA probe arrays and unwanted illumination
- Synchronous array design (2-D placement)
- Asynchronous array design (3-D placement)
- Experimental results
- Extensions
- Conclusions
3Outline
- DNA probe arrays and unwanted illumination
- Synchronous array design (2-D placement)
- Asynchronous array design (3-D placement)
- Experimental results
- Extensions
- Conclusions
4DNA Probe Arrays
- Used in wide range of genomic analyses
- Gene expression monitoring, SNP mapping,
sequencing by hybridization, - Arrays with up to 1000x1000 probes in commercial
use, 108 probes envisioned for next generation
arrays - Highly scalable algorithms required for array
design
5Simplified DNA Array Flow
Probe Selection
Mask Design Placement Embedding
Mask Manufacturing
Array Manufacturing
Soft/Computational Domain
Hybridization Experiment
Analysis of Hybridization Intensities
Hard/Biochemistry Domain
Gene sequences, position of SNPs, etc.
6Array Manufacturing Process
- Very Large-Scale Immobilized Polymer Synthesis
- Treat substrate with chemically
protected linker molecules,
creating rectangular array - Site size approx. 10x10 microns
- Selectively expose array sites to light
- Light deprotects exposed molecules,
activating further synthesis - Flush chip surface with solution of
protected A,C,G,T - Binding occurs at previously
deprotected sites - Repeat steps 23 until desired
probes are synthesized
7Photo-Deprotection Step
Our concern diffraction ?unwanted illumination
?yield decrease
8Probe Synthesis
9Measuring Unwanted Illumination
Unwanted illumination ? border length
10Synchronous vs. Asynchronous Synthesis
(a) periodic deposition sequence
(b) Synchronous embedding of CTG
(c) Asynchronous leftmost embedding of CTG
(d) Another asynchronous embedding
11Outline
- DNA probe arrays and unwanted illumination
- Synchronous array design (2-D placement)
- Asynchronous array design (3-D placement)
- Experimental results
- Extensions
- Conclusions
12Problem Formulation (Synchronous Case)
- Synchronous Array Design (2-D Placement) Problem
- Minimize placement cost of Hamming graph H
- (vertices probes, distance Hamming)
- On 2-dimensional grid graph G2 (N x N array,
edges b/w distance 1 neighbors)
132-D Placement Lower Bound
- Sum of Hamming distances to 4 closest neighbors
minus weight of 4N heaviest arcs
14TSP1-Threading Placement
- Hannenhalli,Hubbell,Lipshutz, Pevzner02
- Place the probes according to
- 1-Threading
- Further decreases total border by 20
- Hubbell 90s
- Find TSP tour/path over given probes w.r.t.
Hamming distance - Thread TSP path in the grid row by row
15Lexicographical Sorting 1-Threading
16Matching Based Probe Placement
Runtime roughly proportional to square of
independent set size
17Sliding Window Matching
Iterate SlidingWindowMatching over the chip until
improvement drops below 0.1
There is a trade-off between solution quality and
size/overlap of windows
18Effect of Window Size on Solution Quality
Increased window size/overlap decreases number of
conflicts, but increases runtime
19 Epitaxial Placement Algorithm
- Simulates crystal-growth
- Start with arbitrary probe placed at center
- Maintain a best probe-candidate (i.e, a probe
with min number of conflicts to the already
placed neighbors) for each border site - Iteratively fill the border site with minimum
increase in border length - - give priority to sites with more neighbors
filled
20Tile- and Row- Epitaxial
- Tile-epitaxial
- Divide array into 100x100 tiles
- Run Epitaxial within each tile
- Take into account border of already placed tiles
- Row-epitaxial
- Place probes by a fast method, e.g.,
sort1-thread - Re-place probes row by row, sequentially filling
sites within a row - Assign to each site a probe with min number of
conflicts among the unplaced probes from
following K rows
21 2-D Placement Algorithm Comparison Border
Conflict
22 2-D Placement Algorithm Comparison Runtime
23Outline
- DNA probe arrays and unwanted illumination
- Synchronous array design (2-D placement)
- Asynchronous array design (3-D placement)
- Experimental results
- Extensions
- Conclusions
24Problem Formulation (Asynchronous Case)
- Asynchronous synthesis
- Periodic nucleotide deposition sequence, e.g.,
(ACTG)p - Every probe grows asynchronously
- ? Border length Hamming distance between
embedded probes - Asynchronous Array (3-D Placement) Design
Problem - Minimize placement cost of embedded-probe Hamming
graph H - (verticesprobes, distance Hamming b/w embedded
probes) - on 2-dimensional grid graph G2 (N x N array,
edges b/w neighbors)
25Lower Bound
- Sum of distances to 4 closest neighbors minus
weight of 4N heaviest arcs - Distance between two probes of length p 2p -
Longest Common Subsequence - Non-tight bound example with LB 8 and best
placement cost 10
1
(c)
AC
GA
1
A
A
1
1
1
1
G
1
G
G
CT
TG
Nucleotide deposition sequence SACTGA
1
T
T
T
AC
GA
C
C
C
CT
TG
A
A
Optimum placement
26 Optimal Probe Alignment
- Find best alignment of probe wrt embedded
neighbors - Dynamic Programming
- Source-sink paths corresponds to feasible
embeddings - O(probe length) x (deposition sequence length)
- Can be extended to simultaneous alignment of two
adjacent probes (2x1) with increase by O(probe
length)
A
C
G
T
A
C
G
T
A
C
T
273-D Placement Flows
- Simultaneous placement and alignment
- asynchronous epitaxial (slow and low quality)
- Synchronous placement followed by in-place probe
alignment (analogous to standard for VLSI flow
partition) - using previous DP to do in-place probe alignment
- Synchronous placement followed by probe alignment
with reshuffle (analogous to feedback loops in
VLSI flows) - asynchronous sliding window matching
28Algorithms for In-Place Probe Alignment
- Asynchronous re-embedding after 2-dim placement
- Greedy Algorithm
- While there exist probes to re-embed with gain
- Optimally re-embed the probe with the largest
gain - Batched greedy speed-up by avoiding
recalculations - Chessboard Algorithm
- While there is gain
- Re-embed probes in green sites
- Re-embed probes in red sites
29 Comparison of In-Place Probe Alignments
Chip size LB TSP1Thr Greedy Greedy Chessboard Chessboard 2x1 Chessboard 2x1 Chessboard
Chip size LB LB LB CPU LB CPU LB CPU
100 100 152.0 125.7 40 120.5 54 119.4 480
200 100 150.2 126.3 154 120.9 221 119.7 1915
300 100 149.1 126.7 357 121.5 522 121.6 4349
500 100 147.9 127.1 943 121.4 1423 120.2 15990
- Post-placement LB sum of distances to adjacent
probes - Distance between two probes of length p 2p -
LCS - Useful for assessing quality of algorithms that
change probe embeddings but do not change probe
placement
30Outline
- DNA probe arrays and unwanted illumination
- Synchronous array design (2-D placement)
- Asynchronous array design (3-D placement)
- Experimental results
- Extensions
- Conclusions
313-D vs. 2-D Placement Results
Chip size TSP1Thr TSP1Thr Chessboard TSP1Thr Chessboard Epitaxial Chessboard Epitaxial Chessboard SyncSWM Chessboard SyncSWM Chessboard AsyncSWM AsyncSWM
Chip size Cost Cost CPU Cost CPU Cost CPU Cost CPU
100 554849 439829 113 419069 274 433274 1 417890 875
200 2140903 1723352 1901 1624988 4441 1693658 46 1636658 3676
300 4667882 3801765 12028 --- --- 3746722 112 3615282 8406
500 12702474 10426237 109648 --- --- 10049442 302 9686918 22351
1000 --- --- --- --- --- 38898792 1307 38005039 54501
32 3-D Placement Algorithm Comparison Border
Conflict
33 3-D Placement Algorithm Comparison Runtime
34Outline
- DNA probe arrays and unwanted illumination
- Synchronous array design (2-D placement)
- Asynchronous array design (3-D placement)
- Experimental results
- Extensions
- Conclusions
35Practical Extensions
- Distant-dependent border conflict weights
- Take into account conflicts between 2-,3-hop
neighbors rather than only immediate neighbors - Position-dependent border conflict weights
- In alignment DP for two sequences take into
account importance of conflicts in the middle of
probes alignment cost has weights on conflicts
which depend on conflict position - Polymorphic probes
- Chip contains SNPs, e.g. pairs of probes
different in a single position they should be
placed together and alignment DP should align
them simultaneously
36Alignment DP for 2-SNPs
Optimal Embedding of AC,TT
37Simplified DNA Array Flow
Probe Selection
Mask Design Placement Embedding
Mask Manufacturing
Array Manufacturing
Soft/Computational Domain
Hybridization Experiment
Analysis of Hybridization Intensities
Hard/Biochemistry Domain
Gene sequences, position of SNPs, etc.
38Enhanced DNA Array Design Flow
Probe Selection
Mask Design Placement Embedding
39Enhanced DNA Array Design Flow
Probe Selection
Probe Pools
Mask Design Placement Embedding
40Enhanced DNA Array Design Flow
Probe Selection
Probe Pools
Deposition Mask Design
Mask Design Placement Embedding
41Enhanced DNA Array Design Flow
Probe Selection
Design Rules Parameters
Probe Pools
Deposition Mask Design
Mask Design Placement Embedding
42Enhanced DNA Array Design Flow
Probe Selection
Design Rules Parameters
Probe Pools
Deposition Mask Design
Conflict Map
Mask Design Placement Embedding
43Enhanced DNA Array Design Flow
Probe Selection
Design Rules Parameters
Probe Pools
Deposition Mask Design
Test/Control Structure Design
Conflict Map
Mask Design Placement Embedding
44 Summary
- Contributions
- Epitaxial placement ? reduces by extra 10 over
the previously best known method - Asynchronous placement problem formulation
- Postplacement improvement by extra 15.5-21.8
- Lower bounds
- Scalable Placements (1000x1000 in 20min)
- Ongoing work
- Comparison on industrial benchmarks
- Experiments with algorithms for extended
formulations (SNPs, distance-dependent weights,
etc.) - Future Directions
- Design flow enhancements
- Nucleotide deposition sequence design
- Partitioning and integration for manufacturing
cost reduction
45