Title: UC Riverside Talk
1Engineering a Scalable
Placement Heuristic for DNA Arrays
A.B. Kahng (UCSD) P. Pevzner (UCSD) S. Reda
(UCSD) A. Zelikovsky (GSU)
2Outline
- DNA probe arrays and unwanted illumination
- Synchronous array design (2-D placement)
- Asynchronous array design (3-D placement)
- Experimental results
- Extensions
- Conclusions
3DNA Arrays
- DNA arrays
- Short DNA probes bound to a glass substrate
- Detect matching single-strand DNA molecules
- Growing number of applications
- Diagnosis of genetically based conditions
- Point of care diagnosis (low-cost, real-time)
- Targeted treatment
- E.g., antibiotic sensitivity
- Drug discovery
- Sequencing, genotyping, gene expression
monitoring - Agricultural research, environmental impact,
bio-warfare agents,
4Scaling Trends and Challenges
- Smaller is better for DNA arrays
- Reduced reagent consumption
- Higher reaction speed
- Higher parallelism
- but brings challenging system complexity new
dominant physical effects - ½ million probes / array ? 100 million probes /
next generation array - Unwanted illumination caused by light diffraction
- ? Emerging DNA array design automation field
- Need scalable design tools, mature methodologies
- Great potential for transfer of techniques and
methodologies from VLSI design automation
5Array Manufacturing Process
- Very Large-Scale Immobilized Polymer Synthesis
- Treat substrate with chemically
protected linker molecules,
creating rectangular array - Site size approx. 10x10 microns
- Selectively expose array sites to light
- Light deprotects exposed molecules,
activating further synthesis - Flush chip surface with solution of
protected A,C,G,T - Binding occurs at previously
deprotected sites - Repeat steps 23 until desired
probes are synthesized
6Unwanted Illumination Effect
- Unwanted illumination ? erroneous probes
- Effect gets worse with technology scaling
7Example Probe Synthesis
8Example Probe Synthesis
9Example Probe Synthesis
10Measure of Unwanted Illumination
Unwanted illumination ? border length
11Synchronous Synthesis
- Periodic deposition sequence, e.g., (ACTG)k
? border conflicts b/w adjacent probes 2 x
Hamming distance
122D Placement Problem
Edge cost 2 x Hamming distance
13Previous Approaches
- Hubbell 90s
- Find TSP w.r.t. Hamming dist
- Thread TSP to grid row by row
- TSP-based methods do not scale to gt 106 probes
- ? Transfer scalable techniques from VLSI
placement!
142D Placement Sliding-Window Matching
- Slide window over entire chip
- Repeat until improvement drops below certain
threshold
15Effect of Window Size
162D Placement Epitaxial Growth
- O(N3/2) row-order implementation, where N
probes
17Asynchronous Synthesis
- Probes grow at different speeds
- border conflicts b/w adjacent probes depends on
their embedding into the nucleotide deposition
sequence
? 3D placement problem
18Single-Probe Embedding
- Dynamic programming algorithm similar to LCS
19Post-Placement Embedding Optimization
- 2D placement fixed, allow only probe embeddings
to change
- Greedy optimally re-embed probe with largest
gain - Chessboard Algorithm alternate re-embedding of
red and green probes
20 Embedding Optimization Results
- Chessboard is 5-6 better than greedy
- Within 21 of lower-bound
21 Comparison of Placement Algorithms
Chip size 100x100 to 500x500
- SWM 600x faster (5 min. vs. 30 hours) with up to
4 border conflict decrease - 20 Row-Epitaxial 6-10 better than
TSPThreading, gt10x faster for 500x500
22Practical Extensions
- Distant-dependent border conflict weights
- Take into account conflicts between 2-,3-hop
neighbors rather than only immediate neighbors - Position-dependent border conflict weights
- In alignment DP for two sequences take into
account importance of conflicts in the middle of
probes alignment cost has weights on conflicts
which depend on conflict position - Perfect match/mismatch probes
- Pairs of probes that differ only in middle
position - Should be placed and aligned together
23Alignment DP for 2-SNPs
Optimal Embedding of AC,TT
24 Summary
- Contributions
- Epitaxial placement ? reduces by extra 10 over
the previously best known method - Asynchronous placement problem formulation
- Postplacement improvement by extra 15.5-21.8
- Lower bounds
- Scalable Placements (1000x1000 in 20min)
- Ongoing work
- Comparison on industrial benchmarks
- Experiments with algorithms for extended
formulations (SNPs, distance-dependent weights,
etc.)
25Summary
- Results demonstrate effectiveness of VLSI
placement techniques to DNA probe placement - Currently exploring other VLSI placement
techniques, e.g., recursive 4-way partition based
on linear-time clustering methods
- Algorithms validated on industry data
- Extended to handle practical constraints such as
control probes, match/mismatch probe pairs - 5 border length improvement over industry
placements - ? Improved design results in fewer erroneous
probes, smaller array area, and/or more probes
per array
26Simplified DNA Array Flow
Probe Selection
Probe Placement
Probe Alignment (Mask Design)
Mask Manufacturing
Array Manufacturing
Soft/Computational Domain
Hybridization Experiment
Analysis of Hybridization Intensities
Hard/Biochemistry Domain
Gene sequences, position of SNPs, etc.