Engineering a Scalable Placement Heuristic for DNA Probe Arrays - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Engineering a Scalable Placement Heuristic for DNA Probe Arrays

Description:

Array Manufacturing Process. Very Large-Scale Immobilized Polymer Synthesis: ... TG. GA. Optimum placement. AC. CT. TG. GA. 1. 1. 1. 1. 1. 1. 1. 1. Nucleotide ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 46
Provided by: csG7
Learn more at: http://www.cs.gsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Engineering a Scalable Placement Heuristic for DNA Probe Arrays


1
Engineering a Scalable Placement Heuristic for
DNA Probe Arrays
  • A.B. Kahng, I.I. Mandoiu, P. Pevzner,
  • S. Reda (all UCSD), A. Zelikovsky (GSU)

2
Outline
  • DNA probe arrays and unwanted illumination
  • Synchronous array design (2-D placement)
  • Asynchronous array design (3-D placement)
  • Experimental results
  • Extensions
  • Conclusions

3
Outline
  • DNA probe arrays and unwanted illumination
  • Synchronous array design (2-D placement)
  • Asynchronous array design (3-D placement)
  • Experimental results
  • Extensions
  • Conclusions

4
DNA Probe Arrays
  • Used in wide range of genomic analyses
  • Gene expression monitoring, SNP mapping,
    sequencing by hybridization,
  • Arrays with up to 1000x1000 probes in commercial
    use, 108 probes envisioned for next generation
    arrays
  • Highly scalable algorithms required for array
    design

5
Simplified DNA Array Flow
Probe Selection
Mask Design Placement Embedding
Mask Manufacturing

Array Manufacturing
Soft/Computational Domain
Hybridization Experiment

Analysis of Hybridization Intensities
Hard/Biochemistry Domain
Gene sequences, position of SNPs, etc.
6
Array Manufacturing Process
  • Very Large-Scale Immobilized Polymer Synthesis
  • Treat substrate with chemically
    protected linker molecules,
    creating rectangular array
  • Site size approx. 10x10 microns
  • Selectively expose array sites to light
  • Light deprotects exposed molecules,
    activating further synthesis
  • Flush chip surface with solution of
    protected A,C,G,T
  • Binding occurs at previously
    deprotected sites
  • Repeat steps 23 until desired
    probes are synthesized

7
Photo-Deprotection Step
Our concern diffraction ?unwanted illumination
?yield decrease
8
Probe Synthesis
9
Measuring Unwanted Illumination
Unwanted illumination ? border length
10
Synchronous vs. Asynchronous Synthesis

(a) periodic deposition sequence
(b) Synchronous embedding of CTG
(c) Asynchronous leftmost embedding of CTG
(d) Another asynchronous embedding
11
Outline
  • DNA probe arrays and unwanted illumination
  • Synchronous array design (2-D placement)
  • Asynchronous array design (3-D placement)
  • Experimental results
  • Extensions
  • Conclusions

12
Problem Formulation (Synchronous Case)
  • Synchronous Array Design (2-D Placement) Problem
  • Minimize placement cost of Hamming graph H
  • (vertices probes, distance Hamming)
  • On 2-dimensional grid graph G2 (N x N array,
    edges b/w distance 1 neighbors)

13
2-D Placement Lower Bound
  • Sum of Hamming distances to 4 closest neighbors
    minus weight of 4N heaviest arcs

14
TSP1-Threading Placement
  • Hannenhalli,Hubbell,Lipshutz, Pevzner02
  • Place the probes according to
  • 1-Threading
  • Further decreases total border by 20
  • Hubbell 90s
  • Find TSP tour/path over given probes w.r.t.
    Hamming distance
  • Thread TSP path in the grid row by row

15
Lexicographical Sorting 1-Threading
16
Matching Based Probe Placement
Runtime roughly proportional to square of
independent set size
17
Sliding Window Matching
Iterate SlidingWindowMatching over the chip until
improvement drops below 0.1
There is a trade-off between solution quality and
size/overlap of windows
18
Effect of Window Size on Solution Quality
Increased window size/overlap decreases number of
conflicts, but increases runtime
19
Epitaxial Placement Algorithm
  • Simulates crystal-growth
  • Start with arbitrary probe placed at center
  • Maintain a best probe-candidate (i.e, a probe
    with min number of conflicts to the already
    placed neighbors) for each border site
  • Iteratively fill the border site with minimum
    increase in border length
  • - give priority to sites with more neighbors
    filled

20
Tile- and Row- Epitaxial
  • Tile-epitaxial
  • Divide array into 100x100 tiles
  • Run Epitaxial within each tile
  • Take into account border of already placed tiles
  • Row-epitaxial
  • Place probes by a fast method, e.g.,
    sort1-thread
  • Re-place probes row by row, sequentially filling
    sites within a row
  • Assign to each site a probe with min number of
    conflicts among the unplaced probes from
    following K rows

21
2-D Placement Algorithm Comparison Border
Conflict

22
2-D Placement Algorithm Comparison Runtime

23
Outline
  • DNA probe arrays and unwanted illumination
  • Synchronous array design (2-D placement)
  • Asynchronous array design (3-D placement)
  • Experimental results
  • Extensions
  • Conclusions

24
Problem Formulation (Asynchronous Case)
  • Asynchronous synthesis
  • Periodic nucleotide deposition sequence, e.g.,
    (ACTG)p
  • Every probe grows asynchronously
  • ? Border length Hamming distance between
    embedded probes
  • Asynchronous Array (3-D Placement) Design
    Problem
  • Minimize placement cost of embedded-probe Hamming
    graph H
  • (verticesprobes, distance Hamming b/w embedded
    probes)
  • on 2-dimensional grid graph G2 (N x N array,
    edges b/w neighbors)

25
Lower Bound
  • Sum of distances to 4 closest neighbors minus
    weight of 4N heaviest arcs
  • Distance between two probes of length p 2p -
    Longest Common Subsequence
  • Non-tight bound example with LB 8 and best
    placement cost 10

1
(c)
AC
GA
1
A
A
1
1
1
1
G
1
G
G
CT
TG
Nucleotide deposition sequence SACTGA
1
T
T
T
AC
GA
C
C
C
CT
TG
A
A
Optimum placement
26
Optimal Probe Alignment
  • Find best alignment of probe wrt embedded
    neighbors
  • Dynamic Programming
  • Source-sink paths corresponds to feasible
    embeddings
  • O(probe length) x (deposition sequence length)
  • Can be extended to simultaneous alignment of two
    adjacent probes (2x1) with increase by O(probe
    length)

A
C
G
T
A
C
G
T
A
C
T
27
3-D Placement Flows
  • Simultaneous placement and alignment
  • asynchronous epitaxial (slow and low quality)
  • Synchronous placement followed by in-place probe
    alignment (analogous to standard for VLSI flow
    partition)
  • using previous DP to do in-place probe alignment
  • Synchronous placement followed by probe alignment
    with reshuffle (analogous to feedback loops in
    VLSI flows)
  • asynchronous sliding window matching

28
Algorithms for In-Place Probe Alignment
  • Asynchronous re-embedding after 2-dim placement
  • Greedy Algorithm
  • While there exist probes to re-embed with gain
  • Optimally re-embed the probe with the largest
    gain
  • Batched greedy speed-up by avoiding
    recalculations
  • Chessboard Algorithm
  • While there is gain
  • Re-embed probes in green sites
  • Re-embed probes in red sites

29

Comparison of In-Place Probe Alignments
Chip size LB TSP1Thr Greedy Greedy Chessboard Chessboard 2x1 Chessboard 2x1 Chessboard
Chip size LB LB LB CPU LB CPU LB CPU
100 100 152.0 125.7 40 120.5 54 119.4 480
200 100 150.2 126.3 154 120.9 221 119.7 1915
300 100 149.1 126.7 357 121.5 522 121.6 4349
500 100 147.9 127.1 943 121.4 1423 120.2 15990
  • Post-placement LB sum of distances to adjacent
    probes
  • Distance between two probes of length p 2p -
    LCS
  • Useful for assessing quality of algorithms that
    change probe embeddings but do not change probe
    placement

30
Outline
  • DNA probe arrays and unwanted illumination
  • Synchronous array design (2-D placement)
  • Asynchronous array design (3-D placement)
  • Experimental results
  • Extensions
  • Conclusions

31
3-D vs. 2-D Placement Results
Chip size TSP1Thr TSP1Thr Chessboard TSP1Thr Chessboard Epitaxial Chessboard Epitaxial Chessboard SyncSWM Chessboard SyncSWM Chessboard AsyncSWM AsyncSWM
Chip size Cost Cost CPU Cost CPU Cost CPU Cost CPU
100 554849 439829 113 419069 274 433274 1 417890 875
200 2140903 1723352 1901 1624988 4441 1693658 46 1636658 3676
300 4667882 3801765 12028 --- --- 3746722 112 3615282 8406
500 12702474 10426237 109648 --- --- 10049442 302 9686918 22351
1000 --- --- --- --- --- 38898792 1307 38005039 54501
32
3-D Placement Algorithm Comparison Border
Conflict

33
3-D Placement Algorithm Comparison Runtime

34
Outline
  • DNA probe arrays and unwanted illumination
  • Synchronous array design (2-D placement)
  • Asynchronous array design (3-D placement)
  • Experimental results
  • Extensions
  • Conclusions

35
Practical Extensions
  • Distant-dependent border conflict weights
  • Take into account conflicts between 2-,3-hop
    neighbors rather than only immediate neighbors
  • Position-dependent border conflict weights
  • In alignment DP for two sequences take into
    account importance of conflicts in the middle of
    probes alignment cost has weights on conflicts
    which depend on conflict position
  • Polymorphic probes
  • Chip contains SNPs, e.g. pairs of probes
    different in a single position they should be
    placed together and alignment DP should align
    them simultaneously

36
Alignment DP for 2-SNPs
Optimal Embedding of AC,TT
37
Simplified DNA Array Flow
Probe Selection
Mask Design Placement Embedding
Mask Manufacturing

Array Manufacturing
Soft/Computational Domain
Hybridization Experiment

Analysis of Hybridization Intensities
Hard/Biochemistry Domain
Gene sequences, position of SNPs, etc.
38
Enhanced DNA Array Design Flow
Probe Selection
Mask Design Placement Embedding
39
Enhanced DNA Array Design Flow
Probe Selection
Probe Pools
Mask Design Placement Embedding
40
Enhanced DNA Array Design Flow
Probe Selection
Probe Pools
Deposition Mask Design
Mask Design Placement Embedding
41
Enhanced DNA Array Design Flow
Probe Selection
Design Rules Parameters
Probe Pools
Deposition Mask Design
Mask Design Placement Embedding
42
Enhanced DNA Array Design Flow
Probe Selection
Design Rules Parameters
Probe Pools
Deposition Mask Design
Conflict Map
Mask Design Placement Embedding
43
Enhanced DNA Array Design Flow
Probe Selection
Design Rules Parameters
Probe Pools
Deposition Mask Design
Test/Control Structure Design
Conflict Map
Mask Design Placement Embedding
44
Summary
  • Contributions
  • Epitaxial placement ? reduces by extra 10 over
    the previously best known method
  • Asynchronous placement problem formulation
  • Postplacement improvement by extra 15.5-21.8
  • Lower bounds
  • Scalable Placements (1000x1000 in 20min)
  • Ongoing work
  • Comparison on industrial benchmarks
  • Experiments with algorithms for extended
    formulations (SNPs, distance-dependent weights,
    etc.)
  • Future Directions
  • Design flow enhancements
  • Nucleotide deposition sequence design
  • Partitioning and integration for manufacturing
    cost reduction

45
  • Thank you!
Write a Comment
User Comments (0)
About PowerShow.com