Title: Timing Driven Gate Duplication: Complexity Issues and Algorithms
1Timing Driven Gate Duplication Complexity Issues
and Algorithms
- Ankur Srivastava, Ryan Kastner and Majid
Sarrafzadeh - Embedded Reconfigurable System Design ER-Group
- UCLA
2Motivation
- Need for new methodologies of delay improvement
in the light of the stringent timing constraint
that designers have - Gate duplication has been studied primarily for
cut-set minimization. Applicability of this
method for improving delay has not been studied
by the research community
3Load Dependent Delay Model (LDDM)
?i ?i
?i
i
?j
j
?(i) ?i ?i COUT wire-delays are assumed
to be zero
?j ?j
4Gate Duplication for Delay Improvement
A
C
B
r 2 ? 5
r 2 ? 5
r 2 ? 5
? 1 ? 1 ? 0.1
CD 15
D
r Input pin required time required time at
O/P - gate delay
r -14
CE 0.1
E
r -15.1
5Gate Duplication for Delay Improvement
C
B
A
r 2 ? 5
r 2 ? 5
r 2 ? 5
? 1 ? 1 ? 0.1
r -9
E
r -10.2
6Complexity Issues
- Theorem Global Gate Duplication is NP-Complete
in LDDM - MONO3SAT gets transformed to an instance of the
global problem - Theorem Local Gate Duplication is NP-Complete
- PARTITION problem gets transformed to an instance
of the local problem
7Complexity Issues (Comparison with Buffer
Insertion)
- Local Buffer Insertion Problem Polynomially
Solvable if the net topology is fixed. - Global Buffer Insertion Problem Polynomially
solvable if the delay model has same pin to pin
parameters - Situations in which buffer insertion is
polynomially solvable, Gate Duplication becomes
NP-Complete
8Algorithm for Gate Duplication
- Based on the structure of dynamic programming
- Applies duplication to all the gates in the
circuit. Hence works in the pro-active mode - Assumption The circuit has only single output
combinational gates.
9Algorithm for Gate Duplication
- Stage1 Traverse the network from POs to PIs in
the topological order evaluating tuples at every
step - Stage2 Now traverse the network from PI to PO in
topological order deciding the gates to be
duplicated - Stage3 Traverse the network from PO to PI
physically duplicating the gates
10Stage 1
i
Need to find the best duplication strategy of the
fanouts such that the input pin required time is
maximized
i
tup(i,g).dup.r_small tup(i,g).dup.r_large
tup(i,g).nodup
11Stage 1
i
Need to find the best duplication strategy of the
fanouts and the best fanout partitioning between
g and g such that the input pin required time is
maximized
i
tup(i,g).dup.r_small tup(i,g).dup.r_large
tup(i,g).nodup
12Stage 1
- NODUP Sort the fanouts and duplicate in that
order. (total n1 duplication strategies)
RESULT This Algorithm is optimal
g
g
13Stage 1
g
g
g
g
14Stage 2
- Stage2 Forward traversal in topo sorted order
1
0
15Stage 3
- Stage 3 Traverse the circuit backwards from PO
to PI, physically duplicating the gates
16Experimental Results
- The circuit was first optimized using
script.rugged of SIS followed by speed_up - Results obtained in two categories, one with
minimum delay technology mapping map -n 1, other
with minimum delay technology mapping with fanout
optimization map -n 1 -AFG
17Experimental Results (map -n 1)
18Experimental Results (map -n 1 -AFG)
19Conclusion
- We presented an algorithm for gate duplication
and showed its effectiveness in reducing circuit
delay, both with and without buffer insertion - We proved the local problem NP-Complete
- The future work would include the extension of
this algorithm in a layout driven framework.
20Timing Driven Gate Duplication Complexity Issues
and Algorithms
Ankur Srivastava, Ryan Kastner and Majid
Sarrafzadeh Embedded Reconfigurable System
Design ER-Group UCLA