Title: PostLayout Logic Optimization of Domino Circuits
1Post-Layout Logic Optimizationof Domino Circuits
- Aiqun Cao and Cheng-Kok Koh
- School of Electrical and Computer Engineering
- Purdue University
This work was supported in part by NSF under
contract number CCR-9984553
2Outline
- Introduction
- Timing constraint on Domino circuit
- Logic optimization
- Post-layout optimization
- Experimental results
- Conclusions
3Introduction
- Domino logic synthesis
- Only non-inverting logic to be implemented
- Imperative binate to unate logic network
transformation - Inverters pushed toward primary inputs by
applying DeMorgans law (bubble pushing) - Inverters trapped at intermediate fan-out nets
- Logic duplication applied to remove trapped
inverters - Size of circuit doubled in the worst case
- High area and power penalties incurred
4Trapped inverters
Trapped inside logic reconvergent paths
Trapped outside logic reconvergent paths
5Trapped inverters
Trapped inside logic reconvergent paths
Trapped outside logic reconvergent paths
6Previous work
- Previous work on reducing logic duplication
costfor Domino logic synthesis - Assign proper phases to primary outputs to
minimize inverters trapped outside reconvergent
paths - Limited reduction on duplication cost
- Apply mixed static-Domino circuit to reduce
duplication cost caused by inverters inside
reconvergent paths - Allow inverters trapped in the logic network
- Trail Domino logic with static CMOS logic
- Compromise circuit performance
7Our work
- Minimize logic duplication cost in pureDomino
logic circuit - Satisfy a certain timing constraint
- Allow inverters trapped inside reconvergent paths
- Reduce area and power substantially
8Early-Late Delay Difference Bound(ELDDB)
constraint
9Early-Late Delay Difference Bound(ELDDB)
constraint
10Early-Late Delay Difference Bound(ELDDB)
constraint
- Cause false discharge of P !
11Early-Late Delay Difference Bound(ELDDB)
constraint
12Candidate gate
- Candidate gate
- Must be an AND gate
- Fan-ins satisfy the ELDDB constraint
13Candidate gate
- Candidate gate
- Must be an AND gate
- Fan-ins satisfy the ELDDB constraint
14Candidate gate
15Conflict
16Conflict
Conflict
Candidate gates
17Output phase assignment
- Problem
- Search for the optimal output phase
assignmentsuch that logic duplication cost is
minimized - Formulation
- Construct an incompatibility graph G(V,E)
- A vertex in V corresponds to a pair of
reconvergent paths with a weight equal to the
duplication cost of the fan-in cone of the
corresponding reconvergent paths - An edge in E connects two incompatible vertices
- Search for the least weight vertex cover
- search for the optimal output phase assignment
- Solution
- Branch and bound algorithm
18Robustness
- Question
- No accurate timing info at logic level
- Robustness ?
- Solution
- Delay the logic duplication minimization
untilpost-layout with accurate timing info
19Post-layout optimization
- Logic level preprocessing step
- Apply the output phase assignment synthesis
scheme to produce maximum potential candidates - with estimated timing info
- Layout
- Place and route the circuit with fan-in cones of
all reconvergent paths duplicated - Post-layout optimization
- Post-layout timing analysis
- Eliminate the duplicated fan-in cones of real
candidates - Satisfy the ELDDB constraint after elimination
- without layout compaction
- with layout compaction
20Logic level preprocessing Output phase assignment
Initial placement routing Post-layout timing
analysis
Real candidate cells
SE Simultaneous Elimination without compaction
IEC Iterative Elimination with Compaction
End
21Logic level preprocessing Output phase assignment
Initial placement routing Post-layout timing
analysis
Real candidate cells
SE Simultaneous Elimination without compaction
IEC Iterative Elimination with Compaction
End
22Simultaneous Elimination without compaction (SE)
- Remove redundant duplicated fan-in cones forall
real candidates simultaneously - Keep empty space after elimination without
affecting other part of the layout - Maintain timing behavior for the majorityof the
circuit - Reduce power consumption
23Logic level preprocessing Output phase assignment
Initial placement routing Post-layout timing
analysis
Real candidate cells
SE Simultaneous Elimination without compaction
IEC Iterative Elimination with Compaction
End
24Iterative Elimination with Compaction (IEC)
Initial layout
Eliminate the duplicated fan-in cone of one
candidate
Scale and legalize placement
Route
Yes
Constraints satisfied?
No
Yes
Restore previous valid layout
More candidates?
No
End
25Iterative Elimination with Compaction (IEC)
Initial layout
Eliminate the duplicated fan-in cone of one
candidate
Scale and legalize placement
Route
Yes
Constraints satisfied?
No
Yes
Restore previous valid layout
More candidates?
No
End
26Placement scaling and legalization
- Placement scaling
- Scale placement area after eliminating redundant
cells - Relocate each cell by scaling its centers
coordinates according to the scaling ratio of the
area - Cause cells overlapping
- Placement legalization
- Maintain signal timing behavior by keeping
circuit topology - Adopt a dynamic programming based
row-by-rowlegalization approach in Fengshui
Agnihotri et al, ICCAD03 - Minimize displacement for cells after legalization
27Experiments
- Domino cells library
- up-to-three-input AND gates
- up-to-six-input OR gates
- Each cell has a footer and a half-latch keeper
- Add an output pin for the internal dynamicnode
of each cell - QPLACE and WROUTE for initial placement and
routing - Post-layout simulations by NanoSim and PathMill
- The ELDDB value B is set to 2ns
- Experiments on ten ISCAS benchmark circuits
28Power comparison
DUP 1.0 SYN 1.0 SE 0.70 IEC 0.79
29Area comparison
DUP 1.0 SYN 0.99 SE 0.99 IEC 0.81
30Critical path delay comparison
DUP 1.0 SYN 0.99 SE 0.98 IEC 0.91
31Conclusions
- Propose a synthesis scheme for Domino logicto
reduce the duplication cost caused
byreconvergent paths - Present two post-layout optimization approaches
to guarantee the robustness of the synthesis
scheme - Achieve significant reductions in power and
area,and may improve delay as a by-product