Title: Costaware synthesis of asynchronous circuits based on partial acknowledgement
1Cost-aware synthesis of asynchronous circuits
based on partial acknowledgement
- Yu Zhou, Danil Sokolov and Alex Yakovlev
- University of Newcastle upon Tyne
2Contents
- Motivation of this research
- Overall design approach and background in
self-timed designs using delay-insensitive (DI)
codes - What is partial acknowledgement
- Area reduction of asynchronous circuits using
partial acknowledgement - Design flow 1 a unate covering problem (UCP)
- Design flow 2 a binate covering problem (BCP)
- Experimental results
- Conclusion and future work
3Motivation
- Solve the problem of poor CAD tool support for
asynchronous circuit design by reusing the
synchronous logic synthesis tools in our design
flows. - Reduce the area of an asynchronous implementation
in a cost-aware manner by employing the idea of
partial acknowledgement in the two design flows. - Speed up an asynchronous implementation and
design the speed-independent circuits respecting
the real data flow. (current work) - Locate the wires in an asynchronous
implementation, during the synthesis process,
whose routing needs special attention.
4General Design Approach
- The input of the design is the structural
description of a synchronous circuit the
single-rail (SR) netlist. - The output of the design flow is the self-timed
implementation of the input synchronous circuit
by mapping each gate in the SR netlist to the
dual-rail (DR) encoded functional module with the
equivalent function. - We use the return-to-zero (RTZ) protocol in both
design flows, where the wavefronts of the valid
code words and nulls flow alternately. - Design flow 1 (DF1) adopts the coarse-grain
functional modules to replace a gate in the SR
netlist while design flow 2 (DF2) the fine-grain
ones to further reduce the area.
5Backgrounds Dual-rail, input-complete (IC) class
functional modules
- The delay-insensitive-minterm-synthesis (DIMS)
dual-rail encoding
2 input AND gate
6Synthesis of the basic functional modules the
input-complete (IC) class
- The null-convention-logic (NCL) implementation of
a 2-input AND gate based on threshold logic.
OR
(Threshold gate with weighted inputs)
7Synthesis of the basic functional modules the
input-complete (IC) class
- The IC class of functional modules synchronises
its inputs in both the valid code word and the
null wavefronts, i.e., the output becomes a valid
code word (null) until all the inputs become
valid code words (nulls). - Small timing assumption exists for certain
wires in the IC functional module, which requires
the end those wires settle down before the next
wavefront arrives. E.g., the dashed inputs should
satisfy this requirement when both inputs have
value 1 in the valid code word wavefront.
1
a
1
b
1
y
8Synthesis of the basic functional modules the
early-propagative (EP) class
- Different from the IC class, a valid code word
(null) can propagate to the output of an EP
functional module without waiting for other
inputs becoming valid code words (nulls).
Threshold logic
Static CMOS
9Synthesis of the basic functional modules the
early-propagative (EP) class
- A valid code word (null) can propagates to the
output of EP functional module without waiting
for other inputs becoming valid code words
(nulls).
0
a
0
y
10Implementing the asynchronous circuits using NCL-D
- NCL-D replaces each gate in the SR netlist with
the IC typed functional module. - Robust because no input and internal wires needs
verification for the timing closure - Large area because of the composing IC modules
- Slow speed because it synchronises the inputs to
each gate
A circuit example
NCL_D implementation of the example
11Implementing the asynchronous circuits using NCL-X
- Two composing parts
- Functional part by replacing each gate with the
corresponding EP module - Completion detection (CD) circuitry made up of OR
gates and C-element trees - Reduces the area in the functional part but has
more interconnects - Some inter-module wires need timing verification
(shown in dash lines) - Real data flows in the functional part but the
explicit CD circuitry still synchronise the input
and internal signals
NCL_X implementation of the example
12Partial acknowledgement
G1
b
DR- OR
G2
a
c
DR- AND
a is the dual rail input to G1 and G2, the
dual-rail encoded functional modules. b and c
are the dual-rail outputs of G1 and G2,
respectively.
13Partial acknowledgement in the rising phase
transition
G1
b
DR- OR
G2
a
c
DR- AND
14Partial acknowledgement in the falling phase
transition
G1
b
DR- OR
G2
a
c
DR- AND
15Revisiting NCL-D and NCL-X
- Every input to an IC-class functional module is
partially acknowledged by the modules output for
its rising and falling phase transitions. - None input to the EP-class functional module is
partially acknowledged by the modules output for
its rising or falling phase transition. - In NCL-D, both the rising and falling phase
transitions of an input (internal) circuit
variable is partially acknowledged by all the
functional modules it fans into. Therefore, no
timing verification is required for any
inter-module wires. - In NCL-X, both the rising and falling phase
transitions of an input (internal) circuit
variable is partially acknowledged by the CD
circuitry it fans into. Therefore, the wires
fanning into the EP functional modules in the
computational part of the circuit need timing
verification.
16Any other implementations of the circuit example?
YES!
- Both the IC and EP functional modules are used
for the mapping, where b and c are partially
acknowledged by e and g for both rising and
falling phase transitions, respectively. - Reduces the area compared with the previous
methods - Less number of wires required of timing
verification (shown in dash lines)
Intuitive implementation 1
17Design Objective
- Implement an asynchronous circuit from the
synchronous circuit by replacing each gate in the
SR netlist with certain type of the DR functional
module with the equivalent functionality. Find
the implementation with a minimum cost in area
(in terms of transistor count) under the
requirement that both the rising and falling
phase transitions of each input and internal
variable are partially acknowledged. - DF1 functional module prototype includes only IC
and EP classes. The design is formulated as a
UCP. - DF2 additional fine-grained functional modules
used for the prototypes and the design is
formulated as a BCP.
18Design flow 1
- A circuit variable is partially acknowledged by
the IC-class functional module it fans into and
all the partial acknowledge is congregated at the
inputs to the IC functional modules in the
circuit. - Formulation of DF1 as a UCP (in the form of the
constraint function, though the matrix form is
also available)
19DF1 Circuit example
G1
G2
G3
Circuit example
Exactly the intuitive Implementation!
Concurrent to our work in the design flow 1,
Cheoljoo Jeong and Steve Nowick also contributed
a similar solution to reduce the overhead of an
asynchronous implementation while maintaining its
robustness. Ref Optimization of robust
asynchronous circuits by local input completeness
relaxation , by Cheoljoo Jeong and Steve
Nowick, Columbia University, accepted by
ASP-DAC2007.
20Design flow 2
- We implement functional modules that can
partially acknowledge particular phase transition
(the rising( ), falling( ), or both rising and
falling ()). - Supplementation of these new prototypes enables a
more flexible design in tuning the circuits
area, performance and the robustness. - DF2 is formulated as a BCP problem. In its
formulation, the clause of a circuit variable is
the sum of all the functional modules that are
possible to partial acknowledge it. However the
constraint function of DF2 has extra terms that
are used to exclude the possibility of casting
the same gate by different prototypes. Binate
covering problem fits for our task. - We first introduce the synthesis methods used in
DF2.
21Design partially acknowledged functional modules
(1)
- Step 1Dual-rail expansion of the Boolean
function . - Step 2Apply Booles expansion theorem to
dual-rail outputs w.r.t - the selected inputs whose rising phase
transitions are to be partially - acknowledged.
22Design partially acknowledged functional modules
(2)
- Step 3 Connect the n typed transistors in the
pull-down network according to dual-rail
expansions in step 2. Partial acknowledgement of
the rising phase transitions are ensured. - Step 4 Implement the pull-up network of the
functional module. To acknowledge the falling
phase transitions of an input, cascade two
p-typed transistors controlled by its dual inputs
in the paths from Power to the dual-rail outputs.
Prototype of the pseudo- static functional module
23DF2 a BCP design example (1)
A circuit example
Covering table of the circuit example
24DF2 a BCP design example (2)
25Experimental results for DF1 and DF2
26Conclusions and future work
- Conclusions
- We construct a framework to synthesise the
asynchronous circuits based on the concept of
partial acknowledgement. - The area reductions of DF1 are 25, 15, and 15
compared with NCL-D when implemented by DIMS,
Reduced direct logic (RDL) and the threshold
logics, respectively. Area reduction of DF2 is
28 compared with NCL-X. - The verification work of DF1 and DF2 reduces by
78 and 67 compared with NCL-X, respectively. - Future Work
- Partial acknowledgement hurdles the
input-dependent data flows in a circuit. Our aim
is to understand the influence through timing
analysis. We propose to apply the idea of latency
non-increasing partial acknowledgement to design
a speed-independent circuit respecting the real
data flows limited solely by the circuits logic.
27Thank you!