Costaware synthesis of asynchronous circuits based on partial acknowledgement - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Costaware synthesis of asynchronous circuits based on partial acknowledgement

Description:

Large area because of the composing IC modules ... Two composing parts. Functional part by replacing each gate with the corresponding EP module ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 28
Provided by: N267550
Category:

less

Transcript and Presenter's Notes

Title: Costaware synthesis of asynchronous circuits based on partial acknowledgement


1
Cost-aware synthesis of asynchronous circuits
based on partial acknowledgement
  • Yu Zhou, Danil Sokolov and Alex Yakovlev
  • University of Newcastle upon Tyne

2
Contents
  • Motivation of this research
  • Overall design approach and background in
    self-timed designs using delay-insensitive (DI)
    codes
  • What is partial acknowledgement
  • Area reduction of asynchronous circuits using
    partial acknowledgement
  • Design flow 1 a unate covering problem (UCP)
  • Design flow 2 a binate covering problem (BCP)
  • Experimental results
  • Conclusion and future work

3
Motivation
  • Solve the problem of poor CAD tool support for
    asynchronous circuit design by reusing the
    synchronous logic synthesis tools in our design
    flows.
  • Reduce the area of an asynchronous implementation
    in a cost-aware manner by employing the idea of
    partial acknowledgement in the two design flows.
  • Speed up an asynchronous implementation and
    design the speed-independent circuits respecting
    the real data flow. (current work)
  • Locate the wires in an asynchronous
    implementation, during the synthesis process,
    whose routing needs special attention.

4
General Design Approach
  • The input of the design is the structural
    description of a synchronous circuit the
    single-rail (SR) netlist.
  • The output of the design flow is the self-timed
    implementation of the input synchronous circuit
    by mapping each gate in the SR netlist to the
    dual-rail (DR) encoded functional module with the
    equivalent function.
  • We use the return-to-zero (RTZ) protocol in both
    design flows, where the wavefronts of the valid
    code words and nulls flow alternately.
  • Design flow 1 (DF1) adopts the coarse-grain
    functional modules to replace a gate in the SR
    netlist while design flow 2 (DF2) the fine-grain
    ones to further reduce the area.

5
Backgrounds Dual-rail, input-complete (IC) class
functional modules
  • The delay-insensitive-minterm-synthesis (DIMS)

dual-rail encoding
2 input AND gate
6
Synthesis of the basic functional modules the
input-complete (IC) class
  • The null-convention-logic (NCL) implementation of
    a 2-input AND gate based on threshold logic.

OR
(Threshold gate with weighted inputs)
7
Synthesis of the basic functional modules the
input-complete (IC) class
  • The IC class of functional modules synchronises
    its inputs in both the valid code word and the
    null wavefronts, i.e., the output becomes a valid
    code word (null) until all the inputs become
    valid code words (nulls).
  • Small timing assumption exists for certain
    wires in the IC functional module, which requires
    the end those wires settle down before the next
    wavefront arrives. E.g., the dashed inputs should
    satisfy this requirement when both inputs have
    value 1 in the valid code word wavefront.

1
a
1
b
1
y
8
Synthesis of the basic functional modules the
early-propagative (EP) class
  • Different from the IC class, a valid code word
    (null) can propagate to the output of an EP
    functional module without waiting for other
    inputs becoming valid code words (nulls).

Threshold logic
Static CMOS
9
Synthesis of the basic functional modules the
early-propagative (EP) class
  • A valid code word (null) can propagates to the
    output of EP functional module without waiting
    for other inputs becoming valid code words
    (nulls).

0
a
0
y
10
Implementing the asynchronous circuits using NCL-D
  • NCL-D replaces each gate in the SR netlist with
    the IC typed functional module.
  • Robust because no input and internal wires needs
    verification for the timing closure
  • Large area because of the composing IC modules
  • Slow speed because it synchronises the inputs to
    each gate

A circuit example
NCL_D implementation of the example
11
Implementing the asynchronous circuits using NCL-X
  • Two composing parts
  • Functional part by replacing each gate with the
    corresponding EP module
  • Completion detection (CD) circuitry made up of OR
    gates and C-element trees
  • Reduces the area in the functional part but has
    more interconnects
  • Some inter-module wires need timing verification
    (shown in dash lines)
  • Real data flows in the functional part but the
    explicit CD circuitry still synchronise the input
    and internal signals

NCL_X implementation of the example
12
Partial acknowledgement
G1
b
DR- OR
G2
a
c
DR- AND
a is the dual rail input to G1 and G2, the
dual-rail encoded functional modules. b and c
are the dual-rail outputs of G1 and G2,
respectively.
13
Partial acknowledgement in the rising phase
transition
G1
b
DR- OR
G2
a
c
DR- AND
14
Partial acknowledgement in the falling phase
transition
G1
b
DR- OR
G2
a
c
DR- AND
15
Revisiting NCL-D and NCL-X
  • Every input to an IC-class functional module is
    partially acknowledged by the modules output for
    its rising and falling phase transitions.
  • None input to the EP-class functional module is
    partially acknowledged by the modules output for
    its rising or falling phase transition.
  • In NCL-D, both the rising and falling phase
    transitions of an input (internal) circuit
    variable is partially acknowledged by all the
    functional modules it fans into. Therefore, no
    timing verification is required for any
    inter-module wires.
  • In NCL-X, both the rising and falling phase
    transitions of an input (internal) circuit
    variable is partially acknowledged by the CD
    circuitry it fans into. Therefore, the wires
    fanning into the EP functional modules in the
    computational part of the circuit need timing
    verification.

16
Any other implementations of the circuit example?
YES!
  • Both the IC and EP functional modules are used
    for the mapping, where b and c are partially
    acknowledged by e and g for both rising and
    falling phase transitions, respectively.
  • Reduces the area compared with the previous
    methods
  • Less number of wires required of timing
    verification (shown in dash lines)

Intuitive implementation 1
17
Design Objective
  • Implement an asynchronous circuit from the
    synchronous circuit by replacing each gate in the
    SR netlist with certain type of the DR functional
    module with the equivalent functionality. Find
    the implementation with a minimum cost in area
    (in terms of transistor count) under the
    requirement that both the rising and falling
    phase transitions of each input and internal
    variable are partially acknowledged.
  • DF1 functional module prototype includes only IC
    and EP classes. The design is formulated as a
    UCP.
  • DF2 additional fine-grained functional modules
    used for the prototypes and the design is
    formulated as a BCP.

18
Design flow 1
  • A circuit variable is partially acknowledged by
    the IC-class functional module it fans into and
    all the partial acknowledge is congregated at the
    inputs to the IC functional modules in the
    circuit.
  • Formulation of DF1 as a UCP (in the form of the
    constraint function, though the matrix form is
    also available)

19
DF1 Circuit example
G1
G2
G3
Circuit example
Exactly the intuitive Implementation!
Concurrent to our work in the design flow 1,
Cheoljoo Jeong and Steve Nowick also contributed
a similar solution to reduce the overhead of an
asynchronous implementation while maintaining its
robustness. Ref Optimization of robust
asynchronous circuits by local input completeness
relaxation , by Cheoljoo Jeong and Steve
Nowick, Columbia University, accepted by
ASP-DAC2007.
20
Design flow 2
  • We implement functional modules that can
    partially acknowledge particular phase transition
    (the rising( ), falling( ), or both rising and
    falling ()).
  • Supplementation of these new prototypes enables a
    more flexible design in tuning the circuits
    area, performance and the robustness.
  • DF2 is formulated as a BCP problem. In its
    formulation, the clause of a circuit variable is
    the sum of all the functional modules that are
    possible to partial acknowledge it. However the
    constraint function of DF2 has extra terms that
    are used to exclude the possibility of casting
    the same gate by different prototypes. Binate
    covering problem fits for our task.
  • We first introduce the synthesis methods used in
    DF2.

21
Design partially acknowledged functional modules
(1)
  • Step 1Dual-rail expansion of the Boolean
    function .
  • Step 2Apply Booles expansion theorem to
    dual-rail outputs w.r.t
  • the selected inputs whose rising phase
    transitions are to be partially
  • acknowledged.

22
Design partially acknowledged functional modules
(2)
  • Step 3 Connect the n typed transistors in the
    pull-down network according to dual-rail
    expansions in step 2. Partial acknowledgement of
    the rising phase transitions are ensured.
  • Step 4 Implement the pull-up network of the
    functional module. To acknowledge the falling
    phase transitions of an input, cascade two
    p-typed transistors controlled by its dual inputs
    in the paths from Power to the dual-rail outputs.

Prototype of the pseudo- static functional module
23
DF2 a BCP design example (1)
A circuit example
Covering table of the circuit example
24
DF2 a BCP design example (2)
25
Experimental results for DF1 and DF2
26
Conclusions and future work
  • Conclusions
  • We construct a framework to synthesise the
    asynchronous circuits based on the concept of
    partial acknowledgement.
  • The area reductions of DF1 are 25, 15, and 15
    compared with NCL-D when implemented by DIMS,
    Reduced direct logic (RDL) and the threshold
    logics, respectively. Area reduction of DF2 is
    28 compared with NCL-X.
  • The verification work of DF1 and DF2 reduces by
    78 and 67 compared with NCL-X, respectively.
  • Future Work
  • Partial acknowledgement hurdles the
    input-dependent data flows in a circuit. Our aim
    is to understand the influence through timing
    analysis. We propose to apply the idea of latency
    non-increasing partial acknowledgement to design
    a speed-independent circuit respecting the real
    data flows limited solely by the circuits logic.

27
Thank you!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com