PostFabrication, Automatically Tunable, Programmable Delay Elements for ClockDelayed Domino Logic - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

PostFabrication, Automatically Tunable, Programmable Delay Elements for ClockDelayed Domino Logic

Description:

Post-Fabrication, Automatically Tunable, Programmable Delay Elements for ... Tuning circuitry dynamically sets (post-fabrication) each PDE delay to the ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 22
Provided by: miodragvuj
Category:

less

Transcript and Presenter's Notes

Title: PostFabrication, Automatically Tunable, Programmable Delay Elements for ClockDelayed Domino Logic


1
Post-Fabrication, Automatically Tunable,
Programmable Delay Elements for Clock-Delayed
Domino Logic
  • Miodrag Vujkovic and Carl Sechen
  • VLSI Design Laboratory
  • Department of Electrical Engineering
  • University of Washington
  • Seattle

2
Dynamic Logic
  • We MUST ensure that dynamic inputs never make 1
    to 0 transitions while in evaluation
  • Two solutions
  • 1. precharge outputs low using an inverting gate
    (standard domino)
  • 2. delay the evaluate clock until inputs settle
    (CD domino)

3
Clock-Delayed (CD) Domino Logic
  • Self-timed dynamic logic family
  • Consists of a dynamic gate, and an optional delay
    element for the clock signal

4
CD Domino Inverting Gates
  • Novel pre-charge low inverting CD domino gate
  • The size (speed) of the NAND2 gate is tuned to
    the speed of the nMOS dynamic gate the glitch at
    the output is constrained to be less than 1 of
    VDD

5
Advantages of CD Domino Logic
  • Uses single-rail circuits, rather than dual-rail
    for standard domino
  • Provides both inverting and non-inverting
    functions
  • High-speed, large fan-in NOR and OR circuits
  • We previously showed that random logic circuits
    implemented in CD domino were 60 faster than
    optimized (Synopsys) static CMOS implementations
  • T. Thorp, G. Yee and C. Sechen, Monotonic
    Static CMOS and Dual Vt Technology, Proc. Int.
    Symp. on Low Power Electronics and Design
    (ISPLED), San Diego, CA, August 16-17,
    1999.

6
Delay Matching
Problem
  • CD domino requires delay matching between the
    slowest dynamic gate at a level and a delay
    element
  • A 20 margin is typically added to the delay of
    the fixed delay element to account for PVT
    variations
  • Thus, 20 of the speed gain possible with CD
    domino is not realized
  • Average speed gain of (6020) is theoretically
    possible

Goal
  • Use digitally programmable delay elements (PDEs)
    to reduce the margin and attain a speed
    improvement without affecting the reliability in
    the presence of variations

7
CD Domino Clocking Scheme
  • The circuits are fully levelized
  • The delay element on each level is tuned to the
    slowest gate at its level, plus a 20 margin

8
New PDE Self-tuning Scheme
9
Clock Generation Scheme
  • of stages in the clock trees determined by the
    level with the largest clock load
  • Only clk(i) and clk(i1) are enabled during level
    i tuning

clock enable gate
10
Non-inverting Gates
  • Add transistor in parallel for OR circuits and
    enable transistor to disable branch in normal
    mode (enab 0)
  • Add the longest branch in parallel for AOI
    circuits
  • All logical inputs are 0 (previous level
    is in precharge)

  • clk(i) tied to transistor(s)
    in additional
    branch
  • Transistor sizes chosen
    to give worst-case
    delay
    from clk(i) to Out

11
Inverting Gates
  • No need to add additional transistors
  • All logical inputs at 0 (previous level
    is in precharge)
  • Worst-case delay
    from clk(i) to Out

12
Programmable Delay Element (PDE)
  • PDE consists of two stages
  • - 6 conditional inverters in parallel
  • - strong output driver
  • PDE delay is determined by the input
    combination bit5..0
  • Conditional inverters are geometrically sized
    w(n) 2w(n-1) 2nw0
  • Output inverter can be sized according to the
    load at level i1
  • Granularity of the PDE proportional to the number
    of conditional inverters

13
6-bit Binary Counter
  • The binary counter generates input combination to
    PDE (bit5..0)
  • Delay is inversely proportional to the count of
    the counter
  • When enab(i) becomes low, the counter keeps
    previous state thereby maintaining the desired
    PDE delay
  • Overflow is generated if the desired PDE delay is
    less than the minimum possible PDE delay, given
    by the vector 111111

14
Counter Enable Controller (CEC)
  • Each level has OR
    gates and 2
    DFFs
  • First DFF stage works
    asynchronously
  • en(i) are synchro- nously
    changed at the
    falling edge of the last level clk
  • Detected error err(i) or count
    overflow ovfl(i) disables level i

15
Error Sensor Circuit
  • Muller C-element output follows
    the leading input
  • The worst-case output at level i has fully
    evaluated before the evaluation phase of the
    clock for the i1 level
  • Delays are matched - no error

16
Error Sensor Circuit
  • The worst-case output at level i has not fully
    evaluated before the evaluation phase of the
    clock for the i1 level
  • Delays are not matched - error
    is detected
  • Margin is varied by sizing of the NAND3 pull-down
    transistors

17
Error Detector with Several Stages
  • Each error detector includes a number of error
    sensors
  • Error err (i) is latched and used by Counter
    Enable Controller to disable current and enable
    subsequent level
  • Error detector size proportional to the number of
    scanned outputs at the gate level

18
Error Detector with Several Stages
  • Number of error sensors reduced by multiplexing
    the gate outputs
  • Wide OR gates implemented as fast pseudo-NMOS
    gate
  • Power only consumed by the stage that is being
    tuned None is consumed after the tuning is
    completed

19
Experimental Results
  • t481 and term1 MCNC benchmarks were simulated
  • 4 levels of logic, 3 PDEs, 3 counters, controller
  • Needs approximately 300-400 cycles to tune itself
    (for 100 MHz clock - about 4ms)
  • Randomly generated capacitances at all gate
    output nodes (can increase gate delay 50-75)
  • Added capacitance simulates process variations
    that can increase gate delay and cause circuit
    failure
  • Temperature varied in the range 0C to 70C
  • Goal verify the self-tuning process and look for
    the margins (clock delay vs. worst-case gate
    delay)
  • 10 separate tuning processes performed

20
Experimental Results
  • Self-timed margin requirement of 20 reduced to
    6 on average
  • Some amount of margin still needed (tuning
    circuitry variations, clock skew)

t481
term1
21
Conclusion
  • Random logic circuits implemented in CD domino
    were 60 faster (including 20 margin) than
    optimized (Synopsys) static CMOS implementations
  • New self-tuning CD domino logic has been
    developed
  • - programmable delay elements (instead of fixed)
  • - associated tuning circuitry to perform delay
    matching
  • Tuning circuitry dynamically sets
    (post-fabrication) each PDE delay to the
    corresponding worst-case gate delay at its logic
    level
  • Simulations of the MCNC benchmark circuits
    performed to verify the functionality in the
    presence of modeled process variations
  • Self-timed margin of 20 reduced to 6 on
    average, resulting in a circuit speed improvement
    without affecting the reliability
Write a Comment
User Comments (0)
About PowerShow.com