Clockless Logic: Dynamic Logic Pipelines (contd.) - PowerPoint PPT Presentation

About This Presentation
Title:

Clockless Logic: Dynamic Logic Pipelines (contd.)

Description:

LP: introduce protocol optimizations 'shave off' components from critical cycle ... overcome a major shortcoming of Williams' PS0 pipelines ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 29
Provided by: Montek5
Learn more at: http://www.cs.unc.edu
Category:

less

Transcript and Presenter's Notes

Title: Clockless Logic: Dynamic Logic Pipelines (contd.)


1
Clockless LogicDynamic Logic Pipelines (contd.)
  • Drawbacks of Williams PS0 Pipelines
  • Lookahead Pipelines

2
Drawbacks of PSO Pipelining
  • Poor throughput
  • long cycle time 6 events per cycle
  • data tokens are forced far apart in time
  • Limited storage capacity
  • max only 50 of stages can hold distinct tokens
  • data tokens must be separated by at least one
    spacer
  • Our Research Goals address both issues
  • still maintain very low latency

3
Recent Approaches
  • 3 novel styles for high-speed async pipelining
  • Lookahead Pipelines (LP) Singh/Nowick,
    Async-00
  • High-Capacity Pipelines (HC) Singh/Nowick,
    WVLSI-00
  • MOUSETRAP Pipelines Singh/Nowick, TAU-00
  • Goal significantly improve throughput of PS0
  • Two Distinct Strategies
  • LP introduce protocol optimizations
  • shave off components from critical cycle
  • HC fundamentally new protocol
  • greater concurrency loosely-coupled stages

?
?
4
Outline
  • New Asynchronous Pipelines
  • Lookahead Pipelines (LP)
  • High-Capacity Pipelines (HC)
  • MOUSETRAP Pipelines

5
Lookahead Pipelines Strategy 1
  • Use non-neighbor communication
  • stage receives information from multiple later
    stages
  • allows early evaluation

Benefit stage gets head-start on next cycle
6
Lookahead Pipelines Strategy 2
  • Use early completion detection
  • completion detector moved before stage (not
    after)
  • stage indicates early done in parallel with
    computation

early completion detector
Benefit again, stage gets head-start on next
cycle
7
Lookahead Pipelines Overview
  • 5 New Designs
  • Dual-Rail Data Signaling
  • LP3/1 early evaluation
  • LP2/2 early done
  • LP2/1 early evaluation early done
  • Single-Rail Bundled-Data Signaling
  • LPSR2/2 early done
  • LPSR2/1 early evaluation early done

8
Dual-Rail Design 1 LP3/1
PC
Eval
Data in
Data out
N
N1
N2
ProcessingBlock
Completion Detector
From N2
  • Optimization early evaluation
  • each stage has two control inputs from stages
    N1 and N2
  • Idea shorten precharge phase
  • terminate precharge early when N2 is done
    evaluating

9
LP3/1 Protocol
  • PRECHARGE N when N1 completes evaluation
  • EVALUATE N when N2 completes evaluation

N2 indicates done
N
N1
N2
N2 evaluates
N evaluates
N1 evaluates
10
LP3/1 Comparison with PS0
N
N1
N2
LP3/1
Only 4 events in cycle!
N
N1
N2
PS0
6 events in cycle
11
LP3/1 Performance
saved path
Savings over PS0 1 Precharge 1 Completion
Detection
12
LP3/1 Inside a Stage
Merging 2 Control Inputs
A NAND gate merges2 control inputs
  • Precharge when PC1 (and Eval0)
  • Evaluate early when Eval1 (or PC0)
  • Problem early Eval1 is non-persistent!
  • may be de-asserted before stage completes
    evaluation!

13
LP3/1 Timing Constraints Example
Problem (cont.) early Eval1 non-persistent
  • Observation PC0 soon after Eval1, and is
    persistent
  • Solution no change!
  • ?use PC as safe takeover for Eval!
  • Timing Constraint PC0 must arrive before Eval
    de-asserted
  • simple one-sided timing requirement
  • other constraints as well all easily satisfied
    in practice

14
Dual-Rail Design 2 LP2/2
  • Optimization early done
  • Idea move completion detector before processing
    block
  • stage indicates when about to precharge/evaluate

early Completion Detector
early done
Data in
Data out
Processing Block
15
LP2/2 Completion Detector
  • Modified completion detectors needed
  • Done1 when stage starts evaluating, and inputs
    valid
  • Done0 when stage starts precharging
  • asymmetric C-element

16
LP2/2 Protocol
  • Completion Detection
  • performed in parallel with evaluation/precharge
    of stage

N
N1
N2
N evaluates
N1 evaluates
17
LP2/2 Performance
4
1
2
LP2/2 savings over PS0 1 Evaluation 1
Precharge
18
Dual-Rail Design 3 LP2/1
  • Hybrid of LP3/1 and LP2/2
  • Combines
  • early evaluation of LP3/1
  • early done of LP2/2

Cycle time Best of our dual-rail lookahead
pipelines
19
Dual-Rail Design 3 LP2/1
  • Hybrid of LP3/1 and LP2/2. Combines
  • early evaluation of LP3/1
  • early done of LP2/2

20
Lookahead Pipelines Overview
  • 5 New Designs
  • Dual-Rail Data Signaling
  • LP3/1 early evaluation
  • LP2/2 early done
  • LP2/1 early evaluation early done
  • Single-Rail Bundled-Data Signaling
  • LPSR2/2 early done
  • LPSR2/1 early evaluation early done

21
Single-Rail Design LPSR2/1
  • Derivative of LP2/1, adapted to single-rail
  • bundled-data matched delays instead of
    completion detectors

22
Inside an LPSR2/1 Stage
23
LPSR2/1 Protocol
N
N1
N2
N evaluates
24
Results
  • Designed/simulated FIFOs for each pipeline style
  • Experimental Setup
  • design 4-bit wide, 10-stage FIFO
  • technology 0.6? HP CMOS
  • operating conditions 3.3 V and 300K

25
Comparison with Williams PS0
dual-rail
single-rail
  • LP2/1 gt2X faster than Williams PS0
  • LPSR2/1 1.2 Giga items/sec

26
Comparison LPSR2/1 vs. Molnar FIFOs
  • LPSR2/1 FIFO 1.2 Giga items/sec
  • Adding logic processing to FIFO
  • simply fold logic into dynamic gate ? little
    overhead
  • Comparison with Molnar FIFOs
  • asp FIFO 1.1 Giga items/sec
  • more complex timing assumptions ? not easily
    formalized
  • requires explicit latches, separate from logic!
  • adding logic processing between stages ?
    significant overhead
  • micropipeline 1.7 Giga items/sec
  • two parallel FIFOs, each only 0.85 Giga/sec
  • very expensive transition latches
  • cannot add logic processing to FIFO!

27
Practicality of Gate-Level Pipelining
When datapath is wide
  • Can often split into narrow streams
  • Use localized completion detector
  • for each stream
  • need to examine only a few bits
  • ? small fan-in
  • send done to only a few gates
  • ? small fan-out
  • comp. det. fairly low cost!

28
Conclusions
  • Introduced several new dynamic pipelines
  • Use two novel protocols
  • early evaluation
  • early done
  • Especially suitable for fine-grain (gate-level)
    pipelining
  • Very high throughputs obtained
  • dual-rail gt2X improvement over Williams PS0
  • single-rail 1.2 Giga items/second in 0.6? CMOS
  • Use easy-to-satisfy, one-sided timing constraints
  • Robustly handle arbitrary-speed environments
  • overcome a major shortcoming of Williams PS0
    pipelines
  • Recent Improvement Even faster single-rail
    pipeline (WVLSI00)
Write a Comment
User Comments (0)
About PowerShow.com