Title: Clockless Logic
1Clockless Logic
- Recap Lookahead Pipelines
- High-Capacity Pipelines
2Recap Lookahead Pipeline Styles
- 2 Strategies
- Early Evaluation
- Early Done
3Lookahead Pipelines Strategy 1
- Use non-neighbor communication
- stage receives information from multiple later
stages - allows early evaluation
Benefit stage gets head-start on next cycle
4Lookahead Pipelines Strategy 2
- Use early completion detection
- completion detector moved before stage (not
after) - stage indicates early done in parallel with
computation
early completion detector
Benefit again, stage gets head-start on next
cycle
5Single-Rail Styles
- Adapt dual-rail styles to single-rail
- replace dual-rail function blocks by single-rail
blocks - replace completion detectors by matched delays
Example LPsr2/2
6Single-Rail Styles (contd.)
7High-Capacity Pipelines
- Singh/Nowick WVLSI-00, ISSCC-02, Async-02
8Recent Approaches
- 3 novel styles for high-speed async pipelining
- Lookahead Pipelines (LP) Singh/Nowick,
Async-00 - High-Capacity Pipelines (HC) Singh/Nowick,
WVLSI-00 - MOUSETRAP Pipelines Singh/Nowick, TAU-00
- Goal significantly improve throughput of PS0
- Two Distinct Strategies
- LP introduce protocol optimizations
- shave off components from critical cycle
- HC fundamentally new protocol
- greater concurrency loosely-coupled stages
?
?
9High-Capacity Pipeline HC
- Key Idea Decouple control for pull-up and
pull-down - increases pipeline concurrency ? initiates next
cycle early - once N1 evaluates, can enter isolate (hold)
phase - stage N allowed to complete entire next cycle!
N
N1
N2
10Inside an HC stage
- Decoupled control pull-up and pull-down stacks
are independently controllable
eval
pc
keeper
precharge control
Pull-down stack
datainputs
dataoutputs
evaluation control
- pc asserted precharge
- eval asserted evaluate
- both de-asserted enter isolate (hold) phase
11Cycle of an LPHC Stage
Stage N
Stage N1
- Only a single backward synchronization arc
- once stage N1 has completed Eval, N can perform
entire next cycle! - why safe? N1 enters isolate phase key to
greater concurrency - almost all existing approaches require 2 arcs
- One (natural) forward synchronization arc
- stage N1 evaluates new data only after N has
evaluated
12Formal Specification of Controller
- Problem Specification too concurrent for direct
synthesis - desired precharge condition N and N1 have
evaluated same data - problem this condition not uniquely captured by
given signals! - N may evaluate next data item, while N1 stuck on
current item!
13Modified Specification of Controller
- Solution Add a state variable ok2pc
- ok2pc records whether N1 has absorbed Ns data
item - ok2pc resets immediately when N deletes item (N
precharges) - ok2pc is set when N1 deletes item (N1
precharges)
14Controller implementation
S
pc
T
NAND3
S
aC
ok2pc
eval
S
INV
- Controller implementation is very simple
- each signal implemented using a single gate
- ok2pc typically off the critical path
15Performance
N
N1
N2
N enables itself for next evaluation
N precharges
N evaluates
N1 evaluates
16Ripple-Carry Adder One Stage
- Mixed Dual-Rail/Single-Rail Datapath
- single-rail sum
- dual-rail A, B, Carry-in and Carry-out
- must implement binate functions using unate
dynamic logic
17Final Adder Architecture
shift-registers provide operand bits
A,B
carryin
adder stage
carryout
most significant
least significant
sum
shift-registers accumulate sum bits
18Results
- Designed/simulated adder in each pipeline style
- Experimental Setup
- design 32-bit ripple-carry-adder
- technology 0.6? HP CMOS, _at_3.3 V and 300K
New LPHC style 10 faster than LPSR2/1
19Conclusions
- Introduced 2 new asynchronous adders
- Use novel pipeline protocols
- observe events from multiple later stages
- decouple control of pull-up/pull-down
- Especially suitable for fine-grain (gate-level)
pipelining - Very high-throughputs obtained
- 0.93-1.02 GHz in 0.6?
- expected to outperform the best (IPCMOS 3.3-4.5
GHz / 0.18?) - LPHC doubles the typical storage capacity
- Robustly handle arbitrary-speed environments
- useful as IPs
- Future Work Layout/fabrication, application to
DSPs