Title: A Quasi-Delay-Insensitive Method to Overcome Transistor Variation
1A Quasi-Delay-Insensitive Method to Overcome
Transistor Variation
- Charlie Brej
- APT Group
- University of Manchester
2Overview
- Synchronous Problems
- Asynchronous Logic
- Why?
- How?
- Asynchronous Benefits
- Delay Insensitivity
- Early Output
3Problems Communication
- Communication horizon
- For a 60 nanometer process a signal can reach
only 5 of the dies length in a clock cycle D.
Matzke,1997 - Clock distributed using wave pipelining
4Cant keep ramping up the clock
- Intel pulls the plug on 4GHz Pentium 4
- AMD and Intel using PR based model numbers
- New ranges run at much slower clock rate
- Higher concentration on parallel execution
- Hyper-threading
- Multiple cores
5Problems Performance
Unbalanced Stages
Clock overheads
Clock Skew/Jitter
Transistor Variability
Timing Assumption overheads
Signal Integrity
Cycle time
Worst Average case performance
Real Computation
6Clock! What is it good for?
- No arguing with the clock
- 9am - 5pm. No excuses!
7Bundled-Data
Request Delay
Acknowledge
- When you finish, do the next task
- Flexitime
8Remove the Clock
Unbalanced Stages
Clock overheads
Clock Skew/Jitter
Transistor Variability
Timing Assumption overheads
Signal Integrity
Worst Average case performance
Cycle time
Real Computation
9How do you know when you are finished?
- Synchronous
- Estimate
- Global timing reference
- Asynchronous (bundled-data)
- Estimate
- Local delay elements
- Asynchronous (delay-insensitive)
- When the data arrives
- Intrinsic
10Becoming Delay Insensitive
- Dual-Rail
- Two wires
- 00 NULL
- 01 Zero
- 10 One
- (11 Not used)
- Four Phase handshake
- Return to zero
R0
R1
Ack
11Delay Insensitivity
- No assumptions on speed of wires or gates
- Environmental effects
- Heat
- Voltage supply
- Manufacturing defects
- Thin Film Transistor
- Next generation process sizes
12Early Output Logic
- Dual-Rail interfaces
- Output generated as early as possible
- Two Early output cases
- If either input is 0 then the output is 0
13Bit level pipelining
- Forward completed parts of the result
- Pace work
- Dont stall parts unless you have to
14Bit level pipelining
- Forward completed parts of the result
- Pace work
- Dont stall parts unless you have to
15Early Output cases
16Paper contribution
- With missing inputs still generates results
- Isolates late inputs
- Allows next data phase
17Remove Unnecessary computation
Unbalanced Stages
Clock overheads
Clock Skew/Jitter
Transistor Variability
Timing Assumption overheads
Signal Integrity
Worst Average case performance
Unnecessary Computation/Delays
Real Computation
Cycle time
18Summary
- Asynchronous
- Delay Insensitive
- Safe
- No timing assumptions
- Average case performance
- Remove unnecessary computation