Title: Low-Latency Interfaces for Mixed-Timing Domains [in DAC-01]
1Low-Latency Interfaces for Mixed-Timing
Domainsin DAC-01
- Tiberiu Chelcea Steven M. Nowick
- Department of Computer Science
- Columbia University
- tibi,nowick_at_cs.columbia.edu
2Introduction
- Key Trend in VLSI systems systems-on-a-chip
(SoC) - Two fundamental challenges
- mixed-timing domains
- long interconnect delays
- Our Goal design of efficient interface circuits
- Desirable Features
- arbitrarily robust
- low-latency, high-throughput
- modularity, scalability
- Few satisfactory solutions to date.
3Timing Issues in SoC Design
(a) single-clock
(b) mixed-timing domains
sync or async
Domain 1
Domain 1
longinter- connect
longinter- connect
Domain 2
sync or async
Domain 2
4Timing Issues in SoC Design (cont.)
- Solution provide interface circuits
(a) single-clock
(b) mixed-timing domains
sync or async
Domain 1
Domain 1
longinter- connect
longinter- connect
sync or async
Domain 2
Domain 2
Carloni et al., relay stations
NEW mixed-timingFIFOs
NEW mixed-timingrelay stations
5Contributions
- Complete set of mixed-timing interface circuits
- sync-sync, async-sync, sync-async, async-async
- Features
- Arbitrary Robustness wrt synchronization
failures - High-Throughput
- in steady-state operation no synchronization
overhead - Low-Latency fast restart
- in empty FIFO only synchronization overhead
- Reusability
- each interface partitioned into reusable
sub-components - Two Contributions
- Mixed-Timing FIFOs
- Mixed-Timing Relay Stations
6Contribution 1 Mixed-Timing FIFOs
- Addresses issue of interfacing mixed-timing
domains - Features token ring architecture
- circular array of identical cells
- shared buses data control
- data immobile once enqueued
- distributed control allows concurrent put/get
operations - 2 circulating tokens define tail head of
queue - Potential benefits
- low latency
- low power
- scalability
7Contribution 2 Mixed-Timing Relay Stations
- Addresses issue of long interconnect delays
- Latency-Insensitive Protocols safely tolerate
long interconnect delays between systems - Prior Contribution introduce relay stations
- single-clock domains (Carloni et al., ICCAD-99)
- Our Contribution introduce mixed-timing relay
stations - mixed-clock (sync-sync)
- async-sync
- First proposed solutions to date.
8Related Work
- Single-Clock Domains handling clock
discrepancies - clock skew and jitter (Kol98, Greenstreet95)
- long interconnect delays (Carloni99)
- Mixed-Timing Domains 3 common approaches
- Use Wrapper Logic
- add logic layer to synchronize data/control (Seit
z80, Seizovic94) - drawback long latencies in communication
- Modify Receivers Clock
- stretchable and pausible clocks (Chapiro84,
Yun96, Bormann97, Sjogren/Myers97) - drawback penalties in restarting clock
9Related Work Closer Approaches
- Mixed-Timing Domains (cont.)
- Interface Circuits Mixed-Clock FIFOs (Intel,
Jex et al. 1997) - drawback significant area overhead
synchronizer for each cell - Our approach mixed-clock FIFOs
- only 2 synchronizers for entire FIFO
10Outline
- Mixed-Clock Interfaces
- FIFO
- Relay Station
- Async-Sync Interfaces
- FIFO
- Relay Station
- Results
- Conclusions
11Mixed-Clock FIFO Block Level
full
req_get
valid_get
req_put
Mixed-Clock FIFO
synchronous put inteface
synchronous get interface
empty
data_put
data_get
CLK_put
CLK_get
12(No Transcript)
13Mixed-Clock FIFO Steady-State Simulation
At the end of clock cycle
Steady state FIFO neither full, nor empty
full
req_put
data_put
CLK_put
CLK_get
data_get
req_get
valid_get
empty
14Mixed-Clock FIFO Steady-State Simulation
full
req_put
data_put
CLK_put
CLK_get
data_get
req_get
valid_get
empty
15Mixed-Clock FIFO Steady-State Simulation
full
req_put
data_put
CLK_put
CLK_get
data_get
req_get
valid_get
empty
Get Operation
16Mixed-Clock FIFO Steady-State Simulation
full
req_put
data_put
CLK_put
CLK_get
data_get
req_get
valid_get
empty
17Mixed-Clock FIFO Steady-State Simulation
full
req_put
data_put
CLK_put
CLK_get
data_get
req_get
valid_get
empty
18Mixed-Clock FIFO Full Scenario
FIFO FULL
full
req_put
data_put
CLK_put
CLK_get
data_get
req_get
valid_get
empty
19Mixed-Clock FIFO Full Scenario
full
req_put
data_put
CLK_put
CLK_get
data_get
req_get
valid_get
empty
20Mixed-Clock FIFO Full Scenario
FIFO NOT FULL
full
req_put
data_put
CLK_put
CLK_get
data_get
req_get
valid_get
empty
21Mixed-Clock FIFO Full Scenario
full
req_put
data_put
CLK_put
CLK_get
data_get
req_get
valid_get
empty
22Mixed-Clock FIFO Cell Implementation
CLK_put
en_put
req_put
data_put
ptok_out
ptok_in
f_i
REG
e_i
gtok_in
gtok_out
CLK_get
en_get
valid
data_get
23Mixed-Clock FIFO Architecture
full
req_put
data_put
CLK_put
CLK_get
data_get
req_get
valid_get
empty
24Synchronization Issues
- Challenge interfaces are highly-concurrent
- Global FIFO state controlled by 2 different
clocks - Problem 1 Metastability
- Each FIFO interface needs clean state signals
- Solution Synchronize full empty signals
- full with CLK_put
- empty with CLK_get
- Add 2 (or more) synchronizing latches to each
signal - Observable full/empty safely approximate true
FIFO state
25Synchronization Issues (cont.)
- Problem 2 FIFO now may underflow/overflow!
- synchronizing latches add extra latency
- Solution Modify definitions of full and
empty - New FULL 0 or 1 empty cells left
- New EMPTY 0 or 1 full cells left
New Full Detector
26Synchronization Issues (cont.)
- Problem 3 Potential for deadlock
- Scenario suppose only 1 data item in quiescent
FIFO - FIFO still considered empty (new definition)
- Get interface cannot dequeue data item!
- Solution bi-modal empty detector, combines
- New empty detector (0 or 1 data items)
- True empty detector (0 data items)
- Two results folded into single global empty
signal
27Synchronization Issues Avoiding Deadlock
Bi-modal empty detection select either ne or oe
CLK_get
ne
f_1
f_3
f_2
f_0
f_0
f_2
f_3
f_1
empty
en_get
CLK_get
oe
f_1
f_3
f_2
f_0
req_get
28Mixed-Clock FIFO Architecture
full
req_put
data_put
CLK_put
CLK_get
data_get
req_get
valid_get
empty
29Put/Get Controllers
en_get
req_get
en_put
full
req_put
valid_get
empty
valid
- Put Controller
- enables put operation
- disabled when FIFO full
- Get Controller
- enables get operation
- indicates when data valid
- disabled when FIFO empty
30Outline
- Mixed-Clock Interfaces
- FIFO
- Relay Station
- Async-Sync Interfaces
- FIFO
- Relay Station
- Results
- Conclusions
31Relay Stations Overview
Proposed by Carloni et al. (ICCAD99)
System 1
System 2
32Relay Stations Implementation
packetOut
packetIn
stopIn
stopOut
- In normal operation
- packetIn copied to MR and forwarded on packetOut
- When stopped (stopIn1)
- stopOut raised on the next clock edge
- extra packet copied to AR
33 Relay Station vs. Mixed-Clock FIFO
empty
full
validOut
validIn
stopOut
stopIn
req_get
req_put
dataOut
dataIn
dataIn
dataOut
- Steady state always pass data
- Data items both valid invalid
- Stopping mechanism stopIn stopOut
- Steady state only pass data when requested
- Data items only valid data
- Stopping mechanism none (only full/empty)
34 Mixed-Clock Relay Stations (MCRS)
System 1
System 2
CLK
Mixed-Clock Relay Station derived from the
Mixed-Clock FIFO
35Mixed-Clock Relay Station Implementation
Mixed-Clock Relay Station vs. Mixed-Clock FIFO
- Identical
- - FIFO cells
- - Full/Empty detectors (...or can simplify)
- Only modify Put Get Controllers
en_get
stopIn
en_put
full
validOut
empty
validIn
valid
to cells
Put Controller
Get Controller
36Outline
- Mixed-Clock Interfaces
- FIFO
- Relay Station
- Async-Sync Interfaces
- FIFO
- Relay Station
- Results
- Conclusions
37Async-Sync FIFO Block Level
req_get
req_get
full
put_req
valid_get
valid_get
req_put
Mixed-Clock FIFO
put_ack
Async-Sync FIFO
empty
empty
data_put
data_get
put_data
data_get
CLK_put
CLK_get
CLK_get
Async Domain
Sync Domain
- Asynchronous put interface uses handshaking
communication - put_req request operation
- put_ack acknowledge completion
- no full signal
- Synchronous get interface no change
38Async-Sync FIFO Architecture
put_ack
put_req
put_data
cell
cell
cell
cell
cell
CLK_get
data_get
req_get
valid_get
empty
39Async-Sync FIFO Cell Implementation
put_ack
put_req
put_data
we
we1
e_i
REG
f_i
gtok_in
gtok_out
CLK_get
en_get
get_data
40Async-Sync Relay Stations (ASRS)
Micropipeline
ASRS
optional
CLK2
41Outline
- Mixed-Clock Interfaces
- FIFO
- Relay Station
- Async-Sync Interfaces
- FIFO
- Relay Station
- Results
- Conclusions
42Results
- Each circuit implemented
- using both academic and industry tools
- MINIMALIST Burst-Mode controllers Nowick et
al. 99 - PETRIFY Petri-Net controllers Cortadella et
al. 97 - Pre-layout simulations 0.6?m HP CMOS technology
- Experiments
- various FIFO capacities (4/8/16 cells)
- various data widths (8/16 bits)
43Results Latency
Experimental Setup - 8-bit data items - various
FIFO capacities (4, 8, 16)
Latency time from enqueuing to dequeueing data
into an empty FIFO
Design 4-place 4-place 8-place 8-place 16-place 16-place
Design Min Max Min Max Min Max
Mixed-Clock 5.43 6.34 5.79 6.64 6.14 7.17
Async-Sync 5.53 6.45 6.13 7.17 6.47 7.51
Mixed-Clock RS 5.48 6.41 6.05 7.02 6.23 7.28
Async-Sync RS 5.61 6.35 6.18 7.13 6.57 7.62
For each design, latency not uniquely defined
Min/Max
44Results Maximum Operating Rate
Synchronous interfaces MegaHertz Asynchronous
interfaces MegaOps/sec
Design 4-place 4-place 8-place 8-place 16-place 16-place
Design Put Get Put Get Put Get
Mixed-Clock 565 549 544 523 505 484
Async-Sync 421 549 379 523 357 484
Mixed-Clock RS 580 539 550 517 509 475
Async-Sync RS 421 539 379 517 357 475
Put vs. Get rates - sync put faster than sync
get - async put slower than sync get
45Conclusions
- Introduced several new low-latency interface
circuits - Address 2 major issues in SoC design
- Mixed-timing domains
- mixed-clock FIFO
- async-sync FIFO
- Long interconnect delays
- mixed-clock relay station
- async-sync relay station
- Other designs implemented and simulated
- Sync-Async FIFO Relay Station
- Async-Async FIFO Relay Station
- Reusable components mix match to build
circuits - Provide useful set of interface circuits for SoC
design