Demystifying DataDriven and Pausible Clocking Schemes - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Demystifying DataDriven and Pausible Clocking Schemes

Description:

If we stretch the clock the insertion delay must be considered in our timing analysis (also true for clock gating in synchronous world) ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 32
Provided by: RobertM219
Category:

less

Transcript and Presenter's Notes

Title: Demystifying DataDriven and Pausible Clocking Schemes


1
Demystifying Data-Driven and Pausible Clocking
Schemes
  • Robert Mullins
  • Computer Architecture Group
  • Computer Laboratory, University of Cambridge
  • ASYNC 2007, 13th IEEE International Symposium on
    Asynchronous Circuits and Systems

2
System-Timing Emerging Challenges
  • Current shift is from complex monolithic designs
    to networks of energy efficient cores
  • Distinct block and system-level timing challenges
  • Network-level timing
  • Physically distributed
  • Activity may be sparse
  • Interconnect delay and power are significant
  • Significant variations in temperature, supply
    voltage and process parameters

Higher-level control, timing and scheduling is
naturally event-driven
3
Combining Local and Global Approaches to Timing
  • Synchronization free approaches
  • Coping with metastability
  • Timing-Safe
  • Allocate a fixed period of time for metastability
    to resolve, e.g. two flip-flop synchronizer
  • Value-Safe
  • Wait for metastability to resolve, e.g. clock
    stretching or pausing techniques
  • Clock is generated locally
  • Value-safe ideas are less well understood,
    avoided by industry

4
Advantages of a value-safe approach
  • Efficiency
  • Synchronization delay is minimized
  • Opportunities for optimization
  • Robustness
  • Inherently robust, no trade-off against
    performance.
  • Only way to guarantee data is never lost, no
    MTBF. Could still have functional failures if we
    are delayed too long dont hit performance
    requirements
  • Transparency
  • Synchronous block is unaffected by clocking
    wrapper.
  • Less true for traditional synchronization and
    clock-gating approaches.
  • Simplicity and modularity
  • I aim to illustrate how simple these schemes are

5
Adding an asynchronous interface to a clock
generator
6
Adding an asynchronous interface to a clock
generator
7
Adding an asynchronous interface to a clock
generator
8
Adding an asynchronous interface to a clock
generator
9
Input register driven by a pausible clock
10
Data-Driven Clock
Pausible Clock
- May need to add a mechanism to ensure block
receives enough clock edges, e.g. to flush
pipeline
- Need to add an explicit sleep mechanism if we
want to halt clock generator during periods of
inactivity
Helps classify and understand existing
techniques. In reality, the design space is a
continuum
11
Stretchable Clocks
  • A type of data-driven clock
  • Rising clock edge is generated
  • Stretch signal may be asserted (synchronously) in
    response to clk
  • Low-phase of clock is stretched until some
    operation has completed and stretch signal is
    removed

12
Stretchable Clocks
13
Stretchable Clocks
14
Stretchable Clocks
15
Stretchable Clocks
16
Stretchable Clocks
17
Input Ports
  • Arbitrated Inputs
  • At most one input can be served per cycle
  • Synchronised Inputs
  • Cannot proceed until multiple inputs are ready
  • Sampled Inputs
  • Can progress with a variable number of data
    inputs(or none)
  • Need to also choose event to trigger sampling of
    inputs
  • Paper provides implementation details for each
    input port type for pausible and data-driven
    clock generators

18
Output Ports
  • Scheduled
  • Ensure data is output on a particular clock
    cycle, stall until data is consumed
  • Registered
  • Addition of an output register allows next
    computation to proceed while data is consumed
  • Polled
  • Sample output port ready signal and take
    appropriate action. Clock period is only ever
    extended to allow metastability to resolve, not
    because output is blocked.

19
A GALS Wrapper Example
  • Free running clock
  • Asynchronous input
  • we know nothing about when data will arrive
  • For simplicity, lets assume we can always accept
    new data
  • Registered output feeding asynchronous FIFO

Simple to combine clock generator, input and
output ports
20
A GALS Wrapper Example Step 1.
Local clock generator with H/S interface
21
A GALS Wrapper Example Step 2.
Pausible Clock Template
22
A GALS Wrapper Example Step 3.
Provide registered output port support
(stretchable clock template)
23
A GALS Wrapper Example Step 4.
24
Data-Driven Clocking for On-Chip Networks
  • Why is global synchrony limiting for on-chip
    networks?
  • Reconfigurable networks, adaptive low-voltage
    interconnect drivers, irregular topologies, .
  • Problem with traditional synchronization
    techniques
  • Latency (could easily double best-case latency,
    our routers are single-cycle support VCs
    30FO4)
  • Problems with fully-asynchronous implementations
  • Latency (for the router designs we have examined)
  • More difficult to speculate? Scheduling is
    expensive?

25
Data-Driven Clocking for On-Chip Routers
  • Router should be clocked when one or more inputs
    are valid (or flits are buffered)
  • Elevator analogy
  • Free running (paternoster) elevator
  • Chain of open compartments
  • Must synchronise before you jump on!
  • Traditional elevator (data-driven clock)
  • Wait for someone to arrive
  • Close doors, decide who is in and who is out
  • Metastability issue again (potentially painful!)

26
Data-Driven Clock with Sampled Inputs
Either admitted or locked out
Incoming data
Local Clock Generator Template
Sample inputs when at least one input is ready
(and clock is low)
Assert Lock
(Close Lift Doors)
27
Clock Tree Insertion Delays
  • Delay from root to leaf of clock tree can be
    considerable (certainly non-zero!)
  • If every clock cycle is the same, this clock
    insertion delay is not normally an issue
  • If we stretch the clock the insertion delay must
    be considered in our timing analysis (also true
    for clock gating in synchronous world)
  • Not difficult to handle, but can increase time
    required to admit new data

28
Clock Tree Insertion Delays
29
Clock Tree Insertion Delays
  • How do we handle multi-cycle insertion delays?
  • In practice, we would want to avoid very large
    synchronous blocks
  • Need to ensure we admit data on the correct clock
    cycle
  • Cannot cheat and promote data!

We simply remember on which clock cycle data has
been scheduled to be admitted
30
Summary
  • Value-safe techniques are simple and robust
  • Powerful framework for composing synchronous
    sub-systems
  • Build efficient event-driven global communication
    and scheduling infrastructure?
  • Scope for supporting low-power techniques?
    (self-timed power-gating, DVFS support,
    timing-speculation)
  • Scope for exploiting event-driven scheduling and
    clocking at system-level.
  • Synchronization costs are low enough to prompt
    use in on-chip network applications
  • More in the paper, aims to be a useful survey and
    hopefully fills some gaps too.

31
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com