A Power Aware System Level Interconnect Design Methodology for LatencyInsensitive Systems - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

A Power Aware System Level Interconnect Design Methodology for LatencyInsensitive Systems

Description:

A buffered wire in a design. A wire in an LI channel. Not suitable for. Stallable modules ... FIFO are also called elastic buffers. Not a shift register! FIFO ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 29
Provided by: vikasc
Category:

less

Transcript and Presenter's Notes

Title: A Power Aware System Level Interconnect Design Methodology for LatencyInsensitive Systems


1
A Power Aware System Level Interconnect Design
Methodology for Latency-Insensitive Systems
  • Vikas Chandra, Herman Schmit
  • Tabula Inc.
  • Santa Clara, CA
  • Anthony Xu, Larry Pileggi
  • Electrical Computer EngineeringCarnegie Mellon
    University

2
Latency-Insensitive (LI) Design Methodology
  • Interconnect significant contributor to delay,
    power
  • Separates communication from computation
  • A system is a collection of computational
    processes
  • Data is exchanged using communication channels
  • Maintains the simplicity of a synchronous design
  • Depends only on the signals events and not on
    their exact timing
  • Does not suffer from interconnect delay problem

3
LI Design Methodology
  • Properties of an LI system
  • All the IP (computation) modules are synchronous
  • The signals can take multiple clock cycles in the
    channel
  • The IP modules are stallable
  • The channel can apply back-pressure

Relay station
Module 1
Relay station
Module 5
Module 3
Relay station
Relay station
Relay station
Module 2
Module 4
Relay station
4
System-on-Chip Design Challenges
  • Delays dominated by interconnecting wires
  • RC delay of a metal wire is getting worse with
    shrinking process
  • Interconnect delay a larger fraction of clock
    cycle-time due to
  • increase in operating frequency
  • increase in die size
  • increase in average interconnect length
  • Multiple clock cycles needed by a signal to span
    the chip
  • wire pipelining

Scaling
Global wires do not scale!
5
System-on-Chip Design Challenges
  • Power consumption in the interconnect
  • A significant contributor to total on-chip power
    consumption
  • Reasons for increase in power consumption
  • Wire scaling increases wire capacitance
  • Inserted repeaters also burn power
  • Latch repeaters are expensive and power hungry
  • The cause of concern
  • Battery power for portable devices
  • Thermal reliability

6
Design Solutions for SoC challenges
  • Latency-insensitive (LI) design methodology
  • Methodology for coping with interconnect
    dominance
  • Alleviates wire uncertainty problems
  • Enables patient processes
  • A general case of GALS design methodology
  • Voltage and frequency Islands
  • IP cores have different voltage and clock
    frequencies
  • To reduce power consumption in the interconnect
  • Also helps in IP cores integration
  • Relay stations in an LI channel can be voltage
    and frequency scaled

7
Wire Optimization in a single clock system
  • A buffered wire in a design
  • Not suitable for
  • Stallable modules
  • Latency-insensitive designs
  • A wire in an LI channel

8
Background Terms
  • FIFO
  • First-In-First-Out queue
  • Synchronous FIFO are also called elastic buffers
  • Not a shift register!
  • FIFO size
  • Maximum number of data elements in a FIFO
  • Channel
  • One or more FIFO connected in series
  • Stage
  • Each of the FIFOs in the channel is called a stage

9
System Parameters
  • l Average data production rate
  • l of 0.5 means that there is 50 probability that
    a new data is generated every clock cycle
  • m Average data consumption rate
  • Throughput requirement
  • Number of data items read per time unit
  • Performance metric of the channel

10
Power Model
  • The dynamic power consumption in CMOS
  • Frequency and voltage are also related
  • Vt Threshold voltage
  • g depends on carrier velocity saturation
  • ranges from 1-2

11
FIFO Power Model
  • The dynamic power consumption increases as FIFO
    size increases
  • Switching capacitance increases with FIFO size

12
Interconnect channel synthesis
  • Design variables
  • a
  • ni size of the FIFO in the ith stage
  • Data production rate lf/p
  • Data consumption rate mf/q

13
Bounds on channel frequency scaling
  • Lower bound on a
  • In best case, data transfer through the channel
    will equal the faster of the clock rate of
    producer and the consumer
  • Channel clock should not be any faster than the
    fastest data rate
  • Upper bound on a
  • Derived from channels data rate
  • Channel clock should not be any slower than the
    slowest data rate

14
Power aware channel synthesis
  • Power consumption in a FIFO
  • Synthesis cost function
  • channel power consumption expression

15
Power aware channel synthesis
  • Problem statement
  • Given
  • number of channel stages (n)
  • frequencies of producer and consumer (p,q)
  • data production and consumption rates (l,m)
  • To find
  • sizes of the FIFOs in the LI channel
  • voltage and clock frequency of the channel
  • Such that
  • the channel meets the given throughput
    requirement
  • the power consumption is minimized

16
FIFO sizing algorithm
  • Given
  • p frequency of the producer
  • q frequency of the consumer
  • a frequency of the channel
  • l average data production rate
  • m average data consumption rate
  • Throughput requirement
  • To find FIFO sizes (n1, , nN) min power
    solution

17
FIFO sizing algorithm
  • The search straddles the target performance
    frontier

Starting FIFO size
FIFO stages
Power optimal FIFO sizing
Final FIFO size
FIFO stages
  • The algorithm selects the lowest cost sizing
  • Based on synthesis cost function

18
Channel frequency search algorithm
  • Given
  • p frequency of the producer
  • q frequency of the consumer
  • l average data production rate
  • m average data consumption rate
  • Throughput requirement
  • To find power optimal value of a

19
Channel frequency search algorithm
amin
amax
  • Exploring each value of a is expensive
  • especially if l,m are small or p,q are large
  • Binary search employed for finding optimal a
  • FIFO sizing is done for each a using
    FIFO_size_search
  • Cost calculated for each a to find an optimal
    solution

20
Channel synthesis analysis
  • Two kinds of channel analyzed
  • Balanced channel
  • Skewed channel
  • Balanced channel
  • Data production rate (lf/p) is similar to data
    consumption rate (mf/q)
  • Skewed channel
  • Data production rate (lf/p) and data consumption
    rate (mf/q) are skewed
  • Shape of cost function varies significantly

21
Balanced 3 stage channel
  • Data production rate Data consumption rate
  • on average

Example l0.1, p1 m0.1, q1 (l/p0.1,
m/q0.1) l0.8, p8 m0.1, q1 (l/p0.1,
m/q0.1)
amax max(p/l, q/m)
amin min(p,q)
22
Balanced 4 and 5 stage channel
  • The cost function is U shaped with respect to a
  • a ? cost ?
  • ni ? cost ?

23
Skewed 3 stage channel
  • Data production rate ltltgtgt Data consumption rate

Example l0.5, p1 m0.5, q8 (l/p0.5,
m/q0.0625) congested l0.5, p8 m0.5, q1
(l/p0.0625, m/q0.5) starved
amax max(p/l, q/m)
amin min(p,q)
24
Balanced 4 and 5 stage channel
  • The cost function decreases with a
  • Optimal to have the channel frequency close to
    amax

25
Optimality analysis for a 5 stage channel
  • The search schemes find near-optimal results
  • balanced channel
  • skewed channel
  • Search approach efficient than brute force
    simulation approach
  • Efficiency increases with number of channel stages

26
Experiments
  • Considered 3 stage, 4 stage and 5 stage channels
  • Given p, q, l, m
  • Goal To synthesize the channel
  • in terms of channel frequency (voltage) and FIFO
    sizes
  • 4 types of channels analyzed
  • l0.8, p8 m0.1, q1 (l/p0.1, m/q0.1)
    balanced
  • l0.5, p8 m0.5, q1 (l/p0.0625, m/q0.5)
    starved
  • l0.5, p1 m0.5, q8 (l/p0.5, m/q0.0625)
    congested
  • l0.1, p1 m0.1, q1 (l/p0.1, m/q0.1)
    balanced

27
4 and 5 stage channels
amin
asim
amax
  • For power efficient balanced channels
  • asim lies approximately in the middle of amin and
    amax
  • For power efficient skewed channels
  • asim lies closer to amax

28
Results Summary
  • Developed two algorithms to synthesize an LI
    channel
  • FIFO_size_search() algorithm finds out the FIFO
    sizes
  • given a channel frequency
  • Channel_frequency_search() algorithm searches for
    channel clock period
  • binary search on clock period search space
  • Channel synthesis results are near optimal
  • verified by exhaustive brute force brute
    simulation approach
  • Choice of asim as compared to amin results in
  • 77.7, 83.6 and 87 power savings for a 3, 4 and
    5 stage channel respectively
Write a Comment
User Comments (0)
About PowerShow.com