Title: A Power Aware System Level Interconnect Design Methodology for LatencyInsensitive Systems
1A Power Aware System Level Interconnect Design
Methodology for Latency-Insensitive Systems
- Vikas Chandra, Herman Schmit
- Tabula Inc.
- Santa Clara, CA
- Anthony Xu, Larry Pileggi
- Electrical Computer EngineeringCarnegie Mellon
University
2Latency-Insensitive (LI) Design Methodology
- Interconnect significant contributor to delay,
power - Separates communication from computation
- A system is a collection of computational
processes - Data is exchanged using communication channels
- Maintains the simplicity of a synchronous design
- Depends only on the signals events and not on
their exact timing - Does not suffer from interconnect delay problem
3LI Design Methodology
- Properties of an LI system
- All the IP (computation) modules are synchronous
- The signals can take multiple clock cycles in the
channel - The IP modules are stallable
- The channel can apply back-pressure
Relay station
Module 1
Relay station
Module 5
Module 3
Relay station
Relay station
Relay station
Module 2
Module 4
Relay station
4System-on-Chip Design Challenges
- Delays dominated by interconnecting wires
- RC delay of a metal wire is getting worse with
shrinking process - Interconnect delay a larger fraction of clock
cycle-time due to - increase in operating frequency
- increase in die size
- increase in average interconnect length
- Multiple clock cycles needed by a signal to span
the chip - wire pipelining
Scaling
Global wires do not scale!
5System-on-Chip Design Challenges
- Power consumption in the interconnect
- A significant contributor to total on-chip power
consumption - Reasons for increase in power consumption
- Wire scaling increases wire capacitance
- Inserted repeaters also burn power
- Latch repeaters are expensive and power hungry
- The cause of concern
- Battery power for portable devices
- Thermal reliability
6Design Solutions for SoC challenges
- Latency-insensitive (LI) design methodology
- Methodology for coping with interconnect
dominance - Alleviates wire uncertainty problems
- Enables patient processes
- A general case of GALS design methodology
- Voltage and frequency Islands
- IP cores have different voltage and clock
frequencies - To reduce power consumption in the interconnect
- Also helps in IP cores integration
- Relay stations in an LI channel can be voltage
and frequency scaled
7Wire Optimization in a single clock system
- A buffered wire in a design
- Not suitable for
- Stallable modules
- Latency-insensitive designs
8Background Terms
- FIFO
- First-In-First-Out queue
- Synchronous FIFO are also called elastic buffers
- Not a shift register!
- FIFO size
- Maximum number of data elements in a FIFO
- Channel
- One or more FIFO connected in series
- Stage
- Each of the FIFOs in the channel is called a stage
9System Parameters
- l Average data production rate
- l of 0.5 means that there is 50 probability that
a new data is generated every clock cycle - m Average data consumption rate
- Throughput requirement
- Number of data items read per time unit
- Performance metric of the channel
10Power Model
- The dynamic power consumption in CMOS
- Frequency and voltage are also related
- Vt Threshold voltage
- g depends on carrier velocity saturation
- ranges from 1-2
11FIFO Power Model
- The dynamic power consumption increases as FIFO
size increases - Switching capacitance increases with FIFO size
12Interconnect channel synthesis
- Design variables
- a
- ni size of the FIFO in the ith stage
- Data production rate lf/p
- Data consumption rate mf/q
13Bounds on channel frequency scaling
- Lower bound on a
- In best case, data transfer through the channel
will equal the faster of the clock rate of
producer and the consumer - Channel clock should not be any faster than the
fastest data rate - Upper bound on a
- Derived from channels data rate
- Channel clock should not be any slower than the
slowest data rate
14Power aware channel synthesis
- Power consumption in a FIFO
- Synthesis cost function
- channel power consumption expression
15Power aware channel synthesis
- Problem statement
- Given
- number of channel stages (n)
- frequencies of producer and consumer (p,q)
- data production and consumption rates (l,m)
- To find
- sizes of the FIFOs in the LI channel
- voltage and clock frequency of the channel
- Such that
- the channel meets the given throughput
requirement - the power consumption is minimized
16FIFO sizing algorithm
- Given
- p frequency of the producer
- q frequency of the consumer
- a frequency of the channel
- l average data production rate
- m average data consumption rate
- Throughput requirement
- To find FIFO sizes (n1, , nN) min power
solution
17FIFO sizing algorithm
- The search straddles the target performance
frontier
Starting FIFO size
FIFO stages
Power optimal FIFO sizing
Final FIFO size
FIFO stages
- The algorithm selects the lowest cost sizing
- Based on synthesis cost function
18Channel frequency search algorithm
- Given
- p frequency of the producer
- q frequency of the consumer
- l average data production rate
- m average data consumption rate
- Throughput requirement
- To find power optimal value of a
19Channel frequency search algorithm
amin
amax
- Exploring each value of a is expensive
- especially if l,m are small or p,q are large
- Binary search employed for finding optimal a
- FIFO sizing is done for each a using
FIFO_size_search - Cost calculated for each a to find an optimal
solution
20Channel synthesis analysis
- Two kinds of channel analyzed
- Balanced channel
- Skewed channel
- Balanced channel
- Data production rate (lf/p) is similar to data
consumption rate (mf/q) - Skewed channel
- Data production rate (lf/p) and data consumption
rate (mf/q) are skewed - Shape of cost function varies significantly
21Balanced 3 stage channel
- Data production rate Data consumption rate
- on average
Example l0.1, p1 m0.1, q1 (l/p0.1,
m/q0.1) l0.8, p8 m0.1, q1 (l/p0.1,
m/q0.1)
amax max(p/l, q/m)
amin min(p,q)
22Balanced 4 and 5 stage channel
- The cost function is U shaped with respect to a
- a ? cost ?
- ni ? cost ?
23Skewed 3 stage channel
- Data production rate ltltgtgt Data consumption rate
Example l0.5, p1 m0.5, q8 (l/p0.5,
m/q0.0625) congested l0.5, p8 m0.5, q1
(l/p0.0625, m/q0.5) starved
amax max(p/l, q/m)
amin min(p,q)
24Balanced 4 and 5 stage channel
- The cost function decreases with a
- Optimal to have the channel frequency close to
amax
25Optimality analysis for a 5 stage channel
- The search schemes find near-optimal results
- balanced channel
- skewed channel
- Search approach efficient than brute force
simulation approach - Efficiency increases with number of channel stages
26Experiments
- Considered 3 stage, 4 stage and 5 stage channels
- Given p, q, l, m
- Goal To synthesize the channel
- in terms of channel frequency (voltage) and FIFO
sizes - 4 types of channels analyzed
- l0.8, p8 m0.1, q1 (l/p0.1, m/q0.1)
balanced - l0.5, p8 m0.5, q1 (l/p0.0625, m/q0.5)
starved - l0.5, p1 m0.5, q8 (l/p0.5, m/q0.0625)
congested - l0.1, p1 m0.1, q1 (l/p0.1, m/q0.1)
balanced
274 and 5 stage channels
amin
asim
amax
- For power efficient balanced channels
- asim lies approximately in the middle of amin and
amax - For power efficient skewed channels
- asim lies closer to amax
28Results Summary
- Developed two algorithms to synthesize an LI
channel - FIFO_size_search() algorithm finds out the FIFO
sizes - given a channel frequency
- Channel_frequency_search() algorithm searches for
channel clock period - binary search on clock period search space
- Channel synthesis results are near optimal
- verified by exhaustive brute force brute
simulation approach - Choice of asim as compared to amin results in
- 77.7, 83.6 and 87 power savings for a 3, 4 and
5 stage channel respectively