Title: Runtime Power Gating of OnChip Routers Using LookAhead Routing
1Runtime Power Gating of On-Chip Routers Using
Look-Ahead Routing
- Hiroki Matsutani (Keio Univ, Japan)
- Michihiro Koibuchi (NII, Japan)
- Daihan Wang (Keio Univ, Japan)
- Hideharu Amano (Keio Univ, Japan)
2Background Leakage Power gating
- Leakage power
- Major component of
- Standby power
- Power gating (PG)
- Leakage power reduction
- Turning on/off the power supply to the circuit
block - Examples of PG
- Processor core
- Execution unit
- ALU, FPU, MAC,
Dynamic
Leakage (60.9)
e.g., Standby power of on-chip router
(90nm CMOS 200MHz)
We focus on power gating to reduce standby power
of NoCs
3Outline
- Network-on-Chip (NoC)
- On-Chip Router
- Architecture
- Power consumption
- Runtime power gating of routers
- Overheads
- Look-Ahead sleep control
- Evaluations
- Performance penalty
- Compensated sleep cycles
- Leakage reduction
4Network-on-Chip (NoC)
- Processor core
-
- On-chip router
Processor core
Router
An example tile architecture (ASPLA 90nm CMOS)
5Network-on-Chip (NoC)
- Processor core
- Largest component
- Various low-power techniques are used
- On-chip router
- Area is not so large
- Infrastructure that affects on-chip communication
D
e.g., Standby current 11uA
Ishikawa,IEICE05
S
Stopping routers makes a topology irregular
An example tile architecture (ASPLA 90nm CMOS)
The next slides show Router architecture and
Its power
6On-Chip Router Architecture
- 5-input 5-output router (data width is 64-bit)
Two virtual channels (64-bit x 4 x 2)
ARBITER
X
X
FIFO
X-
X-
FIFO
Y
Y
FIFO
Y-
Y-
FIFO
5x5 XBAR
CORE
CORE
FIFO
HW amount is 34 kilo gates and 64 of area is
used for FIFO
7On-Chip Router Pipeline
- A header flit goes through a router in 3 cycles
- RC (Routing Computation)
- SA (Switch Allocation)
- ST (Switch Traversal)
- E.g., Packet transfer from router A to C
Packet size is 4-flit including 1-flit header
_at_ROUTER B
_at_ROUTER C
_at_ROUTER A
RC
SA
ST
RC
SA
ST
RC
SA
ST
HEAD
DATA 1
ST
ST
ST
ST
ST
ST
DATA 2
ST
ST
ST
DATA 3
1
2
3
4
5
6
7
8
9
10
11
12
ELAPSED TIME CYCLE
8On-Chip Router Power consumption
- Place-and-routed with 90nm CMOS
- Post layout simulation at 200MHz
Power consumption of a router when n ports are
used mW
A router consumes more power as the router
processes more packets
9On-Chip Router Power consumption
Power consumption when no port is used ? standby
power
Leakage of channel bufs is the largest it should
be reduced
10Outline
- Network-on-Chip (NoC)
- On-Chip Router
- Architecture
- Power consumption
- Runtime power gating of routers
- Overheads
- Look-Ahead sleep control
- Evaluations
- Performance penalty
- Compensated sleep cycles
- Leakage reduction
11On-Chip Router Leakage reduction
- Runtime power gating of router channels
- No packets in a channel ? Sleep
- Packet arrives at the channel ? Wakeup
ARBITER
X
X
FIFO
X-
X-
FIFO
Y
Y
FIFO
Y-
Y-
FIFO
5x5 XBAR
CORE
CORE
FIFO
12On-Chip Router Leakage reduction
- Runtime power gating of router channels
- No packets in a channel ? Sleep
- Packet arrives at the channel ? Wakeup
ARBITER
X
X
FIFO
X-
X-
FIFO
Y
Y
FIFO
Y-
Y-
FIFO
5x5 XBAR
CORE
CORE
FIFO
13Power Gating Various overheads
Pipeline stall of a router occurs
- Area overhead
- Power switches
- Performance overhead
- Wakeup delay
- Pipeline stall is caused
- Power overhead
- Driving power switches
- Short sleeps adversely increases dynamic power
Sleep
FIFO
Waiting for channel wakeup
?Early detection of packet arrivals
?Detect avoid short-term sleeps
14Power Gating Various overheads
Pipeline stall of a router occurs
- Area overhead
- Power switches
- Performance overhead
- Wakeup delay
- Pipeline stall is caused
- Power overhead
- Driving power switches
- Short sleeps adversely increases dynamic power
Sleep
FIFO
Waiting for channel wakeup
?Early detection of packet arrivals
?Detect avoid short-term sleeps
Sleep control that detects arrival of packets
early is needed
15Look-Ahead Sleep Control
- Look-ahead sleep control
- To mitigate the wakeup delay and short-term
sleeps - Normal routing
- Router i calculates the output port of Router i
- Look-ahead routing
- Router i calculates the output port of Router
i1
R0
R1
R2
Look-Ahead
R2 detects a packet arrival when the packet
arrives at R4
R3
R4
R5
R6
R7
R8
Eg., A packet goes through R3, R4, R5, and R2
Look-ahead can eliminate a wakeup delay of less
than 5-cycle
16Outline
- Network-on-Chip (NoC)
- On-Chip Router
- Architecture
- Power consumption
- Runtime power gating of routers
- Overheads
- Look-Ahead sleep control
- Evaluations
- Performance penalty
- Compensated sleep cycles
- Leakage reduction
17Evaluations Sleep control methods
- Evaluation items
- Network throughput
- Leakage reduction
- Parameters
- Ideal method
- Ideal case
- No wakeup delay
- Look-ahead method
- Detects packet arrival 5-cycles ahead
- Naïve method
- Original router
- No look-ahead
Traffic pattern Uniform and NPB programs
(BT,SP,CG,MG, and IS)
18Evaluations Performance of naïve
- Throughput on various wakeup delays (e.g.,
0,1,2,3 cycles) - Naïve
Performance is reduced as Twakeup increases
MG.W traffic (16-core)
Uniform traffic (16-core)
19Evaluations Performance of lookahead
- Throughput on various wakeup delays (e.g.,
0,1,2,3 cycles) - Naïve
- Ideal
- Look-ahead
Performance is degraded as Twakeup increases
MG.W traffic (16-core)
Uniform traffic (16-core)
Look-ahead can conceal a wakeup delay of less
than 5 cycles
20Evaluations Breakeven point of PG
- Power gating model
- Eoverhead Power consumed for turning PS on/off
- Esaved Leakage power saving for an N-cycle
sleep
Hu,ISLPED04
How many cycles are required to sleep for
compensating Eoverhead ?
We calculate the breakeven point of PG based on
the following parameters
21Evaluations Breakeven point of PG
- Power gating model
- Eoverhead Power consumed for turning PS on/off
- Esaved Leakage power saving for N-cycle
sleep
Hu,ISLPED04
How many cycles are required to sleep for
compensating Eoverhead ?
Breakeven point is 6 cycle (200MHz)
Power consumption is reduced as sleep duration
becomes long
Breakeven point is 14 cycles (500MHz)
No power gating (PG)
PG router (200MHz)
PG router (500MHz)
22Evaluations Compensated sleep ratio
- States of router channels
- Nactive Active operation Power is
consumed as usual - Ncsc Compensated sleep Sleep longer than
Tbreakeven - Nusc Uncompensated sleep Sleep less than
Tbreakeven - Estimate the ratio of compensated sleep cycles
- We performed the network simulation again
- Comparison between three sleep control methods
sleep
sleep
Nactive
Nusc
Ncsc
wakeup
Ideal, Look-ahead, Naïve
23Evaluations Compensated sleep ratio
- States of router channels
- Nactive Active operation Power is
consumed as usual - Ncsc Compensated sleep Sleep longer than
Tbreakeven - Nusc Uncompensated sleep Sleep less than
Tbreakeven
Ncsc rate 80 (low workload)
Ncsc rate 25 (high workload)
MG.W traffic (16-core)
Uniform traffic (16-core)
Ncsc decreases as traffic increases Ideal
gtLook-ahead gtNaïve
24Evaluations Leakage power reduction
- Leakage power at each channel Tbreakeven 6
- No power gating consumes 95 uW
- Leakage reduction of PG with 3 sleep control
methods
This includes the overhead energy to turn on/off
power switches
MG.W traffic (16-core)
Uniform traffic (16-core)
Leak increases as traffic increases Ideal
ltLook-ahead lt Naïve
25Summary Look-ahead sleep control
- Runtime power gating of router channels
- Wakeup delay introduces pipeline stalls of
routers - Short-term sleeps overwhelm the leakage reduction
- Look-ahead sleep control
- An extension of look-ahead routing
- Detects the arrival of packets five cycles ahead
- Evaluation results
- Look-ahead conceals the wakeup delay of less than
5 - Look-ahead reduces more leakage compared with
naive
26Thank you for your attention
27Backup sides
28Look-ahead method HW resources
- Routing computation of next router
- Just changing the routing function
- Area overhead is very small
- Wakeup signals are needed
- Sender asserts wakeup signal
- to receiver
- Wakeup signals becomes long
- Negative impact of
- multi-cycle or repeater buffers
NRC stage Next Routing Computation
NRC
SA
ST
NRC
SA
ST
NRC
SA
ST
HEAD
ST
ST
ST
DATA 1
ST
ST
ST
DATA 2
0
1
2
3
4
5
6
7
8
Wakeup signals to router 1
29Wakeup delay Performance impact
- Wakeup delays in literatures
- ALU 2 cycle AES core
approx 4 cycle - FPMAC in Intels 80-tile chip 6 cycle
- It depends on circuit block size, clock freq,
noise, - Performance of look-ahead method (_at_ uniform tr)
Twakeup5
Twakeup0
Twakeup6
Twakeup1
Twakeup7
Twakeup2
Twakeup8
Twakeup3
Twakeup4
Twakeup5
Wakeup delay 0,1,2,3,4,5 cycle
Wakeup delay 5,6,7,8 cycle
30Breakeven point leakage reduction
- Breakeven point in literatures
- Execution unit in processor 10 cycles
- It depends on circuit block size, clock freq,
- Leakage power reduction (_at_ uniform traffic)
The longer Tbreakeven reduces the opportunity of
compensated sleep
Tbreakeven 6 cycle
Tbreakeven 14 cycle
31Finer grain PG of NoC routers
- Virtual channel (VC) level power gating
- Packet routing scheme for VC-level PG
- All packets use VC0 when they are injected to
NoC - VC number is increased when the packet conflicts
VC0
VC0
VC0
VC1
VC1
VC1
Only VC0 is used if workload is low
VC2
VC2
VC2
Router (a)
Router (b)
Router (c)
32Finer grain PG of NoC routers
- Virtual channel (VC) level power gating
- Packet routing scheme for VC-level PG
- All packets use VC0 when they are injected to
NoC - VC number is increased when the packet conflicts
All VCs are activated if workload is high
VC0
VC0
VC0
VC1
VC1
VC1
VC2
VC2
VC2
Router (a)
Router (b)
Router (c)
High peak performance of VCs with the least
leakage power
33Buffer design Registers or SRAMs
- It depends on buffer depth, not width
- Depth gt 32-flit ? Buffers are design with SRAMs
- Otherwise ? Buffers are design with
registers
ARBITER
X
X
FIFO
In our design
Buffer depth is 4-flit
X-
X-
FIFO
Y
Y
FIFO
FIFO buffers are design with registers
Y-
Y-
FIFO
5x5 XBAR
CORE
CORE
FIFO
34Leakage power calculation
- Power estimation flow
- Perform the network simulation
- Obtain the length of every sleep during the
simulation - Ave. leakage of each sleep is estimated according
to its length, based on sleep duration vs.
leakage graph
Sleep duration vs. leakage power
Leakage reduction (Tbreakeven 6)
35Look-ahead method the 1st hop?
- Look-ahead for Router 3, Router 4, Router 5,
- Look-ahead for Router 1 and Router 2
- Network interface (NI) performs look-ahead
- Packet construction takes several clock cycles
- NI of source node can perform look-ahead
Look-ahead!!
Look-ahead!!
Src
Dst
Router (1)
Router (2)
Router (3)
Router (4)
Look-ahead!!
Src
Dst
Router (1)
Router (2)
Router (3)
Router (4)
36Look-ahead methodAdaptive routing
- Routing algorithms
- Deterministic routing ? routing path is
predictable - Adaptive routing ? path is dynamically
changed - Adaptive routing
- It is difficult to predict the routing path
- Look-ahead wakeup sometimes fails
- Eg., Asserting wakeup signals to wrong input
channels - An extension for adaptive
- At low workload,
- Using the output selection function (OSF) that
tries to use the same output channel ? wakeup
rarely fails
We used deterministic routing, because it is
popular in simple NoCs