Title: Jose Miguel Montanana (NII, Japan)
1Stabilizing Path Modification of Power-Aware
On/Off Interconnection Networks
- Jose Miguel Montanana (NII, Japan)
- Michihiro Koibuchi (NII, Japan)
- Hiroki Matsutani (U of Tokyo, Japan)
- Hideharu Amano(Keio U/ NII, Japan)
2Outline
- HPC networks (Infiniband, GbE)
- On/Off link activation method
- Reducing power consumption of HPC networks
- Paths are updated to avoid deactivated links
- Applying network reconfiguration to switches
- Evaluations
- Cycle-accurate network simulator
- Behavior of network during the path change
3Network of High-performance computing
60
50
40
30
Number of Supercomputers on Top500 List
Percentage on Top500 List
20
10
0
4Examples
RoadRunner (LANL)
BLUEGENE/L (LLNL)
TACC (Univ Texas)
Propietary
251,904 cores 5th on top500
IBA
212,992 processors 2nd on Top500 list
122,400 cores 1st on Top500
IBA
Virginia Tech's X
ABE (NCSA)
ASCI-Q (LANL)
IBA
IBA
2,200 cores 280th on Top500
9,600 cores 23th on top500
Quadrics
8,192 cores
2008
5HPC Networks
- Small switches (24/48-port) provide the lowest
cost per port - When 100,000 cores are connected, a large number
of small switches are needed - drastically increasing the number of links
- Unused and rarely-used links should be
deactivated for power-aware HPCs
Link aggr. using 3 links
switch
host
6Power cons of HPC switches
Product Port Other (Xbar) Total(ratio of ports)
PC5324 1.2 14.9 42.9(65)
PC6224 2.0 42.5 91.1(53)
PC6248 2.1 56.8 155.2(63)
SF-420 1.0 32.6 55.4(41)
SFS7000D-SK9 1.0 43.4 66.1(34)
UnitW
GbE
IB
- Power cons is almost constant regardless of
traffic load - of activated ports dominates the power cons of
switches - Power cons of port is reduced down to ZERO by
port-shutdown operation
7Outline
- HPC networks (Infiniband, GbE)
- On/Off link activation method
- Reducing power consumption of HPC networks
- Paths are updated to avoid deactivated links
- Applying network reconfiguration to switches
- Evaluations
- Cycle-accurate network simulator
- Behavior of network during the path change
8Overview of the on/off link method
Switch ports consume 40-60 of the total power
of a switch
Network load is not always high (e.g. during
computation time)
switch
host
Traffic load becomes low (turning off a part of
links)
TREE 1
TREE 4
TREE 3
TREE 2
9A runtime on/off link method
Eg port monitor, IPTraf, pilot execution
Low traffic load is detected
Paths Before After the before path
is deactivated
How is NW stabilized during the path-update?
10Stabilizing network during the path
updateNetwork Reconfiguration (deadlock
avoidance)
Switch
Link
Rold
Rold is deadlock free Rnew is deadlock
free RoldRnew may deadlock
RoldRouting Table before the update
RnewRouting Table after the update
11Network Reconfiguration
3
Reconfiguration
0
0
3
5
1
5
4
1
4
Rnew
Rold
Rold is deadlock free Rnew is deadlock
free RoldRnew may cause deadlock
Deadlock Old behind new New behind old
12 Existing NW reconf tech. on fault-tolerant
networks
Static reconfiguration
Dynamic reconfiguration
Traffic is stopped New routing is applied Traffic
is resumed
Traffic is not stopped Old and new routing coexist
Difficulty to avoid deadlock
High latencies
DOUBLE-SCHEME SIMPLE RECONFIGURATION
STATIC RECONFIGURATION(ST)
13Current NW Reconfigurations
- SR PDA Simple Reconfiguration Packet Dropping
AwareLysne08,TC - Tokens are sent before update of routing
- Packets are sent after updating routing tables
- SR LA Simple Reconfiguration Latency
AwareLysne08,IEEE TC - All new tables are distributed before using new
one. - Latency due to the tokens is reduced.
- DS Double SchemePinkston03,TPDS
- Requires 2 virtual channels.
- One channel have to be drained
- STStatic Reconfiguration
- Traffic injection is completely stopped
14Outline
- HPC Interconnects (Infiniband, GbE)
- On/Off link activation method
- Reducing power consumption of HPC networks
- Paths are updated to avoid deactivated links
- Applying network reconfiguration to switches
- Evaluations
- Cycle-accurate network simulator
- Behavior of network during the path change
15Simulation Environment
- Switch model (InfiniBand)
- Buffered input (1KB per VL) and output (1KB per
VL) ports - Non-multiplexed crossbar with separate ports per
VL - FIFO-based crossbar arbiter per output crossbar
port - Round-robin arbiter per output port
- 100 ns routing time
- Link model
- Link Speed 2.5 Gbps (1X links)
- Topologies
- 2D mesh networks
- Traffic model
- Packet lengths are 58 bytes
- Uniform
- Full range of traffic, from low load to saturation
16Evaluation Results
- We twice apply NW reconf. process to each
execution - Deactivating links, after decrease the traffic
injection - Re-activating links, after increase the traffic
injection - We evaluated full range of initial traffic
injection, (from low traffic-to near
congestion)
17Static Reconfiguration (ST)
Traffic increases, a link is reactivated
Traffic decreases, a link is deactivated
(a) Low Traffic Load
Latency is high
Traffic load decreases
Traffic load increases
Latency is high
(b) High Traffic Load
At each on/off link operation, traffic is not
stabilized in ST!!
18SR-LA (dynamic reconfiguration)
(a) Low Traffic Load
(b) High Traffic Load
Also, at each on/off link operation, traffic is
not stabilized in SR-LA!!
19SR-PDA (dynamic reconfiguration)
(a) Low Traffic Load
(b) High Traffic Load
Also, at each on/off link operation, traffic is
not stabilized in SR-PDA!!
20Double Scheme (dynamic reocnfiguration)
(a) Low Traffic Load
Traffic load decreases
Traffic load increases
Latency is constant
(b) High Traffic Load
Latency is constant
Stabilizing the path update only in Double
Scheme!!
21Larger Network (8x8 Mesh)
Similar behavior!!
ST
SRL
DS
Only Double Scheme stabilizes networks during the
path update!!
22Conclusions
- We apply network reconfiguration techniques to
power-aware on/off networks for HPC - Links consume 63 of switch power
- On/off link activation reduces power
- It must accept the topology change
- Network reconfiguration smoothly supports the
path update - Stabilizing the update of new/old paths
- Avoiding deadlocks of new/old paths
- Cycle-accurate simulation
- shows its impact on the power-aware on/off
networks - Double Scheme (dynamic NW reconf) maintains
performance, stabilizing networks, deadlock
avoidance - Network reconfiguration is essential for
realizing the power-aware on/off networks for HPC
systems
23Acknowledgment
- This work was partially supported by JST CREST
(ULP-HPC Ultra Low-Power, High-Performance
Computing via Modelling and Optimization of Next
Generation HPC Technologies)
2417/17