Title: Michihiro KoibuchiNII, Japan
1An On/Off Link Activation Method for Low-Power
Ethernet in PC Clusters
- Michihiro Koibuchi(NII, Japan)
- Tomohiro Otsuka(Keio U, Japan)
- Hiroki Matsutani (U of Tokyo, Japan)
- Hideharu Amano(Keio U/ NII, Japan)
2HPC PC Clusters with Ethernet
- Host/CPU
- Various low-power techniques are used
- DVFS
- Power Gating
- Ethernet Switch
- Always preparing (active) for packet injection
PC
Ethernet switch
We propose, and evaluate a low-power technique of
Ethernet switches for PC clusters
3Outline
- Ethernet for HPC
- Link aggregation (channel group) multi-paths
- On/Off link activation method
- Evaluations
- Overhead of On/Off link operation
- Performance and power consumption of PC clusters
4Ethernet on HPC systems
- Increasing the number of ports of GbE switches
- - 24/48-port switches provide the lowest cost per
port - Improving the computation power of host( gt
10GFlops) - Link aggregation IEEE 802.3ad multi-path
topology Kudoh, IEEE Cluster, 2004Viking,
Infocom2004 - - drastically increasing the number of links
Link aggr. using 3 links
switch
host
5Power cons of GbE switches
UnitW
- Power cons is almost constant regardless of
traffic load - of activated ports dominates the power cons of
switches - Power cons of port is reduced down to ZERO by
port-shutdown operation
6Overview of the on/off link method
Switch ports consume 40-60 of the total power
Network load is not always high (e.g. during
computation time
switch
node
Traffic load becomes low (turning off a part of
links)
TREE 1
TREE 4
TREE 3
TREE 2
7Outline
- Ethernet for HPC
- Link aggregation (channel group) multi-paths
- On/Off link activation method
- Evaluations
- Overhead of On/Off link operation
- Performance and power consumption of PC clusters
8A framework of on/off link method
Eg port monitor, IPTraf, pilot execution
Low traffic load is detected
Paths Before After the before path
is deactivated
How is it implemented on Ethernet?
9Requirements for the on/off link method
- To achieve a practical on/off link activation
method, - No update of the MPI communication library
- Using existing functions of commercial switches
- Hiding the overhead to activate the link
- Stabilizing the MAC address tables during
updating paths - - Avoiding broadcast storms, and communication
interruption
TREE 1
TREE 4
TREE 3
TREE 2
Before
Switch
Host
10Changing the paths for on/off link op
- Using switch-taggedVLAN routing
methodOtsuka,ICPP06 - Specifying the path by attaching the VLAN tag to
a frame (Port VLAN ID PVID) - Each host sends and receives usual (untagged)
frames - When an frame arrives at a switch from a host,
add a VLAN tag (PVID) to it - When it leaves to a host, removes the VLAN tag
The path of PVIDv1
The path of PVIDv0
VLAN v0
VLAN v1
VLAN tag v0 is attached
4
5
6
7
0
1
2
3
11When a deactivated link is activated
- (1) Activating the target link
- Using no-shutdown command of switch
- (2) Create VLAN v0 for the new path set that
includes the target link, and make its MAC
address table - (3) Update the PVIDs of the ports for connecting
hosts to v0
When the traffic increases
4
5
6
7
0
1
2
3
Before
Updating PVID to v0
12 When an activated link is deactivated
- (1) Create VLAN v1 for the new path set that
avoids the target link, and make its MAC address
table - (2) Update the PVID of the ports for connecting
hosts to v1 - (3) Deactivating the link
The path of PVID v0
The path of PVID v1
Decreasing the traffic
4
5
6
7
0
4
1
2
3
5
6
7
0
1
2
3
Before
Step 1,2
Deactivating
13Outline
- Ethernet for HPC
- Link aggregation (channel group) multi-paths
- On/Off link activation method
- Evaluations
- Overhead of On/Off link operation
- On/off link operation
- Overhead to modify the path set
- Performance and power consumption of PC clusters
Dell 5324, 6224(24 ports), 6248(48 ports),
Netgear SF-G0420(24 ports)
We can buy them at 1,000-3,000
14Fund. evalOn/Off overhead
a link is continuously operated on
off on
- When enabling STP, the overhead becomes some
dozens1 min - To hide this overhead, paths should be updated
after completing the on/off operation
15Fund. eval(2)overhead to update paths
Before
VLAN v0
After
VLAN v1
Update PVID to v1
- Measure the overhead to change paths using VLANs
- Communication is not interrupted!!
- Enabling the runtime on/off link activation
16Performance evaluation on a PC cluster
- PC Cluster
- 128 hosts, Dual Opteron 1.8GHz x2
- MPICH 1.2.7p1
- GbE switch
- Dell Power Connect6248
- 28host per switch
- 48port_at_8
- Application
- NPB 3.2
17Topology of the cluster
- Peak 42 torus, 6 links between switches
- Enabling the link aggregation (IEEE 803.ad)
- Pre-executing the applications for estimating
traffic amount - Set up the on/off link set before executing
- Two on/off link selection algorithms
- Conservative maintain the maximum amount of
traffic on a link - Aggressive further power reduction(details are
the proceeding)
Torus
18Results of NPB(64 procs, PC6248 SW)
26 of NW power cons is reduced w/o performance
degradation
peak(all links)
conservative
aggressive
1.1
1
0.9
Relative Power Cons(W)
0.8
0.7
0.6
EP
IS
LU
SP
Fig 2Power Cons of NWs, PC6248s
Fig 1Performance
The conservative policy maintained almost the
peak performance
19Results of NPB(64 procs, other SWs)
peak(all links)
conservative
aggressive
peak(all links)
conservative
aggressive
1.1
1.1
1
1
0.9
0.9
Relative Power Cons(W)
Relative Power Cons(W)
0.8
0.8
0.7
0.7
0.6
0.6
EP
IS
LU
SP
EP
IS
LU
SP
Fig 3Power Cons, SF-420s
Fig 4Power Cons, PC5324
A small number of services in L2 switch(PC5324)
is always running compared with that of L3 switch
(PC6248)
The L2 switches reduces the larger ratio of power
cons
20Related Work
- On/Off interconnection networks
- Cannot be directly applied to Ethernet
- M.AlonsoIPDPS05,V.SoteriouTPDS07
- Our on/off link method enables to support some of
them in Ethernet - DVFS for interconnection networks
- L.ShangHPCA03, J.M.StineCAL04
- Using multi-speed Ethernet (10M/100M/GbE/10GE) is
similar to the approach for DVFS - Dell switchPC6248, 10M 1.1W 100M 1.3W GbE
2.1W
21Conclusions
- We propose the on/off link method on Ethernet
- Using port-shutdown command for reducing power
cons - Switch ports consume up to 60 of power cons in
GbE switch - Stabilizing the update of the MAC address table
- Evaluations on the PC cluster with GbE switches
- No overhead to update paths
- Reducing down to up to 37 of NW power cons
- We will provide the total solution of Ethernet
for Low-Power PC clusters - Link aggre. multi-path topology on/off links