Integrating New Capabilities into NetPIPE

About This Presentation

Title:

Integrating New Capabilities into NetPIPE

Description:

Scalable Computing Laboratory of Ames Laboratory ... Messages can be bounced between the same buffers (default mode), or they can be ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 19

Provided by: DaveT2

Category:

more less

Transcript and Presenter's Notes

Title: Integrating New Capabilities into NetPIPE

1
Integrating New Capabilities into NetPIPE

Dave Turner, Adam Oline, Xuehua Chen,
and Troy Benjegerdes
Scalable Computing Laboratory of Ames Laboratory
This work was funded by the MICS office of the US
Department of Energy

2

w
i
t
h

o
r

w
i
t
h
o
u
t

f
e
n
c
e

c
a
l
l
s
.

M
e
a
s
u
r
e

p
e
r
f
o
r
m
a
n
c
e

o
r

d
o

a
n

i
n
t
e
g
r
i
t
y

t
e
s
t
.
http//www.scl.ameslab.gov/Projects/NetPIPE/
3
The NetPIPE utility

NetPIPE does a series of ping-pong tests
between two nodes.
Message sizes are chosen at regular intervals,
and with slight perturbations, to fully test the
communication system for idiosyncrasies.
Latencies reported represent half the ping-pong
time for messages smaller than 64 Bytes.

Some typical uses

Measuring the overhead of message-passing
protocols.
Help in tuning the optimization parameters of
message-passing libraries.
Optimizing driver and OS parameters (socket
buffer sizes, etc.).
Identifying dropouts in networking hardware and
drivers.

What is not measured

NetPIPE cannot measure the load on the CPU yet.
The effects from the different methods for
maintaining message progress.
Scalability with system size.

4
Recent additions to NetPIPE

Can do an integrity test instead of measuring
performance.
Streaming mode measures performance in 1
direction only.
Must reset sockets to avoid effects from a
collapsing window size.
A bi-directional ping-pong mode has been added
(-2).
One-sided Get and Put calls can be measured
(MPI or SHMEM).
Can choose whether to use an intervening
MPI_Fence call to synchronize.
Messages can be bounced between the same
buffers (default mode), or they can be started
from a different area of memory each time.
There are lots of cache effects in SMP
message-passing.
InfiniBand can show similar effects since
memory must be registered with the card.

Process 1
Process 0
0
1
2
3
5
Current projects

Overlapping pair-wise ping-pong tests.
Must consider synchronization if not using
bi-directional communications.

Ethernet Switch
n0
n1
n2
n3
Line speed vs end-point limited
n0
n1
n2
n3

Investigate other methods for testing the
global network.
Evaluate the full range from simultaneous
nearest neighbor communications to all-to-all.

6
Performance on Mellanox InfiniBand cards
A new NetPIPE module allows us to measure the raw
performance across InfiniBand hardware (RDMA and
Send/Recv). Burst mode preposts all receives to
duplicate the Mellanox test. The no-cache
performance is much lower when the memory has to
be registered with the card. An MP_Lite
InfiniBand module will be incorporated into
LAM/MPI.
MVAPICH 0.9.1
7
10 Gigabit Ethernet
Intel 10 Gigabit Ethernet cards 133 MHz PCI-X
bus Single mode fiber Intel ixgb driver Can only
achieve 2 Gbps now. Latency is 75 us. Streaming
mode delivers up to 3 Gbps. Much more
development work is needed.
8
Channel-bonding Gigabit Ethernet for
better communications between nodes
Channel-bonding uses 2 or more Gigabit Ethernet
cards per PC to increase the communication rate
between nodes in a cluster. GigE cards cost 40
each. 24-port switches cost 1400. ? 100 /
computer This is much more cost effective for PC
clusters than using more expensive networking
hardware, and may deliver similar performance.
9
Performance for channel-bonded Gigabit Ethernet
GigE can deliver 900 Mbps with latencies of 25-62
us for PCs with 64-bit / 66 MHz PCI
slots. Channel-bonding 2 GigE cards / PC using
MP_Lite doubles the performance for large
messages. Adding a 3rd card does not help
much. Channel-bonding 2 GigE cards / PC using
Linux kernel level bonding actually results in
poorer performance. The same tricks that make
channel-bonding successful in MP_Lite should make
Linux kernel bonding working even better. Any
message-passing system could then make use of
channel-bonding on Linux systems.
Channel-bonding multiple GigE cards using MP_Lite
and Linux kernel bonding
10
Channel-bonding in MP_Lite
User space
Kernel space
device driver
Application on node 0
Large socket buffers
device queue
GigE card
a
b
dev_q_xmit
DMA
TCP/IP stack
b
TCP/IP stack
GigE card
a
dev_q_xmit
DMA
MP_Lite
device queue
Flow control may stop a given stream at several
places. With MP_Lite channel-bonding, each
stream is independent of the others.
11
Linux kernel channel-bonding
User space
Kernel space
device driver
Application on node 0
device queue
Large socket buffer
GigE card
DMA
dqx
bonding.c
TCP/IP stack
dqx
dqx
GigE card
DMA
device queue
A full device queue will stop the flow at
bonding.c to both device queues. Flow control on
the destination node may stop the flow out of the
socket buffer. In both of these cases, problems
with one stream can affect both streams.
12
Comparison of high-speed interconnects
InfiniBand can deliver 4500 - 6500 Mbps at a 7.5
us latency. Atoll delivers 1890 Mbps with a 4.7
us latency. SCI delivers 1840 Mbps with only a
4.2 us latency. Myrinet performance reaches 1820
Mbps with an 8 us latency. Channel-bonded GigE
offers 1800 Mbps for very large messages. Gigabit
Ethernet delivers 900 Mbps with a 25-62
us latency. 10 GigE only delivers 2 Gbps with a
75 us latency.
13
Conclusions

NetPIPE provides a consistent set of analytical
tools in the same flexible framework to many
message-passing and native communication layers.
New modules have been developed.
1-sided MPI and SHMEM
GM, InfiniBand using the Mellanox VAPI, ARMCI,
LAPI
Internal tests like memcpy
New modes have been incorporated into NetPIPE.
Streaming and bi-directional modes.
Testing without cache effects.
The ability to test integrity instead of
performance.

14
Current projects

Developing new modules.
ATOLL
IBM Blue Gene/L
I/O performance
Need to be able to measure CPU load during
communications.
Expanding NetPIPE to do multiple pair-wise
communications.
Can measure the backplane performance on
switches.
Compare the line speed to end-point limited
performance.
Working toward measuring more of the global
properties of a network.
The network topology will need to be considered.

15
Contact information

Dave Turner - turner_at_ameslab.gov
http//www.scl.ameslab.gov/Projects/MP_Lite/
http//www.scl.ameslab.gov/Projects/NetPIPE/

16
One-sided Puts between two Linux PCs

MP_Lite is SIGIO based, so MPI_Put() and
MPI_Get() finish without a fence.
LAM/MPI has no message progress, so a fence is
required.
ARMCI uses a polling method, and therefore does
not require a fence.
An MPI-2 implementation of MPICH is under
development.
An MPI-2 implementation of MPI/Pro is under
development.

Netgear GA620 fiber GigE 32/64-bit 33/66 MHz
AceNIC driver
17
The MP_Lite message-passing library

A light-weight MPI implementation
Highly efficient for the architectures supported
Designed to be very user-friendly
Ideal for performing message-passing research
http//www.scl.ameslab.gov/Projects/MP_Lite/

18
A NetPIPE example Performance on a Cray T3E

Raw SHMEM delivers
2600 Mbps
2-3 us latency
Cray MPI originally delivered
1300 Mbps
20 us latency
MP_Lite delivers
2600 Mbps
9-10 us latency
New Cray MPI delivers
2400 Mbps
20 us latency

The top of the spikes are where the message size
is divisible by 8 Bytes.

Write a Comment

User Comments (0)

About PowerShow.com

Integrating New Capabilities into NetPIPE - PowerPoint PPT Presentation

Integrating New Capabilities into NetPIPE

Scalable Computing Laboratory of Ames Laboratory ... Messages can be bounced between the same buffers (default mode), or they can be ... – PowerPoint PPT presentation