Integrating New Capabilities into NetPIPE - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Integrating New Capabilities into NetPIPE

Description:

Scalable Computing Laboratory of Ames Laboratory ... Messages can be bounced between the same buffers (default mode), or they can be ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 19
Provided by: DaveT2
Category:

less

Transcript and Presenter's Notes

Title: Integrating New Capabilities into NetPIPE


1
Integrating New Capabilities into NetPIPE
  • Dave Turner, Adam Oline, Xuehua Chen,
    and Troy Benjegerdes
  • Scalable Computing Laboratory of Ames Laboratory
  • This work was funded by the MICS office of the US
    Department of Energy

2

w
i
t
h

o
r

w
i
t
h
o
u
t

f
e
n
c
e

c
a
l
l
s
.



M
e
a
s
u
r
e

p
e
r
f
o
r
m
a
n
c
e

o
r

d
o

a
n

i
n
t
e
g
r
i
t
y

t
e
s
t
.
http//www.scl.ameslab.gov/Projects/NetPIPE/
3
The NetPIPE utility
  • NetPIPE does a series of ping-pong tests
    between two nodes.
  • Message sizes are chosen at regular intervals,
    and with slight perturbations, to fully test the
    communication system for idiosyncrasies.
  • Latencies reported represent half the ping-pong
    time for messages smaller than 64 Bytes.

Some typical uses
  • Measuring the overhead of message-passing
    protocols.
  • Help in tuning the optimization parameters of
    message-passing libraries.
  • Optimizing driver and OS parameters (socket
    buffer sizes, etc.).
  • Identifying dropouts in networking hardware and
    drivers.

What is not measured
  • NetPIPE cannot measure the load on the CPU yet.
  • The effects from the different methods for
    maintaining message progress.
  • Scalability with system size.

4
Recent additions to NetPIPE
  • Can do an integrity test instead of measuring
    performance.
  • Streaming mode measures performance in 1
    direction only.
  • Must reset sockets to avoid effects from a
    collapsing window size.
  • A bi-directional ping-pong mode has been added
    (-2).
  • One-sided Get and Put calls can be measured
    (MPI or SHMEM).
  • Can choose whether to use an intervening
    MPI_Fence call to synchronize.
  • Messages can be bounced between the same
    buffers (default mode), or they can be started
    from a different area of memory each time.
  • There are lots of cache effects in SMP
    message-passing.
  • InfiniBand can show similar effects since
    memory must be registered with the card.

Process 1
Process 0
0
1
2
3
5
Current projects
  • Overlapping pair-wise ping-pong tests.
  • Must consider synchronization if not using
    bi-directional communications.

Ethernet Switch
n0
n1
n2
n3
Line speed vs end-point limited
n0
n1
n2
n3
  • Investigate other methods for testing the
    global network.
  • Evaluate the full range from simultaneous
    nearest neighbor communications to all-to-all.

6
Performance on Mellanox InfiniBand cards
A new NetPIPE module allows us to measure the raw
performance across InfiniBand hardware (RDMA and
Send/Recv). Burst mode preposts all receives to
duplicate the Mellanox test. The no-cache
performance is much lower when the memory has to
be registered with the card. An MP_Lite
InfiniBand module will be incorporated into
LAM/MPI.
MVAPICH 0.9.1
7
10 Gigabit Ethernet
Intel 10 Gigabit Ethernet cards 133 MHz PCI-X
bus Single mode fiber Intel ixgb driver Can only
achieve 2 Gbps now. Latency is 75 us. Streaming
mode delivers up to 3 Gbps. Much more
development work is needed.
8
Channel-bonding Gigabit Ethernet for
better communications between nodes
Channel-bonding uses 2 or more Gigabit Ethernet
cards per PC to increase the communication rate
between nodes in a cluster. GigE cards cost 40
each. 24-port switches cost 1400. ? 100 /
computer This is much more cost effective for PC
clusters than using more expensive networking
hardware, and may deliver similar performance.
9
Performance for channel-bonded Gigabit Ethernet
GigE can deliver 900 Mbps with latencies of 25-62
us for PCs with 64-bit / 66 MHz PCI
slots. Channel-bonding 2 GigE cards / PC using
MP_Lite doubles the performance for large
messages. Adding a 3rd card does not help
much. Channel-bonding 2 GigE cards / PC using
Linux kernel level bonding actually results in
poorer performance. The same tricks that make
channel-bonding successful in MP_Lite should make
Linux kernel bonding working even better. Any
message-passing system could then make use of
channel-bonding on Linux systems.
Channel-bonding multiple GigE cards using MP_Lite
and Linux kernel bonding
10
Channel-bonding in MP_Lite
User space
Kernel space
device driver
Application on node 0
Large socket buffers
device queue
GigE card
a
b
dev_q_xmit
DMA
TCP/IP stack
b
TCP/IP stack
GigE card
a
dev_q_xmit
DMA
MP_Lite
device queue
Flow control may stop a given stream at several
places. With MP_Lite channel-bonding, each
stream is independent of the others.
11
Linux kernel channel-bonding
User space
Kernel space
device driver
Application on node 0
device queue
Large socket buffer
GigE card
DMA
dqx
bonding.c
TCP/IP stack
dqx
dqx
GigE card
DMA
device queue
A full device queue will stop the flow at
bonding.c to both device queues. Flow control on
the destination node may stop the flow out of the
socket buffer. In both of these cases, problems
with one stream can affect both streams.
12
Comparison of high-speed interconnects
InfiniBand can deliver 4500 - 6500 Mbps at a 7.5
us latency. Atoll delivers 1890 Mbps with a 4.7
us latency. SCI delivers 1840 Mbps with only a
4.2 us latency. Myrinet performance reaches 1820
Mbps with an 8 us latency. Channel-bonded GigE
offers 1800 Mbps for very large messages. Gigabit
Ethernet delivers 900 Mbps with a 25-62
us latency. 10 GigE only delivers 2 Gbps with a
75 us latency.
13
Conclusions
  • NetPIPE provides a consistent set of analytical
    tools in the same flexible framework to many
    message-passing and native communication layers.
  • New modules have been developed.
  • 1-sided MPI and SHMEM
  • GM, InfiniBand using the Mellanox VAPI, ARMCI,
    LAPI
  • Internal tests like memcpy
  • New modes have been incorporated into NetPIPE.
  • Streaming and bi-directional modes.
  • Testing without cache effects.
  • The ability to test integrity instead of
    performance.

14
Current projects
  • Developing new modules.
  • ATOLL
  • IBM Blue Gene/L
  • I/O performance
  • Need to be able to measure CPU load during
    communications.
  • Expanding NetPIPE to do multiple pair-wise
    communications.
  • Can measure the backplane performance on
    switches.
  • Compare the line speed to end-point limited
    performance.
  • Working toward measuring more of the global
    properties of a network.
  • The network topology will need to be considered.

15
Contact information
  • Dave Turner - turner_at_ameslab.gov
  • http//www.scl.ameslab.gov/Projects/MP_Lite/
  • http//www.scl.ameslab.gov/Projects/NetPIPE/

16
One-sided Puts between two Linux PCs
  • MP_Lite is SIGIO based, so MPI_Put() and
    MPI_Get() finish without a fence.
  • LAM/MPI has no message progress, so a fence is
    required.
  • ARMCI uses a polling method, and therefore does
    not require a fence.
  • An MPI-2 implementation of MPICH is under
    development.
  • An MPI-2 implementation of MPI/Pro is under
    development.

Netgear GA620 fiber GigE 32/64-bit 33/66 MHz
AceNIC driver
17
The MP_Lite message-passing library
  • A light-weight MPI implementation
  • Highly efficient for the architectures supported
  • Designed to be very user-friendly
  • Ideal for performing message-passing research
  • http//www.scl.ameslab.gov/Projects/MP_Lite/

18
A NetPIPE example Performance on a Cray T3E
  • Raw SHMEM delivers
  • 2600 Mbps
  • 2-3 us latency
  • Cray MPI originally delivered
  • 1300 Mbps
  • 20 us latency
  • MP_Lite delivers
  • 2600 Mbps
  • 9-10 us latency
  • New Cray MPI delivers
  • 2400 Mbps
  • 20 us latency

The top of the spikes are where the message size
is divisible by 8 Bytes.
Write a Comment
User Comments (0)
About PowerShow.com