MPI pointtopoint protocols and our improvements - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

MPI pointtopoint protocols and our improvements

Description:

The goal is to move data from the user space in the sender to ... Dell Poweredge 1950, dual 2.33 Ghz Quad-core Xeon E5345, 8GB memory. InfiniBand DDR (20Gbps) ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 20
Provided by: xiny2
Category:

less

Transcript and Presenter's Notes

Title: MPI pointtopoint protocols and our improvements


1
MPI point-to-point protocols and our improvements
2
MPI point-to-point communication
  • MPI supports many modes of point-to-point
    communication blocking, non-blocking, buffered,
    immediate, etc.
  • Sender specifies the memory to be sent
  • Receiver specifies the memory to store the
    message.
  • The goal is to move data from the user space in
    the sender to the user space in the receiver.

3
MPI point-to-point communication
MPI_Send(send_buf, )
MPI_Recv(recv_buf, )
  • Reliability and performance
  • Requires 100 reliability, cannot drop messages.
  • Sender and receiver may arrive at the operation
    at different times.

4
Current protocol for small messages
  • Eager protocol
  • Sender copies msg to system buffer, Issues the
    send command (send to a designated system
    buffer), and completes (send_buff can be reused)
  • Receiver copies from the designated system buffer
    when the message arrives.
  • Ideal for small messages (copy overhead
    negligible, memory overheads not excessive)
  • Drawbacks
  • Copy overhead
  • Per process buffer requirement (O(P) memory).
  • Cannot apply to large messages.

Eager protocol
5
Current protocol for large messages
  • Rendezvous protocol
  • Sender and receiver hand-shake before data is
    transferred.
  • Drawbacks
  • Unnecessary synchronization.
  • Communication progress issue.
  • Receiver arriving early does not help.

Rendezvous protocol
6
  • Eager rendezvous protocols have been around for
    20 years
  • Were designed when communication and message
    processing are expensive
  • Minimize the number of messages needed
  • are not optimized in many situations
  • Newer systems communication and message
    processing are not as expensive
  • Can use more complex protocols to improve
    performance.
  • Our recent work (with Matthew Small) Improve MPI
    point-to-point communication with new protocols
    on RDMA-enabled systems.

7
  • RDMA Remote Direct Memory Access
  • RDMA devices allow direct access to memory in
    other nodes (machines) without the remote/local
    CPU involvement.
  • Supported by all almost all contemporary
    networking systems (InfiniBand, Myrinet, Ethernet
    (iWarp)).

sender
receiver
sender
receiver
RDMA_Put
RDMA_Get
Move data from the local user space to the
remote user space.
Move data from the remote user space to the
local user space.
8
  • More detailed problems with current protocols
  • Eager protocol near optimal, keep it
  • Rendezvous protocol
  • Unnecessary synchronization
  • The sender may wait for receiver
  • Communication progress issue
  • The early arriving receiver MPI call is wasted.

sender
receiver
Sender_Ready
MPI_Isend
MPI_wait
Receiver_ready
MPI_Irecv
MPI_Wait returns here
MPI_wait
MPI_Wait returns here
9
  • More detailed issues with current protocols
  • Rendezvous protocol
  • Communication progress issue
  • The early arriving receiver MPI call is wasted.

sender
receiver
Sender_Ready
MPI_Isend
MPI_wait
MPI_Irecv
Receiver_ready
Idle
MPI_Wait returns here
MPI_wait
MPI_Wait returns here
10
  • More detailed issues with current protocols
  • Rendezvous protocol
  • Communication progress issue
  • The early arriving receiver MPI call is wasted.
  • Rendezvous can be pretty bad depending on how
    users write the program.

sender
receiver
Sender_Ready
MPI_Isend
MPI_Irecv
MPI_wait
Idle
Receiver_ready
MPI_wait
MPI_Wait returns here
MPI_Wait returns here
11
Our idea 1 use a hybrid protocol for medium
sized message
Hybrid protocol make one copy at the sender
side, and use RDMA read to load the data. Why
No more unnecessary synchronization between the
sender and the receiver.
12
Our idea 2 whoever arrives early start the
communication
Sender initiated protocol when sender arrives
early, receiver initiated protocol when receiver
arrives early. Receiver initiated protocol is
much cleaner than the sender initiated protocol.
13
What about both arrive at the same time
Receiver initiated protocol one extra useless
SENDER_READY message a small price to
pay Compared to the original sender initiated
protocol, SENDER_READY is taken out of the
critical path of the communication.
14
The integrated protocols protocol selected based
on msg size and arrival time.
15
Some performance results
  • Our prototype library is on top of the InfiniBand
    Verbs API.
  • Supports commonly used MPI p2p routines.
  • Experiments are done on draco.cs.fsu.edu
  • Dell Poweredge 1950, dual 2.33 Ghz Quad-core Xeon
    E5345, 8GB memory
  • InfiniBand DDR (20Gbps)
  • MVAPICH2.1.2.rc1
  • EAGER_THRESHOLD 12KB, HYBRID_THRESHOLD40KB

16
Pingpong performance
17
Progress benchmark
18
Applications
19
Conclusion
  • By using customized rendezous protocols for
    different situations, our combined protocols
  • Reduce unnecessary synchronizations
  • Decrease the number of control messages in the
    critical path of communications
  • Have a better communication-computation overlap
    capability
  • It is in general a more efficient point-to-point
    communication system on RDMA-enabled clusters.
Write a Comment
User Comments (0)
About PowerShow.com