SCTPbased Middleware for MPI - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

SCTPbased Middleware for MPI

Description:

Library that is widely used to parallelize scientific and ... Performance is stack dependant (Linux lksctp stack FreeBSD KAME stack) Experiments for Loss ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 27
Provided by: KAM152
Category:

less

Transcript and Presenter's Notes

Title: SCTPbased Middleware for MPI


1
SCTP-based Middleware for MPI
  • Humaira Kamal, Brad Penoff, Alan Wagner
  • Department of Computer Science
  • University of British Columbia

2
What is MPI and SCTP?
  • Message Passing Interface (MPI)
  • Library that is widely used to parallelize
    scientific and compute-intensive programs
  • Stream Control Transmission Protocol (SCTP)
  • General purpose unicast transport protocol for IP
    network data communications
  • Recently standardized by IETF
  • Can be used anywhere TCP is used

3
What is MPI and SCTP?
  • Message Passing Interface (MPI)
  • Library that is widely used to parallelize
    scientific and compute-intensive programs
  • Stream Control Transmission Protocol (SCTP)
  • General purpose unicast transport protocol for IP
    network data communications
  • Recently standardized by IETF
  • Can be used anywhere TCP is used
  • Question
  • Can we take advantage of SCTP features to better
    support parallel applications using MPI?

4
Communicating MPI Processes
TCP is often used as transport protocol for MPI
SCTP
SCTP
5
SCTP Key Features
  • Reliable in-order delivery, flow control, full
    duplex transfer.
  • SACK is built in the protocol
  • TCP-like congestion control

6
SCTP Key Features
  • Message oriented
  • Use of associations
  • Multihoming
  • Multiple streams within an association

7
Logical View of Multiple Streams in an Association
8
Partially Ordered User Messages Sent on Different
Streams
9
MPI Middleware
MPI_Send(msg,count,type,dest-rank,tag,context)
MPI_Recv(msg,count,type,source-rank,tag,context)
  • Message matching is done based on Tag, Rank and
    Context (TRC).
  • Combinations such as blocking, non-blocking,
    synchronous, asynchronous, buffered, unbuffered.
  • Use of wildcards for receive

10
MPI Messages Using Same Context, Two Processes
11
MPI Messages Using Same Context, Two Processes
Out of order messages with same tags violate MPI
semantics
12
MPI Middleware
  • Message Progression Layer
  • Short Messages vs. Long Messages

13
Design and Implementation
  • LAM (Local Area Multi-computer) is an open source
    implementation of MPI library
  • We redesigned LAM-MPI to use SCTP
  • Three-phased iterative process
  • Use of One-to-One Style Sockets
  • Use of Multiple Streams
  • Use of One-to-Many Style Sockets

14
Using SCTP for MPI
  • Striking similarities between SCTP and MPI

15
Implementation Issues
  • Maintaining State Information
  • Maintain state appropriately for each request
    function to work with the one-to-many style.
  • Message Demultiplexing
  • Extend RPI initialization to map associations to
    rank.
  • Demultiplexing of each incoming message to direct
    it to the proper receive function.
  • Concurrency and SCTP Streams
  • Consistently map MPI tag-rank-context to SCTP
    streams, maintaining proper MPI semantics.
  • Resource Management
  • Make RPI more message-driven.
  • Eliminate the use of the select() system call,
    making the implementation more scalable.
  • Eliminating the need to maintain a large number
    of socket descriptors.

16
Implementation Issues
  • Eliminating Race Conditions
  • Finding solutions for race conditions due to
    added concurrency.
  • Use of barrier after association setup phase.
  • Reliability
  • Modify out-of-band daemons and request
    progression interface (RPI) to use a common
    transport layer protocol to allow for all
    components of LAM to multihome successfully.
  • Support for large messages
  • Devised a long-message protocol to handle
    messages larger than socket send buffer.
  • Experiments with different SCTP stacks

17
Features of Design
  • Head-of-Line Blocking
  • Multihoming and Reliability
  • Security

18
Head-of-Line Blocking
19
Multihoming
  • Heartbeats
  • Failover
  • Retransmissions
  • User adjustable controls

20
Added Security
User data can be piggy-backed on third and fourth
leg
SCTPs Use of Signed Cookie
21
Limitations
  • Comprehensive CRC32c checksum offload to NIC
    not yet commonly available
  • SCTP bundles messages together so it might not
    always be able to pack a full MTU
  • SCTP stack is in early stages and will improve
    over time
  • Performance is stack dependant (Linux lksctp
    stack ltlt FreeBSD KAME stack)

22
Experiments for Loss
Performance of MPI Program that Uses Multiple Tags
23
Experiments Head-of-Line Blocking
Use of Different Tags vs. Same Tags
24
Experiments SCTP versus TCP
MPBench Ping Pong Test under No Loss
25
Conclusions
  • SCTP is a better suited for MPI
  • Avoids unnecessary head-of-line blocking due to
    use of streams
  • Increased fault tolerant in presence of
    multihomed hosts
  • In-built security features
  • SCTP might be key to moving MPI programs from
    LANs to WANs.

26
Thank you!
  • More information about our work is at
  • http//www.cs.ubc.ca/labs/dsg/mpi-sctp/
Write a Comment
User Comments (0)
About PowerShow.com