SCTP versus TCP for MPI - PowerPoint PPT Presentation

1 / 77
About This Presentation
Title:

SCTP versus TCP for MPI

Description:

Used standard benchmarks as well as real world programs. Fair comparison ... We experimented with a suite of 7 benchmarks, 4 data set sizes ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 78
Provided by: KAM152
Category:
Tags: mpi | sctp | tcp | benchmarks | versus

less

Transcript and Presenter's Notes

Title: SCTP versus TCP for MPI


1
SCTP versus TCP for MPI
  • Brad Penoff, Humaira Kamal, Alan Wagner
  • Department of Computer Science
  • University of British Columbia

2
Outline
  • Self Introduction
  • Research background
  • Research presentation
  • SCTP MPI background
  • MPI over SCTP design
  • Design features
  • Results
  • Conclusions

3
Who am I?
  • Born and raised in Columbus area
  • OSU alumni
  • Europa alumni
  • Worked a few years
  • Grad student finishing my MSc at UBC

4
UBC
  • d

5
Who do I work with?
  • Alan Wagner (Prof, UBC)
  • Humaira Kamal (PhD, UBC)
  • Mike Yao Chen Tsai (MSc, UBC)
  • Edith Vong (BSc, UBC)
  • Randall Stewart (Cisco)

6
What field do we work in?
  • Parallel computing
  • Concurrently utilize multiple resources

7
What field do we work in?
  • Parallel computing
  • Concurrently utilize multiple resources

1 cook
8
What field do we work in?
  • Parallel computing
  • Concurrently utilize multiple resources

1 cook
vs 8 cooks
9
What field do we work in?
  • Parallel computing
  • Concurrently utilize multiple resources

10
What field do we work in?
  • Message passing programming model
  • Message Passing Interface (MPI)
  • Standardized API for applications

11
What field do we work in?
  • Middleware for MPI
  • Glues necessary components together for parallel
    environment

12
What field do we work in?
  • Middleware for MPI
  • Glues necessary components together for parallel
    environment

?
13
What field do we work in?
  • Parallel library component
  • Implements MPI API for various interconnects
  • Shared memory
  • Myrinet
  • Infiniband
  • Specialized hardware (BlueGene/L, ASCI Red, etc)

14
What field do we work in?
  • TCP/IP protocol stack interconnect
  • Stream Control Transmission Protocol

15
SCTP versus TCP for MPI
  • Brad Penoff, Humaira Kamal, Alan Wagner
  • Department of Computer Science
  • University of British Columbia
  • Supercomputing 2005, Seattle, Washington USA

16
What is MPI and SCTP?
  • Message Passing Interface (MPI)
  • Library that is widely used to parallelize
    scientific and compute-intensive programs
  • Stream Control Transmission Protocol (SCTP)
  • General purpose unicast transport protocol for IP
    network data communications
  • Recently standardized by IETF
  • Can be used anywhere TCP is used

17
What is MPI and SCTP?
  • Message Passing Interface (MPI)
  • Library that is widely used to parallelize
    scientific and compute-intensive programs
  • Stream Control Transmission Protocol (SCTP)
  • General purpose unicast transport protocol for IP
    network data communications
  • Recently standardized by IETF
  • Can be used anywhere TCP is used
  • Question
  • Can we take advantage of SCTP features to better
    support parallel applications using MPI?

18
Communicating MPI Processes
TCP is often used as transport protocol for MPI
SCTP
SCTP
19
SCTP Key Features
  • Reliable in-order delivery, flow control, full
    duplex transfer.
  • Selective ACK is built-in the protocol
  • TCP-like congestion control

20
SCTP Key Features
  • Message oriented
  • Use of associations
  • Multihoming
  • Multiple streams within an association

21
Associations and Multihoming
  • Primary address
  • Heartbeats
  • Retransmissions
  • Failover
  • User adjustable controls
  • CMT

22
Logical View of Multiple Streams in an Association
23
Partially Ordered User Messages Sent on Different
Streams
24
Partially Ordered User Messages Sent on Different
Streams
25
Partially Ordered User Messages Sent on Different
Streams
26
Partially Ordered User Messages Sent on Different
Streams
27
Partially Ordered User Messages Sent on Different
Streams
28
Partially Ordered User Messages Sent on Different
Streams
29
Partially Ordered User Messages Sent on Different
Streams
30
Partially Ordered User Messages Sent on Different
Streams
31
Partially Ordered User Messages Sent on Different
Streams
32
Partially Ordered User Messages Sent on Different
Streams
33
Partially Ordered User Messages Sent on Different
Streams
34
Partially Ordered User Messages Sent on Different
Streams
Can be received in the same order as it was sent
(required in TCP).
35
Partially Ordered User Messages Sent on Different
Streams
36
Partially Ordered User Messages Sent on Different
Streams
37
Partially Ordered User Messages Sent on Different
Streams
38
Partially Ordered User Messages Sent on Different
Streams
39
MPI API Implementaion
MPI_Send(msg,count,type,dest-rank,tag,context)
MPI_Recv(msg,count,type,source-rank,tag,context)
  • Message matching is done based on Tag, Rank and
    Context (TRC).
  • Combinations such as blocking, non-blocking,
    synchronous, asynchronous, buffered, unbuffered.
  • Use of wildcards for receive

40
MPI Messages Using Same Context, Two Processes
41
MPI Messages Using Same Context, Two Processes
Out of order messages with same tags violate MPI
semantics
42
MPI API Implementation
  • Request Progression Layer
  • Short Messages vs. Long Messages

43
MPI over SCTP Design and Implementation
  • LAM (Local Area Multi-computer) is an open source
    implementation of MPI library.
  • Origins at Ohio Supercomputing Center
  • We redesigned LAM TCP RPI module to use SCTP.
  • RPI module is responsible maintaining state
    information of all requests.

44
MPI over SCTP Design and Implementation
  • Challenges
  • Lack of documentation
  • Code examination
  • Our document is linked-off LAM/MPI website
  • Extensive instrumentation
  • Diagnostic traces
  • Identification of problems in SCTP protocol

45
Using SCTP for MPI
  • Striking similarities between SCTP and MPI

46
Implementation Issues
  • Maintaining State Information
  • Maintain state appropriately for each request
    function to work with the one-to-many style.
  • Message Demultiplexing
  • Extend RPI initialization to map associations to
    rank.
  • Demultiplexing of each incoming message to direct
    it to the proper receive function.
  • Concurrency and SCTP Streams
  • Consistently map MPI tag-rank-context to SCTP
    streams, maintaining proper MPI semantics.
  • Resource Management
  • Make RPI more message-driven.
  • Eliminate the use of the select() system call,
    making the implementation more scalable.
  • Eliminating the need to maintain a large number
    of socket descriptors.

47
Implementation Issues
  • Eliminating Race Conditions
  • Finding solutions for race conditions due to
    added concurrency.
  • Use of barrier after association setup phase.
  • Reliability
  • Modify out-of-band daemons and request
    progression interface (RPI) to use a common
    transport layer protocol to allow for all
    components of LAM to multihome successfully.
  • Support for large messages
  • Devised a long-message protocol to handle
    messages larger than socket send buffer.
  • Experiments with different SCTP stacks

48
Features of Design
  • Scalability
  • Head-of-Line Blocking

49
Scalability
TCP
50
Scalability
SCTP
51
Head-of-Line Blocking
52

53

54

55

56

57

58

59
Limitations
  • Comprehensive CRC32c checksum offload to NIC
    not yet commonly available
  • SCTP bundles messages together so it might not
    always be able to pack a full MTU
  • SCTP stack is in early stages and will improve
    over time
  • Performance is stack dependant (Linux lksctp
    stack ltlt FreeBSD KAME stack)

60
Experiments
  • Controlled environment - Eight nodes -Dummynet
  • Used standard benchmarks as well as real world
    programs
  • Fair comparison
  • Buffer sizes, Nagle disabled, SACK ON, No
    multihoming, CRC32c OFF

61
Experiments Benchmarks
MPBench Ping Pong Test under No Loss
62
NAS Benchmarks
  • The NAS benchmarks approximate real world
    parallel scientific applications
  • We experimented with a suite of 7 benchmarks, 4
    data set sizes
  • SCTP performance comparable to TCP for large
    datasets.

63
Latency Tolerant Programs
  • Bulk Farm Processor program
  • Real-world application
  • Non-blocking communication
  • Overlap computation with communication
  • Use of multiple tags

64
Farm Program - Short Messages
65
Head-of-line blocking Short messages
66
Conclusions
  • SCTP is a better suited for MPI
  • Avoids unnecessary head-of-line blocking due to
    use of streams
  • Increased fault tolerance in presence of
    multihomed hosts
  • In-built security features
  • Robust under loss
  • SCTP might be key to moving MPI programs from
    LANs to WANs.

67
Future Work
  • Release LAM SCTP RPI module at SC05
  • Incorporate our work into Open MPI and/or MPICH2
  • Modify real applications to use tags as streams

68
Thank you!
  • More information about our work is at
  • http//www.cs.ubc.ca/labs/dsg/mpi-sctp/

69
Extra Slides
70
Partially Ordered User Messages Sent on Different
Streams
71
Added Security
User data can be piggy-backed on third and fourth
leg
SCTPs Use of Signed Cookie
72
Added Security
  • 32 bit Verification Tag reset attack
  • Autoclose feature
  • No half-closed state

73
Farm Program - Long Messages
74
Head-of-line blocking Long messages
75
Experiments Benchmarks
  • SCTP outperformed TCP under loss for ping pong
    test.

76
Experiments Benchmarks
  • SCTP outperformed TCP under loss for ping pong
    test.

77
Experiments Benchmarks
  • SCTP outperformed TCP under loss for ping pong
    test.
Write a Comment
User Comments (0)
About PowerShow.com