MPI - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

MPI

Description:

Specialized hardware (BlueGene/L, ASCI Red, XD1, etc.) Standard TCP/IP transport protocols ... unicast transport protocol for IP network data communications ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 67
Provided by: pen45
Category:
Tags: mpi

less

Transcript and Presenter's Notes

Title: MPI


1
MPI _at_
  • Brad Penoff, Camilo Rostoker, Alan Wagner,
  • Mike Tsai, Humaira Kamal, Edith Vong
  • Department of Computer Science
  • University of British Columbia

March 15, 2006
2
Overview
  • What is MPI and its role within HPC?
  • What is SCTP and how can it help MPI?
  • MPI middleware the good and the bad.
  • How do we use MPI?
  • What is the future for IP protocols in HPC?

3
Overview
  • What is MPI and its role within HPC?
  • What is SCTP and how can it help MPI?
  • MPI middleware the good and the bad.
  • How do we use MPI?
  • What is the future for IP protocols in HPC?

4
Some HPC goals
  • To solve large problems involving large
    computations on large datasets
  • To enable new types of analysis
  • To utilize all available resources
  • Processors
  • Networks
  • I/O means
  • To scale

5
One approach within HPC
  • Parallel programming
  • Models for explicitly expressing a task whose
    parts can be effectively ran simultaneously
  • The most well-known use of a model
  • MPI (message-passing interface)
  • API designed 10 years ago by committee
  • Sometimes called the assembly language of
    parallel processing

6
Middleware for MPI
  • Glues necessary components together for parallel
    environment
  • Attempts to allow for portability with maximal
    performance

7
Communication component
  • Implements MPI API for various interconnects
  • Shared memory
  • Myrinet
  • Infiniband
  • Specialized hardware (BlueGene/L, ASCI Red, XD1,
    etc.)
  • Standard TCP/IP transport protocols

8
TCP/IP protocol stack
  • About 40 of machines in the Top500 use TCP
  • SCTP was yet to be used for MPI

9
Overview
  • What is MPI and its role within HPC?
  • What is SCTP and how can it help MPI?
  • MPI middleware the good and the bad.
  • How do we use MPI?
  • What is the future for IP protocols in HPC?

10
What is SCTP?
  • Stream Control Transmission Protocol
  • General purpose unicast transport protocol for IP
    network data communications
  • Recently standardized by IETF
  • Can be used anywhere TCP is used

11
SCTP Key Similarities
  • Reliable in-order delivery, flow control, full
    duplex transfer.
  • TCP-like congestion control
  • Selective ACK is built-in the protocol

12
SCTP Key Differences
  • Message oriented
  • Added security
  • Multihoming, use of associations
  • Multiple streams within an association

13
Associations and Multihoming
Endpoint X
Endpoint Y
Association
NIC
1
NIC
2
NIC
3
NIC
4
Network
207
.
10
.
x
.
x
IP

207
.
10
.
3
.
20
IP

207
.
10
.
40
.
1
Network
168
.
1
.
x
.
x
IP

168
.
1
.
140
.
10
IP

168
.
1
.
10
.
30
14
Logical View of Multiple Streams in an Association
15
Partially Ordered User Messages Sent on Different
Streams
16
Partially Ordered User Messages Sent on Different
Streams
17
Partially Ordered User Messages Sent on Different
Streams
18
Partially Ordered User Messages Sent on Different
Streams
19
Partially Ordered User Messages Sent on Different
Streams
20
Partially Ordered User Messages Sent on Different
Streams
21
Partially Ordered User Messages Sent on Different
Streams
22
Partially Ordered User Messages Sent on Different
Streams
23
Partially Ordered User Messages Sent on Different
Streams
24
Partially Ordered User Messages Sent on Different
Streams
25
Partially Ordered User Messages Sent on Different
Streams
26
Partially Ordered User Messages Sent on Different
Streams
Can be received in the same order as it was sent
(required in TCP).
27
Partially Ordered User Messages Sent on Different
Streams
28
Partially Ordered User Messages Sent on Different
Streams
29
Partially Ordered User Messages Sent on Different
Streams
30
Partially Ordered User Messages Sent on Different
Streams
31
Partially Ordered User Messages Sent on Different
Streams
32
Partially Ordered User Messages Sent on Different
Streams
33
Partially Ordered User Messages Sent on Different
Streams
Delivery constraints A must be before C and C
must be before D
34
Available SCTP stacks
  • BSD / Mac OS X
  • LKSCTP Linux Kernel 2.4.23 and later
  • Solaris 10
  • HP OpenCall SS7
  • OpenSS7
  • Other implementations listed on sctp.org for
    Windows, AIX, VxWorks, etc.

35
Upcoming annual SCTP Interop
  • July 30 Aug 4, 2006 to be held at UBC
  • Vendors and implementers test their stacks
  • Performance
  • Interoperability

36
MPI over SCTP
  • LAM and MPICH2 are two popular open source
    implementations of the MPI library.
  • We redesigned LAM to use SCTP and take advantage
    of its additional features.
  • Future plans include SCTP support within MPICH2.

37
How can SCTP help MPI?
  • A redesign for SCTP thins the MPI middlewares
    communication component.
  • Use of one-to-many socket-style scales well.
  • SCTP adds resilience to MPI programs.
  • Avoids unnecessary head-of-line blocking with
    streams
  • Increased fault tolerance in presence of
    multihomed hosts
  • Built-in security features
  • Improved congestion control

Full Results Presented _at_
38
Overview
  • What is MPI and its role within HPC?
  • What is SCTP and how can it help MPI?
  • MPI middleware the good and the bad.
  • How do we use MPI?
  • What is the future for IP protocols in HPC?

39
Good of IP-based MPI Middleware
  • Ubiquitous
  • its EVERYWHERE
  • Cheap
  • popularity drives down costs
  • Well-known
  • leverage network research
  • Portable
  • heterogeneous environments
  • Seamlessly connects across networks
  • SMP, cluster, LAN, WAN

40
Bad of IP-based MPI Middleware
  • Control-driven, Event/Interrupt Mismatch
  • NIC/OS interrupt driven
  • User-space usually control-driven
  • Flow control
  • Stuck with transport level flow control
  • Must multiplex incoming message flows
  • How to handle unexpected messages?
  • Excess system calls
  • Context switch for crossing kernel boundary

41
Ugly of MPI Middleware
  • Generalizing the parallel environment
  • Trade-offs with portability and performance
  • Byzantine agreement
  • Has a remote process died or is it just busy?
  • Parallel debugging across a network

42
Overview
  • What is MPI and its role within HPC?
  • What is SCTP and how can it help MPI?
  • MPI middleware the good and the bad.
  • How do we use MPI?
  • What is the future for IP protocols in HPC?

43
MPI Applications
44
Forget the Grid? Lets just use MPI
  • Can utilize heterogeneous resources and networks
    by focusing on IP-based protocols (Grid-lite).
  • Result Need to design applications to be more
    flexible in high latency/high loss environments.

45
Latency-Tolerant Applications
  • Processor Farm Applications
  • mpiBLAST
  • Parallel workflow environment
  • Computational Finance
  • Gene expression network analysis

46
mpiBLAST
  • MPI version of popular bioinfomatics search tool
  • Conforms to parallel farm model

47
Modifying mpiBLAST for WAN (1)
  • Progress multiple independent tasks at once
  • Buffer separate state in case of message loss
  • Each task has its own tag (i.e. SCTP stream)
  • Batch initial work REQuest messages

48
Modifying mpiBLAST for WAN (2)
  • Avoid synchrony (worsened with latency)
  • Additional use of asynchronous MPI calls
  • Use message size that can use eager send within
    the library implementation (i.e. no rendezvous)
  • Have application protocol do less handshaking

49
Overview
  • What is MPI and its role within HPC?
  • What is SCTP and how can it help MPI?
  • How do we use MPI?
  • MPI middleware the good and the bad.
  • What is the future for IP protocols in HPC?

50
The Problem
X
last mile
51
The Problem
last inch
52
Memory copying
01010
Zero-copy and RDMA?
53
Copying - Dont Do It!
Hennessy and Patterson, 1996
54
Protocol processing
01010
01010
01010
01010
Rule of thumb 1 Hz for 1bps
May be 5-6 times more for smaller messages, and
does not seem to be scaling well as processor
speeds increase.
55
Where to do the processing?
  • On-chip
  • Separate processor core in the chip
  • Kernel / User space
  • On the NIC (TOE, TOERDMA)

56
iWarp
  • IETF initiative to support zero-copy and TCP
    off-load
  • A richer interface (like zero-copy, RDMA)
  • Maintains compatibility with existing TCP/IP
  • In 2005, 42.4 of TOP500 machines used Ethernet
    with most using regular Ge adaptors.

57
Other solutions
  • Infiniband
  • Specialized
  • Designed from the start to support RDMA
  • Level5
  • User-space memory mapped, Ethernet to NIC
  • Provides protection on the board
  • Trying to speed up and integrate the I/O onto the
    memory bus or a faster interface

58
RDMA-TCP-SCTP and NIC
SCTP better suited -message based -does
framing -multistreaming -multihoming
59
Convergence?
  • Everything over IP
  • IP/IB 10Gb InfiniBand only .7x or 2x that of
    standard 1 Gb Ethernet (Egenera-white paper)
  • Latency difference 7 microseconds (InfiniBand)
    versus 65 microseconds (regular Ge Ethernet)
    (dual 3GHz Xeons)
  • Level 5, sub 10 microsecond. (user level stacks).

60
Laptop clustering event
  • Live-linux clustering party
  • Bring your laptop, with the recommended
    live-linux distribution
  • We will provide the application
  • Hoping for April 11th.

61
Thank you!
  • More information about our work is at
  • http//www.cs.ubc.ca/labs/dsg/mpi-sctp/

Or Google sctp mpi
62
Extra slides
63
MPI Point-to-Point
MPI_Send(msg,cnt,type,dst-rank,tag,context)
MPI_Recv(msg,cnt,type,src-rank,tag,context)
  • Message matching is done based on Tag, Rank and
    Context (TRC).
  • Combinations such as blocking, non-blocking,
    synchronous, asynchronous, buffered, unbuffered.
  • Use of wildcards for receive

64
MPI Messages Using Same Context, Two Processes
65
MPI Messages Using Same Context, Two Processes
Out of order messages with same tags violate MPI
semantics
66
Using SCTP for MPI
  • TRC-to-stream map matches MPI semantics
Write a Comment
User Comments (0)
About PowerShow.com