Title: The generic Madeleine communication interface: from clusters to grids
1The generic Madeleine communication
interfacefrom clusters to grids
- Luc Bougé
- ReMaP Project
- LIP, ENS Lyon
2Credits
- The ParaDigme group
- Luc Bougé Jean-François Méhaut
- Jean-Christophe Mignot Raymond Namyst
- Loïc Prylli
- Gabriel Antoniu Olivier Aumage
- Alice Bonhomme Vincent Danjean
- Alexandre Denis Guillaume Mercier
- Christian Perez
3Madeleine II - Objectives
- PM2 Multithreading Library
- High performance clusters
- RPC-oriented communication
- Efficiency
- Zero-copy transmission
- Multi-paradigm protocols
- Multiple protocols
- Multiple adapters
Node 1
Network
Node 2
4Related works
- Standards TCP UDP
- Low-level BIP SBP
- High-level MPI PVM
- Intermediate level
- Nexus
- Fast messages
- VIA
- Gamma
5The Madeleine II Interface
Mad_begin_packing
Mad_pack
Mad_end_packing
Mad_begin_unpacking
Mad_unpack
Mad_end_unpacking
6Example
Sending side
Receiving side
7Example
Sending side
Receiving side
int n
int n
char s NULL
char s "Hello, World !"
p_mad_connection_t cnx
p_mad_connection_t cnx
cnx mad_begin_unpacking(channel)
cnx mad_begin_packing(channel, dest)
n strlen(s) 1
mad_unpack(cnx, n, sizeof(int),
mad_pack(cnx, n, sizeof(int),
send_CHEAPER,receive_EXPRESS)
send_CHEAPER, receive_EXPRESS)
s malloc(n)
mad_unpack(cnx, s, n,
mad_pack(cnx, s, n,
send_CHEAPER,receive_CHEAPER)
send_CHEAPER, receive_CHEAPER)
mad_end_unpacking(cnx)
mad_end_packing(cnx)
8Example
Sending side
Receiving side
int n
int n
char s NULL
char s "Hello, World !"
p_mad_connection_t cnx
p_mad_connection_t cnx
cnx mad_begin_unpacking(channel)
cnx mad_begin_packing(channel, dest)
n strlen(s) 1
mad_unpack(cnx, n, sizeof(int),
mad_pack(cnx, n, sizeof(int),
send_CHEAPER,receive_EXPRESS)
send_CHEAPER, receive_EXPRESS)
s malloc(n)
mad_unpack(cnx, s, n,
mad_pack(cnx, s, n,
send_CHEAPER,receive_CHEAPER)
send_CHEAPER, receive_CHEAPER)
mad_end_unpacking(cnx)
mad_end_packing(cnx)
9Example
Sending side
Receiving side
int n
int n
char s NULL
char s "Hello, World !"
p_mad_connection_t cnx
p_mad_connection_t cnx
cnx mad_begin_unpacking(channel)
cnx mad_begin_packing(channel, dest)
n strlen(s) 1
mad_unpack(cnx, n, sizeof(int),
mad_pack(cnx, n, sizeof(int),
send_CHEAPER,receive_EXPRESS)
send_CHEAPER, receive_EXPRESS)
s malloc(n)
mad_unpack(cnx, s, n,
mad_pack(cnx, s, n,
send_CHEAPER,receive_CHEAPER)
send_CHEAPER, receive_CHEAPER)
mad_end_unpacking(cnx)
mad_end_packing(cnx)
10Example
Sending side
Receiving side
int n
int n
char s NULL
char s "Hello, World !"
p_mad_connection_t cnx
p_mad_connection_t cnx
cnx mad_begin_unpacking(channel)
cnx mad_begin_packing(channel, dest)
n strlen(s) 1
mad_unpack(cnx, n, sizeof(int),
mad_pack(cnx, n, sizeof(int),
send_CHEAPER,receive_EXPRESS)
send_CHEAPER, receive_EXPRESS)
s malloc(n)
mad_unpack(cnx, s, n,
mad_pack(cnx, s, n,
send_CHEAPER,receive_CHEAPER)
send_CHEAPER, receive_CHEAPER)
mad_end_unpacking(cnx)
mad_end_packing(cnx)
11Example
Sending side
Receiving side
int n
int n
char s NULL
char s "Hello, World !"
p_mad_connection_t cnx
p_mad_connection_t cnx
cnx mad_begin_unpacking(channel)
cnx mad_begin_packing(channel, dest)
n strlen(s) 1
mad_unpack(cnx, n, sizeof(int),
mad_pack(cnx, n, sizeof(int),
send_CHEAPER,receive_EXPRESS)
send_CHEAPER, receive_EXPRESS)
s malloc(n)
mad_unpack(cnx, s, n,
mad_pack(cnx, s, n,
send_CHEAPER,receive_CHEAPER)
send_CHEAPER, receive_CHEAPER)
mad_end_unpacking(cnx)
mad_end_packing(cnx)
12Example
Sending side
Receiving side
int n
int n
char s NULL
char s "Hello, World !"
p_mad_connection_t cnx
p_mad_connection_t cnx
cnx mad_begin_unpacking(channel)
cnx mad_begin_packing(channel, dest)
n strlen(s) 1
mad_unpack(cnx, n, sizeof(int),
mad_pack(cnx, n, sizeof(int),
send_CHEAPER,receive_EXPRESS)
send_CHEAPER, receive_EXPRESS)
s malloc(n)
mad_unpack(cnx, s, n,
mad_pack(cnx, s, n,
send_CHEAPER,receive_CHEAPER)
send_CHEAPER, receive_CHEAPER)
mad_end_unpacking(cnx)
mad_end_packing(cnx)
13Example
Sending side
Receiving side
int n
int n
char s NULL
char s "Hello, World !"
p_mad_connection_t cnx
p_mad_connection_t cnx
cnx mad_begin_unpacking(channel)
cnx mad_begin_packing(channel, dest)
n strlen(s) 1
mad_unpack(cnx, n, sizeof(int),
mad_pack(cnx, n, sizeof(int),
send_CHEAPER,receive_EXPRESS)
send_CHEAPER, receive_EXPRESS)
s malloc(n)
mad_unpack(cnx, s, n,
mad_pack(cnx, s, n,
send_CHEAPER,receive_CHEAPER)
send_CHEAPER, receive_CHEAPER)
mad_end_unpacking(cnx)
mad_end_packing(cnx)
14Madeleine II structure
Application
Application
Generic BufferManagementModules
Switch
Switch
BMM1
BMMn
BMM1
BMMm
Specific Transmission Modules
Selection
Selection
TM1
TMn
TM1
TMn
Network
15Implementation
- Madeleine II currently available over
- SISCI/SCI
- BIP/Myrinet
- MPI
- VIA
- TCP
- SBP
16BIP/Myrinet
17BIP/Myrinet
18SISCI/SCI
19SISCI/SCI
20Current Limitations
- Currently
- network-homogeneous clusters
- completely connected sessions
Node 4
Node 3
SCI
Node 1
TCP
TCP
Node 2
21Madeleine II
- as a basis forgrid communication layers
22Madeleine II Grid Component
- Multiprotocol communication device for
- Nexus
- MPICH
- Provide cluster-level communication support
- generic
- efficient
-
23Nexus/Madeleine II
- Globus/Nexus multi-site management
- resource management, security
- inter-cluster oriented
- Madeleine II high performance communication
- generic structure
- intra-cluster oriented
- The best of both worlds!
- Madeleine as a Nexus module
24Structure
Nexus
Nexus/Madeleine module
Message Passing module
TCP module
Other modules
TCP protocol
MPL protocol
INX protocol
MAD SCI protocol
MAD TCP protocol
Madeleine
MPL Library
INX Library
Sockets
SCI
TCP
25Latency
26Bandwidth
27MPICH/Madeleine II
- MPICH general-purpose portable MPI
implementation - well-defined protocol interface
- Abstract Device
- Madeleine cluster-specific high-performance
communication - generic structure
- available on Gigabit networks
- highly optimized implementation
- The best of both worlds!
- Madeleine as a MPICH device
28MPICH/Madeleine II
MPI API
Generic part (collective operations,
context/group management, ...)
ADI
Generic ADI code, datatype management, request
queues management
ProtocolInterface
CH_MAD device inter-node communication polling
loops eager protocol rendez-vous-protocol
SMP_PLUG device intra-node communication
CH_SELF device self communication
Madeleine II multi-protocol management
TCP
SISCI
BIP
Fast-Ethernet
SCI
Myrinet
29Latency
30Bandwidth
31Madeleine II
- on network-heterogeneous clusters
32Network-heterogeneous clusters
- Existing solutions for heterogeneous cluster
support - PACX-MPI, MPIConnect, MPICH-G
- Common features
- use of local MPIs for intra-cluster
communication - limited inter-cluster support
- Getting Madeleine heterogeneous?
- MPICH/Madeleine
Myrinet
SCI
33Objectives
34Objectives
- Automatic communication support between nodes on
different networks - Multiprotocol forwarding
- Requirements
- genericity
- efficiency
35Madeleine II structure
Application
Application
BufferManagementLayer
Switch
Switch
BMM1
BMMn
BMM1
BMMm
Selection
Selection
Network Layer
TM1
TMn
TM1
TMn
Network
36Endpoint modification
Application
- Use of a generic TM
- Advantages
- symmetric BMM selection on both sides
- Functions
- control over protocol-dependent TM selection
- MTU negociation
BMM1
BMM2
BMMn
Generic TM
TM1
TM2
Réseau
37Channels
- Real channels
- Correspond to one network adapter
- Do not necessarily cover every node
- Virtual channels
- Cover every nodes
- Contain 2 kinds of real channels
- Special channels for messages to be
retransmitted - Regular channels for other messages
38Virtual channels
One virtual channel
Real SCI channels
Real Myrinet channels
1
2
3
4
Special channel
Regular channel
39The gateway
Application
Polling thread
Forwarding thread
Myrinet
SCI
40Latency
41Bandwidth
42Conclusion
- Madeleine II a generic communication interface
- portable
- efficient
- Madeleine II as a basis for grid communication
layers - Nexus
- MPICH
- Madeleine II on network-heterogeneous clusters
- Myrinet SCI
- Heterogeneous MPICH/Madeleine
- Departemental-level grid computing?