Title: Alexandre DENIS
1Meta-communications in Component-based
Communication Frameworksfor Grids
- Alexandre DENIS
- INRIA/LaBRI, Bordeaux, France
- Alexandre.Denis_at_inria.fr
HPC-GECO, Paris - June 2006
2Outline
- Introduction
- Meta-communications
- definition
- goal
- requirements
- A flexible approach for meta-communications
- Implementation in PadicoTM
- Evaluation
- Conclusion
3Context and Goals
- Network communications on grids
4The Grid Computer
- Hierarchical networks
- WAN Internet, ATM, VTHD (2.5 Gbit/s), etc.
- LAN Ethernet
- SAN Myrinet, SCI,InfiniBand
- Computing resources
- Clusters of PC
- Supercomputers
- Visualization hardware
- ...
- Heterogeneity!
SAN
LAN
Homogeneous cluster
WAN
SAN
Homogeneous cluster
Parallel computer
5Communication schemes
- Example 1
- Two clusters connected through a WAN
- MPI code coupled with CORBA
WAN
SAN
SAN
6Communication schemes
- Example 1
- An MPI code on each cluster (over a SAN)
- code coupling through CORBA over the WAN
CORBA
MPI
MPI
Firewall
NAT
7Communication schemes
- Example 2
- One MPI communicator spaning accross two clusters
Myrinet
Infiniband
MPI
8Communication schemes
- Example 3
- Computation on a cluster
- Cluster protected by a firewall
- Visualization on a dedicated node
CORBA
Visualization
MPI
Laptop with dynamic IP address
Firewall
9Our Goals
- Problems raised by communications on grids
- Connectivity
- Sites protected by firewalls
- Non-routed private IP addresses, NAT
- Security
- Ensure privacy of transmitted data
- Protect from intruders
- Performance
- Use high-performance networks where available
- Use WAN-optimized methods where needed
- e.g. parallel streams when latency x bandwidth is
high - TCP splicing instead of tunneling where possible
- Component-based communications frameworks are
designed to reach such a flexibility!
10Component-based communication framework
- Components as building blocks for the
communication stack - e.g. Globus XIO, NetIbis, PadicoTM
MPI
CORBA
JVM
DSM
Component-based communication framework
Myrinet
Infiniband
Quadrics
TCP/IP
11Meta-communications
12Meta-Communications
- Establishing data connections requires auxiliary
communications - transmit the assembly description
- communication methods need brokering, e.g.
- TCP needs port number exchange
- TCP splicing needs synchronization
- Myrinet/Infiniband/... need channel number
negotiation - ...
- Meta-communications (or control
communications)
13Meta-communications example
- A client connects to a server
TCP port14235
14Meta-communications example
- A client connects to a server
15The weakest link
- Meta-communications properties determine the
whole communication framework - Connectivity
- No data connection if no meta-communication
channel - Security
- Able to connect to meta-communication channel
- Able to establish data connection with any
assembly - Performance
- Messaging on meta-communication channel is on
critical path for data connection establishment - Performance of meta-communications limits data
connection establishment time
16Do withoutmeta-communications?
- Try to eliminate the weakest link
- no brokered communication methods
- no TCP splicing, no reverse connection
- needs to use well-known ports
- no assembly negotiation
- the server must guess what clients will use!
- all clients must use the same communication
method - Limited to static communication schemes
17A flexible approach for meta-communications
- A two-step bootstrap approach
18Component-basedmeta-communications
- Meta-communication channel must meet
requirements - Connectivity
- Security
- Performance
- Idea make the meta-communication channel itself
component-based - Problem needs a meta²-communication channel!
- meta-communications for the meta-communications
19Two-step bootstrap
- Two levels of meta-communications
- control channel
- meta-communications for the data channel
- bootstrap channel
- meta-communications for the control channel
- Requirements for the bootstrap channel
- connectivity security
- low performance requirements
- bootstrap channel used only on control channel
establishment - routing is acceptable
- uses no meta-communications
- static component stack (no assembly negotiation)
20Bootstrap example
- Bootstrap phase 1 establish bootstrap channel
- a rendez-vous node is listening
- nodes connect to the rendez-vous node
- using a secure connection
- the rendez-vous node routes messages
21Bootstrap example
- Bootstrap phase 2 establish control channel
- use the bootstrap channel for meta-communications
- establish an optimized control channel
LID0x1234QPN2
22The PadicoTM Framework
- Communication framework designed for grids
- Transparent access to various networks
- Myrinet, Infiniband, Quadrics, TCP/IP, ...
- Supports plugable communication methods
- Supports a wide range of middleware systems
- MPI, CORBA, SOAP, Java, DSM, HLA
Middleware 1
Middleware 2
Middleware 3
Middleware 4
Method 1
PadicoTM
Method 2
Network 1
Network 2
Network 3
Network 4
23Implementation in PadicoTM
- Available communication methods
- plain TCP, high-performance networks (Myrinet,
Infiniband, Quadrics, ...), shmem - methods for WAN TCP splicing, SOCKS proxy, SSH
tunneling - data filters LZO ZIP compression, Gnu TLS
security - routing
- Bootstrap connection through TLS/TCP
- bootstrap through SSH tunnels if rendez-vous node
is not directly reachable - Default configuration for optimized control
channel
24Two-step bootstrapoptimization
- Lazy connections on control channel
- establish control connection on the first data
connection establishment - reduces application startup time
- Federation of rendez-vous nodes
- reduces load on rendez-vous node
25Evaluation
- Connectivity
- succeeds to get all-to-all connectivity in the
aforementioned cases - NAT, private addresses, dynamic IP address (not
in DNS) - firewalls
26Communication schemes
- Example 1
- An MPI code on each cluster (over a SAN)
- code coupling through CORBA over the WAN
CORBA
MPI
MPI
Firewall
NAT
27Communication schemes
- Example 2
- One MPI communicator spaning accross two clusters
Myrinet
Infiniband
MPI
28Communication schemes
- Example 3
- Computation on a cluster
- Cluster protected by a firewall
- Visualization on a dedicated node
CORBA
Visualization
MPI
Laptop with dynamic IP address
Firewall
29Evaluation
- Security
- user-configurable assembly
- TLS usable for bootstrap/control/data
- firewall crossing doesn't compromise security
- encryption/authentication may be given up by users
30Evaluation
- Performance
- high performance networks actually usable for
control channel and data communications - e.g. PadicoTM virtual sockets / Infiniband
- latency 5.7 usec.
- bandwidth 708 MB/s
- connection establishment 30 usec.
31Conclusion
- Component-based communication frameworks brings
the required flexibility for grids - require a meta-communication channel
- Our contribution
- a flexible approach to meta-communications
- no compromise regarding connectivity, security
and performance - model actually implemented in PadicoTM
- code available
- http//runtime.futurs.inria.fr/PadicoTM/
32Future works
- More communication methods
- e.g. Globus Security Infrastructure (GSI)
- Large scale experiments
- test scalability with thousands of nodes
- Fault-tolerance
- required for large scale experiments
33(No Transcript)