GridMPI: Grid Enabled MPI

About This Presentation

Title:

GridMPI: Grid Enabled MPI

Description:

Inter communication by IMPI(Interoperable MPI), protocol and extension to Grid. MPI-2 ... Extension of IMPI Specification. Refine the current extensions ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 21

Provided by: www293

Category:

more less

Transcript and Presenter's Notes

Title: GridMPI: Grid Enabled MPI

1
GridMPIGrid Enabled MPI

Yutaka Ishikawa
University of Tokyo
and
AIST

2
Motivation

MPI has been widely used to program parallel
applications
Users want to run such applications over the Grid
environment without any modifications of the
program
However, the performance of existing MPI
implementations is not scaled up on the Grid
environment

computing resource site A
computing resource site B
Wide-area Network
Single (monolithic) MPI application over the Grid
environment
3
Motivation

Focus on metropolitan-area, high-bandwidth
environment ?10Gpbs, ? 500miles (smaller than
10ms one-way latency)
Internet Bandwidth in Grid ? Interconnect
Bandwidth in Cluster
10 Gbps vs. 1 Gbps
100 Gbps vs. 10 Gbps

computing resource site A
computing resource site B
Wide-area Network
Single (monolithic) MPI application over the Grid
environment
4
Motivation

Focus on metropolitan-area, high-bandwidth
environment ?10Gpbs, ? 500miles (smaller than
10ms one-way latency)
We have already demonstrated that the performance
of the NAS parallel benchmark programs are scaled
up if one-way latency is smaller than 10ms using
an emulated WAN environment

Motohiko Matsuda, Yutaka Ishikawa, and Tomohiro
Kudoh, Evaluation of MPI Implementations on
Grid-connected Clusters using an Emulated WAN
Environment,'' CCGRID2003, 2003
computing resource site A
computing resource site B
Wide-area Network
Single (monolithic) MPI application over the Grid
environment
5
Issues

High Performance Communication Facilities for MPI
on Long and Fat Networks
TCP vs. MPI communication patterns
Network Topology
Latency and Bandwidth
Interoperability
Most MPI library implementations use their own
network protocol.
Fault Tolerance and Migration
To survive a site failure
Security

6
Issues

High Performance Communication Facilities for MPI
on Long and Fat Networks
TCP vs. MPI communication patterns
Network Topology
Latency and Bandwidth
Interoperability
Most MPI library implementations use their own
network protocol.
Fault Tolerance and Migration
To survive a site failure
Security

7
Issues

High Performance Communication Facilities for MPI
on Long and Fat Networks
TCP vs. MPI communication patterns
Network Topology
Latency and Bandwidth
Interoperability
Most MPI library implementations use their own
network protocol.
Fault Tolerance and Migration
To survive a site failure
Security

8
Issues

High Performance Communication Facilities for MPI
on Long and Fat Networks
TCP vs. MPI communication patterns
Network Topology
Latency and Bandwidth
Interoperability
Many MPI library implementations. Most
implementations use their own network protocol
Fault Tolerance and Migration
To survive a site failure
Security

Internet
9
GridMPI Features

MPI-2 implementation
YAMPII, developed at the University of Tokyo, is
used as the core implementation
Intra communication by YAMPII(TCP/IP?SCore)
Inter communication by IMPI(Interoperable MPI),
protocol and extension to Grid
MPI-2
New Collective protocols
Integration of Vendor MPI
IBM Regatta MPI, MPICH2, Solaris MPI, Fujitsu
MPI, (NEC SX MPI)
Incremental checkpoint
High Performance TCP/IP implementation

LAC Latency Aware Collectives
bcast/allreduce algorithms have been developed
(to appear at the cluster06 conference)

10
High-performance Communication Mechanisms in the
Long and Fat Network

Modifications of TCP Behavior
M Matsuda, T. Kudoh, Y. Kodama, R. Takano, and Y.
Ishikawa, TCP Adaptation for MPI on Long-and-Fat
Networks,
IEEE Cluster 2005, 2005.
Precise Software Pacing
R. Takano, T. Kudoh, Y. Kodama, M. Matsuda, H.
Tezuka, Y. Ishikawa, Design and Evaluation of
Precise Software Pacing Mechanisms for Fast
Long-Distance Networks,
PFLDnet2005, 2005.
Collective communication algorithms with respect
to network latency and bandwidth.
M. Matsuda, T. Kudoh, Y. Kodama, R. Takano, Y.
Ishikawa, Efficient MPI Collective Operations
for Clusters in Long-and-Fast Networks,
to appear at IEEE Cluster 2006.

11
Evaluation

It is almost impossible to reproduce the
execution behavior of communication performance
in the wide area network
A WAN emulator, GtrcNET-1, is used to
scientifically examine implementations,
protocols, communication algorithms, etc.

GtrcNET-1

GtrcNET-1 is developed at AIST.
injection of delay, jitter, error,
traffic monitor, frame capture

http//www.gtrc.aist.go.jp/gnet/
12
Experimental Environment
8 PCs
8 PCs

Node15
Node7

Bandwidth1Gbps
Delay 0ms -- 10ms

CPU Pentium4/2.4GHz, Memory DDR400 512MB
NIC Intel PRO/1000 (82547EI)
OS Linux-2.6.9-1.6 (Fedora Core 2)
Socket Buffer Size 20MB

13
GridMPI vs. MPICH-G2 (1/4)
FT (Class B) of NAS Parallel Benchmarks 3.2 on 8
x 8 processes
Relative Performance
One way delay (msec)
14
GridMPI vs. MPICH-G2 (2/4)
IS (Class B) of NAS Parallel Benchmarks 3.2 on 8
x 8 processes
Relative Performance
One way delay (msec)
15
GridMPI vs. MPICH-G2 (3/4)
LU (Class B) of NAS Parallel Benchmarks 3.2 on 8
x 8 processes
Relative Performance
One way delay (msec)
16
GridMPI vs. MPICH-G2 (4/4)
NAS Parallel Benchmarks 3.2 Class B on 8 x 8
processes
Relative Performance
No parameters tuned in GridMPI
One way delay (msec)
17
GridMPI on Actual Network

NAS Parallel Benchmarks run using 8 node (2.4GHz)
cluster at Tsukuba and 8 node (2.8GHz) cluster at
Akihabara
16 nodes
Comparing the performance with
result using 16 node (2.4 GHz)
result using 16 node (2.8 GHz)

18
GridMPI Now and Future

GridMPI version 1.0 has been released
Conformance Tests
MPICH Test Suite 0/142 (Fails/Tests)
Intel Test Suite 0/493 (Fails/Tests)
GridMPI is integrated into the NaReGI package
Extension of IMPI Specification
Refine the current extensions
Collective communication and check point
algorithms could not be fixed. The current idea
is specifying
The mechanism of
dynamic algorithm selection
dynamic algorithm shipment and load
virtual machine to implement algorithms

19
Dynamic Algorithm Shipment

A collective communication algorithm implemented
in the virtual machine
The code is shipped to all MPI processes
The MPI runtime library interprets the algorithm
to perform the collective communication for
inter-clusters

Internet
20
Concluding Remarks

Our Main Concern is the metropolitan area network
high-bandwidth environment ?10Gpbs, ? 500miles
(smaller than 10ms one-way latency)
Overseas (? 100 milliseconds)
Applications must be aware of the communication
latency
data movement using MPI-IO ?
Collaborations
We would like to ask people, who are interested
in this work, for collaborations