GridMPI: Grid Enabled MPI - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

GridMPI: Grid Enabled MPI

Description:

Inter communication by IMPI(Interoperable MPI), protocol and extension to Grid. MPI-2 ... Extension of IMPI Specification. Refine the current extensions ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 21
Provided by: www293
Category:
Tags: gridmpi | mpi | enabled | grid | impi

less

Transcript and Presenter's Notes

Title: GridMPI: Grid Enabled MPI


1
GridMPIGrid Enabled MPI
  • Yutaka Ishikawa
  • University of Tokyo
  • and
  • AIST

2
Motivation
  • MPI has been widely used to program parallel
    applications
  • Users want to run such applications over the Grid
    environment without any modifications of the
    program
  • However, the performance of existing MPI
    implementations is not scaled up on the Grid
    environment

computing resource site A
computing resource site B
Wide-area Network
Single (monolithic) MPI application over the Grid
environment
3
Motivation
  • Focus on metropolitan-area, high-bandwidth
    environment ?10Gpbs, ? 500miles (smaller than
    10ms one-way latency)
  • Internet Bandwidth in Grid ? Interconnect
    Bandwidth in Cluster
  • 10 Gbps vs. 1 Gbps
  • 100 Gbps vs. 10 Gbps

computing resource site A
computing resource site B
Wide-area Network
Single (monolithic) MPI application over the Grid
environment
4
Motivation
  • Focus on metropolitan-area, high-bandwidth
    environment ?10Gpbs, ? 500miles (smaller than
    10ms one-way latency)
  • We have already demonstrated that the performance
    of the NAS parallel benchmark programs are scaled
    up if one-way latency is smaller than 10ms using
    an emulated WAN environment

Motohiko Matsuda, Yutaka Ishikawa, and Tomohiro
Kudoh, Evaluation of MPI Implementations on
Grid-connected Clusters using an Emulated WAN
Environment,'' CCGRID2003, 2003
computing resource site A
computing resource site B
Wide-area Network
Single (monolithic) MPI application over the Grid
environment
5
Issues
  • High Performance Communication Facilities for MPI
    on Long and Fat Networks
  • TCP vs. MPI communication patterns
  • Network Topology
  • Latency and Bandwidth
  • Interoperability
  • Most MPI library implementations use their own
    network protocol.
  • Fault Tolerance and Migration
  • To survive a site failure
  • Security

6
Issues
  • High Performance Communication Facilities for MPI
    on Long and Fat Networks
  • TCP vs. MPI communication patterns
  • Network Topology
  • Latency and Bandwidth
  • Interoperability
  • Most MPI library implementations use their own
    network protocol.
  • Fault Tolerance and Migration
  • To survive a site failure
  • Security

7
Issues
  • High Performance Communication Facilities for MPI
    on Long and Fat Networks
  • TCP vs. MPI communication patterns
  • Network Topology
  • Latency and Bandwidth
  • Interoperability
  • Most MPI library implementations use their own
    network protocol.
  • Fault Tolerance and Migration
  • To survive a site failure
  • Security

8
Issues
  • High Performance Communication Facilities for MPI
    on Long and Fat Networks
  • TCP vs. MPI communication patterns
  • Network Topology
  • Latency and Bandwidth
  • Interoperability
  • Many MPI library implementations. Most
    implementations use their own network protocol
  • Fault Tolerance and Migration
  • To survive a site failure
  • Security

Internet
9
GridMPI Features
  • MPI-2 implementation
  • YAMPII, developed at the University of Tokyo, is
    used as the core implementation
  • Intra communication by YAMPII(TCP/IP?SCore)
  • Inter communication by IMPI(Interoperable MPI),
    protocol and extension to Grid
  • MPI-2
  • New Collective protocols
  • Integration of Vendor MPI
  • IBM Regatta MPI, MPICH2, Solaris MPI, Fujitsu
    MPI, (NEC SX MPI)
  • Incremental checkpoint
  • High Performance TCP/IP implementation
  • LAC Latency Aware Collectives
  • bcast/allreduce algorithms have been developed
    (to appear at the cluster06 conference)

10
High-performance Communication Mechanisms in the
Long and Fat Network
  • Modifications of TCP Behavior
  • M Matsuda, T. Kudoh, Y. Kodama, R. Takano, and Y.
    Ishikawa, TCP Adaptation for MPI on Long-and-Fat
    Networks,
  • IEEE Cluster 2005, 2005.
  • Precise Software Pacing
  • R. Takano, T. Kudoh, Y. Kodama, M. Matsuda, H.
    Tezuka, Y. Ishikawa, Design and Evaluation of
    Precise Software Pacing Mechanisms for Fast
    Long-Distance Networks,
  • PFLDnet2005, 2005.
  • Collective communication algorithms with respect
    to network latency and bandwidth.
  • M. Matsuda, T. Kudoh, Y. Kodama, R. Takano, Y.
    Ishikawa, Efficient MPI Collective Operations
    for Clusters in Long-and-Fast Networks,
  • to appear at IEEE Cluster 2006.

11
Evaluation
  • It is almost impossible to reproduce the
    execution behavior of communication performance
    in the wide area network
  • A WAN emulator, GtrcNET-1, is used to
    scientifically examine implementations,
    protocols, communication algorithms, etc.

GtrcNET-1
  • GtrcNET-1 is developed at AIST.
  • injection of delay, jitter, error,
  • traffic monitor, frame capture

http//www.gtrc.aist.go.jp/gnet/
12
Experimental Environment
8 PCs
8 PCs


Node15
Node7
  • Bandwidth1Gbps
  • Delay 0ms -- 10ms
  • CPU Pentium4/2.4GHz, Memory DDR400 512MB
  • NIC Intel PRO/1000 (82547EI)
  • OS Linux-2.6.9-1.6 (Fedora Core 2)
  • Socket Buffer Size 20MB

13
GridMPI vs. MPICH-G2 (1/4)
FT (Class B) of NAS Parallel Benchmarks 3.2 on 8
x 8 processes
Relative Performance
One way delay (msec)
14
GridMPI vs. MPICH-G2 (2/4)
IS (Class B) of NAS Parallel Benchmarks 3.2 on 8
x 8 processes
Relative Performance
One way delay (msec)
15
GridMPI vs. MPICH-G2 (3/4)
LU (Class B) of NAS Parallel Benchmarks 3.2 on 8
x 8 processes
Relative Performance
One way delay (msec)
16
GridMPI vs. MPICH-G2 (4/4)
NAS Parallel Benchmarks 3.2 Class B on 8 x 8
processes
Relative Performance
No parameters tuned in GridMPI
One way delay (msec)
17
GridMPI on Actual Network
  • NAS Parallel Benchmarks run using 8 node (2.4GHz)
    cluster at Tsukuba and 8 node (2.8GHz) cluster at
    Akihabara
  • 16 nodes
  • Comparing the performance with
  • result using 16 node (2.4 GHz)
  • result using 16 node (2.8 GHz)

18
GridMPI Now and Future
  • GridMPI version 1.0 has been released
  • Conformance Tests
  • MPICH Test Suite 0/142 (Fails/Tests)
  • Intel Test Suite 0/493 (Fails/Tests)
  • GridMPI is integrated into the NaReGI package
  • Extension of IMPI Specification
  • Refine the current extensions
  • Collective communication and check point
    algorithms could not be fixed. The current idea
    is specifying
  • The mechanism of
  • dynamic algorithm selection
  • dynamic algorithm shipment and load
  • virtual machine to implement algorithms

19
Dynamic Algorithm Shipment
  • A collective communication algorithm implemented
    in the virtual machine
  • The code is shipped to all MPI processes
  • The MPI runtime library interprets the algorithm
    to perform the collective communication for
    inter-clusters

Internet
20
Concluding Remarks
  • Our Main Concern is the metropolitan area network
  • high-bandwidth environment ?10Gpbs, ? 500miles
    (smaller than 10ms one-way latency)
  • Overseas (? 100 milliseconds)
  • Applications must be aware of the communication
    latency
  • data movement using MPI-IO ?
  • Collaborations
  • We would like to ask people, who are interested
    in this work, for collaborations
Write a Comment
User Comments (0)
About PowerShow.com