Design and Implementation of an Efficient MPI Library for the Grid - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Design and Implementation of an Efficient MPI Library for the Grid

Description:

Scatter/Gather (MPI_Scatter/Gather) ... Made by combining Bcast/Scatter. Algorithm design. Point-to-point communication (MPI_Send/Recv) ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 23
Provided by: supercom
Category:

less

Transcript and Presenter's Notes

Title: Design and Implementation of an Efficient MPI Library for the Grid


1
Design and Implementation of an Efficient MPI
Library for the Grid
  • (2003. 1. 17)
  • Kyung-Lang Park
  • Yonsei Univ. Super Computing Lab.

2
Contents
  • Introduction
  • Characterizing the Grid environments
  • Design of an efficient algorithms for Grid
  • Implementation
  • Experimental Result
  • Conclusion

3
Introduction
  • Emergence of the computational Grid
  • Grid can provide unlimited resources
  • User dont need to understand Grid architecture
  • Challenges
  • Architectural Problem
  • Developing new components
  • Changing exist technologies into the Grid
    environments
  • Communication Delay
  • Improving performance of the physical medium
  • Improving performance of communication S/W and
    its algorithm

4
Introduction
  • Communication algorithms
  • There are a lot of previous works
  • Communication algorithm
  • Binomial tree, FNF, ECEF, SPOC, alpha tree, MST
  • MPI Implementation
  • MagPIe, MPICH-G2, PACX-MPI
  • The most important thing is that we must
    understand the Grid environments and apply to
    algorithm design

5
Introduction
  • Our research steps
  • Characterizing the Grid network environments
  • Designing an efficient algorithm for the Grid
  • Implementing the designed algorithm into MPI
    Library

6
Characterizing the Grid
  • Network Status can be changed radically and
    dynamically
  • Need to intelligent communication algorithms
  • Latency is the most considerable parameter and
    not fixed as constant
  • Latency can be a crucial factor to design an
    above algorithm
  • Ideal communication cost is better, but cant be
    implemented

7
Algorithm design
  • Design Constraints
  • Use latency information
  • Make quasi-optimal latency tree, but it should be
    simple
  • Reflect wide area communication concept

8
Algorithm design
  • Hierarchical Latency Optimal Tree(HLOT) Algorithm
  • Input
  • Y a set of selected nodes, V
    an universal set of nodes
  • W 2-dimensional latency array, e(s,r)
    the edge from c to n
  • Dn the sum of latencies for the selected
    path from the root to node n
  • Output
  • F a set of selected edges,(HLOT Tree)
  • Algorithm steps
  • F Ø
  • Y root
  • for (i0, i lt total of nodes) Di
    Wrooti // Distance Initialization
  • while (V-Y ? Ø ) // repeat until finding the
    path to all nodes
  • choose fastest link e(s,r) (s is an element
    of Y and n is one of V-Y)
  • if( e(s,r) makes the loop or Dn is higher than
    W0r )
  • continue // try again
  • add c to Y

9
Algorithm design
  • Broadcasting (MPI_Bcast)
  • Can be implemented simply using HLOT
  • HLOT is expressed as a sorted list of selected
    edges
  • (senders rank, receivers rank) .
  • All processes search the list and
  • If finding the rank in senders part send a
    message to receivers
  • If finding the rank in receivers part ready to
    receive a message from the sender described in
    the list
  • Broadcasting Steps
  • Except for the root all processes can find their
    rank in receivers part
  • Root start to send a message to the receivers
  • When a process receive a message, the process
    search the HLOT and relay the message to another
    one

10
Algorithm design
  • Reducing (MPI_Reduce)
  • Constructing HLOT with uplink latency
  • All processes search the list with reversed order
  • If finding the rank in senders part send a
    message to receivers
  • If finding the rank in receivers part ready to
    receive a message from the sender described in
    the list
  • Reducing Steps
  • Except for the leaf nodes, processes can find
    their rank in receivers part
  • leaf nodes start to send a message to the
    receivers
  • When a process receive all message, the process
    search the HLOT and relay the message to another
    one

11
Algorithm design
  • Scatter/Gather (MPI_Scatter/Gather)
  • Same as MPI_Bcast/MPI_Reduce, but all nodes send
    different messages
  • New Algorithm is needed gt Long Data First (LDF)

P0
P1
P6
P7
P0
P7
(0,7)

(0,7)
(0,1)
(0,7)
(7,6)
(0,7,6,5)
(0,7)
(7,6)
(0,1)
(1,2)
(7,6)
(0,1,2)
(7,6,5)
(7,5)
(0,3)
(7,5)
(0,3)
(0,1)
(0,4)
(0,4)
(1,2)
Bcast
(0,3)
(0,4)
HLOT
12
Algorithm design
  • Allgather/Alltoall
  • Made by combining Bcast/Scatter

13
Algorithm design
  • Point-to-point communication (MPI_Send/Recv)
  • Can be expanded general point-to-point
    communication
  • HLOT can shows the path from one point to another
  • But, can arise confusion
  • Other processes dont know the situation

14
Implementation
  • Network Weather Service
  • Need a mechanism to measure the correct latency
    and not to countervail the efficiency of the HLOT
  • NWS provides measuring components and APIs
  • We make NMC (Network Measurement Caster)
  • It keeps network information and share it each
    other
  • We also make NMC API for programmers convenience

15
Implementation
  • Multi-level communication
  • Need to overcome the complexity of HLOT, O(n2log
    n)
  • HLOT is useful only in dynamic network
    environments
  • We divide network into 5 levels and apply HLOT
    only into highest levels, heavy-WAN

16
Experimental result
  • Antz Testbed
  • Include more than 10 distributed clusters
  • connected by general purpose Internet infra.

KISTI
Ajou
Sogang
General-purpose Internet
Yonsei
Chonbuk
KUT
Postech
Pusan
17
Experimental result
  • MPI_Bcast

18
Experimental result
  • MPI_Reduce

19
Experimental result
  • MPI_Scatter

20
Experimental result
  • MPI_Gather

21
Experimental result
  • Integer sort (NPB)

22
Conclusion
  • We design a HLOT algorithm, which can provide
    efficient communication topology
  • Our MPI Library is better than conventional
    Grid-enabled MPI Library, MPICH-G2
Write a Comment
User Comments (0)
About PowerShow.com