IPDPS 2004 Presentation - PowerPoint PPT Presentation

About This Presentation
Title:

IPDPS 2004 Presentation

Description:

Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on ... Unpack(recv_buf, tilen-1 1, pr); END FOR. April 27, 2004. IPDPS 2004. 9 ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 35
Provided by: nikolaos8
Category:

less

Transcript and Presenter's Notes

Title: IPDPS 2004 Presentation


1
Performance Comparison of Pure MPI vs Hybrid
MPI-OpenMP Parallelization Models on SMP Clusters
Nikolaos Drosinos and Nectarios Koziris
National Technical University of Athens
Computing Systems
Laboratory ndros,nkoziris_at_cslab.ece.ntua.gr ww
w.cslab.ece.ntua.gr
2
Overview
  • Introduction
  • Pure Message-passing Model
  • Hybrid Models
  • Hyperplane Scheduling
  • Fine-grain Model
  • Coarse-grain Model
  • Experimental Results
  • Conclusions Future Work

3
Motivation
  • Active research interest in
  • SMP clusters
  • Hybrid programming models
  • However
  • Mostly fine-grain hybrid paradigms (masteronly
    model)
  • Mostly DOALL multi-threaded parallelization

4
Contribution
  • Comparison of 3 programming models for the
    parallelization of tiled loops algorithms
  • pure message-passing
  • fine-grain hybrid
  • coarse-grain hybrid
  • Advanced hyperplane scheduling
  • minimize synchronization need
  • overlap computation with communication
  • preserves data dependencies

5
Algorithmic Model
Tiled nested loops with constant flow data
dependencies FORACROSS tile0 DO
FORACROSS tilen-2 DO FOR tilen-1 DO
Receive(tile) Compute(tile)
Send(tile) END FOR END FORACROSS
END FORACROSS
6
Target Architecture
SMP clusters
7
Overview
  • Introduction
  • Pure Message-passing Model
  • Hybrid Models
  • Hyperplane Scheduling
  • Fine-grain Model
  • Coarse-grain Model
  • Experimental Results
  • Conclusions Future Work

8
Pure Message-passing Model
tile0 pr0 tilen-2 prn-2 FOR tilen-1 0
TO DO Pack(snd_buf, tilen-1 1,
pr) MPI_Isend(snd_buf, dest(pr))
MPI_Irecv(recv_buf, src(pr)) Compute(tile)
MPI_Waitall Unpack(recv_buf, tilen-1 1,
pr) END FOR
9
Pure Message-passing Model
10
Overview
  • Introduction
  • Pure Message-passing Model
  • Hybrid Models
  • Hyperplane Scheduling
  • Fine-grain Model
  • Coarse-grain Model
  • Experimental Results
  • Conclusions Future Work

11
Hyperplane Scheduling
  • Implements coarse-grain parallelism assuming
    inter-tile data dependencies
  • Tiles are organized into data-independent
    subsets (groups)
  • Tiles of the same group can be concurrently
    executed by multiple threads
  • Barrier synchronization between threads

12
Hyperplane Scheduling
tile (mpi_rank,omp_tid,tile) group
13
Hyperplane Scheduling
pragma omp parallel group0 pr0
groupn-2 prn-2 tile0 pr0 m0 th0
tilen-2 prn-2 mn-2 thn-2
FOR(groupn-1) tilen-1 groupn-1 -
if(0 lt tilen-1 lt )
compute(tile) pragma omp barrier

14
Overview
  • Introduction
  • Pure Message-passing Model
  • Hybrid Models
  • Hyperplane Scheduling
  • Fine-grain Model
  • Coarse-grain Model
  • Experimental Results
  • Conclusions Future Work

15
Fine-grain Model
  • Incremental parallelization of computationally
    intensive parts
  • Pure MPI hyperplane scheduling
  • Inter-node communication outside of
    multi-threaded part (MPI_THREAD_MASTERONLY)
  • Thread synchronization through implicit barrier
    of omp parallel directive

16
Fine-grain Model
FOR(groupn-1) Pack(snd_buf, tilen-1 1,
pr) MPI_Isend(snd_buf, dest(pr))
MPI_Irecv(recv_buf, src(pr)) pragma omp
parallel thread_idomp_get_thread_nu
m() if(valid(tile,thread_id,groupn-1))
Compute(tile) MPI_Waitall
Unpack(recv_buf, tilen-1 1, pr)
17
Overview
  • Introduction
  • Pure Message-passing Model
  • Hybrid Models
  • Hyperplane Scheduling
  • Fine-grain Model
  • Coarse-grain Model
  • Experimental Results
  • Conclusions Future Work

18
Coarse-grain Model
  • Threads are only initialized once
  • SPMD paradigm (requires more programming effort)
  • Inter-node communication inside multi-threaded
    part (requires MPI_THREAD_FUNNELED)
  • Thread synchronization through explicit barrier
    (omp barrier directive)

19
Coarse-grain Model
pragma omp parallel thread_idomp_get_threa
d_num() FOR(groupn-1) pragma omp
master Pack(snd_buf, tilen-1 1,
pr) MPI_Isend(snd_buf, dest(pr))
MPI_Irecv(recv_buf, src(pr))
if(valid(tile,thread_id,groupn-1))
Compute(tile) pragma omp master
MPI_Waitall
Unpack(recv_buf, tilen-1 1, pr)
pragma omp barrier
20
Overview
  • Introduction
  • Pure Message-passing Model
  • Hybrid Models
  • Hyperplane Scheduling
  • Fine-grain Model
  • Coarse-grain Model
  • Experimental Results
  • Conclusions Future Work

21
Experimental Results
  • 8-node SMP Linux Cluster (800 MHz PIII, 128 MB
    RAM, kernel 2.4.20)
  • MPICH v.1.2.5 (--with-devicech_p4,
    --with-commshared)
  • Intel C compiler 7.0 (-O3
  • -mcpupentiumpro -static)
  • FastEthernet interconnection
  • ADI micro-kernel benchmark (3D)

22
Alternating Direction Implicit (ADI)
  • Stencil computation used for solving partial
    differential equations
  • Unitary data dependencies
  • 3D iteration space (X x Y x Z)

23
ADI 2 dual SMP nodes
24
ADI X128 Y512 Z8192 2 nodes
25
ADI X256 Y512 Z8192 2 nodes
26
ADI X512 Y512 Z8192 2 nodes
27
ADI X512 Y256 Z8192 2 nodes
28
ADI X512 Y128 Z8192 2 nodes
29
ADI X128 Y512 Z8192 2 nodes
Computation
Communication
30
ADI X512 Y128 Z8192 2 nodes
Computation
Communication
31
Overview
  • Introduction
  • Pure Message-passing Model
  • Hybrid Models
  • Hyperplane Scheduling
  • Fine-grain Model
  • Coarse-grain Model
  • Experimental Results
  • Conclusions Future Work

32
Conclusions
  • Tiled loop algorithms with arbitrary data
    dependencies can be adapted to the hybrid
    parallel programming paradigm
  • Hybrid models can be competitive to the pure
    message-passing paradigm
  • Coarse-grain hybrid model can be more efficient
    than fine-grain one, but also more complicated
  • Programming efficiently in OpenMP not easier
    than programming efficiently in MPI

33
Future Work
  • Application of methodology to real applications
    and standard benchmarks
  • Work balancing for coarse-grain model
  • Investigation of alternative topologies,
    irregular communication patterns
  • Performance evaluation on advanced
    interconnection networks (SCI, Myrinet)

34
Thank You!
Questions?
Write a Comment
User Comments (0)
About PowerShow.com