Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and G - PowerPoint PPT Presentation

About This Presentation
Title:

Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and G

Description:

Supporting Efficient Execution in Heterogeneous Distributed Computing ... 'Computational Grids' aim to tie these scattered resources into an integrated infrastructure. ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 20
Provided by: csU45
Category:

less

Transcript and Presenter's Notes

Title: Supporting Efficient Execution in Heterogeneous Distributed Computing Environments with Cactus and G


1
Supporting Efficient Execution in Heterogeneous
Distributed Computing Environments with Cactus
and GlobusGabrielle Allen, Thomas Dramlitsch,
Ian Foster, Nick Karonis, Matei Ripeanu, Ed
Seidel, Brian Toonen Proceedings of
Supercomputing 2001 (Winning Paper for Gordon
Bell Prize - Special Category)
  • Presenter
  • Imran Patel

2
Outline
  • Introduction
  • Computational Grids
  • Grid-enabled Cactus Toolkit
  • Experimental Results
  • Ghostzones and Compression
  • Adaptive Strategies
  • Conclusion

3
Introduction
  • Widespread use of numerical simulation techniques
    has led to a high demand for traditional
    high-performance computing resources
    (supercomputers).
  • Low-end computers are becoming increasingly
    powerful and are connected with high-speed
    networks.
  • Computational Grids aim to tie these scattered
    resources into an integrated infrastructure.
  • Applications for the grid include large-scale
    simulations which require high resources for
    increased throughput.

4
Introduction The Problem
  • Heterogeneous and dynamically behaving resources
    makes development of grid-enabled applications
    extremely difficult.
  • One approach is to develop computational
    frameworks which hide this complexity from the
    programmer.
  • Cactus-G a simulation framework which uses
    grid-aware components and message passing library
    (MPICH-G2)

5
Computational Grids
  • Computational Grids differ from other parallel
    computing environments
  • A grid may have nodes with different processor
    speeds, memory, etc
  • Grids may have widely different network
    interconnects and topologies.
  • Resource availability varies in a grid.
  • Nodes in a grid may have different software
    configurations.

6
Computational Grids Programming Techniques
  • To overcome these problems, some generic
    techniques have been devised
  • Irregular Data Distributions use
    application/network/node information.
  • Grid-aware Communication schedules
    overlapping/grouping, dedicated nodes
  • Redundant Computation at the expense of reduced
    communication.
  • Protocol Tuning TCP tweaks,compression

7
Cactus-G
  • Cactus is a modular and parallel simulation
    environment used by scientists and engineers in
    the fields of numerical relativity, astrophysics,
    climate modeling, etc.
  • Cactus design consists of a core (flesh) which
    connects to application modules (thorns).
  • Various thorns exist for services such as Globus
    Toolkit, PETSc library, visualization, etc.
  • Cactus is highly portable and parallel since it
    uses abstraction APIs which themselves are
    implemented as thorns.
  • MPICH-G2 exploits Globus services and provides
    faster communication and QOS.

8
Cactus-G Architecture
  • Applications thorns need not be grid-aware.
  • Example of a grid-aware Cactus thorn is PUGH,
    which provides MPI-based parallelism.
  • The DUROC library handles process management.

9
Experimental Results Setup
  • An application in Fortran for solving numerical
    relativity problems
  • 57 3-d variables, 780 flops per gridpoint per
    iteration.
  • N x N x 6 x ghostzone_size x 8 variables need to
    be synced at each processor.
  • Total 1500 CPUs organized in a 5 x 12 x 25 3-d
    mesh

10
Experimental Results Setup
  • 4 supercomputers at SDSC and NCSA
  • 1024-CPU IBM Power-SP (306 MFlops/s).
  • One 256 CPU and two 128-CPU SGI Origin2000
    systems. (168 MFlops/s).
  • Intramachine 200 MB/s
  • Intermachine 100 MB/s
  • SDSClt-gtNCSA 3 MB/s on 622 Mb/s

11
Communication Optimizations
  • Communication/Computation Overlap Processors
    across WANs were given few grid points so that
    they could overlap their communication with
    computations.
  • Compression A cactus thorn which exploits the
    regularity of data for compression using the libz
    library.
  • Ghostzones Larger ghostzones were used for
    efficient communication at the expense of
    redundant computations.

12
Performance Metrics
  • Flop/s rate and efficiency are used as metrics.
  • Total execution time (ttot) Measured using
    MPI_Wtime()
  • Expected Communication Time (tcomp) Ideal time
    calculated from single node.
  • Flop Count F Calculated using hardware counters
    780 Flops per gridpoint per iteration.
  • Flop/s Rate
  • F num_gridpts num_iterations / ttot
  • E tcomp/ttot

13
Performance Figures
  • 4 supercomputers 42 GFlop/s, 14
  • Compression 10 Ghostzones 249 GFlop/s, 63.3
  • Smaller run on 12011201140 processors 292
    GFlop/s, 88

14
Ghostzones
  • Increasing ghostzone size can reduce latency
    overhead by transferring fewer messages with same
    amount of total data.
  • Increasing the ghostzone size beyond a certain
    point does not give any benefits and wastes
    memory.

15
Compression
  • For increased throughput across WANs.
  • Compression found to be highly useful since data
    is regular/smooth.
  • Since smoothness of data changes over time,
    compression effects can change. So, we need
    adaptive compression.

16
Adaptive Strategies - Compression
  • Predicting optimal values of ghostzone/compression
    parameters might be good.
  • Dont want to use detailed network
    characteristics.
  • For ex change the compression state based on
    efficiency, which is averaged over N iterations.

17
Adaptive Ghostzone Sizes
  • Adapting Ghostzone size is challenging Many
    ghostsize values possible, memory re-allocations,
    ripple effects, need to get extra data from
    neighbors.
  • Start with a size of 1 and increase/decrease in
    accordance with the efficiency.

18
Adaptive Ghostzone Sizes
19
Further Information
  • The Cactus Framework and Toolkit Design and
    Applications Tom Goodale, et al.
  • Vector and Parallel Processing - VECPAR'2002
  • Grid Aware Parallelizing Algorithms Thomas
    Dramlitsch, Gabrielle Allen, Edward Seidel
  • Journal of Parallel and Distributed Computing
Write a Comment
User Comments (0)
About PowerShow.com