Title: MxN Parallel Data Redistribution
1MxN Parallel Data Redistribution
2Outline
- Introduction
- Related Work
- SEINE-based MxN Component
- Experimental Evaluation
- Conclusion
- Future Work
3Introduction
- Scientific computing is adopted in various
disciplines - Climate models
- Combustion models
- Fusion simulation
- Protein simulation
4Introduction
- Multi-physics simulations
- Span multiple scales, domains, disciplines
- Are developed by large and diverse teams
- Possibly created from existing community parallel
codes
5Introduction
- Approaches
- Refactor codes integrate all components into a
single, much larger program - More efficient data sharing
- Large time investment in rewriting program
- Component models
- Simplify the overall application development.
- Complicate the efficient coordination and sharing
of data between components (MxN problem).
6Problem Statement
- The transfer of data from
- a parallel program running
- on M processors to another
- parallel program running on
- N processors. Ideally
- neither program knows
- the number of processes
- on the other one.
7Related Work
- PAWS by Advanced Computing Laboratory at Los
Alamos National Laboratory - Parallel Application WorkSpace
- Provides a framework for coupling parallel
applications within a component-like model - Point-to-point transfers are performed in which
each node sends segments of data to remote nodes
directly and in parallel, instead of resorting to
global gather/scatter operations - The PAWS software package consists of the
following - A set of PAWS data types for parallel transfer.
- A PAWS Controller which coordinates and manages
applications. - A PAWS Application interface for communicating
with the controller and other applications. - A PAWS parallel data transfer library based on
Nexus
8Related Work
- CUMULVS by Oak Ridge National Laboratory
- Collaborative User Migration, User Library for
Visualization and Steering - supports interactive visualization and remote
computational steering of distributed
applications by multiple collaborators, and
provides a mechanism for constructing
fault-tolerant, migrating applications in
heterogeneous distributed computing environments - At data redistribution aspect, it addressed Mx1
problem
9Related Work
- InterComm by University of Maryland
- A framework for coupling distributed memory
parallel components - enables efficient communication in the presence
of complex data distributions - Uses intermediate linearization space to generate
all the information required to execute direct
data transfers between the processes in the
sender program and the receiver program
10Related Work
- MCT by Argonne National Laboratory
- Model Coupling Toolkit
- Is a set of open-source software tools for
coupling message-passing parallel models to
create a parallel coupled model - Provides communications schedulers for parallel
MxN intercomponent data transfer and MxM
intracomponent data redistribution - And much more than that
- PRMI Parallel Remote Method Invocation
- DCA, SciRun2,
11Related Work
- Common Component Architecture (CCA)
- To track down the complexity of high performance
computing applications, CCA group has proposed
the concept of software component - A component is a software object that
- interact with other components
- encapsulating certain functionality or a set of
functionalities - have a clearly defined interface
- conforms to a prescribed behavior common to all
components within an architecture - may be composed to build other components
- Based on this methodology, increasingly more
software components in scientific computing are
being composed together to create new large-scale
multidisciplinary simulations
12Related Work
- CCA MxN Component based on PAWS CUMULVS
13SEINE-based MxN Component
- Three steps in MxN data redistribution
- Describe parallel data ? defined by CCA data
working group - Compute communication schedule ? key step
- Transfer data ? straightforward
- SEINE supports for step 2 3
14SEINE-based MxN Component
- Step 1 describe parallel data
- To utilize SEINE for MxN data redistribution,
data array index space is used as coordinate
space and data index is used as coordinate - Data Array Descriptor (DAD)
- Formal definition is under construction
- Draft version
- Collapsed, Block (regular, cyclic), GenBlock,
Implicit, Explicit
15SEINE-based MxN Component
- Collapsed
- all the dimension is held in the same process
- Block
- this includes regular block distribution and
cyclic block distributions - GenBlock
- this distribution allows for blocks or
- arbitrary sizes on each process,
- although it is limited to one block
- per process.
16SEINE-based MxN Component
- Implicit
- this distribution is an arbitrary mapping of
elements to processes (on a per-axis basis)
17SEINE-based MxN Component
- Explicit
- this distribution is completely user specified.
Explicit representation cannot be combined with
all the other types of representations -
- We need to retrieve regions associated with each
process from the distributed array descriptor and
register the regions with SEINE
18SEINE-based MxN Component
- Step 2 compute communication schedule
- SEINE framework is utilized
- SEINE is a dynamic geometry-based shared space
interaction framework that is - Geometry-based
- Dynamically created/destructed
- Derived from Tuple Space Model
-
- SEINE interaction framework is originally
proposed to support extremely dynamic and complex
communication/coordination patterns while still
enable scalable implementations of large-scale
parallel scientific applications
19SEINE-based MxN Component
A Sample Application Multi-block Oil Reservoir
Simulation
20SEINE-based MxN Component
- Compute communication schedule in SEINE
21Experimental Evaluation
- Experiment Environment
- Ccaffeine framework (direct-connected
framework) - Simulated Sender Receiver Components
- Experiments proceeded on a non-dedicated
environment of 64-node Beowulf cluster connected
by 10/100M Ethernet
22Experimental Evaluation
Redistributed from 4 procs to 9 procs vs. from 8
procs to 27 procs 3-dimensional array sizes
102102102
23Experimental Evaluation
Redistributed from 8 procs to 27
procs 3-dimensional array sizes 102102102 vs.
727272
24Conclusion
- SEINE-based MxN Component
- provides the essential MxN functionality
- supports data redistribution for any
sophisticated data distribution patterns - exploits Hilbert SFC to achieve high efficiency
in communication schedule computation - abides to the most recent draft version of CCA
MxN component specification
25Future Work
- Finish implementation of SEINE-based MxN
Component - Support for implicit, explicit data distribution
patterns - Further improve efficiency in schedule
computation - Identify current computations challenges in hot
areas of scientific application and adapt
SEINE-based MxN component to real requirements - Integrate into PRMI approach