Title: A GCMBased Runtime Support for Parallel Grid Applications
1A GCM-Based Runtime Support for Parallel Grid
Applications
- Elton Mathias, Françoise Baude and Vincent Cave
- Elton.Mathias, Francoise.Baude,
Vincent.Cave_at_inria.fr - CBHPC08 Component-Based High Performance
Computing - Karlsruhe - October 16th, 2008
2Outline
- Introduction
- Related works and positioning
- Context
- The ProActive Middleware
- The Grid Component Model (GCM)
- Extensions to GCM
- The DiscoGrid Project and Runtime
- Evaluation
- Conclusion and perspectives
3Introduction
- Strong trend to integrate parallel resources into
grids - Non-embarrassingly parallel applications must
deal with heterogeneity, scalability and
performance (legacy applications) - Message Passing (MPI!) is established as the main
paradigm to develop scientific apps. - Asynchronism and group communications at
application level - Applications must adapt to cope with a changing
environment - DiscoGrid project intends to solve this issues by
offering a more high-level API, with treatment of
these issues at runtime level
4Related works and positioning
- Grid-oriented MPI
- optimizations at comm. layer
- unmodified apps.
- strong code coupling
- Ex GridMPI, MPICH-G2, PACX-MPI, MagPIE
- Code Coupling
- use of components
- standalone apps.
- weak code coupling
- simplified communication
- Ex. DCA, Xchangemxn, Seine
Approach Code Coupling, but with advanced
support to multipoint interactions (fine grained
tightly component-based code coupling) DiscoGrid
Project / Runtime MPI boosted with advanced
collective operations supported by a flexible
component-based runtime that provides
inter-cluster communication
5ProActive Middleware
- Grid middleware for parallel, distributed and
multi-threaded computing - Featuring
- Deployment with support to several network
protocols and cluster/grid tools - Reference implementation of the GCM
- Legacy code wrapping
- C lt-gt Java communication
- Deployment and control of legacy MPI application
6 Grid Component Model (GCM)
- Defined in the context of the Institute on
Programming Models of CoreGRID Network of
Excellence (EU project) - Extension to the Fractal comp. model adressing
key grid problematics programmability,
interoperability, code reuse and efficiency - Main characteristics
- Hierarchical component model
- primitive and composite components
- Collective interfaces
7ProActive/GCM standard interfaces
- Collective interfaces are complementary
- gathercast (many-to-one)synchronization
parameter gatherind and result dispatch - multicast (one-to-many) parallel invocation,
param. dispatch and result gather
- Standard collective interfaces are enough to
support broadcast, scatter, gather and barriers - But are not general enough to define
many-to-many operation (MxN)
8Extending GCM collective interfaces
gather-multicast itfs
- server gather-mcast
- exposes gcast itf
- connected do internal components
- client gather-mcast
- connects internal comp.
- exposes mcast itf
- Communication semantic relies on 2 policies
- gather policy
- dispatch policy
- Naïve gather-multicast MxN leads to
- bottlenecks in both communication policies
9Gather-Multicast Optimization
- Efficient communication requires direct bindings
- Solution controllers responsible for
establishing direct MxN bindings distribution
of the comm. policies - Configured along 3 operations
- (re) binding configuration (R)
- dispatch policy (D)
- gather policy (G)
- For now, these operations must be coded by
developers
10The DiscoGrid Project
- Promotes a new paradigm to develop
non-embarrassingly parallel grid applications - Target applications
- domain decomposition, requiring solution of PDEs
- Ex. electromagnetic wave propagation, fluid flow
pbs
- DiscoGrid Runtime
- - Grid-aware partitioner
- Modeling resources hierarchy
- Support the DG API
- converging to MPI when possible
- Support to inter-cluster comm.
- DiscoGrid API
- Resources seen as a hierarchical organization
- Hierarchical identifiers
- Neighborhood-based communication (update)
- C/C and Fortran bindings
11ProActive/GCM DiscoGrid Runtime
12Optimizing the update operation
DGOptimizationController.optimize
(AggregationMode, DispatchMode, DGNeighborhood,
)
13DiscoGrid Communication
- DG Runtime convert calls to DG API into MPI or
DG calls
- Point-to-point Communications
- Collective Communication
- Bcast, gather, scatter
- Neighborhood Based
- update
14Evaluation
- Conducted in the Grid5000
- 8 sites (sophia, rennes, grenoble, lyon, bordeaux
toulouse, lille) - Machines with different processors (Intel Xeon
EM64T 3GHz and IA32 2.4GHz, AMD Opterons 218,
246, 248 and 285) - Memory of 2 or 4GB/node
- 2 clusters Myrinet-10G and 2000, backbone 2.5Gb/s
- ProActive 3.9, MPICH 1.2.7p1, Java SDK 1.6.0_02
-
15Experiment P3D
- Poisson3D equation discretized by finite
differences and iterative resolution by Jacobi - Bulk-synchronous behavior
- Concurrent computation
- jacobi over subdomain
- Reduction of results
- reduce operation
- Update of subdomain borders
- update operation
- Cubic mesh of 10243 elements (4GB of data)
- 100 iterations of the algorithm
- 2 versions legacy version (pure MPI), DG
version
16Experiment P3D (cont.)
P3D Execution Time
- The entire DG communication is asynchronous
- DG update is faster
- ReduceAll in the DG version happens in parallel
P3D update Time
- Time reduce as data/node reduces
- Simple gather-mcast interface is not scalable
- The update with the neighborhood happens in
parallel
17Conclusion
- Extensions to GCM provide many-to-many (MxN)
communication - versatile mechanism
- any kind of communication
- even with limited connectivity
- optimizations ensure efficiency and scalability
- The goal is not compete with MPI, but
experimental results showed a good performance - The DiscoGrid Runtime itself can be considered a
successful component-based grid programming
approach supporting an SPMD model - The API and Runtime also permitted a more
high-level approach to develop non-embarrassingly
applications, where group communication/synchroniz
ation are handled as non-functional aspects
18Perspectives
- Better evaluate the work through more complex
simulations (BHE, CEM) and real-size data meshes - Evaluate the work done in comparison to
grid-oriented versions of MPI - Explore deeper
- the separation of concerns in component
architectures consider SPMD parallel programming
as a component configuration and (re)assembling
activity instead of message passing. - adaptation of applications to contexts
- the definition and usage of collective interfaces
(GridComp)
19Questions
20Gather-Multicast µBenchmarks
0 MB (void) message
- for void messages, optimized version is slower
for all-to-all void messages due to the nb of
calls (82162 vs 1282) - optimized version is considerably faster when
size increases because of reduction of
bottlenecks - Considering that all-to-all messages are
avoided, we can expect a real benefit from the
optimization
10 MB message
- factor no of neighbors, spread throughout
the sites (128 All-to-All) - message size is
constant independent on the factor - results are
based on the average time to send the message and
receive ack