Title: Integrating LargeScale Distributed and Parallel High Performance Computing DPHPC Applications Using
1Integrating Large-Scale Distributed and Parallel
High Performance Computing (DPHPC) Applications
Using a Component-based Architecture
- Nanbor Wang1, Fang (Cherry) Liu2, Paul Hamil1,
Stephen Tramer1, Rooparoni Pundaleeka1, Randall
Bramley2 - 1Tech-X Corporation 2Indiana University
Boulder, CO U.S.A Bloomington, IN, U.S.A
Workshop on Component-Based High-Performance
ComputingOctober 16, 2008 Karlsruhe, Germany
Work partially funded by the US Department of
Energy, Office of Advanced Scientific Computing
Research, Grant DE-FG02-04ER84099
2Agenda
- Motivation and approach for Distributed and
Parallel High-Performance Computing (DPHPC) - Enabling distributed technologies
- Applications development
3Distributed and Parallel Component-Based Software
Engineering Addresses Modern Needs of Scientific
Computing
- Motivating scenarios for Distributed and Parallel
HPC (DPHPC) - Integrate separately-developed and established
codes FSP, climate modeling, space weather
modeling, each component needing its own
architecture - Provide ways to better utilize high-CPU number
hardware and combine computing resources of
multiple clusters/computing centers - Enable parallel data streaming between computing
task and post-processing task (no feedback to the
solver) - Integrate multiple parallel codes using
heterogeneous architectures - Existing component standards and frameworks
designed with enterprise applications in mind - No support for features that are important for
HPC scientific applications interoperability
with scientific programming languages (FORTRAN)
and parallel computing infrastructure (MPI) - CCA address needs of HPC scientific applications
combustion modeling, global climate modeling,
fusion and plasma simulations - Tasks
- Explore various distributed technologies and
approaches for DPHPC - Enhance tool support for DPHPC F2003 struct
support (covered later in Stefans talk)
4Typical Parallel CCA Frameworks
MCMD
SCMD
- Support both SPMD and MPMD scenarios
- Stay out of the way of component parallelism
- Components handle parallel communication
5An Illustration of DPHPC Application
Alternative MCMD
- Still support conventional CCA component managed
parallelism - Provide additional framework mediated distributed
inter-component communication capability
Cooperative Processing LLNLPACO INRIA
6Agenda
- Motivation for Distributed and Parallel
High-Performance Computing (DPHPC) - Enabling distributed technologies
- Applications development
7Babel RMI Allows Multiple Implementations
- Babel generates mapping for remote invocations,
and has its own transfer protocol Simple
Protocol implemented in C - Thanks to Babels open architecture and language
interoperability users can take advantage of
various distributed technologies through third
party RMI libraries - We have developed a CORBA protocol library for
Babel RMI using TAO (version 1.5.1 or later) - The first 3rd-party Babel RMI library
- TAO is the C based CORBA middleware framework
- This protocol is essentially a bridge between
Babel and TAO
8Using CORBA in Babel RMI Allows CORBA and Babel
Objects to Interoperate
- Goal is to
- Allow interoperability between existing CORBA and
Babel objects - Retain performance of CORBA IIOP protocol
- Possible approaches for serialization
- Encapsulating Babel Simple Protocol wire-format
into a block of binary data and transport it
using CORBA (as Octet Sequence) - Encapsulating Babel communications into CORBA Any
objects (did not follow up because of
inefficiency of Any) - Mapping Babel communications to CORBA format
directly (the adopted approach). CORBA uses
Common Data Representation (CDR) in the wire.
9Direct Conversions Between CORBA Babel types
Enable Interoperability with Little Penalty
- module taoiiop
-
- module rmi
- exception ServerException string
info - struct fcomplex float real float
imaginary -
- struct dcomplex double real double
imaginary -
- /
- SIDL arrays are mapped to CORBA structs which
keep all the metadata information and the array
values are stored as CORBA sequence following
the metadata - /
- typedef sequence ltlonggt ArrayDims
-
- struct Array_Metadata short
ordering short dims ArrayDims stride
ArrayDims lower ArrayDims upper
- AfterTaoIIOP 2.0 has a performance close to raw
socket - Optimizations Made CORBA-Babel mapping types
native in TAO by implementing optimized,
zero-copy version of marshaling and demarshaling
support
10Agenda
- Motivation for Distributed and Parallel
High-Performance Computing (DPHPC) - Enabling distributed technologies
- Applications development
11Leveraging Oneway and Asynchronous Calls to
Increase Application Parallelism
Compute-bound task
Compute-bound task
Compute-bound task
Simulation cluster
Dump data
Dump data
signal
signal
Data Analysis
Data Analysis
Remote cluster
Synchronous Invocations
Asynchronous/oneway Invocations
Simulation cluster
Compute-bound task
Compute-bound task
Compute-bound task
Compute-bound task
Dump data
signal
Data Analysis
Data Analysis
Remote cluster
Data Analysis
12Performance Comparison TaoIIOP Async and Oneway
Calls
- Figure shows average time for each time step
- Very lightweight data analysis emphasis on
transport cost - 0 payload actually makes no remote invocation
- Babel team is working on a new RMI implementation
13VORPAL is a Versatile Framework for Physics
Simulations
- Highly-flexible, arbitrary-dimension
- Plasma and beam simulations using multiple models
- Utilize both MPI and parallel I/O
- Use of robust ltinitgt file to configure a
simulation task
ltgrid Globalgridgt numPhysCells NX, NY, NZ
length LX, LY, LZ lt/gridgt ltDecomp decompgt
decompTyperegular lt/Decompgt
ltEmField myemfieldgt kindyeeEmField lt/EmFieldgt
ltSpecies Electronsgt kindrelBoris
massELEMACS lt/Speciesgt
14Componentize VORPAL to perform On-demand Data
Processing
15DPHPC Application Speed-up for On-line Data
Analysis
- We had developed a prototype to perform online
data-analysis as a proof-of-concept - Run in the same cluster as two group of
processors - 20 speedup was observed
- More speed up with elaborate data processing
We modified the VORPAL source code separately for
this prototype
16DPHPC Applications Remote Monitoring/Steering
of Simulations
- We have extended Vorpal component framework to
interact with CCA framework through Babel RMI - Configurable from Vorpals initialization
fileltHistory historyNamegt kind
historyKind ltSender mySendergt kind
babelSender babelRmiURL eclipse.txcorp.com
8081 lt/Sendergtlt/Historygt - Support specification of a URL group a list of
URLs running parallel tasks
- We are able to connect a running simulation to
one or multiple workstations - For online data processing/analysis
- For monitoring simulation
- Physicists are most interested in
- Monitoring
- Steering
VpBabelSender connecting to taoiiophandle//quart
ic.txcorp.com8081 VpBabelSender endpoint URL
taoiiophandle//quartic.txcorp.com8081/1000
VorpalClient constructor update 1
time6.128014e-13 update 2 time1.225603e-12
17Summary
- Implemented the distributed proxy components and
the TaoIIOP Babel RMI protocol for connecting
distributed CCA applications into an integrated
systems - Conducted performance benchmarking on preliminary
prototype implementation (version 1.0) to
identify key optimizations needed - Implemented the optimizations to minimize the
overhead (version 2.0) - Interoperability with CORBA can be achieved with
little/no performance penalty
18Summary and Future Directions
- Interoperability with CORBA can be achieved with
little/no performance penalty - Implement more scenarios of mixing distributed
and high performance components involving several
clusters and real applications - Synergy with MCMD
- Support for petascale HPC applications
- Remote monitoring/steering of large-scale
simulations on supercomuters (e.g., franklin) - Can take advantage of CORBA-Babel RMI
interoperability for now and switch to TAOIIOP
later