Integrating LargeScale Distributed and Parallel High Performance Computing DPHPC Applications Using PowerPoint PPT Presentation

presentation player overlay
1 / 18
About This Presentation
Transcript and Presenter's Notes

Title: Integrating LargeScale Distributed and Parallel High Performance Computing DPHPC Applications Using


1
Integrating Large-Scale Distributed and Parallel
High Performance Computing (DPHPC) Applications
Using a Component-based Architecture
  • Nanbor Wang1, Fang (Cherry) Liu2, Paul Hamil1,
    Stephen Tramer1, Rooparoni Pundaleeka1, Randall
    Bramley2
  • 1Tech-X Corporation 2Indiana University
    Boulder, CO U.S.A Bloomington, IN, U.S.A

Workshop on Component-Based High-Performance
ComputingOctober 16, 2008 Karlsruhe, Germany
Work partially funded by the US Department of
Energy, Office of Advanced Scientific Computing
Research, Grant DE-FG02-04ER84099
2
Agenda
  • Motivation and approach for Distributed and
    Parallel High-Performance Computing (DPHPC)
  • Enabling distributed technologies
  • Applications development

3
Distributed and Parallel Component-Based Software
Engineering Addresses Modern Needs of Scientific
Computing
  • Motivating scenarios for Distributed and Parallel
    HPC (DPHPC)
  • Integrate separately-developed and established
    codes FSP, climate modeling, space weather
    modeling, each component needing its own
    architecture
  • Provide ways to better utilize high-CPU number
    hardware and combine computing resources of
    multiple clusters/computing centers
  • Enable parallel data streaming between computing
    task and post-processing task (no feedback to the
    solver)
  • Integrate multiple parallel codes using
    heterogeneous architectures
  • Existing component standards and frameworks
    designed with enterprise applications in mind
  • No support for features that are important for
    HPC scientific applications interoperability
    with scientific programming languages (FORTRAN)
    and parallel computing infrastructure (MPI)
  • CCA address needs of HPC scientific applications
    combustion modeling, global climate modeling,
    fusion and plasma simulations
  • Tasks
  • Explore various distributed technologies and
    approaches for DPHPC
  • Enhance tool support for DPHPC F2003 struct
    support (covered later in Stefans talk)

4
Typical Parallel CCA Frameworks
MCMD
SCMD
  • Support both SPMD and MPMD scenarios
  • Stay out of the way of component parallelism
  • Components handle parallel communication

5
An Illustration of DPHPC Application
Alternative MCMD
  • Still support conventional CCA component managed
    parallelism
  • Provide additional framework mediated distributed
    inter-component communication capability

Cooperative Processing LLNLPACO INRIA
6
Agenda
  • Motivation for Distributed and Parallel
    High-Performance Computing (DPHPC)
  • Enabling distributed technologies
  • Applications development

7
Babel RMI Allows Multiple Implementations
  • Babel generates mapping for remote invocations,
    and has its own transfer protocol Simple
    Protocol implemented in C
  • Thanks to Babels open architecture and language
    interoperability users can take advantage of
    various distributed technologies through third
    party RMI libraries
  • We have developed a CORBA protocol library for
    Babel RMI using TAO (version 1.5.1 or later)
  • The first 3rd-party Babel RMI library
  • TAO is the C based CORBA middleware framework
  • This protocol is essentially a bridge between
    Babel and TAO

8
Using CORBA in Babel RMI Allows CORBA and Babel
Objects to Interoperate
  • Goal is to
  • Allow interoperability between existing CORBA and
    Babel objects
  • Retain performance of CORBA IIOP protocol
  • Possible approaches for serialization
  • Encapsulating Babel Simple Protocol wire-format
    into a block of binary data and transport it
    using CORBA (as Octet Sequence)
  • Encapsulating Babel communications into CORBA Any
    objects (did not follow up because of
    inefficiency of Any)
  • Mapping Babel communications to CORBA format
    directly (the adopted approach). CORBA uses
    Common Data Representation (CDR) in the wire.

9
Direct Conversions Between CORBA Babel types
Enable Interoperability with Little Penalty
  • module taoiiop
  • module rmi
  • exception ServerException string
    info
  • struct fcomplex float real float
    imaginary
  • struct dcomplex double real double
    imaginary
  • /
  • SIDL arrays are mapped to CORBA structs which
    keep all the metadata information and the array
    values are stored as CORBA sequence following
    the metadata
  • /
  • typedef sequence ltlonggt ArrayDims
  • struct Array_Metadata short
    ordering short dims ArrayDims stride
    ArrayDims lower ArrayDims upper
  • AfterTaoIIOP 2.0 has a performance close to raw
    socket
  • Optimizations Made CORBA-Babel mapping types
    native in TAO by implementing optimized,
    zero-copy version of marshaling and demarshaling
    support

10
Agenda
  • Motivation for Distributed and Parallel
    High-Performance Computing (DPHPC)
  • Enabling distributed technologies
  • Applications development

11
Leveraging Oneway and Asynchronous Calls to
Increase Application Parallelism
Compute-bound task
Compute-bound task
Compute-bound task
Simulation cluster
Dump data
Dump data
signal
signal
Data Analysis
Data Analysis
Remote cluster
Synchronous Invocations
Asynchronous/oneway Invocations
Simulation cluster
Compute-bound task
Compute-bound task
Compute-bound task
Compute-bound task
Dump data
signal
Data Analysis
Data Analysis
Remote cluster
Data Analysis
12
Performance Comparison TaoIIOP Async and Oneway
Calls
  • Figure shows average time for each time step
  • Very lightweight data analysis emphasis on
    transport cost
  • 0 payload actually makes no remote invocation
  • Babel team is working on a new RMI implementation

13
VORPAL is a Versatile Framework for Physics
Simulations
  • Highly-flexible, arbitrary-dimension
  • Plasma and beam simulations using multiple models
  • Utilize both MPI and parallel I/O
  • Use of robust ltinitgt file to configure a
    simulation task

ltgrid Globalgridgt numPhysCells NX, NY, NZ
length LX, LY, LZ lt/gridgt ltDecomp decompgt
decompTyperegular lt/Decompgt
ltEmField myemfieldgt kindyeeEmField lt/EmFieldgt
ltSpecies Electronsgt kindrelBoris
massELEMACS lt/Speciesgt
14
Componentize VORPAL to perform On-demand Data
Processing
15
DPHPC Application Speed-up for On-line Data
Analysis
  • We had developed a prototype to perform online
    data-analysis as a proof-of-concept
  • Run in the same cluster as two group of
    processors
  • 20 speedup was observed
  • More speed up with elaborate data processing

We modified the VORPAL source code separately for
this prototype
16
DPHPC Applications Remote Monitoring/Steering
of Simulations
  • We have extended Vorpal component framework to
    interact with CCA framework through Babel RMI
  • Configurable from Vorpals initialization
    fileltHistory historyNamegt kind
    historyKind ltSender mySendergt kind
    babelSender babelRmiURL eclipse.txcorp.com
    8081 lt/Sendergtlt/Historygt
  • Support specification of a URL group a list of
    URLs running parallel tasks
  • We are able to connect a running simulation to
    one or multiple workstations
  • For online data processing/analysis
  • For monitoring simulation
  • Physicists are most interested in
  • Monitoring
  • Steering

VpBabelSender connecting to taoiiophandle//quart
ic.txcorp.com8081 VpBabelSender endpoint URL
taoiiophandle//quartic.txcorp.com8081/1000
VorpalClient constructor update 1
time6.128014e-13 update 2 time1.225603e-12
17
Summary
  • Implemented the distributed proxy components and
    the TaoIIOP Babel RMI protocol for connecting
    distributed CCA applications into an integrated
    systems
  • Conducted performance benchmarking on preliminary
    prototype implementation (version 1.0) to
    identify key optimizations needed
  • Implemented the optimizations to minimize the
    overhead (version 2.0)
  • Interoperability with CORBA can be achieved with
    little/no performance penalty

18
Summary and Future Directions
  • Interoperability with CORBA can be achieved with
    little/no performance penalty
  • Implement more scenarios of mixing distributed
    and high performance components involving several
    clusters and real applications
  • Synergy with MCMD
  • Support for petascale HPC applications
  • Remote monitoring/steering of large-scale
    simulations on supercomuters (e.g., franklin)
  • Can take advantage of CORBA-Babel RMI
    interoperability for now and switch to TAOIIOP
    later
Write a Comment
User Comments (0)
About PowerShow.com