Title: Experiences with Applications on the Grid using PACX-MPI
1Experiences with Applications on the Grid using
PACX-MPI
- Matthias Mueller
- mueller_at_hlrs.de
- HLRS
- www.hlrs.de
- University Stuttgart
- Germany
2Outline
- Definition, Scenarios and Success Stories
- Middleware and Tools The DAMIEN project
- Case Study
3Grid Scenario
- Standard approach one big supercomputer
- Grid approach distributed resources
4Example of Distributed Resources Supercomputers
- HLRS Cray T3E 512/900, 460 GFlops
- CSAR Cray T3E 576/1200, 691 GFlops
- PSC Cray T3E 512/900, 460 GFlops
- TACC Hitachi SR8000 512CPU/64 Nodes, 512
GFlops - NCHC IBM SP3 Winter Hawk2,168CPU/42 Nodes, 252
GFlops - JAERI NEC SX-4/4 Nodes, 8
GFlops 2.383 TFlops
5Applications
- CFD (HLRS)
- re-entry simulation of space craft
- Processing of radio astronomy data (MC)
- pulsar search code
- DSMC (HLRS)
- simulation of granular media
6GWAAT Global Wide Area Application Testbed
NSF Award at SC 99
7Network Topology
TEN 155
DFN Frankfurt
New York DanteDFN
APAN
Abilene
DFN
JANET
X.X.X.X
IMnet
SCInet
Belwü RUS
Tokyo
Hsinchu
Tsukuba/Tokyo
Dallas
Manchester
Stuttgart
TACC Hitachi SR 8000sr8k.aist.go.jp
150.29.228.82
JAERI NEC SX-4frente.koma.jaeri.go.jp
202.241.61.92
NCHC IBM SP3ivory.nchc.gov.tw 140.110.7.x
MCC Cray T3E turing.cfs.ac.uk 130.88.212.1
HLRS Cray T3Ehwwt3e-at.hww.de129.69.200.195
8Middleware for Scientific Grid Computing PACX-MPI
- PACX-MPI is a Grid enabled MPI implementation
- no difference between parallel computing and Grid
computing - higher latencies for external messages (70ms
compared to 20?s)
Co-operation with JAERI regarding Communication
Library (stampi)
9Status of the Implementation (I)
- Full MPI 1.2 implemented
- MPI-2 functionality
- Extended collective operations
- Language interoperability routines
- Canonical Pack/Unpack Functions
- MPI 2 JoD Cluster attributes
- Other implemented features
- data conversion
- data compression
- data encryption
10Status of the implementation (II)
- PACX-MPI ported and tested on
- Cray T3E
- SGI Origin/Onyx
- Hitachi SR2201 and SR8000
- NEC SX4 and SX5
- IBM RS6000/SP
- SUN platforms
- Alpha platforms
- LINUX IA32 and IA64
11Challenge What is the application performance?
- CFD (HLRS)
- re-entry simulation of space craft
- Processing of radio astronomy data (MC)
- pulsar search code
- DSMC (HLRS)
- simulation of granular media
- Electronic structure simulation (PSC)
- Risk management for environment crisis(JAERI)
- GFMC (NCHC)
- high-tc superconductor simulation
- Coupled vibro-acoustic simulation
12Comparision DSMC lt-gt MD
- Domain decomposition
- Speed up
DSMC
MD
13DSMC - Direct Simulation Monte Carlo on
Transatlantic Grid
14- Necessary Tools
- The DAMIEN project
15The development phase
- Sequential code(s)
- Parallel (MPI) code(s)
- parallelization
- code coupling
- optimisation
MPI
MpCCI
- compiling
- linking with libraries
MpCCI
PACX-MPI
- debugging
- performance analysis
Marmot
MetaVampir
no
yes
16MpCCI Basic Functionality
- Communication
- Based on MPI
- Coupling of Sequential and Parallel Codes
- Communicators for Codes (internal) and Coupling
(external) - Neighborhood Search
- Bucket Search Algorithm
- Interface for User-defined Neighborhood Search
- Interpolation
- Linear Surface Interpolation for Standard
Elements - Volume Interpolation
- User-defined Interpolation
17DAMIEN End-User application
- EADS distributed over numerous sites all over
Europe - Computing resources are distributed ? natural
need for Grid-software to couple the resources - Coupled vibro-acoustic simulations
- structure of
- rockets during the
- launch
- noise reduction
- in airplanes-cabins
18MetaVampir - Application Level Analysis
19The production phase
Experience with small problem sizes
MetaVampirtrace
Grid-enabled code
Given problem
Dimemastrace
Dimemas
Determine optimal number of processors and
combination of machines
MetaVampir
Execute job
Configuration Manager
QoS Manager
20Dimemas Tuning Methodology
21DAMIEN tools in the production phase
Dimemas-Tracefile
22 23PCM
- Direct numerical
- simulation of turbulent
- reactive flows
- 2-D flow fields
- detailed chemical reactions
- spatial discretization
- 6th order central derivatives
- integration in time
- 4th order explicit Runge-Kutta
24Requirements for a 3D simulation
- Components
- density, velocity and energy
- mass fractions for chemical species (9 - 100)
- Spatial discretization
- 100 ?m (typical flame-front), 1 cm for
computational domain - 100 grid-points into each direction
- Discretization in time
- 10-8 for some important intermediate radicals
- 1 s for slowly produced pollutants (e.g. NO)
- Summary
- 100 variables, 106 grid-points
- 1 ms simulation time with time steps of about
10-8 - 105 iterations with 108 unknowns
25Example of a production run
- auto-ignition process
- fuel (10H2 and 90N2, T298K) and heated
oxidiser (air at T1298K) - distribution is superimposed with turbulent
flow-field - temporal evolution computed
26Performance Analysis with Meta-Vampir
27Message statistics - process view
28Message statistics - cluster-view
29Result of production run Maximum heat-release
30Summary and Conclusion
- To make use of the Grid you need middleware and
tools - A Grid aware MPI implementation like PACX-MPI
offers an incremental, optimized approach to the
Grid - Together with other standard tools this attracted
a lot of scientific applications - Applications are the driving force for PACX-MPI
and DAMIEN - This kind of scientific Grid computing is very
demanding, but the requirements are common to
other forms of Grid computing - performance, resource management, scheduling,
security - The network is the bottleneck, because
- Fat networks that are weakly interconnected
- Political barriers between networks
- Performance is not transparent
- no responsibility for end-to-end performance