Title: The Cactus Code: A Framework for Parallel Computing
1The Cactus Code A Framework for Parallel
Computing
- Gabrielle Allen
- Albert Einstein Institute
- Max Planck Institute for Gravitational Physics
- allen_at_cactuscode.org
2Cactus Code Versions 1,2,3
- A code for Numerical Relativity
- collaborative, portable, parallel, ...
- Model of Flesh Thorns
- Flesh core code, provides parallelisation, IO.
- Thorns plug in modules, written in Fortran or C,
which provide the applications and
infrastructure. - Successful for numerical relativity, but problems
as number of thorns and number of users increased - Redesign, incorporating lessons learnt from
previous versions - Cactus 4.0
3Current Version Cactus 4.0
- Cactus 4.0 beta 1 released September 1999
- Flesh and many thorns distributed under GNU GPL
- Currently Cactus 4.0 beta 8
- Supported Architectures
- SGI Origin
- SGI 32/64
- Cray T3E (142GF on 1024 nodes)
- Dec Alpha
- Intel/Mac Linux
- Windows NT
- HP Exemplar
- IBM SP2
- Sun Solaris
- Hitachi SR8000-F
- NEC SX-5
4Userbase
- Astrophysics (Finite differencing, 3D
hyperbolic/elliptic) PDEs - Einstein equations Black holes, gravitational
waves - Relativistic matter Neutron stars, boson stars
- Newtonian matter ZEUS code
- Aerospace
- Pilot project with DLR, introduction of
unstructured grids - QCD
- Computational Science
- Application code for Grid, Egrid, GrADs projects
- Parallel and Distributed IO
- Remote steering and visualization
- Adaptive mesh refinement
- And more ...
5Einsteins Equations Gravitational Waves
- Einsteins General Relativity
- Fundamental theory of Physics (Gravity)
- Among most complex equations of physics
- Dozens of coupled, nonlinear hyperbolic-elliptic
- equations with 1000s of terms
- Barely have capability to solve after a century
- Predict black holes, gravitational waves, etc.
- Exciting new field about to be born
Gravitational Wave Astronomy - Fundamentally new information about Universe
- What are gravitational waves?? Ripples in
spacetime curvature, caused by matter motion,
causing distances to change - A last major test of Einsteins theory do they
exist? - Eddington Gravitational waves propagate at the
speed of thought - 1993 Nobel Prize Committee Hulse-Taylor Pulsar
(indirect evidence) - 20xx Nobel Committee ??? (For actual
detection)
6Detecting Gravitational Waves
- LIGO, VIRGO (Pisa), GEO600, 1 Billion Worldwide
- Was Einstein right? 5-10 years, well see!
- Well need numerical relativity to
-
Hanford Washington Site
- Detect thempattern matching against numerical
templates to enhance signal/noise ratio - Understand themjust what are the waves telling
us?
4km
7Waveforms What Happens in Nature...
PACS Virtual Machine Room
8Resources for 3D Numerical Relativity
- Explicit Finite Difference Codes
- 104 Flops/zone/time step
- 100 3D arrays
- Require 10003 zones or more
- 1000 Gbytes
- Double resolution 8x memory, 16x Flops
- TFlop, Tbyte machine required
- Parallel AMR, I/O essential
- Etc...
t100
t0
- InitialData 4 coupled nonlinear elliptics
- Evolution
- hyperbolic evolution
- coupled with elliptic eqs.
9Axisymmetric Black Hole Simulations
Collision of two Black Holes (Misner Data)
Evolution of Highly Distorted Black Hole
10NSF Black Hole Grand Challenge Alliance
- University of Texas (Matzner, Browne)
- NCSA/Illinois/AEI (Seidel, Saylor,
- Smarr, Shapiro,
Saied) - North Carolina (Evans, York)
- Syracuse (G. Fox)
- Cornell (Teukolsky)
- Pittsburgh (Winicour)
- Penn State (Laguna, Finn)
Develop Code To Solve Gmn 0
11NASA Neutron Star Grand Challenge
A Multipurpose Scalable Code for Relativistic
Astrophysics
- NCSA/Illinois/AEI (Saylor, Seidel, Swesty,
Norman) - Argonne (Foster)
- Washington U (Suen)
- Livermore (Ashby)
- Stony Brook (Lattimer)
Develop Code To Solve Gmn 8pTmn
12Cactus Modularity
Boundary
CartGrid3D
WaveToyF77
WaveToyF90
PUGH
FLESH (Parameters, Variables, Scheduling)
GrACE
IOFlexIO
IOHDF5
13Cactus 4 Design Goals
- Generalization
- meta-code that can be applied e.g. to any system
of PDEs - mainly 3D cartesian finite differencing codes
(but changing) - Abstraction
- Identify key concepts that can be abstracted
- Evolution skeleton. Reduction operators. I/O.
Etc... - Encapsulation
- Protect the developers of thorns from other
thorns ... - Extension
- Prepare for new concepts in future thorns
- Overloading, Inheritance, etc...
- In some way, make it a little Object Oriented
14Design issues
- Modular and Object Oriented
- Keep the concept of thorns
- Encapsulation, Polymorphism, Inheritance, ...
- Fortran
- Influences most design issues
- Portable Parallelism
- Support for FMR and AMR as well as Unigrid
- Powerful Make system
- Tools such as Testsuite checking technology
15Realization
- Perl
- The final code is created from Thorn
configuration files by perl scripts that are some
sort of seed for a new language - The Cactus Configuration Language (CCL)
- variables, (functions),
- parameters,
- scheduling
- Perl scripts also take care of testsuite
checking, configuration, ... - Flesh written in ANSI C
- Thorns written in C, C, Fortran77, Fortran90
16Cactus Flesh interface between Application
Thorns and Computational Infrastructure Thorns
17The Flesh
- Abstract API
- evolve the same PDE with unigrid, AMR (MPI or
shared memory, etc) without having to change any
of the application code. - Interfaces
- set of data structures that a thorn exports to
the world (global), to its friends (protected)
and to nobody (private) and how these are
inherited. - Implementations
- Different thorns may implement e.g. the evolution
of the same PDE and we select the one we want at
runtime. - Scheduling
- call in a certain order the routines of every
thorn and how to handle their interdependencies. - Parameters
- many types of parameters and all of their
essential consistency checked before running
18Cactus Computational Toolkit
- Parallel Evolution Drivers
- PUGH
- MPI domain decomposition based unigrid driver
- Can be distributed using globus
- GrACE/PAGH
- Adaptive Mesh Refinement driver
- Parallel Elliptic Solvers
- PETSc
- BAM
- Parallel Interpolators
- Parallel I/O
- FlexIO, ASCII, HDF5, Panda, Checkpointing, etc...
- Visualization, etc...
19Data Structures
- Grid Arrays
- An multidimensional and arbitrarily sized array
distributed among processors - Grid Functions
- A field distributed on the multidimensional
computational grid (a Grid Array sized to the
grid) - Every point in a grid may hold a different value
f(x,y,z) - Grid Scalars
- Values common to all the grid points
- Parameters
- Values/Keywords that affect the behavior of the
code (initialization, evolution, output, etc..) - parameter checking, steerable parameters
20Data Types
- Cactus data types to provide portability across
platforms - CCTK_REAL
- CCTK_REAL4, CCTK_REAL8, CCTK_REAL16
- CCTK_INT
- CCTK_INT2, CCTK_INT4, CCTK_INT8
- CCTK_CHAR
- CCTK_COMPLEX
- CCTK_COMPLEX8, CCTK_COMPLEX16, CCTK_COMPLEX32
21Scheduling
- Thorns schedule
- when their routines should be executed
- what memory for Grid Arrays should be enabled
- which Grid Arrays should be synchronized on exit
- Basic evolution skeleton idea
- standard scheduling points INITIAL, EVOL,
ANALYSIS - fine control run this routine BEFORE/AFTER that
routine - Extend/customise with scheduling groups
- Define own scheduling points MYEVOL
- Add my routine to this group of routines
- Run the group WHILE some condition is met
- Future redesign
- The scheduler is really a runtime selector of the
computation flow. - We can add much more power to this concept
22Interface
- The concept contract with the rest of the code
- Now it is only for the data structures
variables and parameters - adding thorn utility routines and their arguments
- Private
- The variables that you want the flesh to
allocate/communicate but no other thorn to see. - Public
- The variables that you want everybody to see
(that means that everybody can modify them too!) - Inheritance
- Protected
- Variables that you want only your friends to see!
- Watch out for the change of meaning from C
names
23Implementation
- Why
- Two or more thorns that provide the same
functionality but different internal
implementation - Interchangeable pieces that allow easy comparison
and evolution in the development process - They are compiled together and only one is
activated at runtime - How
- If all the other thorns need to see the same
contract, then thorns implementing a certain
functionality must - Have the same public variables
- and their protected ones!!
- The same concept applies to parameters and
scheduling - Example
- Wildly different evolution approaches for the
same equations, so all the analysis and initial
data thorns remain the same.
24Parallelism in Cactus
- Cactus is designed around a distributed memory
model. Each thorn is passed a section of the
global grid. - The actual parallel driver (implemented in a
thorn) can use whatever method it likes to
decompose the grid across processors and exchange
ghost zone information - each thorn is presented
with a standard interface, independent of the
driver.
drivernghostzones 1
25PUGH
- The standard parallel driver supplied with Cactus
is supplied by thorn PUGH - Driver thorn Sets up grid variables, handles
processor decomposition, deals with processor
communications - 1,2,3D (soon n-D) Grid Arrays/Functions
- Uses MPI
- Custom processor decomposition/Load balancing
- Otherwise decomposes in z, then y, then x
directions
26Parallelizing an Application Thorn
- All these calls are overloaded by infrastructure
thorns - CCTK_SyncGroup
- synchronise ghostzones for a group of grid
variables - CCTK_Reduce
- call any registered reduction operator, e.g.
maximum value over the grid - CCTK_Interpolate
- call any registered interpolation operator
- CCTK_MyProc
- unique processor number within the computation
- CCTK_nProcs
- total number of processors
- CCTK_Barrier
- waits for all processors to reach this point
27Building an executable
- Compiling Cactus involves two stages
- creating a configuration
- compiling the source files to an executable
- Configuration
- Cactus can be compiled with different compilers,
different compilation options, with different
lists of thorns, with different external
libraries (e.g. MPICH or LAM), and on different
architectures. - To facilitate this Cactus uses configurations,
which store all the distinct information used to
build a particular executable (Cactus/configs) - Each configuration is given a unique name.
28Configuration Options
- gmake MyConfig-config ltoptionsgt (or options file)
- Default options decided by autoconf
- Compiler and tool specification e.g.
- F77/weirdplace/pgf90
- Compilation and tool flags e.g.
- CFLAGSsave-temps
- DEBUGALL
- Library and include file specification
- Precision options e.g.
- REAL_PRECISION16
29Configuring with External Packages
- Cactus currently knows about the external
packages - MPI (NATIVE, MPICH, LAM, WMPI,CUSTOM)
- HDF5
- GRACE
- and will search standard locations for them
- gmake MyConfig MPINATIVE
- gmake MyConfig MPIMPICH MPICH_DEVICEglobus
GLOBUS_LIB_DIR/usr/local/globus/lib - gmake MyConfig MPICUSTOM MPI_LIBSmpi
MPI_LIB_DIRS/usr/lib MPI_INC_DIRS/usr/include
30Compile Options
- gmake MyConfig ltoptionsgt
- Parallel build
- FJOBSltngt
- TJOBSltngt
- Compilation debugging
- SILENTno
- Compiler warnings
- WARNyes
31Running Cactus
- ./exe/cactus_MyConfig MyParameterFile.par
- Additional command line options
- -h help
- -Ov details about all parameters
- -o ltparamgt details about one parameter
- -v version number, compile date
- -T list all thorns
- -t ltthorngt is thorn compiled
- -r redirect stdout
- -W ltnumgt reset warning level
- -E ltnumgt reset error level
32Parameter Files
- Cactus runs from a users parameter file
- chooses the thorns to be used for the run (so
that inactive thorns cant do any damage) - sets parameters which are different from default
values - !desc Demonstrates my new application
- ActiveThorns PUGH WaveToyF77 Boundary
CartGrid3D - driverglobal_size 30 Change the grid size
- wavetoy initial_data wave Initial data
33MetaComputing
- Scientists want easy access to available
resources - Authentication, file systems, batch queues ...
- They also want access to many more resources
- Einstein equations require extreme memory, speed
- Largest supercomputers too small
- Want to access multiple supercomputers for large
runs - With AMR etc will want to acquire resources
dynamically during simulation - Interactive visualization and steering of
simulations from anywhere
34MetaComputing Experiments
- SC93 remote CM-5 simulation with live viz in
CAVE - SC95 Heroic I-Way experiments leads to
development of Globus. Cornell SP-2, Power
Challenge, with live viz in San Diego CAVE - SC97 Garching 512 node T3E, launched,
controlled, visualized in San Jose - SC98 HPC Challenge. SDSC, ZIB, and Garching
T3E compute collision of 2 Neutron Stars,
controlled from Orlando - SC99 Colliding Black Holes using Garching, ZIB
T3Es, with remote collaborative interaction and
viz at ANL and NCSA booths - April/May 2000 Attempting to use LANL, NCSA,
NERSC, SDSC, ...ZIB, Garching, for single
simulation
35Grid Enabled Cactus
- Collaboration between AEI, ANL, U. Chicago, N.
Illinois U. to run a 512x512x2048 Black Hole
collision - Cactus Globus/MPICH-G2
- Machines
- 1000 IBP SP2 at SDSC,
- 512 T3E at NERSC,
- 1500 Origin 2000 at NCSA,
- 128 Origin 2000 at ANL.
- Possibly more
- Connected via high-speed networks
- Issues different processor types, memories,
operating systems, resource management, varied
networks, bandwidths and latencies,
36Cactus Globus
Cactus Application Thorns Distribution
information hidden from programmer Initial data,
Evolution, Analysis, etc
Grid Aware Application Thorns Drivers for
parallelism, IO, communication, data
mapping PUGH parallelism via MPI (MPICH-G2,
grid enabled message passing library)
Grid Enabled Communication Library MPICH-G2
implementation of MPI, can run MPI programs
across heterogenous computing resources
Standard MPI
Single Proc
37Remote Steering/Visualization Architecture
38Coming up
- Thorns written in Java or Perl
- Cactus communication layer
- Parallel driver thorn (e.g. PUGH) currently
provides both variable management and
communication - abstract send and receives etc
- Abstract communication from driver thorn
- easily implement different parallel paradigms
- shared memory, threads, Corba, OpenMP, PVM, ...
- Compact groups (different layout in memory for
improved Cache performance) - Unstructured Meshes/Finite Elements/Spectral
Methods - Unstructured Multigrid Solver
- Convergence/Multiple Coordinate Patches
- Capability browsing mechanism
- Command line interface connect directly to
Cactus, scheduling - GUIs, Documentation, GUIs, Documentation .
39www.CactusCode.org
- Documentation
- IEEE Computer December 1999
- Users Guide
- Maintainers Guide
- Download
- CVS distribution (stable and development
versions) - Development
- Bugs and feature requests
- Mailing lists (e.g. cactususers_at_cactuscode.org,
flesh_at_cactuscode.org) - Showcase
- Presentations, publications, movies...
- News, and Links to related institutions, software