Title: Cactus 4.0
1Cactus 4.0
2Cactus Computational Toolkit and Distributed
Computing
- Solving Einsteins Equations
- Impact on computation
- Large collaborations essential and difficult!
- Code becomes the collaborating tool.
- Cactus, a new community code for 3D
GR-Astrophysics - Toolkit for many PDE systems
- Suite of solvers for Einstein system
- Metacomputing for the general user
- Distributed computing experiments with Cactus and
Globus
Gabrielle Allen, Ed Seidel Albert-Einstein-Institu
t MPI-Gravitationsphysik
3 Einsteins Equations and Gravitational Waves
- Einsteins General Relativity
- Fundamental theory of Physics (Gravity)
- Black holes, neutron stars, gravitational waves,
... - Among most complex equations of physics
- Dozens of coupled, nonlinear hyperbolic-elliptic
- equations with 1000s of terms
- New field Gravitational Wave Astronomy
- Will yield new information about the Universe
- What are gravitational waves? Ripples in
the curvature of spacetime - A last major test of Einsteins theory do they
exist? - Eddington Gravitational waves propagate at the
speed of thought - 1993 Nobel Prize Committee Hulse-Taylor Pulsar
(indirect evidence)
4Detecting Gravitational Gravitational Waves
- LIGO, VIRGO (Pisa), GEO600,1 Billion Worldwide
- We need results from numerical relativity to
- Detect thempattern matching against numerical
templates to enhance signal/noise ratio - Understand themjust what are the waves telling
us?
Hanford Washington Site
4km
5Merger Waveform Must Be Found Numerically
6Axisymmetric Black Hole Simulations Cray C90
Collision of two Black Holes (Misner Data)
Evolution of Highly Distorted Black Hole
7Computational Needs for 3D Numerical Relativity
- Finite Difference Codes
- 104 Flops/zone/time step
- 100 3D arrays
- Currently use 2503
- 15 GBytes
- 15 TFlops/time step
- Need 10003 zones
- 1000 GBytes
- 1000 TFlops/time step
- Need TFlop, TByte machine
- Need Parallel AMR, I/O
t100
t0
- Initial Data 4 couple nonlinear
elliptics - Time step update
- explicit hyperbolic update
- also solve elliptics
8Mix of Varied Technologies and Expertise!
- Scientific/Engineering
- formulation of equations, equation of state,
astrophysics, hydrodynamics ... - Numerical Algorithms
- Finite differences? Finite elements? Structured
meshes? - Hyperbolic equations explicit vs implicit,
shock treatments, dozens of methods (and
presently nothing is fully satisfactory!) - Elliptic equations multigrid, Krylov subspace,
spectral, preconditioners (elliptics currently
require most of the time) - Mesh Refinement?
- Computer Science
- Parallelism (HPF, MPI, PVM, ???)
- Architecture Efficiency (MPP, DSM, Vector, NOW,
???) - I/O Bottlenecks (generate gigabytes per
simulation, checkpointing) - Visualization of all that comes out!
9- Clearly need huge teams, with huge expertise base
to attack such problems - in fact need collections of communities
- But how can they work together effectively?
- Need a code environment that encourages this
10NSF Black Hole Grand Challenge Alliance
- University of Texas (Matzner, Browne)
- NCSA/Illinois/AEI (Seidel, Saylor,
- Smarr, Shapiro,
Saied) - North Carolina (Evans, York)
- Syracuse (G. Fox)
- Cornell (Teukolsky)
- Pittsburgh (Winicour)
- Penn State (Laguna, Finn)
Develop Code To Solve Gmn 0
11NASA Neutron Star Grand Challenge
A Multipurpose Scalable Code for Relativistic
Astrophysics
- NCSA/Illinois/AEI (Saylor, Seidel, Swesty,
Norman) - Argonne (Foster)
- Washington U (Suen)
- Livermore (Ashby)
- Stony Brook (Lattimer)
Develop Code To Solve Gmn 8pTmn
12What we learn from Grand Challenges
- Successful, but also problematic
- No existing infrastructure to support
collaborative HPC - Many scientists are Fortran programmers, and NOT
computer scientists - Many sociological issues of large collaborations
and different cultures - Many language barriers
- Applied mathematicians, computational
scientists, physicists have very different
concepts and vocabularies - Code fragments, styles, routines often clash
- Successfully merged code (after years) often
impossible to transplant into more modern
infrastructure (e.g., add AMR or switch to MPI) - Many serious problems this is what the Cactus
Code seeks to address
13What Is Cactus?
- Cactus was developed as a general, computational
framework for solving PDEs (originally in
numerical relativity and astrophysics) - Modular for easy development, maintenance and
collaborations. Users supply thorns which
plug-into compact core flesh - Configurable thorns register parameter,
variable and scheduling information with runtime
function registry (RFR). Object-orientated
inspired features - Scientist friendly thorns written in F77, F90,
C, C - Accessible parallelism driver layer (thorn) is
hidden from physics thorns by a fixed flesh
interface
14What Is Cactus?
- Standard interfaces interpolation, reduction,
IO, coordinates. Actual routines supplied by
thorns - Portable Cray T3E, Origin, NT/Win9, Linux, O2,
Dec Alpha, Exemplar, SP2 - Free and open community code distributed under
the GNU GPL. Uses as much free software as
possible - Up-to-date new computational developments
and/or thorns immediately available to users
(optimisations, AMR, Globus, IO) - Collaborative thorn structure makes it possible
for large number of people to use and development
toolkits the code becomes the collaborating
tool - New version Cactus beta-4.0 released 30th August
15Core Thorn Arrangements Provide Tools
- Parallel drivers (presently MPI-based)
- (Mesh refinement schemes Nested Boxes, DAGH,
HLL) - Parallel I/O for Output, Filereading,
Checkpointing (HDF5, FlexIO, Panda, etc) - Elliptic solvers (Petsc, Multigrid, SOR, etc)
- Interpolators
- Visualization Tools (IsoSurfacer)
- Coordinates and boundary conditions
- Many relativity thorns
- Groups develop their own thorn arrangements to
add to these
16Cactus 4.0
Boundary
CartGrid3D
WaveToyF77
WaveToyF90
PUGH
FLESH (Parameters, Variables, Scheduling)
GrACE
IOFlexIO
IOHDF5
17Current Status
- It works many people, with different
backgrounds, different personalities, on
different continents, working together
effectively on problems of common interest. - Dozens of physics/astrophysics and computational
modules developed and shared by seed community - Connected modules work together, largely without
collisions - Test suites used to ensure integrity of both code
and physics - How to get it
- Workshop 27 Sept - 1 Oct NCSA
- http//www.ncsa.uiuc.edu/SCD/Training/
Movie from Werner Benger, ZIB
18Near Perfect Scaling
- Excellent scaling on many architectures
- Origin up to 128 processors
- T3E up to 1024
- NCSA NT cluster up to 128 processors
- Achieved 142 Gflops/s on 1024 node T3E-1200
(benchmarked for NASA NS Grand Challenge)
19Many Developers Physics Computational Science
20Metacomputing harnessing power when and where
it is needed
- Easy access to available resources
- Find Resources for interactive use Garching?
ZIB? NCSA? SDSC? - Do I have an account there? Whats the password?
- How do get executable there?
- Where to store data?
- How to launch simulation. What are local queue
structure/OS idiosyncracies?
21Metacomputing harnessing power when and where
it is needed
- Access to more resources
- Einstein equations require extreme memory, speed
- Largest supercomputers too small!
- Networks very fast!
- DFN gigabit testbed 622 Mbits Potsdam-Berlin-Garc
hing, connect multiple supercomputers - Gigabit networking to US possible
- Connect workstations to make supercomputer
22Metacomputing harnessing power when and where
it is needed
- Acquire resources dynamically during simulation!
- Need more resolution in one area
- Interactive visualization, monitoring and
steering from anywhere - Watch simulation as it progresses live
visualisation - Limited bandwidth compute vis. online with
simulation - High bandwidth ship data to be visualised
locally - Interactive Steering
- Are parameters screwed up? Very complex?
- Is memory running low? AMR! What to do? Refine
selectively or acquire additional resources via
Globus? Delete unnecessary grids?
23Metacomputing harnessing power when and where
it is needed
- Call up an expert colleague let her watch it
too - Sharing data space
- Remote collaboration tools
- Visualization server all privileged users can
login and check status/adjust if necessary
24Globus Can provide many such services for
Cactus
- Information (Metacomputing Directory Service
MDS) - Uniform access to structure/state information
- Where can I run Cactus today?
- Scheduling (Globus Resource Access Manager
GRAM) - Low-level scheduler API
- How do I schedule Cactus to run at NCSA?
- Communications (Nexus)
- Multimethod communication QoS management
- How do I connect Garching and ZIB together for a
big run? - Security (Globus Security Infrastructure)
- Single sign-on, key management
- How do I get authority at SDSC for Cactus?
25Globus Can provide many such services for
Cactus
- Health and status (Heartbeat monitor)
- Is my Cactus run dead?
- Remote file access (Global Access to Secondary
Storage GASS) - How do I manage my output, and get executable to
Argonne?
26Colliding Black Holes and MetaComputing German
Project supported by DFN-Verein
- Solving Einsteins Equations
- Developing Techniques to Exploit High Speed
Networks - Remote Visualization
- Distributed Computing Across OC-12 Networks
between AEI (Potsdam), Konrad-Zuse-Institut
(Berlin), and RZG (Garching-bei-München)
AEI
27Distributing Spacetime SC97 Intercontinental
Metacomputing at AEI/Argonne/Garching/NCSA
Immersadesk
512 Node T3E
28Metacomputing the Einstein EquationsConnecting
T3Es in Berlin, Garching, San Diego
29Collaborators
- A distributed astrophysical simulation involving
the following institutions - Albert Einstein Institute (Potsdam, Germany)
- Washington University St. Louis, MO.
- Argonne National Laboratory (Chicago, IL)
- NLANR Distributed Applications Team (Champaign,
IL) - The following supercomputer centers
- San Diego Supercomputer Center (268 proc. T3E)
- Konrad-Zuse-Zentrum in Berlin (232 proc. T3E)
- Max-Planck-Institute in Garching (768 proc. T3E)
30The Grand Plan
- Distribute simulation across 128 PEs of SDSC T3E
and 128 PEs of Konrad-Zuse-Zentrum T3E in
Berlin, using Globus - Visualize isosurface data in real-time on
Immersadesk in Orlando - Transatlantic bandwidth from an OC-3 ATM network
San Diego
Berlin
31SC98 Neutron Star Collision
Movie from Werner Benger, ZIB
32Cactus scaling across PEs(Jason Novotny, NLANR)
33Analysis of metacomputing experiments
- It works! (Thats the main thing we wanted at
SC98) - Cactus not optimized for metacomputing messages
too small, lower MPI bandwidth, could be better - ANL-NCSA
- Measured bandwidth 17Kbits/sec (small) ---
25Mbits/sec (large) - Latency 4ms
- Munich-Berlin
- Measured bandwidth 1.5Kbits/sec (small) ---
4.2Mbits/sec (large) - Latency 42.5ms
- Within single machine Order of magnitude better
- Bottom Line
- Expect to be able to improve performance
significantly - Can run much larger jobs on multiple machines
- Start using Globus routinely for job submission
34The Dream not far away...
Physics Module 1
BH Initial Data
Cactus/Einstein solver
MPI, MG, AMR, DAGH, Viz, I/O, ...
Budding Einstein in Berlin...
Globus Resource Manager
Mass storage
Ultra 3000 Whatever-Wherever
Garching T3E
NCSA Origin 2000 array
35Cactus 4.0 Credits
- Cactus flesh and design
- Gabrielle Allen
- Tom Goodale
- Joan Massó
- Paul Walker
- Computational toolkit
- Flesh authors
- Gerd Lanferman
- Thomas Radke
- John Shalf
- Development toolkit
- Bernd Bruegmann
- Manish Parashar
- Many others
- Relativity and astrophysics
- Flesh authors
- Miguel Alcubierre
- Toni Arbona
- Carles Bona
- Steve Brandt
- Bernd Bruegmann
- Thomas Dramlitsch
- Ed Evans
- Carsten Gundlach
- Gerd Lanferman
- Lars Nerger
- Mark Miller
- Hisaaki Shinkai
- Ryoji Takahashi
- Malcolm Tobias
- Vision and Motivation
- Bernard Schutz
- Ed Seidel "the Evangelist"
- Wai-Mo Suen