Title: The Pencil Code: multi-purpose and multi-user maintained
1The Pencil Code multi-purpose and multi-user
maintained
- Axel Brandenburg
- (Nordita, Copenhagen)
2Overview
- Pencil formulation (advantages, headaches)
- Structure of code, cvs maintainence
- High-order schemes, tests
- Peculiarities on big linux clusters
- Online data processing
3Pencil Code
- Started in Sept. 2001 with Wolfgang Dobler
- High order (6th order in space, 3rd order in
time) - Cache memory efficient
- MPI, can run PacxMPI (across countries!)
- Maintained/developed by many people (CVS!)
- Automatic validation (over night or any time)
- Max resolution so far 10243
4Range of applications
- Isotropic turbulence
- MHD (Nils), passive scalar, cosmic rays
- Stratified layers
- Convection, radiative transport (Tobi)
- Shearing box
- MRI (Nils), Planetesimals (Anders), Interstellar
(Tony) - Sphere embedded in box
- Fully convective stars (Dobler), geodynamo
(McMillan)
5Pencil formulation
- In CRAY days worked with full chunks
f(nx,ny,nz,nvar) - Now, on SGI, nearly 100 cache misses
- Instead work with f(nx,nvar), i.e. one nx-pencil
- No cache misses, negligible work space, just 2N
- Communication before sub-timestep
- Then evaluate all derivatives, e.g. call
curl(f,iA,B) - Vector potential Af(,,,iAxiAz), BB(nx,3)
6A few headaches
- All operations must be combined
- Curl(curl), max5(smooth(divu)) must be in one go
- rms and max values for monitoring
- call max_name(b2,i_bmax,lsqrt.true.)
- call sum_name(b2,i_brms,lsqrt.true.)
- Similar routines for toroidal average, etc
- Online analysis (spectra, slices, vectors)
7CVS maintained
- pserver (password protected)
- Public (check-out only), private (ci/co, 20
people) - Set of 10 test problems
- Nightly auto-test (different machines, web)
- Before check-in run auto-test yourself
- Mpi and nompi dummy module for single processor
machine (or use lammpi on laptops)
8Switch modules
- magnetic or nomagnetic (e.g. just hydro)
- hydro or nohydro (e.g. kinematic dynamo)
- density or nodensity (burgulence)
- entropy or noentropy (e.g. isothermal)
- radiation or noradiation (see Tobis talk)
- dustvelocity or nodustvelocity (planetesimals)
9Features, problems
- Namelist (can freely introduce new params)
- Upgrades forgotten on no-modules (auto-test)
- SGI namelist problem (see pencil FAQs)
10Pencil Code check ins
11High-order schemes
- Alternative to spectral or compact schemes
- Efficiently parallelized
- No transpose necessary
- 6th order central differences in space
- Non-conservative scheme
- Allows use of logarithmic density and entropy
- Copes well with strong stratification and
temperature contrasts
12High-order spatial schemes
Main advantage low phase errors
13Wavenumber characteristics
14Higher order less viscosity
15Less viscosity also in shocks
16High-order temporal schemes
Main advantage low amplitude errors
2N-RK3 scheme (Williamson 1980)
2nd order
3rd order
1st order
17Shock tube test
18Hydromagnetic turbulence and subgrid scale models?
- Want to shorten diffusive subrange
- Waste of resources
- Want to prolong inertial range
- Focus of essential physics
- Reasons to be worried about hyperviscosity
- Shallower spectra
- Wrong amplitudes of resulting large scale fields
19Simulations at 5123
Biskamp Muller (2000)
Normal diffusivity
With hyperdiffusivity
20256 processor run at 10243
21MHD equation
Magn. Vector potential
Induction Equation
Momentum and Continuity eqns
22Vector potential
- BcurlA, advantage divB0
- JcurlBcurl(curlA) curl2A
- Not a disadvantage consider Alfven waves
B-formulation
A-formulation
2nd der once is better than 1st der twice!
23Wallclock time versus processor
24Sensitivity to layout onLinux clusters
Gigabit uplink
100 Mbit link only
- yprox x zproc
- 4 x 32 ? 1 (speed)
- 8 x 16 ? 3 times slower
- 16 x 8 ? 17 times slower
24 procs per hub
25Why this sensitivity to layout?
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
6 7 8 9 0 1 2 3 4
All processors need to communicate with
processors outside to group of 24
26Use exactly 4 columns
Only 2 x 4 8 processors need to communicate
outside the group of 24 ? optimal use of speed
ratio between 100 Mb ethernet switch and 1 Gb
uplink
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
16 17 18 19
20 21 22 23
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
27Fragmentation over many switches
28Animation of uz
29Animation of B vectors
30Animation of B vectors
31Animation of energy spectra
32Saturation behavior explained by magnetic
helicity conservation
Steady state, closed box
Small scale and large scale current helicity in
balance
33With hyperdiffusivity
for ordinary hyperdiffusion
34Conclusions
- Subgrid scale modeling can be unsafe (some
problems) - shallower spectra, longer time scales, different
saturation amplitudes - High order schemes
- Low phase and amplitude errors
- Need less viscosity
- 100 MB link close to bandwidth limit
- Comparable to Origin
- 2x faster with GB switch
- 100 MB switches with GB uplink optimal