Title: Terascaling Applications on HPCx: The First 12 Months
1Terascaling Applications on HPCx The First 12
Months
- Mike Ashworth
- HPCx Terascaling Team
- HPCx Service
- CCLRC Daresbury Laboratory
- UK
- m.ashworth_at_dl.ac.uk
- http//www.hpcx.ac.uk/
2Outline
- Terascaling Objectives
- Case Studies
- DL-POLY
- CRYSTAL
- CASTEP
- AMBER
- PFARM
- PCHAN
- POLCOMS
- Efficiency of Codes
- Summary
Application, and not H/W driven
3 4Terascaling Objectives
Jobs which use gt 50 of cpus
- The primary aim of the HPCx service is Capability
Computing - Key objective that user codes should scale to
O(1000) cpus - Largest part of our science support is the
Terascaling Team - Understanding performance and scaling of key
codes - Enabling world-leading calculations
(demonstrators) - Closely linked with Software Engineering Team and
Applications Support Team
5Strategy for Capability Computing
- Performance Attributes of Key Applications
- Trouble-shooting with Vampir Paraver
- Scalability of Numerical Algorithms
- Parallel eigensolvers. FFTs etc
- Optimisation of Communication Collectives
- e.g., MPI_ALLTOALLV and CASTEP
- New Techniques
- Mixed-mode programming
- Memory-driven Approaches
- e.g., In-core SCF DFT, direct minimisation
CRYSTAL - Migration from replicated to distributed data
- e.g., DL_POLY3
- Scientific drivers amenable to Capability
Computing - - Enhanced Sampling Methods, Replica Methods
HPCx Terascaling Team
6 7Molecular Simulation
- DL_POLY
- W. Smith and T.R. Forester, CLRC Daresbury
Laboratory - General purpose molecular dynamics simulation
package - http//www.cse.clrc.ac.uk/msi/software/DL_POLY/
8DL_POLY3 Coulomb Energy Performance
- Distributed Data
- SPME, with revised FFT Scheme
Performance Relative to the Cray T3E/1200E
DL_POLY3 216,000 ions, 200 time steps, Cutoff12Å
Number of CPUs
9DL_POLY3 Macromolecular Simulations
Gramicidin in water rigid bonds SHAKE 792,960
ions, 50 time steps
Measured Time (seconds)
Performance Relative to the SGI Origin
3800/R14k-500
Number of CPUs
Number of CPUs
10Materials Science
- CRYSTAL
- calculate wave-functions and properties of
crystalline systems - periodic Hartree-Fock or density functional
Kohn-Sham Hamiltonian - various hybrid approximations
- http//www.cse.clrc.ac.uk/cmg/CRYSTAL/
11Crystal
- Electronic structure and related properties of
periodic systems - All electron, local Gaussian basis set, DFT and
Hartree-Fock - Under continuous development since 1974
- Distributed to over 500 sites world wide
- Developed jointly by Daresbury and the University
of Turin
12Crystal Functionality
- Basis Set
- LCAO - Gaussians
- All electron or pseudopotential
- Hamiltonian
- Hartree-Fock (UHF, RHF)
- DFT (LSDA, GGA)
- Hybrid funcs (B3LYP)
- Techniques
- Replicated data parallel
- Distributed data parallel
- Forces
- Structural optimization
- Direct SCF
- Visualisation
- AVS GUI (DLV)
Properties Energy Structure
Vibrations (phonons) Elastic tensor
Ferroelectric polarisation Piezoelectric
constants X-ray structure factors Density
of States / Bands Charge/Spin Densities
Magnetic Coupling Electrostatics (V, E, EFG
classical) Fermi contact (NMR) EMD
(Compton, e-2e)
13Benchmark Runs on Crambin
- Very small protein from Crambe Abyssinica - 1284
atoms per unit cell - Initial studies using STO3G (3948 basis
functions) - Improved to 6-31G (12354 functions)
- All calculations Hartree-Fock
- As far as we know the largest Hartree-Fock
calculation ever converged
14Scalability of CRYSTAL for crystalline Crambin
HPCx vs. SGI Origin
faster, more stable version of the parallel
Jacobi diagonalizer replaces ScaLaPack
Increasing the basis set size increases the
scalability
15Crambin Results Electrostatic Potential
- Charge density isosurface coloured according to
potential - Useful to determine possible chemically active
groups
16Futures - Rusticyanin
- Rusticyanin (Thiobacillus Ferrooxidans) has 6284
atoms (Crambin was 1284) and is involved in redox
processes - We have just started calculations using over
33000 basis functions - In collaboration with S.Hasnain (DL) we want to
calculate redox potentials for rusticyanin and
associated mutants
17Materials Science
- CASTEP
- CAmbridge Serial Total Energy Package
- http//www.cse.clrc.ac.uk/cmg/NETWORKS/UKCP/
18What is Castep?
- First principles (DFT) materials simulation code
- electronic energy
- geometry optimization
- surface interactions
- vibrational spectra
- materials under pressure, chemical reactions
- molecular dynamics
- Method (direct minimization)
- plane wave expansion of valence electrons
- pseudopotentials for core electrons
19Castep 2003 HPCx performance gain
- Bottleneck
- Data Traffic in 3D FFT and MPI_AlltoAllV
20Castep 2003 HPCx performance gain
21Molecular Simulation
- AMBER
- (Assisted Model Building with Energy Refinement)
- Weiner and Kollman, University of California,
1981 - Widely used suite of programs particularly for
biomolecules - http//amber.scripps.edu/
22AMBER - Initial Scaling
- Factor IX protein with Ca ions 90906 atoms
23Current developments - AMBER
- Bob Duke
- Developed a new version of Sander on HPCx
- Originally called AMD (Amber Molecular Dynamics)
- Renamed PMEMD (Particle Mesh Ewald Molecular
Dynamics) - Substantial rewrite of the code
- Converted to Fortran90, removed multiple copies
of routines, - Likely to be incorporated into AMBER8
- We are looking at optimising the collective
communications the reduction / scatter
24Optimisation PMEMD
25Atomic and Molecular Physics
- PFARM
- Queens University Belfast, CLRC Daresbury
Laboratory - R-matrix formalism to treat applications such as
the description of the edge region in Tokamak
plasmas (fusion power research) and for the
interpretation of astrophysical spectra
26External Region Calculation Timings
PFARM Performance Ratio vs. Cray T3E/1200E
Elapsed Time (seconds)
CPUs
Bottleneck Matrix Diagonalisation
CPUs
27Peigs vs. ScaLapack in PFARM
Bottleneck Matrix Diagonalisation
28ScaLapack diagonalisation on HPCx
29Stage 1 (Sector Diags) on HPCx
- Sector Hamiltonian matrix size 10032 (x 3 sectors)
30Computational Engineering
- UK Turbulence Consortium
- Led by Prof. Neil Sandham, University of
Southampton - Focus on compute-intensive methods (Direct
Numerical Simulation, Large Eddy Simulation, etc)
for the simulation of turbulent flows - Shock boundary layer interaction modelling -
critical for accurate aerodynamic design but
still poorly understood - http//www.afm.ses.soton.ac.uk/
31Direct Numerical Simulation 3603 benchmark
32Environmental Science
- Proudman Oceanographic Laboratory Coastal Ocean
Modelling System (POLCOMS) - Coupled marine ecosystem modelling
- http//www.pol.ac.uk/home/research/polcoms/
33Coupled Marine Ecosystem Model
34POLCOMS resolution b/m HPCx
35POLCOMS 2 km b/m All systems
36 37Motivation and Strategy
- Scalability of Terascale applications is only
half the story - Absolute performance also depends on
- single cpu performance
- Percentage of peak is seen as an
- important measure
- Comparison with other systems e.g. vector
machines - Run representative test cases on small numbers of
processors for applications and some important
kernels - Use IBMs hpmlib to measure Mflop/s
- Other hpmlib counters can help to understand
performance - e.g. memory bandwidth, cache miss rates, FMA
count, computational intensity etc.
Scientific output is the key measure
38Matrix-matrix multiply kernel
39PCHAN small test case 1203
40Summary of percentage of peak
41Acknowledgements
- Adrian Jackson
- Chris Johnson
- Martin Plummer
- Gavin Pringle
- Lorna Smith
- Kevin Stratford
- Andrew Sunderland
- HPCx Terascaling Team
- Mike Ashworth
- Mark Bull
- Ian Bush
- Martyn Guest
- Joachim Hein
- David Henty
- IBM Technical Support
- Luigi Brochard et al.
- CSAR Computing Service Cray T3E
turing, Origin 3800 R12k-400 green - ORNL IBM Regatta cheetah
- SARA Origin 3800 R14k-500
- PSC AlphaServer SC ES45-1000
42The Reality of Capability Computing on HPCx
- The success of the Terascaling strategy is shown
by the Nov 2003 HPCx usage - Capability jobs (512 procs) account for 48 of
usage - Even without Teragyroid it is 40.7
43Summary
- HPCx Terascaling team is addressing scalability
for a wide range of codes - Key Strategic Applications Areas
- Atomic and Molecular Physics, Molecular
Simulation, Materials Science, Computational
Engineering, Environmental Science - Reflected by take up of Capability Computing on
HPCx - In Nov 03, gt40 of time used by jobs with 512
procs and greater - Key challenges
- Maintain progress with Terascaling
- Include new applications and new science areas
- Address efficiency issues esp. with single
processor performance - Fully exploit the phase 2 system 1.7 GHz p690,
32 proc partitions, Federation interconnect