Title: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance
1Opportunities for Biological Consortia on
HPCxCode Capabilities and Performance
- HPCx and CCP Staff
- http//www.ccp.ac.uk/
- http//www.hpcx.ac.uk/
2Welcome to the Meeting
- Background
- HPCx
- Objectives
- to consider whether there is a case to bid
- Agenda
- Introduction to the HPCx service
- Overview of Code Performance
- Contributed Presentations
- Invited Presentation -
- Discussion
3Outline
- Overview of Code Capabilities and Performance
- Macromolecular simulation
- DL_POLY, AMBER, CHARMM, NAMD
- Localised basis molecular codes
- Gaussian, GAMESS-UK, NWChem
- Local basis periodic code
- CRYSTAL
- Plane wave periodic codes
- CASTEP
- CPMD (Alessandro Curioni talk)
- Note - consortium activity is not limited to
these codes.
4The DL_POLY Molecular Dynamics Simulation Package
5DL_POLY Background
- General purpose parallel MD code
- Developed at Daresbury Laboratory for CCP5
1994-today - Available free of charge (under licence) to
University researchers world-wide - DL_POLY versions
- DL_POLY_2
- Replicated Data, up to 30,000 atoms
- Full force field and molecular description
- DL_POLY_3
- Domain Decomposition, up to 1,000,000 atoms
- Full force field but no rigid body description.
6DL_POLY Force Field
- Intermolecular forces
- All common van de Waals potentials
- Sutton Chen many-body potential
- 3-body angle forces (SiO2)
- 4-body inversion forces (BO3)
- Tersoff potential -gt Brenner
- Intramolecular forces
- Bonds, angle, dihedrals, inversions
- Coulombic forces
- Ewald SPME (3D), HK Ewald (2D), Adiabatic
shell model, Reaction field, Neutral groups,
Truncated Coulombic, - Externally applied field
- Walled cells,electric field,shear field, etc
- Not in DL_POLY_3
7Boundary Conditions
- None (e.g. isolated macromolecules)
- Cubic periodic boundaries
- Orthorhombic periodic boundaries
- Parallelepiped periodic boundaries
- Truncated octahedral periodic boundaries
- Rhombic dodecahedral periodic boundaries
- Slabs (i.e. x,y periodic, z nonperiodic)
8Algorithms and Ensembles
- Algorithms
- Verlet leapfrog
- RD-SHAKE
- Euler-Quaternion
- QSHAKE
- All combinations
- Not in DL_POLY_3
- Ensembles
- NVE
- Berendsen NVT
- Hoover NVT
- Evans NVT
- Berendsen NPT
- Hoover NPT
- Berendsen N?T
- Hoover N?T
9Migration from Replicated to Distributed data
DL_POLY-3 Domain Decomposition
- Distribute atoms, forces across the nodes
- More memory efficient, can address much larger
cases (105-107) - Shake and short-ranges forces require only
neighbour communication - communications scale linearly with number of
nodes - Coulombic energy remains global
- strategy depends on problem and machine
characteristics - Adopt Smooth Particle Mesh Ewald scheme
- includes Fourier transform smoothed charge
density (reciprocal space grid typically 64x64x64
- 128x128x128)
An alternative FFT algorithm has been designed to
reduce communication costs
10Migration from Replicated to Distributed data
DL_POLY-3 Coulomb Energy Evaluation
- Conventional routines (e.g. fftw) assume plane or
column distributions - A global transpose of the data is required to
complete the 3D FFT and additional costs are
incurred re-organising the data from the natural
block domain decomposition. - An alternative FFT algorithm has been designed to
reduce communication costs. - the 3D FFT are performed as a series of 1D FFTs,
each involving communications only between blocks
in a given column - More data is transferred, but in far fewer
messages - Rather than all-to-all, the communications are
column-wise only
Plane Block
11DL_POLY_2 3 Differences
- Rigid bodies not in _3
- MSD not in _3
- Tethered atoms not in _3
- Standard Ewald not in _3
- HK_Ewald not in _3
- DL_POLY_2 I/O files work in _3 but NOT vice versa
- No multiple timestep in _3
12DL_POLY_2 Developments
- DL_MULTI - Distributed multipoles
- DL_PIMD - Path integral (ionics)
- DL_HYPE - Rare event simulation
- DL_POLY - Symplectic versions 2/3
- DL_POLY - Multiple timestep
- DL_POLY - F90 re-vamp
13DL_POLY_3 on HPCx
- Test case 1 (552960 atoms, 300Dt)
- NaKSi2O5 - disilicate glass
- SPME (1283grid)3 body terms, 15625 LC)
- 32-512 processors (4-64 nodes)
14DL_POLY_3 on HPCx
- Test case 2 (792960 atoms, 10Dt)
- 64xGramicidin(354)256768 H2O
- SHAKESPME(2563 grid),14812 LC
- 16-256 processors (2-32 nodes)
15DL_POLY People
- Bill Smith DL_POLY_2 _3 GUI
- w.smith_at_dl.ac.uk
- Ilian Todorov DL_POLY_3
- i.t.todorov_at_dl.ac.uk
- Maurice Leslie DL_MULTI
- m.leslie_at_dl.ac.uk
- Further Information
- W. Smith and T.R. Forester, J. Molec. Graphics,
(1996), 14, 136 - http//www.cse.clrc.ac.uk/msi/software/DL_POLY/ind
ex.shtml - W. Smith, C.W. Yong, P.M. Rodger,Molecular
Simulation (2002), 28, 385
16DL_POLY V2 Replicated Data
Macromolecular Simulations
Performance Relative to the Cray T3E/1200E
Bench 7 Gramicidin in water rigid bonds and
SHAKE, 12,390 atoms, 500 time steps
Performance Relative to the Cray T3E/1200E
Number of CPUs
Bench 4. NaCl 27,000 ions, Ewald, 75 time
steps, Cutoff24Å
Ionic Simulations
Number of CPUs
.. CHARMM, AMBER
17DL_POLY3 Macromolecular Simulations
Gramicidin in water rigid bonds SHAKE 792,960
ions, 50 time steps
Measured Time (seconds)
Speedup
Speedup
Number of CPUs
Number of CPUs
18AMBER, NAMD and Gaussian
- Lorna Smith and Joachim Hein
19AMBER
- AMBER (Assisted Model Building with Energy
Refinement) - A molecular dynamics program, particularly for
biomolecules - Weiner and Kollman, University of California,
1981. - Current version AMBER7
- Widely used suite of programs
- Sander, Gibbs, Roar
- Main program for molecular dynamics Sander
- Basic energy minimiser and molecular dynamics
- Shared memory version only for SGI and Cray
- MPI version master / slave, replicated data model
20AMBER - Initial Scaling
- Factor IX protein with Ca ions 90906 atoms
21Current developments - AMBER
- Bob Duke
- Developed a new version of Sander on HPCx
- Originally called AMD (Amber Molecular Dynamics)
- Renamed PMEMD (Particle Mesh Ewald Molecular
Dynamics) - Substantial rewrite of the code
- Converted to Fortran90, removed multiple copies
of routines, - Likely to be incorporated into AMBER8
- We are looking at optimising the collective
communications the reduction / scatter
22Optimisation PMEMD
23NAMD
- NAMD
- molecular dynamics code designed for
high-performance simulation of large biomolecular
systems. - Theoretical and Computational Biophysics Group,
University of Illinois at Urbana-Champaign. - Versions 2.4, 2.5b and 2.5 available on HPCx
- One of the first codes to be awarded a capability
incentive rating bronze
24NAMD Performance
- Benchmarks from Prof Peter Coveney
- TCR-peptide-MHC system
25NAMD Performance
26Molecular Simulation - NAMD Scaling
http//www.ks.uiuc.edu/Research/namd/
- Parallel, object-oriented MD code
- High-performance simulation of large biomolecular
systems - Scales to 100s of processors on high-end
parallel platforms
Speedup
- standard NAMD ApoA-I benchmark, a system
comprising 92,442 atoms, with 12Å cutoff and PME
every 4 time steps. - scalability improves with larger simulations -
speedup of 778 on 1024 CPUs of TCS-1 in a 327K
particle simulation of F1-ATPase.
Number of CPUs
27Performance Comparison
- Performance comparison between AMBER, CHARMM and
NAMD - See http//www.scripps.edu/brooks/Benchmarks/
- Benchmark
- dihydrofolate reductase protein in an explicit
water bath with cubic periodic boundary
conditions. - 23,558 atoms
28Performance
29Gaussian
- Gaussian 03
- Performs semi-empirical and ab initio molecular
orbital calulcations. - Gaussian Inc, www.gaussian.com
- Shared memory version available on HPCx
- Limited to the size of a logical partition (8
processors) - Phase 2 upgrade will allow access to 32
processors - Task farming option
30CRYSTAL and CASTEP
- Ian Bush and Martin Plummer
31Crystal
- Electronic structure and related properties of
periodic systems - All electron, local Gaussian basis set, DFT and
Hartree-Fock - Under continuous development since 1974
- Distributed to over 500 sites world wide
- Developed jointly by Daresbury and the University
of Turin
32Crystal Functionality
- Basis Set
- LCAO - Gaussians
- All electron or pseudopotential
- Hamiltonian
- Hartree-Fock (UHF, RHF)
- DFT (LSDA, GGA)
- Hybrid funcs (B3LYP)
- Techniques
- Replicated data parallel
- Distributed data parallel
- Forces
- Structural optimization
- Direct SCF
- Visualisation
- AVS GUI (DLV)
Properties Energy Structure
Vibrations (phonons) Elastic tensor
Ferroelectric polarisation Piezoelectric
constants X-ray structure factors Density
of States / Bands Charge/Spin Densities
Magnetic Coupling Electrostatics (V, E, EFG
classical) Fermi contact (NMR) EMD
(Compton, e-2e)
33Benchmark Runs on Crambin
- Very small protein from Crambe Abyssinica - 1284
atoms per unit cell - Initial studies using STO3G (3948 basis
functions) - Improved to 6-31G (12354 functions)
- All calculations Hartree-Fock
- As far as we know the largest HF calculation ever
converged
34Crambin - Parallel Performance
- Fit measured data to Amdahls law to obtain
estimate of speed up - Increasing the basis set size increases the
scalability - About 700 speed up on 1024 processors for 6-31G
- Takes about 3 hours instead of about 3 months
- 99.95 parallel
35Results Electrostatic Potential
- Charge density isosurface coloured according to
potential - Useful to determine possible chemically active
groups
36Futures - Rusticyanin
- Rusticyanin (Thiobacillus Ferrooxidans) has 6284
atoms and is involved in redox processes - We have just started calculations using over
33000 basis functions - In collaboration with S.Hasnain (DL) we want to
calculate redox potentials for rusticyanin and
associated mutants
37What is Castep?
- First principles (DFT) materials simulation code
- electronic energy
- geometry optimization
- surface interactions
- vibrational spectra
- materials under pressure, chemical reactions
- molecular dynamics
- Method (direct minimization)
- plane wave expansion of valence electrons
- pseudopotentials for core electrons
38HPCx biological applications
- Examples currently include
- NMR of proteins
- hydroxyapatite (major component of bone)
- chemical processes following stroke
- Possibility of treating systems with a few
hundred atoms on HPCx - May be used in conjunction with classical codes
(eg DL_POLY) for detailed QM treatment of
features of interest
39Castep 2003 HPCx performance gain
40Castep 2003 HPCx performance gain
41HPCx biological applications
- Castep (version 2) is written by
- M Segall, P Lindan, M Probert C Pickard, P
Hasnip, S Clark, K Refson, V Milman, B Montanari,
M Payne. - Easy to understand top-level code.
- Castep is fully maintained and supported on HPCx
- Castep is distributed by Accelrys Ltd
- Castep is licensed free to UK academics by the
UKCP consortium (contact ukcp_at_dl.ac.uk)
42CHARMM, NWChem and GAMESS-UK
43NWChem
- Objectives
- Highly efficient and portable MPP computational
chemistry package - Distributed Data - Scalable with respect to
chemical system size as well as MPP hardware size - Extensible Architecture
- Object-oriented design
- abstraction, data hiding, handles, APIs
- Parallel programming model
- non-uniform memory access, global arrays
- Infrastructure
- GA, Parallel I/O, RTDB, MA,
- Wide range of parallel functionality essential
for HPCx
- Tools
- Global arrays
- portable distributed data tool
- Used by CCP1 groups (e.g. MOLPRO)
- PeIGS
- parallel eigensolver,
- guaranteed orthogonality of
eigenvectors
Physically distributed data
Single, shared data structure
44Distributed Data SCF
Pictorial representation of the iterative SCF
process in (i) a sequential process, and (ii) a
distributed data parallel process MOAO
represents the molecular orbitals, P the density
matrix and F the Fock or Hamiltonian matrix
Sequential
Distributed Data
45NWChem
- NWChem Capabilities (Direct, Semi-direct and
conventional) - RHF, UHF, ROHF using up to 10,000 basis
functions analytic 1st and 2nd derivatives. - DFT with a wide variety of local and non-local XC
potentials, using up to 10,000 basis functions
analytic 1st and 2nd derivatives. - CASSCF analytic 1st and numerical 2nd
derivatives. - Semi-direct and RI-based MP2 calculations for RHF
and UHF wave functions using up to 3,000 basis
functions analytic 1st derivatives and numerical
2nd derivatives. - Coupled cluster, CCSD and CCSD(T) using up to
3,000 basis functions numerical 1st and 2nd
derivatives of the CC energy. - Classical molecular dynamics and free energy
simulations with the forces obtainable from a
variety of sources
46Case Studies - Zeolite Fragments
- DFT Calculations with Coulomb Fitting
- Basis (Godbout et al.)
- DZVP - O, Si
- DZVP2 - H
- Fitting Basis
- DGAUSS-A1 - O, Si
- DGAUSS-A2 - H
-
- NWChem GAMESS-UK
- Both codes use auxiliary fitting basis for
coulomb energy, with 3 centre 2 electron
integrals held in core.
Si8O7H18 347/832
Si8O25H18 617/1444
Si26O37H36 1199/2818
Si28O67H30 1687/3928
47DFT Coulomb Fit - NWChem
Si28O67H30 1687/3928
Si26O37H36 1199/2818
Measured Time (seconds)
Measured Time (seconds)
Number of CPUs
Number of CPUs
48Memory-driven Approaches NWChem - DFT (LDA)
Performance on the IBM SP/p690
Zeolite ZSM-5
- DZVP Basis (DZV_A2) and Dgauss A1_DFT
Fitting basis - AO basis 3554
- CD basis 12713
- IBM SP/p690)
- Wall time (13 SCF iterations)
- 64 CPUs 9,184 seconds
- 128 CPUs 3,966 seconds
- MIPS R14k-500 CPUs (Teras)
- Wall time (13 SCF iterations)
- 64 CPUs 5,242 seconds
- 128 CPUs 3,451 seconds
- 3-centre 2e-integrals 1.00 X 10 12
- Schwarz screening 6.54 X 10 9
- 3c 2e-ints. In core 100
49GAMESS-UK
- GAMESS-UK is the general purpose ab initio
molecular electronic structure program for
performing SCF-, MCSCF- and DFT-gradient
calculations, together with a variety of
techniques for post Hartree Fock calculations. - The program is derived from the original GAMESS
code, obtained from Michel Dupuis in 1981 (then
at the National Resource for Computational
Chemistry, NRCC), and has been extensively
modified and enhanced over the past decade. - This work has included contributions from
numerous authors, and has been conducted largely
at the CCLRC Daresbury Laboratory, under the
auspices of the UK's Collaborative Computational
Project No. 1 (CCP1). Other major sources that
have assisted in the on-going development and
support of the program include various academic
funding agencies in the Netherlands, and ICI plc. - Additional information on the code may be found
from links at http//www.dl.ac.uk/CFS
M.F. Guest, J.H. Amos, R.J. Buenker, H.J.J. van
Dam, M. Dupuis, N.C. Handy, I.H. Hillier, P.J.
Knowles, V. Bonacic-Koutecky van Lenthe, J.
Kendrick, K. Schoffel P. Sherwood, with
contributions from R.D., W. von Niessen, R.J.
Harrison, A.P. Rendell, V.R. Saunders, A.J. Stone
and D. Tozer.
50GAMESS-UK features 1.
- Hartree Fock
- Segmented/ GC spherical harmonic basis sets
- SCF-Energies and Gradients conventional,
in-core, direct - SCF-Frequencies numerical and analytic 2nd
derivatives - Restricted, unrestricted open shell SCF and GVB.
- Density Functional Theory
- Energies gradients, conventional and direct
including Dunlap fit - B3LYP, BLYP, BP86, B97, HCTH, B97-1, FT97 LDA
functionals - Numerical 2nd derivatives (analytic
implementation in testing) - Electron Correlation
- MP2 energies, gradients and frequencies,
Multi-reference MP2, MP3 Energies - MCSCF and CASSCF Energies, gradients and
numerical 2nd derivatives - MR-DCI Energies, properties and transition
moments (semi-direct module) - CCSD and CCSD(T) Energies
- RPA (direct) and MCLR excitation energies /
oscillator strengths, RPA gradients - Full-CI Energies
- Green's functions calculations of IPs.
- Valence bond (Turtle)
51GAMESS-UK features 2.
- Molecular Properties
- Mulliken and Lowdin population analysis,
Electrostatic Potential-Derived Charges - Distributed Multipole Analysis, Morokuma
Analysis, Multipole Moments - Natural Bond Orbital (NBO) Bader Analysis
- IR and Raman Intensities, Polarizabilities
Hyperpolarizabilities - Solvation and Embedding Effects (DRF)
- Relativistic Effects (ZORA)
- Pseudopotentials
- Local and non-local ECPs.
- Visualisation tools include CCP1 GUI
- Hybrid QM/MM (ChemShell CHARMM QM/MM)
- Semi-empirical MNDO, AM1, and PM3 hamiltonians
- Parallel Capabilities
- MPP and SMP implementations (GA tools)
- SCF/DFT energies, gradients, frequencies
- MP2 energies and gradients
- Direct RPA
52Parallel Implementation of GAMESS-UK
- Extensive use of Global Array (GA) Tools and
Parallel Linear Algebra from NWChem Project
(EMSL) - SCF and DFT
- Replicated data, but
- GA Tools for caching of I/O for restart and
checkpoint files - Storage of 2-centre 2-e integrals in DFT Jfit
- Linear Algebra (via PeIGs, DIIS/MMOs, Inversion
of 2c-2e matrix) - SCF and DFT second derivatives
- Distribution of ltvvoogt and ltvovogt integrals via
GAs - MP2 gradients
- Distribution of ltvvoogt and ltvovogt integrals via
Gas - Direct RPA Excited States
- Replicated data with parallelisation of direct
integral evaluation
53GAMESS-UK DFT Calculations
Speedup
Elapsed Time (seconds)
Valinomycin (DFT HCTH) Basis DZVP2_A2
(Dgauss) (1620 GTOs)
Number of CPUs
Cyclosporin (DFT B3LYP) Basis 6-31G (1855
GTOs)
Number of CPUs
54DFT Analytic 2nd Derivatives PerformanceIBM
SP/p690, HP/Compaq SC ES45/1000 and SGI O3800
(C6H4(CF3))2 Basis 6-31G (196 GTO)
Elapsed Time (seconds)
Terms from MO 2e-integrals in GA storage (CPHF
pert. Fock matrices) Calculation dominated by
CPHF
CPUs
55CHARMM
- CHARMM (Chemistry at HARvard Macromolecular
Mechanics) is a general purpose molecular
mechanics, molecular dynamics and vibrational
analysis package for modelling and simulation of
the structure and behaviour of macromolecular
systems (proteins, nucleic acids, lipids etc.) - Supports energy minimisation and MD approaches
using a classical parameterised force field. - J. Comp. Chem. 4 (1983) 187-217
- Parallel Benchmark - MD Calculation of Carboxy
Myoglobin (MbCO) with 3830 Water Molecules. - QM/MM model for study of reacting species
- incorporate the QM energy as part of the system
into the force field - coupling between GAMESS-UK (QM) and CHARMM.
56Parallel CHARMM Benchmark
Benchmark MD Calculation of Carboxy Myoglobin
(MbCO) with 3830 Water Molecules 14026 atoms,
1000 steps (1 ps), 12-14 A shift.
57Multiple Time and Length Scales
- QM/MM - first step towards multiple length scales
- QM treatment of the active site
- reacting centre
- problem structures (e.g. transition metal
centres) - excited state processes (e.g. spectroscopy)
- Classical MM treatment of environment
- enzyme structure, zeolite framework,
explicitand/or dielectric solvent models - Multiple time scale algorithms for MD
- Recompute different parts of energy expression at
different intervals e.g. variants of the
Reference System Propagation Algorithm (RESPA)
But to date length / time scales only differ by
1 order of magnitude For an example of an effort
to link the atomistic and meso-scales see
RealityGrid http//www.realitygrid.org/informati
on.html
58QM/MM Applications
- Triosephosphate isomerase (TIM)
- Central reaction in glycolysis, catalytic
interconversion ofDHAP to GAP - Demonstration case within QUASI (Partners UZH,
and BASF)
- QM region 35 atoms (DFT BLYP)
- include residues with possible proton
donor/acceptor roles - GAMESS-UK, MNDO, TURBOMOLE
- MM region (4,180 atoms 2 link)
- CHARMM force-field, implemented in CHARMM,
DL_POLY
59Sampling Methods
- Multiple independent simulations
- Replica exchange - Monte Carlo exchange of
configurations between an ensemble of replicas at
different temperatures - Combinatorial approach to ligand binding
- Replica path method - simultaneously optimise a
series of points defining a reaction path or
conformational change, subject to path
constraints. - Suitable for QM and QM/MM Hamiltonians
- Parallelisation per point
- Communication is limited to adjacent points on
the path - global sum of energy function
Collaboration with Bernie Brooks (NIH)
http//www.cse.clrc.ac.uk/qcg/chmguk
60Summary
- Many of the codes used by the community have
quite poor scaling - Best cases
- large quantum calculations (Crystal, DFT etc)
- very large MD simulations (NAMD)
- For credible consortium bid we need to focus on
applications which have - acceptable scaling now (perhaps involving
migration to new codes (e.g. NAMD) - heavy CPU or memory demands (e.g. CRYSTAL).
- potential for algorithmic development to exploit
1000s of processors (e.g. pathway optimisation,
Monte Carlo etc)