Opportunities for Biological Consortia on HPCx Code Capabilities and Performance - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Opportunities for Biological Consortia on HPCx Code Capabilities and Performance

Description:

Opportunities for Biological Consortia on HPCx. Code Capabilities and Performance ... Parallelepiped periodic boundaries. Truncated octahedral periodic boundaries ... – PowerPoint PPT presentation

Number of Views:168
Avg rating:3.0/5.0
Slides: 58
Provided by: drmarty6
Category:

less

Transcript and Presenter's Notes

Title: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance


1
Opportunities for Biological Consortia on
HPCxCode Capabilities and Performance
  • HPCx and CCP Staff
  • http//www.ccp.ac.uk/
  • http//www.hpcx.ac.uk/

2
Welcome to the Meeting
  • Background
  • HPCx
  • Objectives
  • to consider whether there is a case to bid
  • Agenda
  • Introduction to the HPCx service
  • Overview of Code Performance
  • Contributed Presentations
  • Invited Presentation -
  • Discussion

3
Outline
  • Overview of Code Capabilities and Performance
  • Macromolecular simulation
  • DL_POLY, AMBER, CHARMM, NAMD
  • Localised basis molecular codes
  • Gaussian, GAMESS-UK, NWChem
  • Local basis periodic code
  • CRYSTAL
  • Plane wave periodic codes
  • CASTEP
  • CPMD (Alessandro Curioni talk)
  • Note - consortium activity is not limited to
    these codes.

4
The DL_POLY Molecular Dynamics Simulation Package
  • Bill Smith

5
DL_POLY Background
  • General purpose parallel MD code
  • Developed at Daresbury Laboratory for CCP5
    1994-today
  • Available free of charge (under licence) to
    University researchers world-wide
  • DL_POLY versions
  • DL_POLY_2
  • Replicated Data, up to 30,000 atoms
  • Full force field and molecular description
  • DL_POLY_3
  • Domain Decomposition, up to 1,000,000 atoms
  • Full force field but no rigid body description.

6
DL_POLY Force Field
  • Intermolecular forces
  • All common van de Waals potentials
  • Sutton Chen many-body potential
  • 3-body angle forces (SiO2)
  • 4-body inversion forces (BO3)
  • Tersoff potential -gt Brenner
  • Intramolecular forces
  • Bonds, angle, dihedrals, inversions
  • Coulombic forces
  • Ewald SPME (3D), HK Ewald (2D), Adiabatic
    shell model, Reaction field, Neutral groups,
    Truncated Coulombic,
  • Externally applied field
  • Walled cells,electric field,shear field, etc
  • Not in DL_POLY_3

7
Boundary Conditions
  • None (e.g. isolated macromolecules)
  • Cubic periodic boundaries
  • Orthorhombic periodic boundaries
  • Parallelepiped periodic boundaries
  • Truncated octahedral periodic boundaries
  • Rhombic dodecahedral periodic boundaries
  • Slabs (i.e. x,y periodic, z nonperiodic)

8
Algorithms and Ensembles
  • Algorithms
  • Verlet leapfrog
  • RD-SHAKE
  • Euler-Quaternion
  • QSHAKE
  • All combinations
  • Not in DL_POLY_3
  • Ensembles
  • NVE
  • Berendsen NVT
  • Hoover NVT
  • Evans NVT
  • Berendsen NPT
  • Hoover NPT
  • Berendsen N?T
  • Hoover N?T

9
Migration from Replicated to Distributed data
DL_POLY-3 Domain Decomposition
  • Distribute atoms, forces across the nodes
  • More memory efficient, can address much larger
    cases (105-107)
  • Shake and short-ranges forces require only
    neighbour communication
  • communications scale linearly with number of
    nodes
  • Coulombic energy remains global
  • strategy depends on problem and machine
    characteristics
  • Adopt Smooth Particle Mesh Ewald scheme
  • includes Fourier transform smoothed charge
    density (reciprocal space grid typically 64x64x64
    - 128x128x128)

An alternative FFT algorithm has been designed to
reduce communication costs
10
Migration from Replicated to Distributed data
DL_POLY-3 Coulomb Energy Evaluation
  • Conventional routines (e.g. fftw) assume plane or
    column distributions
  • A global transpose of the data is required to
    complete the 3D FFT and additional costs are
    incurred re-organising the data from the natural
    block domain decomposition.
  • An alternative FFT algorithm has been designed to
    reduce communication costs.
  • the 3D FFT are performed as a series of 1D FFTs,
    each involving communications only between blocks
    in a given column
  • More data is transferred, but in far fewer
    messages
  • Rather than all-to-all, the communications are
    column-wise only

Plane Block
11
DL_POLY_2 3 Differences
  • Rigid bodies not in _3
  • MSD not in _3
  • Tethered atoms not in _3
  • Standard Ewald not in _3
  • HK_Ewald not in _3
  • DL_POLY_2 I/O files work in _3 but NOT vice versa
  • No multiple timestep in _3

12
DL_POLY_2 Developments
  • DL_MULTI - Distributed multipoles
  • DL_PIMD - Path integral (ionics)
  • DL_HYPE - Rare event simulation
  • DL_POLY - Symplectic versions 2/3
  • DL_POLY - Multiple timestep
  • DL_POLY - F90 re-vamp

13
DL_POLY_3 on HPCx
  • Test case 1 (552960 atoms, 300Dt)
  • NaKSi2O5 - disilicate glass
  • SPME (1283grid)3 body terms, 15625 LC)
  • 32-512 processors (4-64 nodes)

14
DL_POLY_3 on HPCx
  • Test case 2 (792960 atoms, 10Dt)
  • 64xGramicidin(354)256768 H2O
  • SHAKESPME(2563 grid),14812 LC
  • 16-256 processors (2-32 nodes)

15
DL_POLY People
  • Bill Smith DL_POLY_2 _3 GUI
  • w.smith_at_dl.ac.uk
  • Ilian Todorov DL_POLY_3
  • i.t.todorov_at_dl.ac.uk
  • Maurice Leslie DL_MULTI
  • m.leslie_at_dl.ac.uk
  • Further Information
  • W. Smith and T.R. Forester, J. Molec. Graphics,
    (1996), 14, 136
  • http//www.cse.clrc.ac.uk/msi/software/DL_POLY/ind
    ex.shtml
  • W. Smith, C.W. Yong, P.M. Rodger,Molecular
    Simulation (2002), 28, 385

16
DL_POLY V2 Replicated Data
Macromolecular Simulations
Performance Relative to the Cray T3E/1200E
Bench 7 Gramicidin in water rigid bonds and
SHAKE, 12,390 atoms, 500 time steps
Performance Relative to the Cray T3E/1200E
Number of CPUs
Bench 4. NaCl 27,000 ions, Ewald, 75 time
steps, Cutoff24Å
Ionic Simulations
Number of CPUs
.. CHARMM, AMBER
17
DL_POLY3 Macromolecular Simulations
Gramicidin in water rigid bonds SHAKE 792,960
ions, 50 time steps
Measured Time (seconds)
Speedup
Speedup
Number of CPUs
Number of CPUs
18
AMBER, NAMD and Gaussian
  • Lorna Smith and Joachim Hein

19
AMBER
  • AMBER (Assisted Model Building with Energy
    Refinement)
  • A molecular dynamics program, particularly for
    biomolecules
  • Weiner and Kollman, University of California,
    1981.
  • Current version AMBER7
  • Widely used suite of programs
  • Sander, Gibbs, Roar
  • Main program for molecular dynamics Sander
  • Basic energy minimiser and molecular dynamics
  • Shared memory version only for SGI and Cray
  • MPI version master / slave, replicated data model

20
AMBER - Initial Scaling
  • Factor IX protein with Ca ions 90906 atoms

21
Current developments - AMBER
  • Bob Duke
  • Developed a new version of Sander on HPCx
  • Originally called AMD (Amber Molecular Dynamics)
  • Renamed PMEMD (Particle Mesh Ewald Molecular
    Dynamics)
  • Substantial rewrite of the code
  • Converted to Fortran90, removed multiple copies
    of routines,
  • Likely to be incorporated into AMBER8
  • We are looking at optimising the collective
    communications the reduction / scatter

22
Optimisation PMEMD
23
NAMD
  • NAMD
  • molecular dynamics code designed for
    high-performance simulation of large biomolecular
    systems.
  • Theoretical and Computational Biophysics Group,
    University of Illinois at Urbana-Champaign.
  • Versions 2.4, 2.5b and 2.5 available on HPCx
  • One of the first codes to be awarded a capability
    incentive rating bronze

24
NAMD Performance
  • Benchmarks from Prof Peter Coveney
  • TCR-peptide-MHC system

25
NAMD Performance
26
Molecular Simulation - NAMD Scaling
http//www.ks.uiuc.edu/Research/namd/
  • Parallel, object-oriented MD code
  • High-performance simulation of large biomolecular
    systems
  • Scales to 100s of processors on high-end
    parallel platforms

Speedup
  • standard NAMD ApoA-I benchmark, a system
    comprising 92,442 atoms, with 12Å cutoff and PME
    every 4 time steps.
  • scalability improves with larger simulations -
    speedup of 778 on 1024 CPUs of TCS-1 in a 327K
    particle simulation of F1-ATPase.

Number of CPUs
27
Performance Comparison
  • Performance comparison between AMBER, CHARMM and
    NAMD
  • See http//www.scripps.edu/brooks/Benchmarks/
  • Benchmark
  • dihydrofolate reductase protein in an explicit
    water bath with cubic periodic boundary
    conditions.
  • 23,558 atoms

28
Performance
29
Gaussian
  • Gaussian 03
  • Performs semi-empirical and ab initio molecular
    orbital calulcations.
  • Gaussian Inc, www.gaussian.com
  • Shared memory version available on HPCx
  • Limited to the size of a logical partition (8
    processors)
  • Phase 2 upgrade will allow access to 32
    processors
  • Task farming option

30
CRYSTAL and CASTEP
  • Ian Bush and Martin Plummer

31
Crystal
  • Electronic structure and related properties of
    periodic systems
  • All electron, local Gaussian basis set, DFT and
    Hartree-Fock
  • Under continuous development since 1974
  • Distributed to over 500 sites world wide
  • Developed jointly by Daresbury and the University
    of Turin

32
Crystal Functionality
  • Basis Set
  • LCAO - Gaussians
  • All electron or pseudopotential
  • Hamiltonian
  • Hartree-Fock (UHF, RHF)
  • DFT (LSDA, GGA)
  • Hybrid funcs (B3LYP)
  • Techniques
  • Replicated data parallel
  • Distributed data parallel
  • Forces
  • Structural optimization
  • Direct SCF
  • Visualisation
  • AVS GUI (DLV)

Properties Energy Structure
Vibrations (phonons) Elastic tensor
Ferroelectric polarisation Piezoelectric
constants X-ray structure factors Density
of States / Bands Charge/Spin Densities
Magnetic Coupling Electrostatics (V, E, EFG
classical) Fermi contact (NMR) EMD
(Compton, e-2e)
33
Benchmark Runs on Crambin
  • Very small protein from Crambe Abyssinica - 1284
    atoms per unit cell
  • Initial studies using STO3G (3948 basis
    functions)
  • Improved to 6-31G (12354 functions)
  • All calculations Hartree-Fock
  • As far as we know the largest HF calculation ever
    converged

34
Crambin - Parallel Performance
  • Fit measured data to Amdahls law to obtain
    estimate of speed up
  • Increasing the basis set size increases the
    scalability
  • About 700 speed up on 1024 processors for 6-31G
  • Takes about 3 hours instead of about 3 months
  • 99.95 parallel

35
Results Electrostatic Potential
  • Charge density isosurface coloured according to
    potential
  • Useful to determine possible chemically active
    groups

36
Futures - Rusticyanin
  • Rusticyanin (Thiobacillus Ferrooxidans) has 6284
    atoms and is involved in redox processes
  • We have just started calculations using over
    33000 basis functions
  • In collaboration with S.Hasnain (DL) we want to
    calculate redox potentials for rusticyanin and
    associated mutants

37
What is Castep?
  • First principles (DFT) materials simulation code
  • electronic energy
  • geometry optimization
  • surface interactions
  • vibrational spectra
  • materials under pressure, chemical reactions
  • molecular dynamics
  • Method (direct minimization)
  • plane wave expansion of valence electrons
  • pseudopotentials for core electrons

38
HPCx biological applications
  • Examples currently include
  • NMR of proteins
  • hydroxyapatite (major component of bone)
  • chemical processes following stroke
  • Possibility of treating systems with a few
    hundred atoms on HPCx
  • May be used in conjunction with classical codes
    (eg DL_POLY) for detailed QM treatment of
    features of interest

39
Castep 2003 HPCx performance gain
40
Castep 2003 HPCx performance gain
41
HPCx biological applications
  • Castep (version 2) is written by
  • M Segall, P Lindan, M Probert C Pickard, P
    Hasnip, S Clark, K Refson, V Milman, B Montanari,
    M Payne.
  • Easy to understand top-level code.
  • Castep is fully maintained and supported on HPCx
  • Castep is distributed by Accelrys Ltd
  • Castep is licensed free to UK academics by the
    UKCP consortium (contact ukcp_at_dl.ac.uk)

42
CHARMM, NWChem and GAMESS-UK
  • Paul Sherwood

43
NWChem
  • Objectives
  • Highly efficient and portable MPP computational
    chemistry package
  • Distributed Data - Scalable with respect to
    chemical system size as well as MPP hardware size
  • Extensible Architecture
  • Object-oriented design
  • abstraction, data hiding, handles, APIs
  • Parallel programming model
  • non-uniform memory access, global arrays
  • Infrastructure
  • GA, Parallel I/O, RTDB, MA,
  • Wide range of parallel functionality essential
    for HPCx
  • Tools
  • Global arrays
  • portable distributed data tool
  • Used by CCP1 groups (e.g. MOLPRO)
  • PeIGS
  • parallel eigensolver,
  • guaranteed orthogonality of
    eigenvectors

Physically distributed data
Single, shared data structure
44
Distributed Data SCF
Pictorial representation of the iterative SCF
process in (i) a sequential process, and (ii) a
distributed data parallel process MOAO
represents the molecular orbitals, P the density
matrix and F the Fock or Hamiltonian matrix
Sequential
Distributed Data
45
NWChem
  • NWChem Capabilities (Direct, Semi-direct and
    conventional)
  • RHF, UHF, ROHF using up to 10,000 basis
    functions analytic 1st and 2nd derivatives.
  • DFT with a wide variety of local and non-local XC
    potentials, using up to 10,000 basis functions
    analytic 1st and 2nd derivatives.
  • CASSCF analytic 1st and numerical 2nd
    derivatives.
  • Semi-direct and RI-based MP2 calculations for RHF
    and UHF wave functions using up to 3,000 basis
    functions analytic 1st derivatives and numerical
    2nd derivatives.
  • Coupled cluster, CCSD and CCSD(T) using up to
    3,000 basis functions numerical 1st and 2nd
    derivatives of the CC energy.
  • Classical molecular dynamics and free energy
    simulations with the forces obtainable from a
    variety of sources

46
Case Studies - Zeolite Fragments
  • DFT Calculations with Coulomb Fitting
  • Basis (Godbout et al.)
  • DZVP - O, Si
  • DZVP2 - H
  • Fitting Basis
  • DGAUSS-A1 - O, Si
  • DGAUSS-A2 - H
  • NWChem GAMESS-UK
  • Both codes use auxiliary fitting basis for
    coulomb energy, with 3 centre 2 electron
    integrals held in core.

Si8O7H18 347/832
Si8O25H18 617/1444
Si26O37H36 1199/2818
Si28O67H30 1687/3928
47
DFT Coulomb Fit - NWChem
Si28O67H30 1687/3928
Si26O37H36 1199/2818
Measured Time (seconds)
Measured Time (seconds)
Number of CPUs
Number of CPUs
48
Memory-driven Approaches NWChem - DFT (LDA)
Performance on the IBM SP/p690
Zeolite ZSM-5
  • DZVP Basis (DZV_A2) and Dgauss A1_DFT
    Fitting basis
  • AO basis 3554
  • CD basis 12713
  • IBM SP/p690)
  • Wall time (13 SCF iterations)
  • 64 CPUs 9,184 seconds
  • 128 CPUs 3,966 seconds
  • MIPS R14k-500 CPUs (Teras)
  • Wall time (13 SCF iterations)
  • 64 CPUs 5,242 seconds
  • 128 CPUs 3,451 seconds
  • 3-centre 2e-integrals 1.00 X 10 12
  • Schwarz screening 6.54 X 10 9
  • 3c 2e-ints. In core 100

49
GAMESS-UK
  • GAMESS-UK is the general purpose ab initio
    molecular electronic structure program for
    performing SCF-, MCSCF- and DFT-gradient
    calculations, together with a variety of
    techniques for post Hartree Fock calculations.
  • The program is derived from the original GAMESS
    code, obtained from Michel Dupuis in 1981 (then
    at the National Resource for Computational
    Chemistry, NRCC), and has been extensively
    modified and enhanced over the past decade.
  • This work has included contributions from
    numerous authors, and has been conducted largely
    at the CCLRC Daresbury Laboratory, under the
    auspices of the UK's Collaborative Computational
    Project No. 1 (CCP1). Other major sources that
    have assisted in the on-going development and
    support of the program include various academic
    funding agencies in the Netherlands, and ICI plc.
  • Additional information on the code may be found
    from links at http//www.dl.ac.uk/CFS

M.F. Guest, J.H. Amos, R.J. Buenker, H.J.J. van
Dam, M. Dupuis, N.C. Handy, I.H. Hillier, P.J.
Knowles, V. Bonacic-Koutecky van Lenthe, J.
Kendrick, K. Schoffel P. Sherwood, with
contributions from R.D., W. von Niessen, R.J.
Harrison, A.P. Rendell, V.R. Saunders, A.J. Stone
and D. Tozer.
50
GAMESS-UK features 1.
  • Hartree Fock
  • Segmented/ GC spherical harmonic basis sets
  • SCF-Energies and Gradients conventional,
    in-core, direct
  • SCF-Frequencies numerical and analytic 2nd
    derivatives
  • Restricted, unrestricted open shell SCF and GVB.
  • Density Functional Theory
  • Energies gradients, conventional and direct
    including Dunlap fit
  • B3LYP, BLYP, BP86, B97, HCTH, B97-1, FT97 LDA
    functionals
  • Numerical 2nd derivatives (analytic
    implementation in testing)
  • Electron Correlation
  • MP2 energies, gradients and frequencies,
    Multi-reference MP2, MP3 Energies
  • MCSCF and CASSCF Energies, gradients and
    numerical 2nd derivatives
  • MR-DCI Energies, properties and transition
    moments (semi-direct module)
  • CCSD and CCSD(T) Energies
  • RPA (direct) and MCLR excitation energies /
    oscillator strengths, RPA gradients
  • Full-CI Energies
  • Green's functions calculations of IPs.
  • Valence bond (Turtle)

51
GAMESS-UK features 2.
  • Molecular Properties
  • Mulliken and Lowdin population analysis,
    Electrostatic Potential-Derived Charges
  • Distributed Multipole Analysis, Morokuma
    Analysis, Multipole Moments
  • Natural Bond Orbital (NBO) Bader Analysis
  • IR and Raman Intensities, Polarizabilities
    Hyperpolarizabilities
  • Solvation and Embedding Effects (DRF)
  • Relativistic Effects (ZORA)
  • Pseudopotentials
  • Local and non-local ECPs.
  • Visualisation tools include CCP1 GUI
  • Hybrid QM/MM (ChemShell CHARMM QM/MM)
  • Semi-empirical MNDO, AM1, and PM3 hamiltonians
  • Parallel Capabilities
  • MPP and SMP implementations (GA tools)
  • SCF/DFT energies, gradients, frequencies
  • MP2 energies and gradients
  • Direct RPA

52
Parallel Implementation of GAMESS-UK
  • Extensive use of Global Array (GA) Tools and
    Parallel Linear Algebra from NWChem Project
    (EMSL)
  • SCF and DFT
  • Replicated data, but
  • GA Tools for caching of I/O for restart and
    checkpoint files
  • Storage of 2-centre 2-e integrals in DFT Jfit
  • Linear Algebra (via PeIGs, DIIS/MMOs, Inversion
    of 2c-2e matrix)
  • SCF and DFT second derivatives
  • Distribution of ltvvoogt and ltvovogt integrals via
    GAs
  • MP2 gradients
  • Distribution of ltvvoogt and ltvovogt integrals via
    Gas
  • Direct RPA Excited States
  • Replicated data with parallelisation of direct
    integral evaluation

53
GAMESS-UK DFT Calculations
Speedup
Elapsed Time (seconds)
Valinomycin (DFT HCTH) Basis DZVP2_A2
(Dgauss) (1620 GTOs)
Number of CPUs
Cyclosporin (DFT B3LYP) Basis 6-31G (1855
GTOs)
Number of CPUs
54
DFT Analytic 2nd Derivatives PerformanceIBM
SP/p690, HP/Compaq SC ES45/1000 and SGI O3800
(C6H4(CF3))2 Basis 6-31G (196 GTO)
Elapsed Time (seconds)
Terms from MO 2e-integrals in GA storage (CPHF
pert. Fock matrices) Calculation dominated by
CPHF
CPUs
55
CHARMM
  • CHARMM (Chemistry at HARvard Macromolecular
    Mechanics) is a general purpose molecular
    mechanics, molecular dynamics and vibrational
    analysis package for modelling and simulation of
    the structure and behaviour of macromolecular
    systems (proteins, nucleic acids, lipids etc.)
  • Supports energy minimisation and MD approaches
    using a classical parameterised force field.
  • J. Comp. Chem. 4 (1983) 187-217
  • Parallel Benchmark - MD Calculation of Carboxy
    Myoglobin (MbCO) with 3830 Water Molecules.
  • QM/MM model for study of reacting species
  • incorporate the QM energy as part of the system
    into the force field
  • coupling between GAMESS-UK (QM) and CHARMM.

56
Parallel CHARMM Benchmark
Benchmark MD Calculation of Carboxy Myoglobin
(MbCO) with 3830 Water Molecules 14026 atoms,
1000 steps (1 ps), 12-14 A shift.
57
Multiple Time and Length Scales
  • QM/MM - first step towards multiple length scales
  • QM treatment of the active site
  • reacting centre
  • problem structures (e.g. transition metal
    centres)
  • excited state processes (e.g. spectroscopy)
  • Classical MM treatment of environment
  • enzyme structure, zeolite framework,
    explicitand/or dielectric solvent models
  • Multiple time scale algorithms for MD
  • Recompute different parts of energy expression at
    different intervals e.g. variants of the
    Reference System Propagation Algorithm (RESPA)

But to date length / time scales only differ by
1 order of magnitude For an example of an effort
to link the atomistic and meso-scales see
RealityGrid http//www.realitygrid.org/informati
on.html
58
QM/MM Applications
  • Triosephosphate isomerase (TIM)
  • Central reaction in glycolysis, catalytic
    interconversion ofDHAP to GAP
  • Demonstration case within QUASI (Partners UZH,
    and BASF)
  • QM region 35 atoms (DFT BLYP)
  • include residues with possible proton
    donor/acceptor roles
  • GAMESS-UK, MNDO, TURBOMOLE
  • MM region (4,180 atoms 2 link)
  • CHARMM force-field, implemented in CHARMM,
    DL_POLY

59
Sampling Methods
  • Multiple independent simulations
  • Replica exchange - Monte Carlo exchange of
    configurations between an ensemble of replicas at
    different temperatures
  • Combinatorial approach to ligand binding
  • Replica path method - simultaneously optimise a
    series of points defining a reaction path or
    conformational change, subject to path
    constraints.
  • Suitable for QM and QM/MM Hamiltonians
  • Parallelisation per point
  • Communication is limited to adjacent points on
    the path - global sum of energy function

Collaboration with Bernie Brooks (NIH)
http//www.cse.clrc.ac.uk/qcg/chmguk
60
Summary
  • Many of the codes used by the community have
    quite poor scaling
  • Best cases
  • large quantum calculations (Crystal, DFT etc)
  • very large MD simulations (NAMD)
  • For credible consortium bid we need to focus on
    applications which have
  • acceptable scaling now (perhaps involving
    migration to new codes (e.g. NAMD)
  • heavy CPU or memory demands (e.g. CRYSTAL).
  • potential for algorithmic development to exploit
    1000s of processors (e.g. pathway optimisation,
    Monte Carlo etc)
Write a Comment
User Comments (0)
About PowerShow.com