Terascaling Applications on HPCx: The First 12 Months - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Terascaling Applications on HPCx: The First 12 Months

Description:

The primary aim of the HPCx service is Capability Computing ... Benthic Model. Wind Stress. Heat Flux. Irradiation. Cloud Cover. C, N, P, Si. Sediments ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 43
Provided by: drmikea
Category:

less

Transcript and Presenter's Notes

Title: Terascaling Applications on HPCx: The First 12 Months


1
Terascaling Applications on HPCx The First 12
Months
  • Mike Ashworth
  • HPCx Terascaling Team
  • HPCx Service
  • CCLRC Daresbury Laboratory
  • UK
  • m.ashworth_at_dl.ac.uk
  • http//www.hpcx.ac.uk/

2
Outline
  • Terascaling Objectives
  • Case Studies
  • DL-POLY
  • CRYSTAL
  • CASTEP
  • AMBER
  • PFARM
  • PCHAN
  • POLCOMS
  • Efficiency of Codes
  • Summary

Application, and not H/W driven
3
  • Terascaling Objectives

4
Terascaling Objectives
Jobs which use gt 50 of cpus
  • The primary aim of the HPCx service is Capability
    Computing
  • Key objective that user codes should scale to
    O(1000) cpus
  • Largest part of our science support is the
    Terascaling Team
  • Understanding performance and scaling of key
    codes
  • Enabling world-leading calculations
    (demonstrators)
  • Closely linked with Software Engineering Team and
    Applications Support Team

5
Strategy for Capability Computing
  • Performance Attributes of Key Applications
  • Trouble-shooting with Vampir Paraver
  • Scalability of Numerical Algorithms
  • Parallel eigensolvers. FFTs etc
  • Optimisation of Communication Collectives
  • e.g., MPI_ALLTOALLV and CASTEP
  • New Techniques
  • Mixed-mode programming
  • Memory-driven Approaches
  • e.g., In-core SCF DFT, direct minimisation
    CRYSTAL
  • Migration from replicated to distributed data
  • e.g., DL_POLY3
  • Scientific drivers amenable to Capability
    Computing
  • - Enhanced Sampling Methods, Replica Methods

HPCx Terascaling Team
6
  • Case Studies

7
Molecular Simulation
  • DL_POLY
  • W. Smith and T.R. Forester, CLRC Daresbury
    Laboratory
  • General purpose molecular dynamics simulation
    package
  • http//www.cse.clrc.ac.uk/msi/software/DL_POLY/

8
DL_POLY3 Coulomb Energy Performance
  • Distributed Data
  • SPME, with revised FFT Scheme

Performance Relative to the Cray T3E/1200E
DL_POLY3 216,000 ions, 200 time steps, Cutoff12Å
Number of CPUs
9
DL_POLY3 Macromolecular Simulations
Gramicidin in water rigid bonds SHAKE 792,960
ions, 50 time steps
Measured Time (seconds)
Performance Relative to the SGI Origin
3800/R14k-500
Number of CPUs
Number of CPUs
10
Materials Science
  • CRYSTAL
  • calculate wave-functions and properties of
    crystalline systems
  • periodic Hartree-Fock or density functional
    Kohn-Sham Hamiltonian
  • various hybrid approximations
  • http//www.cse.clrc.ac.uk/cmg/CRYSTAL/

11
Crystal
  • Electronic structure and related properties of
    periodic systems
  • All electron, local Gaussian basis set, DFT and
    Hartree-Fock
  • Under continuous development since 1974
  • Distributed to over 500 sites world wide
  • Developed jointly by Daresbury and the University
    of Turin

12
Crystal Functionality
  • Basis Set
  • LCAO - Gaussians
  • All electron or pseudopotential
  • Hamiltonian
  • Hartree-Fock (UHF, RHF)
  • DFT (LSDA, GGA)
  • Hybrid funcs (B3LYP)
  • Techniques
  • Replicated data parallel
  • Distributed data parallel
  • Forces
  • Structural optimization
  • Direct SCF
  • Visualisation
  • AVS GUI (DLV)

Properties Energy Structure
Vibrations (phonons) Elastic tensor
Ferroelectric polarisation Piezoelectric
constants X-ray structure factors Density
of States / Bands Charge/Spin Densities
Magnetic Coupling Electrostatics (V, E, EFG
classical) Fermi contact (NMR) EMD
(Compton, e-2e)
13
Benchmark Runs on Crambin
  • Very small protein from Crambe Abyssinica - 1284
    atoms per unit cell
  • Initial studies using STO3G (3948 basis
    functions)
  • Improved to 6-31G (12354 functions)
  • All calculations Hartree-Fock
  • As far as we know the largest Hartree-Fock
    calculation ever converged

14
Scalability of CRYSTAL for crystalline Crambin
HPCx vs. SGI Origin
faster, more stable version of the parallel
Jacobi diagonalizer replaces ScaLaPack
Increasing the basis set size increases the
scalability
15
Crambin Results Electrostatic Potential
  • Charge density isosurface coloured according to
    potential
  • Useful to determine possible chemically active
    groups

16
Futures - Rusticyanin
  • Rusticyanin (Thiobacillus Ferrooxidans) has 6284
    atoms (Crambin was 1284) and is involved in redox
    processes
  • We have just started calculations using over
    33000 basis functions
  • In collaboration with S.Hasnain (DL) we want to
    calculate redox potentials for rusticyanin and
    associated mutants

17
Materials Science
  • CASTEP
  • CAmbridge Serial Total Energy Package
  • http//www.cse.clrc.ac.uk/cmg/NETWORKS/UKCP/

18
What is Castep?
  • First principles (DFT) materials simulation code
  • electronic energy
  • geometry optimization
  • surface interactions
  • vibrational spectra
  • materials under pressure, chemical reactions
  • molecular dynamics
  • Method (direct minimization)
  • plane wave expansion of valence electrons
  • pseudopotentials for core electrons

19
Castep 2003 HPCx performance gain
  • Bottleneck
  • Data Traffic in 3D FFT and MPI_AlltoAllV

20
Castep 2003 HPCx performance gain
21
Molecular Simulation
  • AMBER
  • (Assisted Model Building with Energy Refinement)
  • Weiner and Kollman, University of California,
    1981
  • Widely used suite of programs particularly for
    biomolecules
  • http//amber.scripps.edu/

22
AMBER - Initial Scaling
  • Factor IX protein with Ca ions 90906 atoms

23
Current developments - AMBER
  • Bob Duke
  • Developed a new version of Sander on HPCx
  • Originally called AMD (Amber Molecular Dynamics)
  • Renamed PMEMD (Particle Mesh Ewald Molecular
    Dynamics)
  • Substantial rewrite of the code
  • Converted to Fortran90, removed multiple copies
    of routines,
  • Likely to be incorporated into AMBER8
  • We are looking at optimising the collective
    communications the reduction / scatter

24
Optimisation PMEMD
25
Atomic and Molecular Physics
  • PFARM
  • Queens University Belfast, CLRC Daresbury
    Laboratory
  • R-matrix formalism to treat applications such as
    the description of the edge region in Tokamak
    plasmas (fusion power research) and for the
    interpretation of astrophysical spectra

26
External Region Calculation Timings
PFARM Performance Ratio vs. Cray T3E/1200E
Elapsed Time (seconds)
CPUs
Bottleneck Matrix Diagonalisation
CPUs
27
Peigs vs. ScaLapack in PFARM
Bottleneck Matrix Diagonalisation
28
ScaLapack diagonalisation on HPCx
29
Stage 1 (Sector Diags) on HPCx
  • Sector Hamiltonian matrix size 10032 (x 3 sectors)

30
Computational Engineering
  • UK Turbulence Consortium
  • Led by Prof. Neil Sandham, University of
    Southampton
  • Focus on compute-intensive methods (Direct
    Numerical Simulation, Large Eddy Simulation, etc)
    for the simulation of turbulent flows
  • Shock boundary layer interaction modelling -
    critical for accurate aerodynamic design but
    still poorly understood
  • http//www.afm.ses.soton.ac.uk/

31
Direct Numerical Simulation 3603 benchmark
32
Environmental Science
  • Proudman Oceanographic Laboratory Coastal Ocean
    Modelling System (POLCOMS)
  • Coupled marine ecosystem modelling
  • http//www.pol.ac.uk/home/research/polcoms/

33
Coupled Marine Ecosystem Model
34
POLCOMS resolution b/m HPCx
35
POLCOMS 2 km b/m All systems
36
  • Efficiency of Codes

37
Motivation and Strategy
  • Scalability of Terascale applications is only
    half the story
  • Absolute performance also depends on
  • single cpu performance
  • Percentage of peak is seen as an
  • important measure
  • Comparison with other systems e.g. vector
    machines
  • Run representative test cases on small numbers of
    processors for applications and some important
    kernels
  • Use IBMs hpmlib to measure Mflop/s
  • Other hpmlib counters can help to understand
    performance
  • e.g. memory bandwidth, cache miss rates, FMA
    count, computational intensity etc.

Scientific output is the key measure
38
Matrix-matrix multiply kernel
39
PCHAN small test case 1203
40
Summary of percentage of peak
41
Acknowledgements
  • Adrian Jackson
  • Chris Johnson
  • Martin Plummer
  • Gavin Pringle
  • Lorna Smith
  • Kevin Stratford
  • Andrew Sunderland
  • HPCx Terascaling Team
  • Mike Ashworth
  • Mark Bull
  • Ian Bush
  • Martyn Guest
  • Joachim Hein
  • David Henty
  • IBM Technical Support
  • Luigi Brochard et al.
  • CSAR Computing Service Cray T3E
    turing, Origin 3800 R12k-400 green
  • ORNL IBM Regatta cheetah
  • SARA Origin 3800 R14k-500
  • PSC AlphaServer SC ES45-1000

42
The Reality of Capability Computing on HPCx
  • The success of the Terascaling strategy is shown
    by the Nov 2003 HPCx usage
  • Capability jobs (512 procs) account for 48 of
    usage
  • Even without Teragyroid it is 40.7

43
Summary
  • HPCx Terascaling team is addressing scalability
    for a wide range of codes
  • Key Strategic Applications Areas
  • Atomic and Molecular Physics, Molecular
    Simulation, Materials Science, Computational
    Engineering, Environmental Science
  • Reflected by take up of Capability Computing on
    HPCx
  • In Nov 03, gt40 of time used by jobs with 512
    procs and greater
  • Key challenges
  • Maintain progress with Terascaling
  • Include new applications and new science areas
  • Address efficiency issues esp. with single
    processor performance
  • Fully exploit the phase 2 system 1.7 GHz p690,
    32 proc partitions, Federation interconnect
Write a Comment
User Comments (0)
About PowerShow.com