TeraGyroid

About This Presentation

Title:

TeraGyroid

Description:

UKLight Town Meeting, NeSC, 9/9/2004. 2. The TeraGyroid Project. Funded by EPSRC (UK) & NSF (USA) to join the UK ... rapprochement necessary for success ... – PowerPoint PPT presentation

Number of Views:99

Avg rating:3.0/5.0

Slides: 25

Provided by: stephe530

Category:

more less

Transcript and Presenter's Notes

Title: TeraGyroid

1
TeraGyroid

HPC Applications ready for UKLight

Stephen Pickles ltstephen.pickles_at_man.ac.ukgt http
//www.realitygrid.org http//www.realitygrid.org/
TeraGyroid.html UKLight Town Meeting, NeSC,
Edinburgh, 9/9/2004
2
The TeraGyroid Project

Funded by EPSRC (UK) NSF (USA) to join the UK
e-Science Grid and US TeraGrid
application from RealityGrid, a UK e-Science
Pilot Project
3 month project including work exhibited at SC03
and SC Global, Nov 2003
thumbs up from TeraGrid mid-September, funding
from EPSRC approved later
Main objective was to deliver high impact science
which it would not be possible to perform without
the combined resources of the US and UK grids
Study of defect dynamics in liquid crystalline
surfactant systems using lattice-Boltzmann
methods
featured worlds largest Lattice Boltzmann
simulation
10243 cell simulation of gyroid phase demands
terascale computing
hence TeraGyroid

3
Networking
HPC engine
HPC engine
checkpoint files
steering control and status
visualization data
compressed video
visualization engine
storage
4
LB3D 3-dimensional Lattice-Boltzmann simulations

LB3D code is written in Fortran90 and
parallelized using MPI
Scales linearly on all available resources
(Lemieux, HPCx, CSAR, Linux/Itanium II clusters)
Data produced during a single run can exceed 100s
of gigabytes to terabytes
Simulations require supercomputers
High end visualization hardware (eg. SGI Onyx,
dedicated viz clusters) and parallel rendering
software (e.g. VTK) needed for data analysis

3D datasets showing snapshots from a simulation
of spinodal decomposition A binary mixture of
water and oil phase separates. Blue areas
denote high water densities and red visualizes
the interface between both fluids.
5
Computational Steering ofLattice Boltzmann
Simulations

LB3D instrumented for steering using the
RealityGrid steering library.
Malleable checkpoint/restart functionality allows
rewinding of simulations and run-time job
migration across architectures.
Steering reduces storage requirements because the
user can adapt data dumping frequencies.
CPU time can be saved because users do not have
to wait for jobs to be finished if they can
already see that nothing relevant is happening.
Instead of doing task farming, parameter
searches are accelerated by steering through
parameter space.
Analysis time is significantly reduced because
less irrelevant data is produced.

Applied to study of gyroid mesophase of
amphiphilic liquid crystals at unprecedented
space and time scales
6
Parameter space exploration
Cubic micellar phase, high surfactant density
gradient.
Cubic micellar phase, low surfactant density
gradient.
Initial condition Random water/ surfactant
mixture.
Self-assembly starts.
Lamellar phase surfactant bilayers between water
layers.
Rewind and restart from checkpoint.
7
Strategy

Aim use federated resources of US TeraGrid and
UK e-Science Grid to accelerate scientific
process
Rapidly map out parameter space using large
number of independent small (1283) simulations
use job cloning and migration to exploit
available resources and save equilibration time
Monitor their behaviour using on-line
visualization
Hence identify parameters for high-resolution
simulations on HPCx and Lemieux
10243 on Lemieux (PSC) takes 0.5 TB to
checkpoint!
create initial conditions by stacking smaller
simulations with periodic boundary conditions
Selected 1283 simulations were used for
long-time studies
All simulations monitored and steered by
geographically distributed team of computational
scientists

8
The Architecture of Steering
OGSI middle tier
multiple clients Qt/C, .NET on PocketPC,
GridSphere Portlet (Java)
remote visualization through SGI VizServer,
Chromium, and/or streamed to Access Grid

Computations run at HPCx, CSAR, SDSC, PSC and
NCSA
Visualizations run at Manchester, UCL, Argonne,
NCSA, Phoenix
Scientists in 4 sites steer calculations,
collaborating via Access Grid
Visualizations viewed remotely
Grid services run anywhere

9
SC Global 03 Demonstration
10
TeraGyroid Testbed
Starlight (Chicago)
Netherlight (Amsterdam)
10 Gbps
ANL
PSC
Manchester
Caltech
BT provision
NCSA
Daresbury
2 x 1 Gbps
production network
MB-NG
SJ4
SDSC
Phoenix
Visualization
UCL
Computation
Access Grid node
Service Registry
Network PoP
Dual-homed system
11
Trans-AtlanticNetwork

Collaborators
Manchester Computing
Daresbury Laboratory Networking Group
MB-NG and UKERNA
UCL Computing Service
BT
SurfNET (NL)
Starlight (US)
Internet-2 (US)

12
TeraGyroidHardware Infrastructure

Computation (using more than 6000 processors)
including
HPCx (Daresbury), 1280 procs IBM Power4 Regatta,
6.6 Tflops peak, 1.024 TB
Lemieux (PSC), 3000 procs HP/Compaq, 3TB memory,
6 Tflops peak
TeraGrid Itanium2 cluster (NCSA), 256 procs, 1.3
Tflops peak
TeraGrid Itanium2 cluster (SDSC), 256 procs, 1.3
Tflops peak
Green (CSAR), SGI Origin 3800, 512 procs, 0.512
TB memory (shared)
Newton (CSAR), SGI Altix 3700, 256 Itanium 2
procs, 384GB memory (shared)
Visualization
Bezier (Manchester), SGI Onyx 300, 6xIR3, 32procs
Dirac (UCL), SGI Onyx 2, 2xIR3, 16 procs
SGI loan machine, Phoenix, SGI Onyx 1xIR4, 1xIR3,
commissioned on site
TeraGrid Visualization Cluster (ANL), Intel Xeon
SGI Onyx (NCSA)
Service Registry
Frik (Manchester), Sony Playstation2
Storage
20 TB of science data generated in project
2 TB moved to long term storage for on-going
analysis - Atlas Petabyte Storage System (RAL)
Access Grid nodes at Boston University, UCL,
Manchester, Martlesham, Phoenix (4)

13
Network lessons

Less than three weeks to debug networks
applications people and network people nodded
wisely but didnt understand each other
middleware such as GridFTP is infrastructure to
applications folk, but an application to network
folk
rapprochement necessary for success
Grid middleware not designed with dual-homed
systems in mind
HPCx, CSAR (Green) and Bezier are busy production
systems
had to be dual homed on SJ4 and MB-NG
great care with routing
complication we needed to drive everything from
laptops that couldnt see the MB-NG network
Many other problems encountered
but nothing that cant be fixed once and for all
given persistent infrastructure

14
Measured Transatlantic Bandwidths during SC03
15
TeraGyroid Summary

Real computational science...
Gyroid mesophase of amphiphilic liquid crystals
Unprecedented space and time scales
investigating phenomena previously out of reach
...on real Grids...
enabled by high-bandwidth networks
...to reduce time to insight

Dislocations
Interfacial Surfactant Density
16
TeraGyroid Collaborating Organisations

Our thanks to hundreds of individuals at...
Argonne National Laboratory (ANL)
Boston University
BT
BT Exact
Caltech
CSC
Computing Services for Academic Research (CSAR)
CCLRC Daresbury Laboratory
Department of Trade and Industry (DTI)
Edinburgh Parallel Computing Centre
Engineering and Physical Sciences Research
Council (EPSRC)
Forschungzentrum Juelich
HLRS (Stuttgart)
HPCx
IBM
Imperial College London
National Center for Supercomputer Applications
(NCSA)

ANL
17
The TeraGyroid Experiment

S. M. Pickles1, R. J. Blake2, B. M. Boghosian3,
J. M. Brooke1,
J. Chin4, P. E. L. Clarke5, P. V. Coveney4,
N. González-Segredo4, R. Haines1, J. Harting4, M.
Harvey4,
M. A. S. Jones1, M. Mc Keown1, R. L. Pinning1,
A. R. Porter1, K. Roy1, and M. Riding1.
Manchester Computing, University of Manchester
CLRC Daresbury Laboratory, Daresbury
Tufts University, Massachusetts
Centre for Computational Science, University
College London
Department of Physics Astronomy, University
College London

http//www.realitygrid.org http//www.realitygrid
.org/TeraGyroid.html
18
New Application at AHM2004
Exact calculation of peptide-protein binding
energies by steered thermodynamic integration
using high-performance computing grids.

Philip Fowler, Peter Coveney, Shantenu Jha and
Shunzhou Wan
UK e-Science All Hands Meeting
31 August 3 September 2004

19
Why are we studying this system?

Measuring binding energies are vital for e.g.
designing new drugs.
Calculating a peptide-protein binding energy can
take weeks to months.
We have developed a grid-based method to
accelerate this process

To compute ??Gbind during the AHM 2004 conference
i.e. in less than 48 hours Using federated
resources of UK National Grid Service and US
TeraGrid
20
Thermodynamic Integration on Computational Grids
Use steering to launch, spawn and terminate ?-
jobs
Starting conformation
Check for convergence
Combine and calculate integral
?0.1
time
?0.2
?0.3
lambda
Seed successive simulations (10 sims, each 2ns)

?0.9
Run each independent job on the Grid
21
checkpointing
steering and control
monitoring
22
We successfully ran many simulations

This is the first time we have completed an
entire calculation.
Insight gained will help us improve the
throughput.
The simulations were started at 5pm on Tuesday
and the data was collated at 10am Thursday.
26 simulations were run
At 4.30pm on Wednesday, we had nine simulations
in progress (140 processors)
1x TG-SDSC, 3x TG-NCSA, 3x NGS-Oxford, 1x
NGS-Leeds, 1x NGS-RAL
We simulated over 6.8ns of classical molecular
dynamics in this time

23
Very preliminary results
??G (kcal/mol) Experiment -1.0 0.3 Quick
and dirty analysis -9 to -12
- as at 41 hours
We expect our value to improve with further
analysis around the endpoints.
24
Conclusions

We can harness todays grids to accelerate
high-end computational science
On-line visualization and job migration require
high bandwidth networks
Need persistent network infrastructure
else set up costs are too high
QoS Would like ability to reserve bandwidth
and processors, graphics pipes, AG rooms, virtual
venues, nodops... (but thats another story)
Hence our interest in UKLight