Title: eScience: Why it matters, and how to build applications for the Grid'
1 e-Science Why it matters, and how to build
applications for the Grid.
- David Abramson
- Faculty of Information Technology
- Monash University
2Overview
- New Methods in research
- e-Science e-Research
- Computational Platforms
- The Grid and the Web
- Supporting a Software Lifecycle
- The role of Grid Services Middleware
- Software Lifecycle Tools
- Applications development
- Deployment
- Test and debugging
- Execution
- Examples from Monash Tools
- The Nimrod Family
- Applications
- Deployment tools
- Active Data
- More
3New Methods in Research
4e-Science
- Pre-Internet
- Theorize /or experiment, aloneor in small
teams publish paper - Post-Internet
- Construct and mine large databases of
observational or simulation data - Develop simulations analyses
- Access specialized devices remotely
- Exchange information within distributed
multidisciplinary teams
Grids are not just communities of
computers, but communities of researchers, of
people. Peter Arzberger, UCSD
Source Ian Foster
5(No Transcript)
6Typical e-Science Applications
- Characteristics
- High Performance Computation
- Distributed infrastructure
- Instruments are first class resources
- Lots of data
- Not just bigger fundamentally different
- Some examples
- In-silico biology (See MyGrid)
- Earthquake simulation
- Virtual observatory
- High energy physics
- Medical applications
- Environmental applications.
7Computational Platforms
8The Grid
- Infrastructure (middleware services) for
establishing, managing, and evolving
multi-organizational federations - Dynamic, autonomous, domain independent
- On-demand, ubiquitous access to computing, data,
and services - Mechanisms for creating and managing workflow
within such federations - New capabilities constructed dynamically and
transparently from distributed services - Service-oriented, virtualization
Source Ian Foster
9The (Power) GridOn-Demand Access to Electricity
Quality, economies of scale
Time
Source Ian Foster
10By analogy, some challenges
Voltage 110 220 240 Frequency 50 60 Hz.
11Grid and Web Services Convergence
- The definition of WSRF means that the Grid and
Web services communities can move forward on a
common base.
Source Globus Alliance
12Supporting the Software Lifecycle
13Why is this challenging?
Write software for local workstation
14Why is this challenging?
Build heterogeneous testbed
15Why is this challenging?
Deploy Software
16Why is this challenging?
?
?
?
?
Test Software
17Why is this challenging?
Build, schedule Execute virtual application
18Why is this challenging?
Interpret results
19But this what I do well!
20Can we support this process better?
21Grid Services Middleware
22Building Software for the Grid
Courtesy IBM
Platform Infrastructure
Unix
Windows
JVM
TCP/IP
MPI
.Net Runtime
VPN
SSH
23Building Software for the Grid
Upper Middleware Tools
Lower Middleware
Courtesy IBM,
Bonds
Platform Infrastructure
Unix
Windows
JVM
TCP/IP
MPI
.Net Runtime
.Web Services
VPN
SSH
24Building Software for the Grid
Lower Middleware
Globus GT4
SRB
Platform Infrastructure
Unix
Windows
JVM
TCP/IP
MPI
.Net Runtime
.Web Services
VPN
SSH
25Building Software for the Grid
Semantic Gap
Lower Middleware
Globus GT4
SRB
Platform Infrastructure
Unix
Windows
JVM
TCP/IP
MPI
.Net Runtime
.Web Services
VPN
SSH
26Why is there a semantic gap?
def build_rsl_file(executable, args, stagein,
stageout, cleanup) tocleanup stderr
t5temp.mktempfile() stdout
t5temp.mktempfile() rstderr 'GLOBUS_USER_HOME
/.nimrod/' os.path.basename(stderr) rstdout
'GLOBUS_USER_HOME/.nimrod/'
os.path.basename(stdout) rslfile
t5temp.mktempfile() f open(rslfile,
'w') f.write("ltjobgt\n ltexecutablegtslt/executablegt
\n" executable) for arg in args f.write(" lta
rgumentgtslt/argumentgt\n" str(arg)) f.write(" lts
tdoutgtslt/stdoutgt\n" rstdout) f.write(" ltstderr
gtslt/stderrgt\n" rstderr) User defined
stage-in section if stagein f.write(" ltfileSta
geIngt") for src, dest, leave in stagein if
not leave tocleanup.append(dest) f.write("
"" lttransfergt ltsourceUrlgtgsiftp//sslt/sourc
eUrlgt ltdestinationUrlgtfile///GLOBUS_USER_HOM
E/.nimrod/slt/destinationUrlgt lt/transfergt"""
(hostname, src, dest)) f.write("\n\tlt/fileStageI
ngt\n") f.write(" ltfileStageOutgt") User
defined stage-out files section
27Software Layers
Upper Middleware /Tools
Lower Middleware
SRB
Globus GT4
Platform Infrastructure
Unix
Windows
JVM
TCP/IP
MPI
.Net Runtime
.Web Services
VPN
SSH
28Software Layers
Upper Middleware /Tools
Lower Middleware
Globus GT4
SRB
Platform Infrastructure
Unix
Windows
JVM
TCP/IP
MPI
.Net Runtime
VPN
SSH
29Applications Development
Upper Middleware /Tools
Lower Middleware
Globus GT4
SRB
29
30Applications Development on the Grid
- New Applications
- Code to middleware standards
- Significant effort
- Exciting new distributed application
- Numerous programming techniques
- Legacy Applications
- Were built before the Grid
- They are fragile
- File based IO
- May be sequential
- Leverage old codes to produce new virtual
application - Amenable to Grid Workflows
31Approaches to Grid programming
- General Purpose Workflows
- Generic solution
- Workflow editor
- Scheduler
- Special purpose workflows
- Solve one class of problem
- Specification language
- Scheduler
32 eNabling Science and Engineering with Nimrod
33High throughput computing
- Ad-hoc supercomputing
- Study or search the behaviour of some of the
output variables against a range of different
input scenarios. - Design optimization
- Allows robust analysis
- More realistic simulations
- Computations are loosely coupled (file transfer)
- Very wide range of applications
34Nimrod ...
- Supports workflows for robust design and search
- Vary parameters
- Execute programs
- Copy data in and out
- Sequential and parallel dependencies
- Computational economy drives scheduling
- Computation scheduled near data when appropriate
- Use distributed high performance platforms
- Upper middleware broker for resources discovery
- Wide Community adoption
Nimrod/K
Nimrod/WS
Nimrod/OI
Active Sheets (Excel)
Nimrod/O
EnFuzion (www.axceleon.com)
Nimrod
Nimrod/G
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Nimrod Roadmap
35The Nimrod family
Plan File
parameter pressure float range from 5000 to 6000
points 4 parameter concent float range from 0.002
to 0.005 points 2 parameter material text select
anyof Fe Al task main copy compModel
nodecompModel copy inputFile.skel
nodeinputFile.skel nodesubstitute
inputFile.skel inputFile nodeexecute
./compModel lt inputFile gt results copy
noderesults results.jobname endtask
36Nimrod scales from local to remote resources
37Nimrods Scheduler
Soft real-time scheduling problem
38From drug to aircraft to antenna design
Aerofoil Design
Antenna Design
Drug Docking
39Nimrod Development Cycle
Sent to available machines
Prepare Jobs using Portal
Results displayed interpreted
Jobs Scheduled Executed Dynamically
40Optimization using Nimrod/O
- Nimrod/G allows exploration of design scenarios
- Search by enumeration
- Search for local/global minima based on objective
function - How do I minimise the cost of this design?
- How do I maxmimize the life of this object?
- Objective function evaluated by computational
model - Computationally expensive
41How Nimrod/OWorks
Genetic Algorithm
Simplex
BFGS
Nimrod Plan File
Nimrod or EnFuzion Dispatcher
Grid or Cluster
42Experimental Design with Nimrod/E
- Want to evaluate effects of parameters and
parameter combinations - Design of Experiments approach
- Dates back to 1950
- Extensively used to generate minimum number of
right experiments - New support in Nimrod/G
- Specify resolution of experiment
43Nimrod Applications
- Physics
- Environmental Science
- Systems Biology
- Chemistry
- Engineering
44Physics
45Ionisation Chamber Design Lew Kotler, ARPANSA
46Radiotherapy planningGiddy, Chin, Lewis, Welsh
e-Science Centre, UK
RADIATION SOURCE
PATIENT
IMAGER
www.utsouthwestern.edu/.../270177SynergyS.2.bmp
47Outcomes
CONVOLUTION /SUPERPOSITION
MONTE CARLO
Spezi E 2003 PhD Thesis Med Phys 31(3)
48SmartPET - A Compton CameraToby Beveridge,
Monash University
- A SmartPET Detector
- Large Volume - 20 x 60 x 60 mm3
- Operating Range 0.1 2 MeV
- Detector resolution depends on Pulse Shape
Analysis
- A Compton Camera
- Extensive FoV
- Multi-resolution Data
- Angular precision depends on detector resolution
- Multi-parameter space is difficult to
characterise, and optimise, analytically - Monte-Carlo solutions such as GEANT4 are
computationally expensive
49Outcomes
Each pixel (at a particular incident energy) was
assigned a separate job
At each point the resolution matrix could be
calculated
For a Single Trial242 point-source locations
(112 field over 2 orthogonal planes)5 energies
(between 140 keV and 1000 keV)2 different
detection conditions242 x 5 x 2 x (20 mins per
run) 806 hours
50Environmental Science
51Climate StudiesLynch, Abramson, Görgen,
Beringer, Uotila, Monash University
- Extensive savanna eco-systems in northern
Australia - Changing fire regime
- Fires lead to abrupt changes in surface
properties - Surface energy budgets
- Partititioning of convective fluxes
- Increased soil heat flux
- Modified surface-atmosphere coupling
- Sensitivity study do the fires effects on
atmospheric processes lead to changes in highly
variable precipitation regime of Australian
Monsoon? - Many potential impacts (e.g. agricultural
productivity)
(J. Beringer)
52Outcomes
A Workshop On Earth System Models of Intermediate
Complexity28-29 March 2006 at the Bureau of
Meteorology Research Centre, Melbourne
53Systems biology
54Cardiac ModellingSher, Gavaghan, Hinch, Noble,
Oxford University
- Heart disease still leading
cause of death - Understanding the underlying physiological
mechanisms is cheaper and faster when
experimental studies are performed together with
mathematical models computer simulations - Studying pathologies
- Developing Testing drugs
55Cardiac Modeling
- Based on experimental data, mathematical models
have been developed - ODEs
- Initial conditions
- Ion movement in single cells
Shannon et al. model, 2004
56Studying ionic modelsAnna Sher, Oxford
- Examine the effect of various parameters on
Ca2-induced Ca2 release and on shape of the
action potential - Fit simulated to experimental data
- Identify parameter(s) that are critical to
distinguish Ca2 dynamics within various species
57Outcomes
- Single cell ionic models allow us to study
- Whole cell currents during an action potential
(AP) - Currents in response to voltage-clamp stimuli
- Dynamics of ions such as Ca2 and Na
- Force-frequency relationship
- etc.
58More Cardiac ModellingDederko, Nevo, Altshuler,
Wu, Mcculloch, Mihaylova, Kerckhoffs , UCSD
59Chemistry
60Quantum ChemistryWibke Sudholt, Univ Zurich
61Drug docking pipelineBaldridge, Amoreira, Univ
Zurich, Berstis, Kondrick, UCSD
- Goal is to minimize the free binding energy
- Use Quantum calculations for more realism
Protein Data Bank
PDB2PQR
WHATIF
QMView
APBS
Compute free binding energy
Add Hydrogen Atoms
Remove the water network
Solve Poisson-Boltzmann equation
Place ligand
62Engineering
63Flame Kernel Growth in Turbulent FlowsTom
Dunstan, Karl Jenkins, Cranfield University
64Turbulent Flame propagation
65Deployment
Upper Middleware /Tools
IE
Eclipse
Worqbench
Lower Middleware
Globus GT4
SRB
65
66Why is this challenging?
Deploy Software
67Deployment
- Has largely been ignored in Grid middleware
- Globus supports file transport, execution, data
access - Challenges
- Deployment interfaces lacking
- Heterogeneity
Grid Deploy Aware Clients
CLIENT
RFT
GRAM
Delegation
Index
Trigger
Archiver
CAS
OGSA-DAI
GTCP
Deployment
Your Java Service
Your Java Service
High Performance Virtualization
SERVER
Globus 4.0 Services
68Towards a Grid Deployment Service
Configured Application
InstantiatedApplication
6
4
Un-configured Files
User Security Scope
Globus User Hosting Environment
Reliable File Transfer Service (GridFTP)
DistAnt Service
Managed Job Service (GRAM)
Remote Host
2
3
5
Application Files
4
6
RSL
Ant Build File
DistAnt Deployment Client
Local Host
1
69High Performance VirtualizationThe Motor Runtime
- Our approach is runtime-internal
- Why do Java .NET support web services, UI,
security and other libraries as part of the
standard environment? - Functionality is guaranteed
- Similarly, we aim to provide guaranteed HPC
functionality
70Test and Debug
Upper Middleware /Tools
Lower Middleware
Globus GT4
SRB
Deploy
71Why is this challenging?
?
?
?
?
Test Software
72Grid level basic debugging
Hardware
Software
Grid Debug Aware Clients
CLIENT
RFT
GRAM
Delegation
Index
Trigger
Archiver
CAS
OGSA-DAI
Debug
GTCP
Your Java Service
Your Java Service
SERVER
73Grid level basic debugging
Hardware
Software
Job Scheduler
globus run-ws
2
WS-GRAM
1
User
4
WS-DBG
Dbg Lib
Debug Client
3
App
GDBServer
8
5
6
7
GDB
74Relative Debugging on the Grid
Server running application Big Endian 64 bit
Grid Infrastructure
Server running application Little endian 32 bit
75Visualize differences
Different Results?
Complex Data Types
Source Code
Assertions
Simple Data Types
Build Assertions
Run Both Applications
76Execution
Upper Middleware /Tools
Lower Middleware
Globus GT4
SRB
77Why is this challenging?
Build, schedule Execute virtual application
78The Nimrod Execution Architecture
79Nimrod/G Architecture
Enfuzion API
Run File
Creator
Nimrod Portal
Job Scheduler
Agent Scheduler
DB Server
Condor Actuator
Legion Actuator
Globus Actuator
Grid Middleware
Grid Information Server(s)
RM TS
G
Agent
Agent
Agent
RM TS
L
Globus enabled node
C
RM TS
Legion enabled node.
Condor enabled node.
RM Local Resource Manager, TS Trade Server
80Felxible Workflow Run Time Machinery
81GriddLeS
- Support a variety of inter-communication
mechanisms in workflows - Legacy applications need to be shielded from IO
details in Grid - Local files
- Remote files
- Replicated files
- Producer-consumer pipes
- Dont want to lock in IO model when application
is written (or even Grid Enabled) - Choice of IO model should be
- Dynamic
- Late bound
82Flexible IO in GriddLeS
83A Grid Data Life Cycle
- Derived data may be stored as computation
procedures - Virtual Data Grid (e.g. Chimera)
- Re-create deleted data dynamically
- Use buffering for seamless recreation?
84Acknowledgements MESSAGE Lab
- Faculty Members
- Jeff Tan
- Research Fellows
- Blair Bethwaite
- Clement Chu
- Colin Enticott
- Slavisa Garic
- Tom Peachy
- Admin
- Rob Gray
- Current PhD Students
- Shahaan Ayyub
- Philip Chan
- Tim Ho
- Donny Kurniawan
- Completed PhD Students
- Greg Watson
- Rajkumar Buyya
- Andrew Lewis
- Funding Support
- CRC for Enterprise Distributed Systems (DSTC)
- Australian Research Council
- GrangeNet (DCITA)
- Australian Partnership for Advanced Computing
(APAC) - Microsoft
- Sun Microsystems
- IBM
- Hewlett Packard
- Axceleon
85Questions?
- www.csse.monash.edu.au/davida