Title: Community grid infrastructures for geosciences and materials modelling
1Community grid infrastructures for geosciences
and materials modelling Martin Dove University
of Cambridge and National Institute for
Environmental eScience
2Hello and thank you NIEeS
National Institute for Environmental eScience
Partnership between
- aiming to support the development of escience
work within the UK environmental science
community - centre for national capability for escience
within the UK environmental sciences community
3Hello and thank you NIEeS
National Institute for Environmental eScience
- Stuart Ballard
- Ian Frame
- Gen-Tao Chiang
- Therese Williams
4Hello and thank you NIEeS
National Institute for Environmental eScience
- Training events (eg Google maps, XML from
Fortran) - Source of expertise, capability information
- Grid-enabling geoscience and environmental
sciences applications
5New initiatives eGY
We are one of the UK representatives for the
Electronic Geophysical Year 20078
We can achieve a major step forward in
geoscience capability, knowledge, and usage
throughout the world for the benefit of humanity
by accelerating the adoption of modern and
visionary practices for managing and sharing data
and information.
6Hello and thank you eMinerals
eMinerals project
NERC-funded escience project involving Cambridge,
Bath, UCL, Royal Institution, CCLRC Daresbury,
Birkbeck
- Developing the concept of community grids to
support collaborative working in molecular-scale
simulations - Emphasis on grid computing, data management,
information delivery and collaborative working - Core applications in understanding pollutants in
the environment at the atomic scale
7Hello and thank you eMinerals
- Toby White, Kat Austen, Andrew Walker, Emilio
Artacho, Peter Murray-Rust (Cambridge) - Rik Tyer, Kerstin Kleese (CCLRC)
- Steve Parker, Arnaud Marmier, Corinne Arrouvel
(Bath) - Ismael Bhana (Reading)
8Scales of geosciences and environmental sciences
9Our view of eScience
- eScience refers to new science opportunities that
require distributed collaborations and are
enabled by emerging internet technologies. - These technologies include grid computing,
distributed data management and collaborative
tools. - Many tools are still in the process of rapid
development, and in some cases standards are not
yet established.
10Our view of eScience
11Implicit in the grid area
12Example adsorption of dioxin molecules on clay
surfaces
Combinatorial study requiring grid computing,
data management and collaborative tools
13eScience Science beyond the lab book
- Management of many tasks
- Management of the resultant data deluge
- Sharing the information content with collaborators
Stretching science beyond human limitations
whilst maintaining accuracy and accountability
14eMinerals science Molecular-scale environmental
issues
Radioactive waste disposal
Pollution molecules and atoms on mineral surfaces
Crystal dissolution and weathering
Crystal growth and scale inhibition
15eMinerals science Molecular-scale environmental
issues
Radioactive waste disposal
Pollution molecules and atoms on mineral surfaces
Crystal dissolution and weathering
Crystal growth and scale inhibition
16Using the Virtual Organisation model in
environmental science
- A comprehensive assault on the issue of transport
of pollutants in the environment - Heavy metal poisonous waste
- Toxic organic molecules
- Nuclear waste encapsulation
17Collaborative science and the eMinerals Virtual
Organisation
18Collaborative science and the eMinerals Virtual
Organisation
19So what does our Virtual Organisation need?
- Computing grid infrastructure for running
large-scale simulation studies - Easy-to-use tools for managing large-scale
combinatorial computational studies - Ability to share data between collaborators
- Ability to extract and share information content
- Grid-enabling of simulation codes
- Means to communicate effectively (NOT email!)
20Components of the compute grid
- Authentication authorisation, and job
submission, handled by Globus
21Researcher
22Data grid the San Diego Storage Resource Broker
Distributed file management
Distributed data vaults
23SRB client tools
- Unix Scommands (eg Sput, Sls, Scd, Sget)
- Web interface
- GUI for MS Windows (InQ)
24Running jobs on our minigrid Job workflow
- The scientist places input files and application
executables into the SRB - The scientist submits the job to a grid resource
- The job downloads the files from the SRB
- The calculations are performed
- The job places all output files into the SRB
- Metadata and core information are collected from
the output files - The scientist retrieves data files from the SRB
and/or core information from the metadata store
25Researcher
4. Job runs on grid compute resources
Application server
26User interface my_condor_submit tool
Executable ossia2004 pathToExe
/home/rty.eminerals/OSSIA2004preferredMachin
eList lv1.nw-grid.ac.uk-serial
dl1.nw-grid.ac.uk-serial jobType
performance numOfProcs 1 Output
trans.out Sdir
/home/mdv.eminerals/RMCSdemo Sget
Sput GetEnvMetadata
true RDesc Test sweep of temperature
using ossia RDatasetID 263 AgentX
Temperature,trans.xml/ParameterListtitle'Init
ial System'/Parametername'Temperature' AgentX
Energy,trans.xml/PropertyListlast/Pr
opertytitle'Energy' AgentX
OrderParameter,trans.xml/Modulelast/Propertyti
tle'Order parameter' AgentX
HeatCapacity,trans.xml/Modulelast/Propertytitl
e'Heat capacity' AgentX
Susceptibility,trans.xml/Modulelast/Propertyti
tle'Susceptibility'
27User profile
- Users do not want portals portals are for tools,
not for the working environment - Users do not want their applications pre-wrapped
as services they want to have complete control
over their applications, e.g. to add capability - Users do not want a provider/consumer model that
does not provide the freedom they need
28Data sharing the need for information delivery
tools
Classical molecular dynamics methods
Quantum mechanical methods
29Collaborative grid Data and information sharing
?
30Data and information sharing XML data
representation
lt?xml version"1.0" encoding"UTF-8"?gt ltcml
convention"FoX_wcml-2.0" fileId"cis1.cml"
version"2.4" xmlns"http//www.xml-cml.org/schema
"gt ltmetadataList name"Metadata"gt ltmetadata
name"LeadProgramAuthor" content"Martin
Dove"/gt ltmetadata name"Code name"
content"ossia"/gt ... lt/metadataListgt ltmo
dule title"Initial System" dictRef"emininitialM
odule"gt ltparameterListgt ltparameter
dictRef"ossiatemperature" name"Temperature"gt lts
calar dataType"xsddouble units"cmlUnitseV"gt1.
000000000000e-1lt/scalargt lt/parametergt ltparam
eter dictRef"ossiaNumberOfSteps" name"Number
of steps"gt ltscalar dataType"xsdinteger"
units"unitscountable"gt10000000lt/scalargt lt/parame
tergt ... lt/parameterListgt lt/modulegt
... ltmodule title"Finalization"
dictRef"eminfinalModule"gt ltpropertyListgt ltproper
ty dictRef"ossiaEnergy" title"Energy"gt ltscalar
dataType"xsddouble" units"cmlUnitseV"gt2.052516
362912e-1lt/scalargt lt/propertygt
... lt/propertyListgt lt/modulegt lt/cmlgt
Chemical Markup Language
Capturing metadata
Capturing initial parameters
Capturing computed properties
31What XML gives us
- Simulation code output that is self-describing
(no more mere lists of numbers!) - XML files can be transformed to give user-centric
and information-centric representations of data,
including plotted data - XML files can have key information extracted
easily, essential for large combinatorial studies - XML enables automatic capture of metadata, and
metadata is essential for managing data
32XML ? metadata
- Our job submission tools automatically harvest
metadata from our output XML files - We have developed a new set of tools to access
the metadata database (RCommands)
- We use metadata for locating data and datasets
created by our colleagues - We also use metadata for extracting core
information from data  useful for analysing
combinatorial studies
33Collaborative grids
Classical molecular dynamics methods
Quantum mechanical methods
34Researcher A
Web 2.0
Researcher B
35Example Compressibility of amorphous silica
- Density is not quite linear note that the
gradient is larger in the middle of the plot than
at either end. - Bulk modulus (BM)
- BM has minimum around 2 GPa compressibility
1/B has maximum
36Example Compressibility of amorphous silica
Molecular dynamics simulations of
pressure-dependence of amorphous silica Volume
curve shows that silica gets softer around 2
GPa Negative derivative defines the
compressibility
37The message from this example
- We had to run over 600 sets of simulations and
analyses ... - ... each generating around 2 GB data
- We used grid computing to run the simulations
using our tools, and our data management system
to extract and share the key data - And part of this was run by a third-year project
student, Lucy
38Example Dioxin molecules on layer silicate
neutral surfaces
39Range of molecule/substrate systems
Calculating how the binding energy varies across
the molecular series from all chlorine to no
chlorine
40Summary
- The tools I have discussed can be used for many
different application domains within the
geosciences and environmental sciences
- Emphasis on all core areas compute, data and
collaborative grids