Title: e-Science Technologies in the Simulation of Complex Materials
1e-Science Technologies in the Simulation of
Complex Materials
L. Blanshard, R. Tyer, K. Kleese
S. A. French, D. S. Coombes, C. R. A. Catlow
B. Butchart, W. Emmerich CS H. Nowell, S. L.
Price Chem
eMaterials
2Polymorphism
prediction of polymorphs a drug substance may
exist as two or more crystalline phases in which
the molecules are packed differently.
Combinatorial Computational Catalysis
explore which sites are involved in catalysis
used in diverse industries including petroleum,
chemical, polymers, agrochemicals, and
environmental.
3Combinatorial Computational Catalysis
explore which sites are involved in catalysis
used in diverse industries including petroleum,
chemical, polymers, agrochemicals, and
environmental.
4e-Science Issues to Address
- simulations take too long to run
- data are distributed across many sites and
systems - no catalogue system
- output in legacy text files, different for each
program - few tools to access, manage and transfer data
- workflow management is manual
- licensing within distributed environment
5Acid Sites in Zeolites
- Determine the extra framework cation position
within the zeolite framework. - Explore which proton sites are involved in
catalysis and then characterise the active sites. - To produce a database with structural models and
associated vibrational modes for Si/Al ratios. - Improve understanding of the role of the Si/Al
ratio in zeolite chemistry.
6Chabazite 1T site, 12 Si centres per unit cell,
8 membered ring channels (3.8Å 3.8Å).
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11The Problem
Si/Al 11 4 Si/Al 5 160 Si/Al 3
5760 Si/Al 2 184,320 The number of
calculations quickly becomes an issue when
realistic Si/Al ratios are considered. A Si/Al
ratio of 2 would require 184,320 calculations at
100 second each. 5120.0 hours 213 days of
cpu time.
When substitution of a second Al is considered
there are now 4 (10 4) possible structures as
symmetry has been broken.
Note this is for a very simple zeolite with 36
ions per unit cell, materials of interest have
296.
12MC/EM
A combined MC and EM approach has been developed
to model zeolitic materials with low and medium
Si/Al ratios. Firstly Al is inserted into a
siliceous unit cell and then charge compensate
with cations.
13RI Condor Pool
- Name OpSys Arch
State Activity LoadAv Mem ActvtyTime - vm1-8_at_faraday.r IRIX65 SGI Owner
Idle 1.192 128 3030102 - vm1-14_at_tyndall.r IRIX65 SGI Unclaimed
Idle 0.000 507 0001509 - ising2.ri.ac. LINUX INTEL
Unclaimed Idle 0.200 501 ????? - vm1-16_at_strutt1-4 OSF1 ALPHA Owner
Idle 1.113 1024 002646 - xp2.ri.ac.uk OSF1 ALPHA Owner
Idle 1.113 256 49122646 - xp3.ri.ac.uk OSF1 ALPHA
Unclaimed Idle 0.000 256 0005500 - d8.ri.ac.uk WINNT40 INTEL
Unclaimed Idle 0.000 255 0020945 - ATLANTIC WINNT51 INTEL
Unclaimed Idle 0.008 256 0010230 - BABBLE.ri.ac. WINNT51 INTEL
Unclaimed Idle 0.252 512 0002257 - D500.ri.ac.uk WINNT51 INTEL Owner
Idle 0.533 254 0052606 - PCDAVIDC.ri.a WINNT51 INTEL Unclaimed
Idle 0.000 504 0035126 - e-sam.ri.ac.u WINNT51 INTEL
Unclaimed Idle 0.001 512 0031639 - pcalexey.ri.a WINNT51 INTEL
Unclaimed Idle 0.002 256 0003553 - Machines Owner Claimed
Unclaimed Matched Preempting - ALPHA/OSF1 18 1 0
1 0 0
We have set up and tested a Condor pool at the
RI, which has 50 heterogeneous nodes from
desktop PCs, machines controlling instruments to
main servers of the DFRL.
14RI Condor Pool
- Name OpSys Arch
State Activity LoadAv Mem ActvtyTime - vm1-8_at_faraday.r IRIX65 SGI Owner
Idle 1.192 128 3030102 - vm1-14_at_tyndall.r IRIX65 SGI Unclaimed
Idle 0.000 507 0001509 - ising2.ri.ac. LINUX INTEL
Unclaimed Idle 0.200 501 ????? - vm1-16_at_strutt1-4 OSF1 ALPHA Owner
Idle 1.113 1024 002646 - xp2.ri.ac.uk OSF1 ALPHA Owner
Idle 1.113 256 49122646 - xp3.ri.ac.uk OSF1 ALPHA
Unclaimed Idle 0.000 256 0005500 - d8.ri.ac.uk WINNT40 INTEL
Unclaimed Idle 0.000 255 0020945 - ATLANTIC WINNT51 INTEL
Unclaimed Idle 0.008 256 0010230 - BABBLE.ri.ac. WINNT51 INTEL
Unclaimed Idle 0.252 512 0002257 - D500.ri.ac.uk WINNT51 INTEL Owner
Idle 0.533 254 0052606 - PCDAVIDC.ri.a WINNT51 INTEL Unclaimed
Idle 0.000 504 0035126 - e-sam.ri.ac.u WINNT51 INTEL
Unclaimed Idle 0.001 512 0031639 - pcalexey.ri.a WINNT51 INTEL
Unclaimed Idle 0.002 256 0003553 - Machines Owner Claimed
Unclaimed Matched Preempting - ALPHA/OSF1 18 1 0
1 0 0
But where is PC-CRAC???
15Level of Optimisation
50eV
16Level of Optimisation
240eV
17MOR
- Mordenite
- 1 dimensional channel system
- simulation cell contains two unit cells
- 296 atoms, with 96 Si centres (referred to as T
sites). - Substituting 8 T sites with 8 Na cations
18Workflow
MC_subs
Gulp Files
Gulp WinXP
Perl script
MS Excel
SRB
19Workflow II
C
MC_subs
Si-zeo structure Interatomic pots Input file
Gulp Files
Batch of labelled Gulp files
Script auto batch sub Script for cleaning dirs
Gulp WinXP
Perl script
f90
Subset of data in formatted file
Scommands
MS Excel
SRB
20Condor Stats
Extensive use of Condor pools (UCL 950 nodes in
teaching pools). 150 cpu-years of previously
unused compute resource have been utilised in
this study. Close collaboration with the NERC
e-minerals project has allowed access to this
resource. 150,000 calculations have been
performed each with varying numbers of particles
per simulation box, which means a total of
75,000,000 particles have been included in our
simulations of Mordenite to date.
21Condor Specifics
Jobs submitted in 1,000 job batches issue of
stability. Shadows not my game but a pain when
Condor Master dies due to too many jobs hitting
the queue (guilty feeling as Master was not
solely running pool but also being used for
science by pool administrator. Maximum number of
jobs in queue.
22Condor Specifics
Handling of data and analysis becomes
RDS. However, keeping the pool full of jobs is
also a tedious step when jobs are short, which is
the ideal for the UCL pool (re turning off pool
once a day) drip feeding.
Thought in application design is key many on
UCL pool are TOTALLY unsuitable for UCL Condor
Pool.
23MOR
- Mordenite
- 1 dimensional channel system
- simulation cell contains two unit cells
- 296 atoms, with 96 Si centres (referred to as T
sites). - Substituting 8 T sites with 8 Na cations
24100 Configurations
0
100
20eV
It can be seen that there are two distinct
regions, -12079eV to -12076eV and -12075eV to
-12073eV, but there is no obvious correlation
between total energy and cell volume.
2510000 Configurations
0
10000
25eV
However, when 10,000 structures are considered it
is clear that the most stable structures
correspond to cation placements that do not cause
the cell to expand. This requires that the
cations sit in the large channel.
2610000 Configurations
27Comparison of Regions
-12079.5eV
-12075.04eV
28Analysis
mysql, allows input from a text file, C/C
program or mysql command line and GUI
29Workflow III
MC_subs
Gulp Files
Gulp WinXP
mysql
db
SRB
30Building an Ensemble
31Validation
Comparison with experiment is very promising
showing a large difference in the quality of the
fit between good set and bad.
32Monitor
33Drip Feeding and Interactive Steering using
Relational Databases
Distributed Computing Portal
User Input Structural model Si/Al, cation types,
H2O etc.
Model/Configuration Generator
Jobs
db
Analysis(geometry, energy, fit)
Steering
db
Improve generation / modelstrategy
Analysis
db
User Input Diffraction data, chemical analysis,
building units, Si/Al, cation types, H2O etc.
D. Lewis, R. Coates, S. French UCL Chem / RI
34Workflow IV
Workflow service needs to be exposed to outside
world as a web service
SSH
CML
CML
Since we require new WSDL interfaces for each
application it is a perfect opportunity to employ
a standard representation for chemical
structures. XML standard in Chemistry is CML
(Chemical Markup Language)
CML
35Key Achievement
We are now doing science that was not possible
before the advancements made within e-Science.
36(No Transcript)
37FER
- Ferrite
- 2 dimensional channel system
- simulation cell contains 115 atoms.
- substituting at 4 T sites with 4 Na cations
38100 Configurations
14eV
Again there are steps in Total Energy and again
this time no correlation with volume for the low
number of configurations.
Only 75 out of 100 configurations optimise
3910000 Configurations
15eV
However, this time when 10,000 structures are
considered there are no clear steps in the
volume. The volume still increases with
decreasing stability but this is due to cell
expansion caused by Al to Al interactions.
Only 7500 out of 10000 optimise
40Comparison of Regions
41Comparison of Regions
42MFI
- ZSM5
- 3 dimensional channel system
- simulation cell contains 292 atoms
- substituting at 4 sites with 4 Na cations
4310000 Configurations
10eV
There is a step in Total Energy but this time
only one and from then the trend is smooth.
44What Next
When confirmed the lowest energy positions of Al
the cation is exchanged for a proton and again
energy minimised. This method will allow us to
construct realistic models of low and medium
Si/Al zeolites. Such structures can be used for
further simulations and aid the interpretation of
experimental data.
45Solid Solutions
BaTiO3
46Solid Solutions
BaSrTiO3
47Solid Solutions
SrTiO3
48Ongoing and Future Work
- upload files as part of workflow to SRB
- generate metadata
- upload extracted data from files
- more extensive use of CML
49Key Achievement
We are now doing science that was not possible
before the advancements made within e-Science.
50(No Transcript)
51Achievements To Date
1. First use of CML schema for defining Web
Service port types. 2. Calculation of 50,000
configurations of zeolite Mordenite (24,000,000
particles) to gain insight into structure when a
realistic ratio of Al substitution is included in
model. 3. Successfully exposed Fortran codes as
OGSI Web Services - prototype application
deployed on 80 nodes. The prototype computational
polymorph application is being ported to a larger
production machine. 4. First use of BPEL standard
for orchestrating web services in a Grid
application. 5. Open Source BPEL implementation
in development enabling late binding and dynamic
deployment of large computational processes. 6.
Integration of OGSI and BPEL with Sun Grid
Engine. 7. Development of Graphic User Interface
for polymorph application - connects to
relational database via EJB interface. 8. Infrastr
ucture for metadata and data management 9. SRB
and dataportal are already being used to hold
datasets and being used for transferring the data
between different scientists and computer
applications. 10. Implementation of Condor pool
at Ri.
52Polymorph Prediction
- Different crystal structures of a molecule are
called polymorphs. - Polymorphs may have considerably different
properties - (e.g. bioavailability, solubility, morphology)
- Polymorph prediction is of great importance to
the pharmaceutical industry where the discovery
of a new polymorph during production or storage
of a drug may be disastrous
Drug molecules are often flexible and this makes
the polymorph prediction process more challenging
53Polymorph Prediction Workflow
For flexible molecules conformational
optimisation n feasible rigid molecular probes
representing energetically plausible conformers
MOLPAK Generation of 6000 densely packed
crystal structures using rigid molecular probe
n times
DMAREL Lattice energy optimisation
Morphology
n number of conformers
Data Unit cell volume, density, lattice energy
Restricted number of structures selected
crystal structures and properties stored in
Database
54Storage Resource Broker
Store data files from simulations in the Storage
Resource Broker
55Key Achievement
We are now doing science that was not possible
before the advancements made within e-Science.