Title: Large Scale Virtual Screening of Drug Design on the Grid Fighting against Avian Flu
1Large Scale Virtual Screening of Drug Design on
the GridFighting against Avian Flu
- Yun-Ta Wu and Hurng-Chun Lee
- ISGC 2006, Taiwan
2Credit
- Docking workflow preparation
- Contact point Y.T. Wu
- E. Rovida
- P. D'Ursi
- N. Jacq
- Grid resource management
- Contact point J. Salzemann
- TWGrid H.C. Lee, H. Y. Chen
- AuverGrid E. Medernach
- EGEE Y. Legré
- Platform deployment on the Grid
- Contact point H.C. Lee, J. Salzemann
- M. Reichstadt
- N. Jacq
- Users (deputy)
- J. Salzemann (N. Jacq)
- M. Reichstadt (E. Medernach)
- L. Y. Ho (H. C. Lee)
- I. Merelli, C. Arlandini (L. Milanesi)
- J. Montagnat (T. Glatard)
- R. Mollon (C. Blanchet)
- I. Blanque (D. Segrelles)
- D. Garcia
Grid operational supports from sites and
operation centers DIANE technical supports from
CERN-ARDA group
3Outline
- The avian flu
- EGEE biomed data challenge II
- Conclusion
4Influenza A pandemic
NA
HA
H1N1
H1N1
H2N2
H3N2
H1N1
2006
2005
Apr 21, 2006
113 deaths/204 cases
http//www.who.int/csr/disease/avian_influenza
5A closer look at bird flu
- The bird flu virus is named H5N1. H5 and N1
correspond to the name of proteins
(Hemagglutinins and Neuraminidases) on the virus
surface. - Neuraminidases play a major role in the virus
multiplication - Present drugs such as Tamiflu inhibit the action
of neuraminidases and stop the virus
proliferation - The N1 protein is known to evolve into variants
if it comes under drug stress - To free-up medicinal chemists time to better
response to instant and large scale threats, a
large scale in-silico screening was set for
initial investment of the design of new drug
6In-silico (virtual) screening of drug design
- Computer-based in-silico screening can help to
identify the most promising leads for biological
tests - systematic and productive
- reduces the cost of trail-and-error approach
- The requirement of CPU power and storage space
increases proportional to the number of compounds
and target receptors involved in the screening - massive virtual screening is time consuming
7The computing challenge of large scale in-silico
screening
- Molecular docking engine
- Autodock
- FlexX
- Problem size
- 8 predicted possible variants of Influenza A
neuraminidase N1 as targets - around 300 K compounds from ZINC database and a
chemical combinatorial library - Computing challenge (a rough measurement based on
Xeon 2.8 GHz) - Each docking requires 30 mins CPU time
- Required computing power in total 137 CPU years
- Storage requirement
- Each docking produces results with the size of
130 KByte - Required storage space in total 600 GByte (with
1 back-up) - To speed-up and reduce the cost to develop new
drugs, high-throughput screening is demanded - Thats the Grid can help !!
8EGEE Biomed DC II objectives
- Biological goal
- finding potential compounds that can inhibit the
activities of Influenza A neuraminidase N1
subtype variants - Biomedical goal
- accelerating the discovery of novel potent
inhibitors thru minimizing non-productive
trial-and-error approaches - Grid goal
- aspect of massive throughput reproducing a
grid-enabled in silico process (exercised in DC
I) with a shorter time of preparation - aspect of interactive feedback evaluating an
alternative light-weight grid application
framework (DIANE) in terms of stability,
scalability and efficiency
9EGEE Biomed DC II grid resources
- AuverGrid
- BioinfoGrid
- EGEE-II
- Embrace
- TWGrid
- a world-wide infrastructure providing over than
5,000 CPUs
10EGEE biomed DC II current status
- The first DC job was submitted at 10 Apr, 2006
- It is scheduled to be finished in the mid of May
- As of today, we have completed 1500K dockings
- 60 of the whole challenge (i.e. 82 CPU years)
- Grid efficiency 80
11EGEE Biomed DCII the Grid tools
- WISDOM
- has succeeded to handle the first EGEE biomed DC
- a workflow of Grid job handling automated job
submission, status check and report, error
recovery - push model job scheduling
- batch mode job handling
- DIANE
- a framework for applications with master-worker
model - pull mode job scheduling
- interactive mode job handling with flexible
failure recovery feature - we will focus on this framework in the following
discussions
12The WISDOM workflow in DC2
- Developed for 1st data challenge fighting against
Malaria - 40 millian dockings (80 CPU years) were done in 6
weeks - 1700 CPUs in 15 countries were used
simultaneously - Reproducing a grid-enabled in silico processing
with a shorter time of preparation (lt 1 month
preparation time has been achieved) - Testing new submission strategy to improve the
Grid efficiency
Use AutoDock in DC2
http//wisdom.eu-egee.fr
13The DIANE framework
- DIANE Distributed Analysis Environment
- A lightweight framework for parallel scientific
applications in master-worker model - ideal for applications without communications
between parallel tasks (e.g. for most of the
Bioinformatics applications in analyzing huge
amount of independent dataset) - The framework takes care of all synchronization,
communication and workflow management details on
behalf of application
http//cern.ch/diane
14The DIANE Autodock adapter for DC2
15The DIANE exercise in DC2
- Taking care of the dockings of 1 variant
- the mission is to complete 300 K dockings
- Taking a small subset of the resources
- the mission is to handle several hundred
concurrent DIANE workers by one DIANE master for
a long period - Testing the stability of the framework
- Evaluating the deployment efforts and the
usability of the framework - Demonstrating efficient computing resource
integration and usage
16Statistics of one of the DIANE runs
- Submitted Grid jobs 300
- Healthy jobs 261 (87)
- Total number of dockings 40210
- Total CPU time 55684848 sec (1.76 year)
- Job duration 249746 sec (2.9 days)
9.24 CPU years ? 250 CPUs x two week
17Development and deployment efforts of DIANE
- Development efforts
- The Autodock adapter for DC2 is around 500 lines
of python codes
- Deployment efforts
- The DIANE framework and Autodock adaptor are
installed on-the-fly on the Grid nodes - Targets and compound databases can be prepared on
the UI or pre-stored on the Grid storages - Output are returned to the UI interactively
18Intuitive user interface of DIANE
- Start the DIANE job and allocate 64 workers from
LCG and local cluster - Allocate more workers from LCG if resources are
available
diane.startjob job autodock.job ganga w
32_at_lcg,32_at_pbs
diane.ganga.submitworkers job autodock.job
nw100 bklcg
- -- python --
- Application 'Autodock'
- JobInitData 'macro_repos' file///home/hclee/
diane_demo/autodock/macro', - 'ligand_repos'file///home/hclee/
diane_demo/autodock/ligand', - 'ligand_list''/home/hclee/diane_de
mo/biomed_dc2/ligand/ligands.list', - 'dpf_parafile''/home/hclee/diane_d
emo/biomed_dc2/parameters/dpf3gen.awk', - 'output_prefix''autodock_test'
-
- The input files will be staged in to workers
- InputFiles JobInitDatadpf_parafile
19The profile of DIANE job
good load balance
A simple test on local cluster
a DIANE/Autodock Task 1 docking
20The profile of realistic DIANE job
- Each horizontal line segment one task one
docking - Unhealthy workers are removed from the worker
list - Failed tasks are rescheduled to healthy workers
21Efficiency and throughput of DIANE
- 280 DIANE worker agents were submitted as LCG
jobs - 200 jobs (71) were healthy
- 16 failures related to middleware errors
- 12 failures related to application errors
22Logging and bookkeeping Thanks to GANGA
In 1 print jobsDIANE_6
Statistics 325 jobs slice("DIANE_6") -----------
--- id status name subjobs
application backend
backend.actualCE 1610 running
DIANE_6 Executable
LCG melon.ngpp.ngp.org.sg2119/jobmanager-lcgpbs-
1611 running DIANE_6
Executable LCG node001.grid.auth.gr
2119/jobmanager-lcgpbs-b 1612 running
DIANE_6 Executable
LCG polgrid1.in2p3.fr2119/jobmanager-lcgpbs-biom
1613 failed DIANE_6
Executable LCG polgrid1.in2p3.fr21
19/jobmanager-lcgpbs-sdj 1614 submitted
DIANE_6 Executable
LCG ce01.ariagni.hellasgrid.gr2119/jobmanager-pb
1615 running DIANE_6
Executable LCG
ce01.pic.es2119/jobmanager-lcgpbs-biomed 1616
running DIANE_6 Executable
LCG ce01.tier2.hep.manchester.ac.uk
2119/jobmanag 1617 running DIANE_6
Executable LCG
clrlcgce03.in2p3.fr2119/jobmanager-lcgpbs-bi
- Helpful for tracing the execution progress and
Grid job errors - Fairly easy to visualize the job statistics
23- The in-silico screening provides not only the
docking poses of a compound against the target
but also the docking energy - By ranking the information, chemist can select
the promising compounds to go on the
structure-based drug design for potential drugs
24Conclusion
- From biological point of view
- We managed to shorten the molecular docking
process of structure-based drug design from 137
year to 4 weeks - A large set of complexes has been produced on the
Grid for further analysis - From Grid point of view
- The DC has demonstrated that large-scale
scientific challenge can be effortlessly tackled
on the Grid - The WISDOM system has successfully reproduced the
massive throughput of in-silico screening with
minimized deployment effort - The DIANE framework which can take control of
Grid failures and isolate Grid system latency
does benefit the Grid application in terms of
efficiency, stability and usability - Moving toward a service
- Stability and reliability of Grid has been tested
through the DC activity and the result encourages
the movement from prototype to real service - Friendly graphic user interfaces for up-coming
analysis among the large set of outputs is needed
25Thank you for your attention!!