Large Scale Virtual Screening of Drug Design on the Grid Fighting against Avian Flu - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Large Scale Virtual Screening of Drug Design on the Grid Fighting against Avian Flu

Description:

Large Scale Virtual Screening of Drug Design on the Grid. Fighting against Avian Flu ... The avian flu. EGEE biomed data challenge II. Conclusion. Influenza A ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 26
Provided by: wisdom9
Category:

less

Transcript and Presenter's Notes

Title: Large Scale Virtual Screening of Drug Design on the Grid Fighting against Avian Flu


1
Large Scale Virtual Screening of Drug Design on
the GridFighting against Avian Flu
  • Yun-Ta Wu and Hurng-Chun Lee
  • ISGC 2006, Taiwan

2
Credit
  • Docking workflow preparation
  • Contact point Y.T. Wu
  • E. Rovida
  • P. D'Ursi
  • N. Jacq
  • Grid resource management
  • Contact point J. Salzemann
  • TWGrid H.C. Lee, H. Y. Chen
  • AuverGrid E. Medernach
  • EGEE Y. Legré
  • Platform deployment on the Grid
  • Contact point H.C. Lee, J. Salzemann
  • M. Reichstadt
  • N. Jacq
  • Users (deputy)
  • J. Salzemann (N. Jacq)
  • M. Reichstadt (E. Medernach)
  • L. Y. Ho (H. C. Lee)
  • I. Merelli, C. Arlandini (L. Milanesi)
  • J. Montagnat (T. Glatard)
  • R. Mollon (C. Blanchet)
  • I. Blanque (D. Segrelles)
  • D. Garcia

Grid operational supports from sites and
operation centers DIANE technical supports from
CERN-ARDA group
3
Outline
  • The avian flu
  • EGEE biomed data challenge II
  • Conclusion

4
Influenza A pandemic
NA
HA
H1N1
H1N1
H2N2
H3N2
H1N1
2006
2005
Apr 21, 2006
113 deaths/204 cases
http//www.who.int/csr/disease/avian_influenza
5
A closer look at bird flu
  • The bird flu virus is named H5N1. H5 and N1
    correspond to the name of proteins
    (Hemagglutinins and Neuraminidases) on the virus
    surface.
  • Neuraminidases play a major role in the virus
    multiplication
  • Present drugs such as Tamiflu inhibit the action
    of neuraminidases and stop the virus
    proliferation
  • The N1 protein is known to evolve into variants
    if it comes under drug stress
  • To free-up medicinal chemists time to better
    response to instant and large scale threats, a
    large scale in-silico screening was set for
    initial investment of the design of new drug

6
In-silico (virtual) screening of drug design
  • Computer-based in-silico screening can help to
    identify the most promising leads for biological
    tests
  • systematic and productive
  • reduces the cost of trail-and-error approach
  • The requirement of CPU power and storage space
    increases proportional to the number of compounds
    and target receptors involved in the screening
  • massive virtual screening is time consuming

7
The computing challenge of large scale in-silico
screening
  • Molecular docking engine
  • Autodock
  • FlexX
  • Problem size
  • 8 predicted possible variants of Influenza A
    neuraminidase N1 as targets
  • around 300 K compounds from ZINC database and a
    chemical combinatorial library
  • Computing challenge (a rough measurement based on
    Xeon 2.8 GHz)
  • Each docking requires 30 mins CPU time
  • Required computing power in total 137 CPU years
  • Storage requirement
  • Each docking produces results with the size of
    130 KByte
  • Required storage space in total 600 GByte (with
    1 back-up)
  • To speed-up and reduce the cost to develop new
    drugs, high-throughput screening is demanded
  • Thats the Grid can help !!

8
EGEE Biomed DC II objectives
  • Biological goal
  • finding potential compounds that can inhibit the
    activities of Influenza A neuraminidase N1
    subtype variants
  • Biomedical goal
  • accelerating the discovery of novel potent
    inhibitors thru minimizing non-productive
    trial-and-error approaches
  • Grid goal
  • aspect of massive throughput reproducing a
    grid-enabled in silico process (exercised in DC
    I) with a shorter time of preparation
  • aspect of interactive feedback evaluating an
    alternative light-weight grid application
    framework (DIANE) in terms of stability,
    scalability and efficiency

9
EGEE Biomed DC II grid resources
  • AuverGrid
  • BioinfoGrid
  • EGEE-II
  • Embrace
  • TWGrid
  • a world-wide infrastructure providing over than
    5,000 CPUs

10
EGEE biomed DC II current status
  • The first DC job was submitted at 10 Apr, 2006
  • It is scheduled to be finished in the mid of May
  • As of today, we have completed 1500K dockings
  • 60 of the whole challenge (i.e. 82 CPU years)
  • Grid efficiency 80

11
EGEE Biomed DCII the Grid tools
  • WISDOM
  • has succeeded to handle the first EGEE biomed DC
  • a workflow of Grid job handling automated job
    submission, status check and report, error
    recovery
  • push model job scheduling
  • batch mode job handling
  • DIANE
  • a framework for applications with master-worker
    model
  • pull mode job scheduling
  • interactive mode job handling with flexible
    failure recovery feature
  • we will focus on this framework in the following
    discussions

12
The WISDOM workflow in DC2
  • Developed for 1st data challenge fighting against
    Malaria
  • 40 millian dockings (80 CPU years) were done in 6
    weeks
  • 1700 CPUs in 15 countries were used
    simultaneously
  • Reproducing a grid-enabled in silico processing
    with a shorter time of preparation (lt 1 month
    preparation time has been achieved)
  • Testing new submission strategy to improve the
    Grid efficiency

Use AutoDock in DC2
http//wisdom.eu-egee.fr
13
The DIANE framework
  • DIANE Distributed Analysis Environment
  • A lightweight framework for parallel scientific
    applications in master-worker model
  • ideal for applications without communications
    between parallel tasks (e.g. for most of the
    Bioinformatics applications in analyzing huge
    amount of independent dataset)
  • The framework takes care of all synchronization,
    communication and workflow management details on
    behalf of application

http//cern.ch/diane
14
The DIANE Autodock adapter for DC2
15
The DIANE exercise in DC2
  • Taking care of the dockings of 1 variant
  • the mission is to complete 300 K dockings
  • Taking a small subset of the resources
  • the mission is to handle several hundred
    concurrent DIANE workers by one DIANE master for
    a long period
  • Testing the stability of the framework
  • Evaluating the deployment efforts and the
    usability of the framework
  • Demonstrating efficient computing resource
    integration and usage

16
Statistics of one of the DIANE runs
  • Submitted Grid jobs 300
  • Healthy jobs 261 (87)
  • Total number of dockings 40210
  • Total CPU time 55684848 sec (1.76 year)
  • Job duration 249746 sec (2.9 days)

9.24 CPU years ? 250 CPUs x two week
17
Development and deployment efforts of DIANE
  • Development efforts
  • The Autodock adapter for DC2 is around 500 lines
    of python codes
  • Deployment efforts
  • The DIANE framework and Autodock adaptor are
    installed on-the-fly on the Grid nodes
  • Targets and compound databases can be prepared on
    the UI or pre-stored on the Grid storages
  • Output are returned to the UI interactively

18
Intuitive user interface of DIANE
  • Start the DIANE job and allocate 64 workers from
    LCG and local cluster
  • Allocate more workers from LCG if resources are
    available

diane.startjob job autodock.job ganga w
32_at_lcg,32_at_pbs
diane.ganga.submitworkers job autodock.job
nw100 bklcg
  • -- python --
  • Application 'Autodock'
  • JobInitData 'macro_repos' file///home/hclee/
    diane_demo/autodock/macro',
  • 'ligand_repos'file///home/hclee/
    diane_demo/autodock/ligand',
  • 'ligand_list''/home/hclee/diane_de
    mo/biomed_dc2/ligand/ligands.list',
  • 'dpf_parafile''/home/hclee/diane_d
    emo/biomed_dc2/parameters/dpf3gen.awk',
  • 'output_prefix''autodock_test'
  • The input files will be staged in to workers
  • InputFiles JobInitDatadpf_parafile

19
The profile of DIANE job
good load balance
A simple test on local cluster
a DIANE/Autodock Task 1 docking
20
The profile of realistic DIANE job
  • Each horizontal line segment one task one
    docking
  • Unhealthy workers are removed from the worker
    list
  • Failed tasks are rescheduled to healthy workers

21
Efficiency and throughput of DIANE
  • 280 DIANE worker agents were submitted as LCG
    jobs
  • 200 jobs (71) were healthy
  • 16 failures related to middleware errors
  • 12 failures related to application errors

22
Logging and bookkeeping Thanks to GANGA
In 1 print jobsDIANE_6
Statistics 325 jobs slice("DIANE_6") -----------
--- id status name subjobs
application backend
backend.actualCE 1610 running
DIANE_6 Executable
LCG melon.ngpp.ngp.org.sg2119/jobmanager-lcgpbs-
1611 running DIANE_6
Executable LCG node001.grid.auth.gr
2119/jobmanager-lcgpbs-b 1612 running
DIANE_6 Executable
LCG polgrid1.in2p3.fr2119/jobmanager-lcgpbs-biom
1613 failed DIANE_6
Executable LCG polgrid1.in2p3.fr21
19/jobmanager-lcgpbs-sdj 1614 submitted
DIANE_6 Executable
LCG ce01.ariagni.hellasgrid.gr2119/jobmanager-pb
1615 running DIANE_6
Executable LCG
ce01.pic.es2119/jobmanager-lcgpbs-biomed 1616
running DIANE_6 Executable
LCG ce01.tier2.hep.manchester.ac.uk
2119/jobmanag 1617 running DIANE_6
Executable LCG
clrlcgce03.in2p3.fr2119/jobmanager-lcgpbs-bi
  • Helpful for tracing the execution progress and
    Grid job errors
  • Fairly easy to visualize the job statistics

23
  • The in-silico screening provides not only the
    docking poses of a compound against the target
    but also the docking energy
  • By ranking the information, chemist can select
    the promising compounds to go on the
    structure-based drug design for potential drugs

24
Conclusion
  • From biological point of view
  • We managed to shorten the molecular docking
    process of structure-based drug design from 137
    year to 4 weeks
  • A large set of complexes has been produced on the
    Grid for further analysis
  • From Grid point of view
  • The DC has demonstrated that large-scale
    scientific challenge can be effortlessly tackled
    on the Grid
  • The WISDOM system has successfully reproduced the
    massive throughput of in-silico screening with
    minimized deployment effort
  • The DIANE framework which can take control of
    Grid failures and isolate Grid system latency
    does benefit the Grid application in terms of
    efficiency, stability and usability
  • Moving toward a service
  • Stability and reliability of Grid has been tested
    through the DC activity and the result encourages
    the movement from prototype to real service
  • Friendly graphic user interfaces for up-coming
    analysis among the large set of outputs is needed

25
Thank you for your attention!!
Write a Comment
User Comments (0)
About PowerShow.com