Title: Drug Discovery Grid -- A real grid application
1Drug Discovery Grid-- A real grid application
Zhang Wenju, Shen Jianhua Shanghai Institute of
Materia Medica, CAS Shanghai Jiaotong University
Jiangnan Institute of Computing The University
of Hong Kong
- DDGrid Introduction
- DDGrid Architecture
- DDGrid Application
- DDGrid Demo
Large-scale High-throughput Virtual Screening
- in Silico
- The computational analysis of chemical databases
to identify compounds appropriate for a given
biological receptor - in Vitro
- Identification of new compounds showing some
activity against a target biological receptor,
and the progressive optimization of these leads
to yield a compound with improved potency and
physicochemical properties in vitro - in Vivo
- eventually, improved efficacy, pharmacokinetics,
and toxicological profiles in vivo.
4Process of Drug Discovery and Design
Leads and Opt.
2-3 years
2-3 years
Random Screening 10, 000 20, 000 Compounds
Drug Candidate
Computer-Aid Drug Design
2-3 years
Clinic (phase I, II, III)
3-4 years
- Time 10-12 years
- Money several billion dollars
5DDGrid overview
? Drug Discovery Grid project aims to build a
collaboration platform for drug discovery using
the state-of-the-art grid computing technology.
? This project intends to solve large-scale
computation and data intensive scientific
applications in the fields of medicine chemistry
and molecular biology with the help of grid
middleware developed by our team. ? Over one
million compounds database with 3-D structure and
physicochemical properties are also provided to
identify potential drug candidates. Users also
can build and maintain their own customized
ligand database to share in this grid platform.
6DDGrid Architecture
7DDGrid Architecture
8DDGrid Architecture
9DDGrid Architecture
10DDGrid Workflow
Job Submit
ID and Result Return
Global Server (Monitoring, Work Pool, Resource
Manag., Assimilate of Result)
Return of Result, New job request
Job Dispatch
Slave Server (Local Resource Manag., Monitoring,
Local Work Pool, Assimilate of Result)
Return of Result, New job request
Job Dispatch
Computational Client (Docking)
11DDGrid security
1. PKI-based security 2. All the sites involved
should hold a certification issued by our CA 3.
All the databases deployed and results are
encrypted 4. All the message passing are
12DDGrid Web Portal
13Test Case 1
- Virtual Screening from 20,000 compounds
- Involved Sites
- Shanghai Inst. of M. M. (SIMM) Alpha
Cluster (32CPU) - Beijing Mol. Ltd. Sunway Cluster (224CPU)
- The Univ. of Hong Kong Gideon Cluster (16CPU)
- Shanghai SuperComp. Centre Dawning 4000A
- Dalian Univ. of Tech. Dawning 4000A
- London e-Science Centre Mars Cluster
- Time consumed
- 5946 sec(appr. 99 min)
- Data Sets (CDB)
- Specs
14Job scheduling
15Visualisation of Docking Result
16DDGrid message passing
ltscheduler_requestgt ltauthenticatorgt3333lt/aut
henticatorgt lthostidgt102lt/hostidgt
or_versiongt ltcore_client_minor_versiongt19lt/c
ltidle_ncpugt16lt/idle_ncpugt ltproject_disk_usage
egt ltcode_sign_keygt lt/code_sign_keygt
ltprojectsgt ltprojectgt
rlgt ltresource_sharegt100.000000lt/re
source_sharegt lt/projectgt
lt/projectsgt ltresultgt lt/resultgt
lthost_infogt lt/host_infogt lt/scheduler_requestgt
17DDGrid message passing
ltscheduler_replygt ltmessage priority"low"gtNo
work availablelt/messagegt ltproject_namegtDdglt/p
roject_namegt ltuser_namegtssslt/user_namegt
ltcode_sign_keygt lt/code_sign_keygt
ltworkunitgt lt/workunitgt
ltpreferencesgt ltlow_water_daysgt1.2lt/l
ow_water_daysgt lthigh_water_daysgt2.5lt
bgt lt/preferencesgt
18DDGrid message passing
ltworkunitgt ltfile_infogt ltnumbergt0lt/numbergt
lt/file_infogt ltfile_infogt
ltnumbergt1lt/numbergt lt/file_infogt
ltfile_infogt ltnumbergt2lt/numbergt
lt/file_infogt ltfile_refgt
ltopen_namegttabfilelt/open_namegt lt/file_refgt
ltfile_refgt ltfile_numbergt1lt/file_numbergt
lt/file_refgt ltfile_refgt
ltopen_namegtsphfilelt/open_namegt lt/file_refgt
ltcommand_linegt-businesslt/command_linegt lt/workunit
19DDGrid message passing
ltprojectgt ltscheduler_urlgthttp//www.ddgrid.a
rlgt ltproject_namegtDdglt/project_namegt lt/project
gt ltappgt ltnamegtgridapplt/namegt lt/appgt ltfile_info
gt ltnamegtgridapp/gridapp_2.19_i686-pc-linux-gnu
lt/namegt ltnbytesgt260754.000000lt/nbytesgt
ltexecutable/gt ltsignature_required/gt
ltfile_signaturegt lt/file_signaturegt
2.19_i686-pc-linux-gnult/urlgt lt/file_infogt ltfile_in
fogt lt/file_infogt
20DDGrid Resources
Computational and Data Resources
Integration Resources aggregated SIMM Sunway
32A Cluster Beijing Molecule Inc. Sunway 256P
Cluster HKU Gideon 300 Cluster SSC Dawning
4000A LeSC Mars Cluster (Test only) Singapore
Poly-tech Univ. Dalian Univ. of
Technology Shanghai Jiaotong Univ. Heterogeneous
resources OS IRIX, Digital Unix, Linux(IA32,
x86_64) CPUR12000, Alpha, Pentium, AMD
21DDGrid Resources
- DDGrid Apps.
- Docking pre-process software
- Combimark
- 2. Docking software
- 1) Dock UCSF
- 2) gsDock SIMM
- 3. CDB build and maintain S/W
- Combilib
- 4. AutoDock
- 5. AutoGrid
- 6. Visualisation
- 7. Security-related tools
22DDGrid Resources
Chemical Databases (CDB) Each ligand record
in a chemical database represents the 3D
structural information of a compound. The numbers
of compounds in each CDB can be in the order of
tens of thousands and the database size be
anywhere from tens of megabytes to gigabytes and
even terabytes. 1. static databases purchased
from commercial chemical company. Available
Chemical Directory (ACD) Chinese natural
product database (CNPD) SPECS
database chemical ADME/T database, etc. 2.
dynamic databases made by user own, and deployed
23Deployed commercial CDB (appr.700,000)
Name of Database Description
Specs     Provides about 230,000 compounds
CMC-3D Provides 3D models and important biochemical properties (including drug class, logP, and pKa values) for over 8,400 pharmaceutical compounds.
ACD-3D Provides 200,000 3D compounds commercial available
NCI-3D 213,000compounds with 2D information from the National Cancer Institute
CNPD Collected 12,000 Chinese natural products with chemical structure
TCMD With 9127 compounds and 3922 herbs
24appr. 3,300,000 compounds
Vendor Num. of Mol. Vendor Num. of Mol.
ACB-Eurochem 98603 Maybridge 53042
Ambinter 533866 Nanosyn 68317
Asinex 293385 National Cancer Institute 223536
ChemBridge 562624 Otava 181195
ChemDiv 361859 Peakdale 9632
ComGenex 38590 Pharmeks 116355
Enamine 533111 PubChem 164031
IBScreen 452728 Ryan Scientific 64205
InterChim 288882 Sigma-Aldrich 49022
KeyOrganics 22294 Specs 307550
Life Chemicals 44762 TimTec 127173
25CDB exampleCNPD-China Natural Products Database
26CDB exampleCNPD
CNPD The first and only comprehensive source of
chemical, structural and bibliographic data on
all known natural products in China. CNPD serves
as information sources for chemical, physical and
biological properties, literature, they are
useful to scientists within the pharmaceutical
industry. CNPD can be searched in flexible ways
structure, sub-structure, name, molecular
formula, molecular weight, CAS register number,
category, etc. CNPD Traditional Chinese
Medicine (TCM) applications are pre-indexed in
CNPD to provide hints for lead compounds
27CDB exampleCNPD
28CDB exampleTCMD
TCMD-Traditional Chinese Medicine Database
TCMD is a bibliographical database of
approximately 20,000 records with abstracts of
TCM articles. Relevant articles are selected from
among 150-200 journals from Mainland China,
Taiwan, and Hong Kong (most of them are Chinese)
English abstracts are written for the selected
articles and other pertinent information is
translated into English.
29CDB exampleTCMD
30DDGrid applications in reality
- SIMM carried out anti-SARS and anti-diabetes
drug research using the DDGrid - Anti-SARS drug research
- Anti-diabetes drug research
31Research on Anti-SARS medicine
Virtual screening from Comprehensive Medicinal
Chemistry-3D (CMC-3D) database which contains
7,900 compounds, found that cinanserin have
distinct anti-SARS effect Department of
Virology, Bernhard-Nocht-Institute for Tropical
Medicine, Germany Research Department, Cantonal
Hospital St Gallen, Switzerland Basically your
inhibitor turned out to be the best compound we
have tested so far! Have applied for domestic
patent 03129071.x and PCT patent pi034248
32Research on anti-diabetes medicine
Found an anti-diabetes lead better than
Rosiglitazone. by targeting on PPAR,through
virtual screening, optimization design and
synthesis and biology and pharmacology testing
CADD process
33Research on anti-diabetes medicine
2.4 m
400 t
10 t
composite design
virtual screening
virtual screening
manually screening
48 KDlt1 mM 22 KDlt0.1mM
protein testing
protein testing
cell testing
animal testing
comprehensive evaluation
34New anti-diabetes drug
Current Progress 1. Applied for patent
200410016460.X,and PCT patent 2. Security
testing and pre-clinic research
35What does the DDGrid provide?
1? Drug Design Collaboration Platform Large-scale
Virtual Screening platform sharing large
CDB 2?Computational Resources Sharing SIMM/SSC/HK
U/Mol. Ltd/SJTU/DUT 3?Data Resources
Sharing pre-deployed commercial CDB (ACD/CNPD
) sharing self-made CDB 4?Medicinal chemistry
text and structure search 5?Customization and
Selected Users of DDGrid
37DDGrid Demo
43Thank you!