Title: Building a Massive Virtual Screening using Grid Infrastructure
1Building a Massive Virtual Screening using Grid
Infrastructure
Putchong Uthayopas High Performance Computing and
Networking Center, Kasetsart University
- Chak Sangma
- Centre for Cheminformatics
- Kasetsart University
2Motivation
- Thailands Medicinal Plants is important for Thai
society - Over 1,000 species
- Over 200,000 compounds
- Multiple disease targets
- Problem
- No complete collection of compounds database
- The practice is still mostly rely on local
knowledge and conventional wisdom - Lack of systematic verifications by scientific
methods -
SIATIC PENNYWORT
Bariena lunulina Linae
3Kasetsart University Thai Medicinal Plants Effort
- Led by Center for Cheminformatics, Kasetsart
University (Dr. Chak Sangma) - Goal
- Establish Thai medicinal plant knowledgebase by
building 3D molecular database - Employ Virtual Screening to verify active
compounds with conventional knowledge
4Reports and Literatures
2D Structures
Approximated 3D Structures
Compute Intensive!
Optimized 3D Structures with GAMESS
Calculated Binding Energy with Autodock 3.0
Structure in 0.5 Ã… from Binding Site
SOM Neural Network Map
Results
5ThaiGrid Drug Design Portal
- Partners
- High Performance Computing and networking Center,
KU - Center for Cheminfomatics, KU
- IBM Thailand
- Goal
- Building a virtual screening infrastructure on
ThaiGrid System - Start from KU campus Grid and extended to other
ThaiGrid partner universities later - Link
- http//tgcc.cpe.ku.ac.th
- http//www.thaigrid.net
6Challenge
- Recent project for National Center for Genetic
Engineering and Biotechnology, Thailand - Screen 3000 compounds in 3 months
- Computation time on 2.4 GHz Pentium IV 4 system
- Over 30 mins/1 optimized structure
- Over 30 mins/1 docking
- Estimate computing time on single processor
- (3,000 x 30) (3,000 x 30)
- 3,000 Hours
- 125 Days
- 4 month 16 days
- Not fast enough!
7Key Technologies
- Three key technologies must be combined to
provide the solution - Cluster Computing
- Grid Computing
- Portal Technology
8What we want to do? Hide the complexity of Grid
and computational chemistry software from
scientists while providing massive computational
power needed
9Infrastructure
- ThaiGrid infrastructure are used
- 10 Clusters from 6 organizations
- AMATA KU
- GASS KU
- MAEKA KU
- WARINE KU
- CAMETA SUT
- OPTIMA - AIT
- ENQUEUE KMUTNB
- PALM KMUTNB
- SPIRIT CU
- INCA - KMUTT
- 158 CPUs on 110 nodes
10Software Architecture
- Each cluster has local scheduler
- SGE, OpenPBS, Condor can be used
- We use our SQMS scheduler
- Globus2.4 is used as middleware
- Resources control and security (GSI)
- Grid level scheduler control multi-cluster job
submission - Use KU own SQMS/G
11The Portal
- Roles
- User interface
- Automate execution flow
- File access and management
- Features
- Create project
- Add ligand, enzyme
- Submit screening job, monitor job status
- Download output
- Current portal is built using Plone
- http//www.plone.org/
- Python based web content management
- Flexible and extensible
12How things work!
Task
Task
Resource Broker (SQMS/G)
Portal
Grid Middleware Globus2.4
Task
Task
Task
Monitor
Compute Resource
Compute Resource
Compute Resource
Compute Resource
Compute Resource
KU Campus network
13Results
XK-263
- The first version of compound databases (around
3,000 compounds) - 3,000 compounds screened ( found 30 high
potential compounds) - 4 drug targets (Influenza, HIV-RT, HIV-PR, HIV-IN)
14Experiences
- Some files such as enzyme structure and output
are very large. - Require a good bandwidth between sites
- Some simple optimizing techniques can help
- Implements caching of enzyme structure file at
target hosts. Substantially reduce the number of
transfer needed - Batch scheduling approach is good if the systems
are very homogenous - Allow dynamic execution code staging to the
target host without installation/recompilation - Many script tools must be developed to
- Streamline the execution
- Handling data and code staging
- Cleanup the execution
15Next Generation Massive Screening on Grid
- Move to Service Oriented Grid
- Use Grid and Web services to encapsulate key
applications - Build broker and service discovery infrastructure
- Rely heavily on OGSA and GT3.X, 4.X
- Portlet based portal
- JSR 168 Portlet Specification compliance
- More modular , customizable, flexible
- Plan to adopt GridShpere from gridlab
(www.gridlab.org) - Use database as backend instead of files
- OGSA DAI might be used for data access
16Progress
- We are working on
- New portal using GridSphere technology (done,
testing) - Service wrapper for lagacy code
- Gamess, autodock (done, testing)
- MMJFS interface ( progress)
- OGSA DAI integration (progress)
- Service Registration and Discovery (partial)
- Broker System ( design)
- New Monitoring (done)
- Schedule
- Finish and testing Jan-Feb 2005
- Deploy in March 2005
17File Server
Molecular DB
Grid Ftp
Gamess
Scheduler
OGSA DAI
MMJFS
Gamess Service
Portlet
Gamess
Portal
Registration Server
Broker Server
Backend DB
18Design Choices
- Mass Data Transportation across site
- Central ftp server is used to store data/database
- Each compute node can pull required data from
this ftp - Adhoc ftp , wget/http (firewall friendly)
- Next Grid ftp
- Cluster/ Single server
- Gridify using service wrapper to expose grid
service of that lagacy application to the grid - Not working for cluster since compute node are
hidden behind head node - Back to MMJFS interface that talk to local
shceduler
19Design Choices
- Service Discovery Mechanism
- Publish/subscribe model
- Service advertising interface/protocol
- Backend data based that shared between
registration service component and broker
component - Adoption of Grid Notification service and model
- Available from mygrid project, seems to be useful
for more dynamics environment - Scalability.
Broker Service
Registration Service
Discovery (SQL)
20Job Submission
Job Status
Result visualization
21Performance Record
System Status
Job Queue Monitoring
22Service Discovery
23Conclusion
- Grid and cluster computing is a key technology
that can give us the power. Grid works if use
wisely! - Challenges
- Grid standard is still rapidly evolving
- Things change before you can finish!
- Difficult to configure, maintain, Some part is
still unstable - Firewall and security concern
- Lack of manpower with expertise
- Opportunity
- Secure infrastructure
- Cost reduction by the integration of networked
resources on demand
24Acknowledgement
- HPCNC Team
- Somsak Sriprayoonsakul
- Nuttaphon Thangkittisuwan
- Thanakit Petchprasan
- Isiriya Paireepairit
25The End
26Backup
27Process
GRID
3D Structure
2D Structure
GAMESS
GAMESS
GAMESS
GAMESS
GAMESS
Molecular Structure Database
Optimized 3D Structure
Autodock
Autodock
Autodock
Enzyme Grid
Enzyme
Autodock
SOM Neural Network Analysis
Results
28Workflow Engine
Grid Portal
Portlet
Portlet
Portlet
Portlet
Grid Middleware (OGSA )
OGSA DAI
Optimizing Services
Docking Services
Broker Services
Molecule Database
Resources ( Computer, Network)
Monitoring Services