Building a Massive Virtual Screening using Grid Infrastructure - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Building a Massive Virtual Screening using Grid Infrastructure

Description:

Thailand's Medicinal Plants is important for Thai society. Over 1,000 species ... INCA - KMUTT. 158 CPUs on 110 nodes. Kasetsart University. Software Architecture ... – PowerPoint PPT presentation

Number of Views:60

Avg rating:3.0/5.0

Slides: 29

Provided by: Nee

Category:

more less

Transcript and Presenter's Notes

Title: Building a Massive Virtual Screening using Grid Infrastructure

1
Building a Massive Virtual Screening using Grid
Infrastructure
Putchong Uthayopas High Performance Computing and
Networking Center, Kasetsart University

Chak Sangma
Centre for Cheminformatics
Kasetsart University

2
Motivation

Thailands Medicinal Plants is important for Thai
society
Over 1,000 species
Over 200,000 compounds
Multiple disease targets
Problem
No complete collection of compounds database
The practice is still mostly rely on local
knowledge and conventional wisdom
Lack of systematic verifications by scientific
methods

SIATIC PENNYWORT
Bariena lunulina Linae
3
Kasetsart University Thai Medicinal Plants Effort

Led by Center for Cheminformatics, Kasetsart
University (Dr. Chak Sangma)
Goal
Establish Thai medicinal plant knowledgebase by
building 3D molecular database
Employ Virtual Screening to verify active
compounds with conventional knowledge

4
Reports and Literatures
2D Structures
Approximated 3D Structures
Compute Intensive!
Optimized 3D Structures with GAMESS
Calculated Binding Energy with Autodock 3.0
Structure in 0.5 Å from Binding Site
SOM Neural Network Map
Results
5
ThaiGrid Drug Design Portal

Partners
High Performance Computing and networking Center,
KU
Center for Cheminfomatics, KU
IBM Thailand
Goal
Building a virtual screening infrastructure on
ThaiGrid System
Start from KU campus Grid and extended to other
ThaiGrid partner universities later
Link
http//tgcc.cpe.ku.ac.th
http//www.thaigrid.net

6
Challenge

Recent project for National Center for Genetic
Engineering and Biotechnology, Thailand
Screen 3000 compounds in 3 months
Computation time on 2.4 GHz Pentium IV 4 system
Over 30 mins/1 optimized structure
Over 30 mins/1 docking
Estimate computing time on single processor
(3,000 x 30) (3,000 x 30)
3,000 Hours
125 Days
4 month 16 days
Not fast enough!

7
Key Technologies

Three key technologies must be combined to
provide the solution
Cluster Computing
Grid Computing
Portal Technology

8
What we want to do? Hide the complexity of Grid
and computational chemistry software from
scientists while providing massive computational
power needed
9
Infrastructure

ThaiGrid infrastructure are used
10 Clusters from 6 organizations
AMATA KU
GASS KU
MAEKA KU
WARINE KU
CAMETA SUT
OPTIMA - AIT
ENQUEUE KMUTNB
PALM KMUTNB
SPIRIT CU
INCA - KMUTT
158 CPUs on 110 nodes

10
Software Architecture

Each cluster has local scheduler
SGE, OpenPBS, Condor can be used
We use our SQMS scheduler
Globus2.4 is used as middleware
Resources control and security (GSI)
Grid level scheduler control multi-cluster job
submission
Use KU own SQMS/G

11
The Portal

Roles
User interface
Automate execution flow
File access and management
Features
Create project
Add ligand, enzyme
Submit screening job, monitor job status
Download output
Current portal is built using Plone
http//www.plone.org/
Python based web content management
Flexible and extensible

12
How things work!
Task
Task
Resource Broker (SQMS/G)
Portal
Grid Middleware Globus2.4
Task
Task
Task
Monitor
Compute Resource
Compute Resource
Compute Resource
Compute Resource
Compute Resource
KU Campus network
13
Results
XK-263

The first version of compound databases (around
3,000 compounds)
3,000 compounds screened ( found 30 high
potential compounds)
4 drug targets (Influenza, HIV-RT, HIV-PR, HIV-IN)

14
Experiences

Some files such as enzyme structure and output
are very large.
Require a good bandwidth between sites
Some simple optimizing techniques can help
Implements caching of enzyme structure file at
target hosts. Substantially reduce the number of
transfer needed
Batch scheduling approach is good if the systems
are very homogenous
Allow dynamic execution code staging to the
target host without installation/recompilation
Many script tools must be developed to
Streamline the execution
Handling data and code staging
Cleanup the execution

15
Next Generation Massive Screening on Grid

Move to Service Oriented Grid
Use Grid and Web services to encapsulate key
applications
Build broker and service discovery infrastructure
Rely heavily on OGSA and GT3.X, 4.X
Portlet based portal
JSR 168 Portlet Specification compliance
More modular , customizable, flexible
Plan to adopt GridShpere from gridlab
(www.gridlab.org)
Use database as backend instead of files
OGSA DAI might be used for data access

16
Progress

We are working on
New portal using GridSphere technology (done,
testing)
Service wrapper for lagacy code
Gamess, autodock (done, testing)
MMJFS interface ( progress)
OGSA DAI integration (progress)
Service Registration and Discovery (partial)
Broker System ( design)
New Monitoring (done)
Schedule
Finish and testing Jan-Feb 2005
Deploy in March 2005

17
File Server
Molecular DB
Grid Ftp
Gamess
Scheduler
OGSA DAI
MMJFS
Gamess Service
Portlet
Gamess
Portal
Registration Server
Broker Server
Backend DB
18
Design Choices

Mass Data Transportation across site
Central ftp server is used to store data/database
Each compute node can pull required data from
this ftp
Adhoc ftp , wget/http (firewall friendly)
Next Grid ftp
Cluster/ Single server
Gridify using service wrapper to expose grid
service of that lagacy application to the grid
Not working for cluster since compute node are
hidden behind head node
Back to MMJFS interface that talk to local
shceduler

19
Design Choices

Service Discovery Mechanism
Publish/subscribe model
Service advertising interface/protocol
Backend data based that shared between
registration service component and broker
component
Adoption of Grid Notification service and model
Available from mygrid project, seems to be useful
for more dynamics environment
Scalability.

Broker Service
Registration Service
Discovery (SQL)
20
Job Submission
Job Status
Result visualization
21
Performance Record
System Status
Job Queue Monitoring
22
Service Discovery
23
Conclusion

Grid and cluster computing is a key technology
that can give us the power. Grid works if use
wisely!
Challenges
Grid standard is still rapidly evolving
Things change before you can finish!
Difficult to configure, maintain, Some part is
still unstable
Firewall and security concern
Lack of manpower with expertise
Opportunity
Secure infrastructure
Cost reduction by the integration of networked
resources on demand

24
Acknowledgement

HPCNC Team
Somsak Sriprayoonsakul
Nuttaphon Thangkittisuwan
Thanakit Petchprasan
Isiriya Paireepairit

25
The End
26
Backup
27
Process
GRID
3D Structure
2D Structure
GAMESS
GAMESS
GAMESS
GAMESS
GAMESS
Molecular Structure Database
Optimized 3D Structure
Autodock
Autodock
Autodock
Enzyme Grid
Enzyme
Autodock
SOM Neural Network Analysis
Results
28
Workflow Engine
Grid Portal
Portlet
Portlet
Portlet
Portlet
Grid Middleware (OGSA )
OGSA DAI
Optimizing Services
Docking Services
Broker Services
Molecule Database
Resources ( Computer, Network)
Monitoring Services

Write a Comment

User Comments (0)