Building a Massive Virtual Screening using Grid Infrastructure - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Building a Massive Virtual Screening using Grid Infrastructure

Description:

Thailand's Medicinal Plants is important for Thai society. Over 1,000 species ... INCA - KMUTT. 158 CPUs on 110 nodes. Kasetsart University. Software Architecture ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 29
Provided by: Nee
Category:

less

Transcript and Presenter's Notes

Title: Building a Massive Virtual Screening using Grid Infrastructure


1
Building a Massive Virtual Screening using Grid
Infrastructure
Putchong Uthayopas High Performance Computing and
Networking Center, Kasetsart University
  • Chak Sangma
  • Centre for Cheminformatics
  • Kasetsart University

2
Motivation
  • Thailands Medicinal Plants is important for Thai
    society
  • Over 1,000 species
  • Over 200,000 compounds
  • Multiple disease targets
  • Problem
  • No complete collection of compounds database
  • The practice is still mostly rely on local
    knowledge and conventional wisdom
  • Lack of systematic verifications by scientific
    methods

SIATIC PENNYWORT
Bariena lunulina Linae
3
Kasetsart University Thai Medicinal Plants Effort
  • Led by Center for Cheminformatics, Kasetsart
    University (Dr. Chak Sangma)
  • Goal
  • Establish Thai medicinal plant knowledgebase by
    building 3D molecular database
  • Employ Virtual Screening to verify active
    compounds with conventional knowledge

4
Reports and Literatures
2D Structures
Approximated 3D Structures
Compute Intensive!
Optimized 3D Structures with GAMESS
Calculated Binding Energy with Autodock 3.0
Structure in 0.5 Ã… from Binding Site
SOM Neural Network Map
Results
5
ThaiGrid Drug Design Portal
  • Partners
  • High Performance Computing and networking Center,
    KU
  • Center for Cheminfomatics, KU
  • IBM Thailand
  • Goal
  • Building a virtual screening infrastructure on
    ThaiGrid System
  • Start from KU campus Grid and extended to other
    ThaiGrid partner universities later
  • Link
  • http//tgcc.cpe.ku.ac.th
  • http//www.thaigrid.net

6
Challenge
  • Recent project for National Center for Genetic
    Engineering and Biotechnology, Thailand
  • Screen 3000 compounds in 3 months
  • Computation time on 2.4 GHz Pentium IV 4 system
  • Over 30 mins/1 optimized structure
  • Over 30 mins/1 docking
  • Estimate computing time on single processor
  • (3,000 x 30) (3,000 x 30)
  • 3,000 Hours
  • 125 Days
  • 4 month 16 days
  • Not fast enough!

7
Key Technologies
  • Three key technologies must be combined to
    provide the solution
  • Cluster Computing
  • Grid Computing
  • Portal Technology

8
What we want to do? Hide the complexity of Grid
and computational chemistry software from
scientists while providing massive computational
power needed
9
Infrastructure
  • ThaiGrid infrastructure are used
  • 10 Clusters from 6 organizations
  • AMATA KU
  • GASS KU
  • MAEKA KU
  • WARINE KU
  • CAMETA SUT
  • OPTIMA - AIT
  • ENQUEUE KMUTNB
  • PALM KMUTNB
  • SPIRIT CU
  • INCA - KMUTT
  • 158 CPUs on 110 nodes

10
Software Architecture
  • Each cluster has local scheduler
  • SGE, OpenPBS, Condor can be used
  • We use our SQMS scheduler
  • Globus2.4 is used as middleware
  • Resources control and security (GSI)
  • Grid level scheduler control multi-cluster job
    submission
  • Use KU own SQMS/G

11
The Portal
  • Roles
  • User interface
  • Automate execution flow
  • File access and management
  • Features
  • Create project
  • Add ligand, enzyme
  • Submit screening job, monitor job status
  • Download output
  • Current portal is built using Plone
  • http//www.plone.org/
  • Python based web content management
  • Flexible and extensible

12
How things work!
Task
Task
Resource Broker (SQMS/G)
Portal
Grid Middleware Globus2.4
Task
Task
Task
Monitor
Compute Resource
Compute Resource
Compute Resource
Compute Resource
Compute Resource
KU Campus network
13
Results
XK-263
  • The first version of compound databases (around
    3,000 compounds)
  • 3,000 compounds screened ( found 30 high
    potential compounds)
  • 4 drug targets (Influenza, HIV-RT, HIV-PR, HIV-IN)

14
Experiences
  • Some files such as enzyme structure and output
    are very large.
  • Require a good bandwidth between sites
  • Some simple optimizing techniques can help
  • Implements caching of enzyme structure file at
    target hosts. Substantially reduce the number of
    transfer needed
  • Batch scheduling approach is good if the systems
    are very homogenous
  • Allow dynamic execution code staging to the
    target host without installation/recompilation
  • Many script tools must be developed to
  • Streamline the execution
  • Handling data and code staging
  • Cleanup the execution

15
Next Generation Massive Screening on Grid
  • Move to Service Oriented Grid
  • Use Grid and Web services to encapsulate key
    applications
  • Build broker and service discovery infrastructure
  • Rely heavily on OGSA and GT3.X, 4.X
  • Portlet based portal
  • JSR 168 Portlet Specification compliance
  • More modular , customizable, flexible
  • Plan to adopt GridShpere from gridlab
    (www.gridlab.org)
  • Use database as backend instead of files
  • OGSA DAI might be used for data access

16
Progress
  • We are working on
  • New portal using GridSphere technology (done,
    testing)
  • Service wrapper for lagacy code
  • Gamess, autodock (done, testing)
  • MMJFS interface ( progress)
  • OGSA DAI integration (progress)
  • Service Registration and Discovery (partial)
  • Broker System ( design)
  • New Monitoring (done)
  • Schedule
  • Finish and testing Jan-Feb 2005
  • Deploy in March 2005

17
File Server
Molecular DB
Grid Ftp
Gamess
Scheduler
OGSA DAI
MMJFS
Gamess Service
Portlet
Gamess
Portal
Registration Server
Broker Server
Backend DB
18
Design Choices
  • Mass Data Transportation across site
  • Central ftp server is used to store data/database
  • Each compute node can pull required data from
    this ftp
  • Adhoc ftp , wget/http (firewall friendly)
  • Next Grid ftp
  • Cluster/ Single server
  • Gridify using service wrapper to expose grid
    service of that lagacy application to the grid
  • Not working for cluster since compute node are
    hidden behind head node
  • Back to MMJFS interface that talk to local
    shceduler

19
Design Choices
  • Service Discovery Mechanism
  • Publish/subscribe model
  • Service advertising interface/protocol
  • Backend data based that shared between
    registration service component and broker
    component
  • Adoption of Grid Notification service and model
  • Available from mygrid project, seems to be useful
    for more dynamics environment
  • Scalability.

Broker Service
Registration Service
Discovery (SQL)
20
Job Submission
Job Status
Result visualization
21
Performance Record
System Status
Job Queue Monitoring
22
Service Discovery
23
Conclusion
  • Grid and cluster computing is a key technology
    that can give us the power. Grid works if use
    wisely!
  • Challenges
  • Grid standard is still rapidly evolving
  • Things change before you can finish!
  • Difficult to configure, maintain, Some part is
    still unstable
  • Firewall and security concern
  • Lack of manpower with expertise
  • Opportunity
  • Secure infrastructure
  • Cost reduction by the integration of networked
    resources on demand

24
Acknowledgement
  • HPCNC Team
  • Somsak Sriprayoonsakul
  • Nuttaphon Thangkittisuwan
  • Thanakit Petchprasan
  • Isiriya Paireepairit

25
The End
26
Backup
27
Process
GRID
3D Structure
2D Structure
GAMESS
GAMESS
GAMESS
GAMESS
GAMESS
Molecular Structure Database
Optimized 3D Structure
Autodock
Autodock
Autodock
Enzyme Grid
Enzyme
Autodock
SOM Neural Network Analysis
Results
28
Workflow Engine
Grid Portal
Portlet
Portlet
Portlet
Portlet
Grid Middleware (OGSA )
OGSA DAI
Optimizing Services
Docking Services
Broker Services
Molecule Database
Resources ( Computer, Network)
Monitoring Services
Write a Comment
User Comments (0)
About PowerShow.com