Title: IBM eBusiness on-demand
1Building Science Gateways with EnginFrame Life
Science example
Maurizio Melato e-mail maurizio_at_nice-software.com
2At the beginning
Aliases
Scripts
NFS
FTP
- At the beginning was the command line
Restart
Repository
DOE
- At first glance, simple tools and technology,
light - but the complexity handled by users arose and
arose
Teamwork
Versioning
LSF
Scripting
Library
Disk quota
CLI
Windows
CRASH!
Compute-/Data-Grid Middlewares
Queue
Linux
Convert
IP Protection
Resource
Working directory
Password
Execution host
FlexLM
3The Web (r)evolution
- Web interface to the Grid Grid Portals
- At first glance, the all-purpose-every-day-do-ever
ything solution - Portals as glue-technology integrate services,
tools and applications - Users may have various level of customizations on
both layout and contents - They are general purpose and any specific need
requires to be addressed and developed.
Grid Portal
Scripting
CLI
Compute-/Data-Grid Middlewares
4The Science Gateway perspective
- A community-developed set of tools, applications,
and data that is integrated via a portal or a
suite of applications
- SGs are specializations of Portals for specific
scientific communities. - SG is customized to meet the needs of the
targeted community - SG provides a a common interface configured for
optimal use. - SG allows researchers to focus on their research
and fostering collaborations
Portal
Scripting
CLI
Compute-/Data-Grid Middlewares
Other Community specific Data Sources
Distributed and heterogeneous Data Sources
Distributed and heterogeneous Computing
Resources (Grid/Compute/Visualization Farm)
5The Science Gateway perspective
- Gateways are independent projects, each of which
has its own guidelines, requirements and
constraints. - But they have similar technological challenges
- Compute-/Data-Grid integration
- Authentication/Authorization
- Collaboration mechanisms
- Tools Application integration
-
- Does the wheel need to be reinvented every time??
- ? Need of Scientific Gateway Framework technology
6Science Gateway Capabilities
- Depend on the needs of the specific community
- Authentication and Authorization
- Job Execution Services
- Domain-Specific Computational Applications
- Resource Discovery
- Access to Data Collections
- Data Movement Tools
- Visualization Hardware and Software
- Workflows
7SG Authentication and Authorization
- Satisfy the authentication and authorization
security constraints of the community - Integrating with the target authentication
technology - Providing the proper authorization mechanism
- Configurable authentication mechanims
- NIS
- PAM
- LDAP
- Windows ActiveDirectory
- MyProxy
- X509 Certificates
- Krb5
- Built-in Authorization system with extension
points - e.g. custom inheritance of group definitions
8SG Job Execution Services
- Preparation, submission, monitoring and result
retrieval
- Born as abstraction layer and interface on the
underlying Job Scheduler - Supports many Job Schedulers
9SG Domain-Specific Computational Applications
- Provide high-level vertical services
- Computing Portal was initially adopted by
Industrial communities - Automotive
- Manufacturing
- Electronics
- Oil Gas
- Telecommunication
- Life Sciences
- and Research Institutions
- INFN - National Institute of Nuclear Physics
- CILEA Lombard Inter-university Consortium for
Automatic Computation - CERN
10A growing number of customers
10
- Energy Utilities
- Addax Petroleum, AECL, Amerada Hess, British Gas,
CC of Water Resources, Chevron, Conoco-Phillips,
DSC-Libya, ENI/Agip, GazPromNeft, Marathon Oil,
Nexen, Rosneft, Schlumberger, Sibneft, Sinopec,
Slavneft, Sonatrach, Statoil, Talisman Energy,
Telecom Italia, TNK-BP, TNNC, TOTAL,
TyumenNIIGaz, VNIIGaz, Xinjiang Oil
Aerospace Manufacturing AIRBUS, Air Products
and Chemicals, ProcterGamble, Galileo Avionica,
Hamilton Sunstrand, Kimberly Clark, Magellan
Aerospace, MTU, Northrop Grumman, PW, Raytheon,
Simpson Strong-Tie
Automotive Industrial Equipment Audi, ARRK,
Bridgestone, Bosch, Corus Automotive, Delphi,
Elasis/CRF, Ferrari, Brawn GP, Jaguar-LandRover,
Lear, Magneti Marelli, McLaren, PZ, PSA, RedBull
Engineering, Swagelok, Suzuki, Toyota, TRW,
Volkswagen
Life Sciences LitBio project, DEISA project,
Biolab, Swiss Institute for Bioinformatics,
Partners Healthcare, M.D. Anderson Cancer Center
Research Education ASSC, CCLRC, CERN, CILEA,
CINECA, CNR, CNRS/IN2P3, ENEA, FzU, ICI, IFAE,
INFN, ITEP, Harvard Business School, SSC-Russia,
SDSC, Ferrara Uni, ITU, T.U.Dresden, Trinity
College Dublin, Huazhong Normal Uni, Yale
University
High Tech STMicroelectronics, Accent, Samsung
SDI, SensorDynamics, Motorola
11Which applications are used in EnginFrame?
12EnginFrame snapshots Technology Overview
- Services are XML description defining
- Input parameters
- The action to accomplish (Unix/Windows script,
Java, )
13EnginFrame Customizable Job Submission
13
User friendly, Application-oriented Job submission
Flexible and efficient Input file management
Ties in with dynamic enterprise data - Such as
databases
14Interactive job submission
Hide complexity of Underlying scheduler
15Monitoring control
Global Job monitoring
Cluster host monitoring
Job details control
16Output management
Data lifecycle managemnet
Comprehensive output File manipulation (view,
edit, delete, zip, )
Follow-up actions support
RESUBMIT jobs Rapidly edit input files and
re-submit with same parameters/settings?
17SG Resource Discovery
- The ability to dynamically discover resources and
available services - To build an indexed collections of the resources
- New defined services are dynamically published
according to authorization settings - EF relies on the underlying Grid middlewares for
query the availability of new hardware or
software resources - In A-WARE EU Project custom functionality for
dynamic discovering of third party services.
18SG Access to Data Collections
- The ability to access, query and retrieve data
collections and their metadata
- EF plugins provide integration with
- gLite Storage and AMGA metadata system
- SRB / iRODS datagrid middlewares
- Functionalities
- Browse data collections
- search metadata
- Integrated file-system view
- Read and search various audit data
- Seamless authentication and user mapping
- Define and run rules
19SG Data Movement Tools
- The capability to provision the required data to
a specific location considering network,
performance, caching concerns
- Browsing of local or remote Grid filesystem can
be transparent to users - Specific services can move data accordingly to
users needs - No analysis is currently performed on performance
or network latency concerns
20EF Data Management
Flexible and efficient Input file management
21EF Data Management
Data lifecycle management
View or stream Output files
22SG Workflows
- The possibility to design and run workflows (aka
virtual experiments) made up of basic tasks
with inter-dependencies
- Workflow technologies integrated
- Taverna, EF used as a third party webservice
provider - Moteur, batch Taverna workflows enactor
- EU Project A-WARE aimed to develop a Grid
worlkflow system - UNICORE Grid middleware
- BPMN/BPEL
23EF and Workflows
24EF and Workflows
25SG Visualization Software
- Provide high-end visualization tools to
visualize, work and collaborate with complex / 3D
interactive applications
- EF Remote visualization integrates
- RealVNC
- TurboVNC and VirtualGL
- Nomachine NX
- 3D Optimization technologies
- IBM Deep Computing Visualization (DCV)
- HP Remote Graphics Software (RGS)
- Sun OpenGL
- Session Management from the Web
- Collaboration capabilities via session sharing
26EF Visualization
IBMDCV
27EF Visualization Seamless Interactive
Application Integration
28Portal case study Remote 3D visualization
28
See demo online!
Application isolation (users do not need access
to command line)?
Collaborate
29Life Science Application Example
- How many steps you need to build and run your own
application in EnginFrame portal? - How much development effort it will take?
- Going practical... Here the steps an EF developer
should follow to build and expose his own
application - The use case is a Survival Analysis service
- The service performs an analysis on data from
different domains and with different tools
30Step 0 Use case analysis
- Analyse large microarray datasets for breast
cancer prognosis assessment - Concatenate clinical data and microarray results
- Mix of custom and R/Bioconductor programs
- Automatic analysis and plot creation
Demo available at http//ada.dist.unige.it8080/
enginframe/bioinf
31Step 1 Prepare Components
- Choose the pieces you already have
- Existing R and Bioconductor analysis scripts
- Existing CLI tools with parameters
- A bit of directory structure on the filesystem
- Bash (or similar) script you have to submit code
- Nothing is automagic but the probability you
will be able to recycle existing work is really
high - If not, we're talking about 50 lines of bash
script!
32Step 2 The EF Service Definition File
33Step 3 and the corresponding Web GUI
Just custom background !
Submission form
34Step 4 Monitor Execution
35Step 5 View Results!
36The End
- Thanks for your attention!
37EnginFrame Architecture (detail)
Plugins
Skins / Themes
Template-based dynamic presentation engine with
AJAX support
Single-Sign-On
Auth. delegation
ACL manager
Channel security
Usage acct./billing engine
User mapping
Session manager
GUI Virtualization
Service chaining
Distributed file manager
Custom XML Application Kits
Data life-cycle manager
Multi-language services
Workflow Engine
GridML virtualization
Data virtualization
App. virtualization
38EnginFrame Process
Service Request
EnginFrame Server
XSLT
Execute
Authorize
39(No Transcript)
40Bioinformatics Challenges
- Analyze large datasets ? big computational effort
- Store big amount of data ? data management needs
- Access to distributed data
- Make experiments automated and reproducible
- Integrate heterogeneous tools and data
- Ease accessibility usability
- Increase quality of experiments
41(No Transcript)
42Increasing complexity...
- What about the possibility to create a unique
virtual working environment between
geographically distributed sites ? - What about the possibility to run applications
and data randomly distributed on different sites? - This scenario has already
- been experimented in ...
- DEISA project!
43The DEISA Project
- Market
- Gov Life Sciences
- Value proposition
- User-friendly portal for the Life Science user
community (VO)? - Context
- 2 largest European Grid
- Connects the 11 largest HPC centers in Europe
- Complex security policies
- Multiple schedulers
- Solution
- NICE EnginFrame
- NICE DataGate
- Custom enhancements (will be included in v5.x)?
www.deisa.org
- Benefits
- IT App managers happy not to re-invent the
wheel - HPC centers want it locally, too
- Excellent reference
44The Portal for Life Sciences community
45(No Transcript)
46Harnessing grid computing to save womens lives
Case Study HealthCare
Screening Mammography
Together for a better diagnosis/prognosis
47Case Study Neuroinf.it (EGEE)
- Market
- Research Life Sciences
- Value proposition
- Distributed access to medical images across
hospitals and remote processing - 3D remote visualization of medical images
- Context
- 1 largest European Grid
- Privacy must be assured
- Solution
- EnginFrame used as application interface and data
grid resource manager
48Remote 3D Visualization
49(No Transcript)
50Why EnginFrame Elements?
- Ease of use for small HPC clusters
- Empowered end users
- Lack of Sysadmin with focus on HPC
- Low customization needs
- SSH replacement
- Simplified sales and delivery process
- No upfront services or customization required
- Next-next-next installation
- Suitable as differentiator for hardware sales
- Low budget solution
- License at 95 / node / year
- Premium support at 3900/year every 50 users
51My HPC Dashboard
52Managing files
53Interactive access
54Job monitoring
55Cluster monitoring
56WYSIWYG service editor