Title: The MiGenAS Workflow Engine
1The MiGenAS Workflow Engine
- Thomas Soddemann, RZG
- Markus Ramp, MPG
- MPG MiGenAS Consortium
2- Overview
- The RZG
- The MiGenAS consortium
- Bio-Informatics workflows
- The MiGenAS pipeline
- The MiGenAS WE
- Perspectives
3The Supercomputing Center
RZG Rechen-Zentrum Garching Supercomputing
Center for the Max Planck Society (MPG)
- Services and involvements
- Supercomputing facility with a 5 TFlop
- IBM Regatta system
- Linux compute farms
- Data Storage
- DEISA
- MiGenAS
- D-Grid German Grid initiative
- Data Acquisition for ASDEX Upgrade and
- Wendelstein 7X (Plasma Physics)
4The Supercomputing Center
5The Supercomputing Center
DEISA Distributed European Infrastructure
for Supercomputing Applications
- Consortium of leading national supercomputing
centers - focuses in deploying an Grid empowered
infrastructure - to build a distributed terascale supercomputing
facility
6The MiGenAS Consortium
- Microbial Genome Analysis System - MiGenAS
- Integrated environment for microbial genome
research - Members of the consortium are
- Dept. from Several Max Planck Societies
Institutes (MPIs) - MPI for Bio-Chemistry (Martinsried)
- MPI for Developmental Biology (Tübingen)
- MPI for Marine Biology (Bremen)
- MPI for Computer Science (Saarbrücken)
-
- CeBiTec, Center for Genome Research,
- Bielefeld
- RZG, Supercomputing Center of the MPG
7Workflows in the Bio-Informatics World
- Tools from the MiGenAS
- consortium
- GenDB
- Focus on ORF Prediction and
- automated Annotation
- MiGenAS WE
- Focus on Alignment, Phylogeny,
- and Structure Prediction
8Functionality
Software packages
Data structures
Data bases
Assembly
Sequence data
phred/phrap/consed
Project assembly result (genome)
Quality Control
control center
Database (MySQL) contig (P), ORFs
(P) Observations(P)
ORF prediction automatic annotation
Standard db
Gendb tools import blast export
Files
Public genome ORFs
e.g. halolex
Dedicated Annotation
blast
Homology Search
PSI-blast
Private ORFs
HMMaccel
HMMer
ClustalW
blast-align
Alignment
poa
pcma
Alignment-validate
CluCheck
PAC / MAC
protdist
Phylogeny
neighbor
tree-view
Distance
Parsimony
Maximum likelihood
Tree-Puzzle
fastDNAml
ARB
Second. Structure
JNET
Structure Prediction
SignalP
PSIPred
TMHMM
Arby
Prosa2003
Tertiary Structure
9Workflows in the Bio-Informatics World
NGPSTKDFGKISES REFDNQNGPSTKD FGKISESREFDNQ
- Simplified view of a typical workflow
- Query sequence gets blasted versus a selected
DB - Selected target sequences are aligned with the
query - Calculate the evolutionary distance between
sequences - Construct the evolutionary tree
- The result is displayed and analyzed using a tree
viewer - (6. Reanalyze)
Blast
ClustalW
TreeView
ProtDist
Neighbor
10Workflows in the Bio-Informatics World
NGPSTKDFGKISES REFDNQNGPSTKD FGKISESREFDNQ
Task from the MiGenAS consortium Connect all
tools to a pipeline which can used step by step
by the scientist.
Blast
Result The MiGenAS pipeline A semi-automatic
workflow engine.
ClustalW
TreeView
ProtDist
Neighbor
11MiGenAS pipeline
Semi-automatic Workflow Engine Flow control is
handled manually by the user at processing time.
- Advanced use cases
- Same workflow,
- different data sets
- Same workflow,
- different parameter sets
- combination of the above
- Predefined result selection
- criteria
Typical set of requirements for automation
12MiGenAS pipeline
- Automated Data Processing
- Automated Parameter Space scanning
- Automated Data Set deployment
- Definition of complex workflows
Description
13MiGenAS pipeline
- Computational Steering
- Adapting the flow at execution time
14MiGenAS pipeline
- Integration of problematic tools
Entry
MiGenAS WE
Local Resource
Internet
Exit
Reentry
15MiGenAS pipeline
- Distributed Instances of the MiGenAS WE
MiGenAS WE Site A
utilizing MiGenAS WE Site B
Internet
16MiGenAS WE
- Web Service centric Solution
- Equip atomic components with WS endpoint
interfaces - Define relevant workflows, e.g., in BPEL4WS
- Deploy the workflow descriptions and WS
- Publish entry, exit, reentry points
MiGenAS WE
17MiGenAS WE
Web Service centric Solution
User Input
- User invokes the Workflow by
- Contacting the Entry Point
- Providing the necessary input
contacts
Entry Point WS
provides
- The Web service performs a set up by
- Storing user input in a DB
- Further initialization of spawning WF
- processes
- Setting exit and reentry points
Set Up
Processing of the predefined WF, E.g.
Blast-gtClustalW-gt Note Atomic components take
care of necessary data access as defined
Finally Informing user of the availability of
results
WF WS
User notification
message
sends
18MiGenAS WE
- Technological Realization
- J2EE 1.4 (EJB, JSP, Servlet, JMS)
- JBoss4 (http//www.jboss.org)
- JAX-RPC
- Apache Axis, JBossWS
- CORBA
- JacORB, omniORB
- BPEL4WS
- ActiveBPEL http//www.ActiveBPEL.org/
19MiGenAS WE
BPEL4WS
- Advantages
- various implementations (e.g. ActiveBPEL, Oracle
PEM), - hence no vendor lock-in
- Provides an relatively easy to use framework for
the developer - Integrates seamlessly into J2EE and CORBA
environments - Some implementations allow non-WS endpoints
(e.g. method of an EJB)
- Problems
- Impossible for our end-user to define a BPEL
WF - gt e.g. a Rich Client is needed
- One needs to know Port Types etc. in advance,
- hence Difficult to define abstract WF
20Perspectives
W3C WS Choreography
- MiGenAS WE could employ
- Abstract Choreographies
- for user defined workflows
- as a meta-workflow language
- to be processed and converted to a
- Concrete Choreography
- Abstract Choreographies
- Define
- Type of information
- Sequence and conditions
- Do not define
- Physical structure (e.g. port types)
- how flow control conditions are determined
- where messages should be sent
21(No Transcript)