The MiGenAS Workflow Engine - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

The MiGenAS Workflow Engine

Description:

Focus on ORF Prediction and. automated Annotation. MiGenAS WE. Focus on Alignment, Phylogeny, ... ORF prediction automatic. annotation. Gendb. tools: import ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 21
Provided by: thomasso
Category:
Tags: migenas | engine | orf | workflow

less

Transcript and Presenter's Notes

Title: The MiGenAS Workflow Engine


1
The MiGenAS Workflow Engine
  • Thomas Soddemann, RZG
  • Markus Ramp, MPG
  • MPG MiGenAS Consortium

2
  • Overview
  • The RZG
  • The MiGenAS consortium
  • Bio-Informatics workflows
  • The MiGenAS pipeline
  • The MiGenAS WE
  • Perspectives

3
The Supercomputing Center
RZG Rechen-Zentrum Garching Supercomputing
Center for the Max Planck Society (MPG)
  • Services and involvements
  • Supercomputing facility with a 5 TFlop
  • IBM Regatta system
  • Linux compute farms
  • Data Storage
  • DEISA
  • MiGenAS
  • D-Grid German Grid initiative
  • Data Acquisition for ASDEX Upgrade and
  • Wendelstein 7X (Plasma Physics)

4
The Supercomputing Center
5
The Supercomputing Center
DEISA Distributed European Infrastructure
for Supercomputing Applications
  • Consortium of leading national supercomputing
    centers
  • focuses in deploying an Grid empowered
    infrastructure
  • to build a distributed terascale supercomputing
    facility

6
The MiGenAS Consortium
  • Microbial Genome Analysis System - MiGenAS
  • Integrated environment for microbial genome
    research
  • Members of the consortium are
  • Dept. from Several Max Planck Societies
    Institutes (MPIs)
  • MPI for Bio-Chemistry (Martinsried)
  • MPI for Developmental Biology (Tübingen)
  • MPI for Marine Biology (Bremen)
  • MPI for Computer Science (Saarbrücken)
  • CeBiTec, Center for Genome Research,
  • Bielefeld
  • RZG, Supercomputing Center of the MPG

7
Workflows in the Bio-Informatics World
  • Tools from the MiGenAS
  • consortium
  • GenDB
  • Focus on ORF Prediction and
  • automated Annotation
  • MiGenAS WE
  • Focus on Alignment, Phylogeny,
  • and Structure Prediction

8
Functionality
Software packages
Data structures
Data bases
Assembly
Sequence data
phred/phrap/consed
Project assembly result (genome)
Quality Control
control center
Database (MySQL) contig (P), ORFs
(P) Observations(P)
ORF prediction automatic annotation
Standard db
Gendb tools import blast export
Files
Public genome ORFs
e.g. halolex
Dedicated Annotation
blast
Homology Search
PSI-blast
Private ORFs
HMMaccel
HMMer
ClustalW
blast-align
Alignment
poa
pcma
Alignment-validate
CluCheck
PAC / MAC
protdist
Phylogeny
neighbor
tree-view
Distance
Parsimony
Maximum likelihood
Tree-Puzzle
fastDNAml
ARB
Second. Structure
JNET
Structure Prediction
SignalP
PSIPred
TMHMM
Arby
Prosa2003
Tertiary Structure
9
Workflows in the Bio-Informatics World
NGPSTKDFGKISES REFDNQNGPSTKD FGKISESREFDNQ
  • Simplified view of a typical workflow
  • Query sequence gets blasted versus a selected
    DB
  • Selected target sequences are aligned with the
    query
  • Calculate the evolutionary distance between
    sequences
  • Construct the evolutionary tree
  • The result is displayed and analyzed using a tree
    viewer
  • (6. Reanalyze)

Blast
ClustalW
TreeView
ProtDist
Neighbor
10
Workflows in the Bio-Informatics World
NGPSTKDFGKISES REFDNQNGPSTKD FGKISESREFDNQ
Task from the MiGenAS consortium Connect all
tools to a pipeline which can used step by step
by the scientist.
Blast
Result The MiGenAS pipeline A semi-automatic
workflow engine.
ClustalW
TreeView
ProtDist
Neighbor
11
MiGenAS pipeline
Semi-automatic Workflow Engine Flow control is
handled manually by the user at processing time.
  • Advanced use cases
  • Same workflow,
  • different data sets
  • Same workflow,
  • different parameter sets
  • combination of the above
  • Predefined result selection
  • criteria

Typical set of requirements for automation
12
MiGenAS pipeline
  • Automated Data Processing
  • Automated Parameter Space scanning
  • Automated Data Set deployment
  • Definition of complex workflows

Description
13
MiGenAS pipeline
  • Computational Steering
  • Adapting the flow at execution time

14
MiGenAS pipeline
  • Integration of problematic tools

Entry
MiGenAS WE
Local Resource
Internet
Exit
Reentry
15
MiGenAS pipeline
  • Distributed Instances of the MiGenAS WE

MiGenAS WE Site A
utilizing MiGenAS WE Site B
Internet
16
MiGenAS WE
  • Web Service centric Solution
  • Equip atomic components with WS endpoint
    interfaces
  • Define relevant workflows, e.g., in BPEL4WS
  • Deploy the workflow descriptions and WS
  • Publish entry, exit, reentry points

MiGenAS WE
17
MiGenAS WE
Web Service centric Solution
User Input
  • User invokes the Workflow by
  • Contacting the Entry Point
  • Providing the necessary input

contacts
Entry Point WS
provides
  • The Web service performs a set up by
  • Storing user input in a DB
  • Further initialization of spawning WF
  • processes
  • Setting exit and reentry points

Set Up
Processing of the predefined WF, E.g.
Blast-gtClustalW-gt Note Atomic components take
care of necessary data access as defined
Finally Informing user of the availability of
results
WF WS
User notification
message
sends
18
MiGenAS WE
  • Technological Realization
  • J2EE 1.4 (EJB, JSP, Servlet, JMS)
  • JBoss4 (http//www.jboss.org)
  • JAX-RPC
  • Apache Axis, JBossWS
  • CORBA
  • JacORB, omniORB
  • BPEL4WS
  • ActiveBPEL http//www.ActiveBPEL.org/

19
MiGenAS WE
BPEL4WS
  • Advantages
  • various implementations (e.g. ActiveBPEL, Oracle
    PEM),
  • hence no vendor lock-in
  • Provides an relatively easy to use framework for
    the developer
  • Integrates seamlessly into J2EE and CORBA
    environments
  • Some implementations allow non-WS endpoints
    (e.g. method of an EJB)
  • Problems
  • Impossible for our end-user to define a BPEL
    WF
  • gt e.g. a Rich Client is needed
  • One needs to know Port Types etc. in advance,
  • hence Difficult to define abstract WF

20
Perspectives
W3C WS Choreography
  • MiGenAS WE could employ
  • Abstract Choreographies
  • for user defined workflows
  • as a meta-workflow language
  • to be processed and converted to a
  • Concrete Choreography
  • Abstract Choreographies
  • Define
  • Type of information
  • Sequence and conditions
  • Do not define
  • Physical structure (e.g. port types)
  • how flow control conditions are determined
  • where messages should be sent

21
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com