Title: Ilkay Altintas
1KEPLER Collaboration for Scientific Workflows and
ROADNet
- Ilkay Altintas
- Lead, Scientific Workflow Automation Technologies
Laboratory - SDSC Project Manager, Kepler Scientific Workflow
Project - San Diego Supercomputer Center, UCSD
2Cyberinfrastucture Needs
- Goal is for NSF facilities to provide capability
over the whole space - SDSC HEC DATA
Cyberinfrastructure is the coordinated
aggregate of software, hardware and other
technologies, as well as human expertise,
required to support current and future
discoveries in science and engineering.
3Scientific Workflow Systems are a Glue
- Tools to combine different CI technologies
- Mission of scientific workflow systems
- Promote scientific discovery by providing
tools and methods to generate scientific
workflows - Create a generic customizable graphical user
interface for scientists from different
scientific domains - Support computational experiment creation,
execution, sharing, reuse and provenance - Design frameworks which define efficient ways to
connect to the existing data and integrate
heterogeneous data from multiple resources - Bring CI into users monitor!!!
4SWF Systems Requirements (1/2)
- it should work (No kidding!)
- USER REQUIREMENTS
- Design tools-- especially for non-expert users
- Ease of use-- fairly simple user interface having
more complex features hidden in the background - Reusable generic features
- Generic enough to serve to different communities
but specific enough to serve one domain (e.g.
geosciences) - Extensibility for the expert user-- almost a
visual programming interface - Registration and publication of data products and
process products (workflows) provenance
5SWF Systems Requirements (2/2)
- TECHNICAL REQUIREMENTS
- Error detection and recovery from failure
- Logging information for each workflow
- Allow data-intensive and compute-intensive tasks
- (Maybe at the same time)
- HPC Data management/integration
- Allow status checks and on the fly updates
- Remote execution
- Visualization
- Semantics, metadata based data access
- Certification, trust, security
6Kepler is a Scientific Workflow System
www.kepler-project.org
- and a cross-project collaboration
- Latest alpha release out last week!
- Builds upon the open-source Ptolemy II framework
7Kepler is a Team Effort
Griddles
SKIDL
Resurgence
SRB
Cypres
NLADR
Contributor names and funding info are at the
Kepler website!!
New contributors - Chesire (UK Text Mining
Center) - SCEC
LOOKING
8Strategic Plan/Position
- An multi-project, multi-institution,
multi-national collaboration derived by
application pull from each project
- Development principal
- gt Define your requirements
- Reuse existing development if possible
- Extend features if needed
- Add new components if they dont exist
- Merge features if they can be
generalized.
- Create a core of production-quality programmers
who share experiences via online media like
mailing lists, IRC, wiki, shared code and
documents (Currently 24 developers, 10 active)
- Develop methodology for scientific software and
workflow development
- Make your community happy
9Kepler Software Practice
- Joint CVS
- Open-source (BSD)
- Website Wiki
- Communications
- Busy IRC channel
- Mailing lists
- Kepler-dev
- Kepler-users
- 6-monthly hackatons
10A co-development in KEPLER GEON Dataset
Generation Registration
Makefile gt ant run
SQL database access (JDBC)
Matt,Chad, Dan et al. (SEEK)
Ilkay (SDM)
Efrat (GEON)
Yang (Ptolemy)
Xiaowen (SDM)
Edward et al.(Ptolemy)
11Actors are the Processing Components
- Actor
- Encapsulation of parameterized actions
- Interface defined by ports and parameters
- Port
- Communication between input and output data
- Without call-return semantics
- Model of computation
- Communication semantics among ports
- Flow of control
- Implementation is a framework
- Examples
- Simulink(The MathWorks)
- LabVIEW ( from National Instruments)
- Easy 5x (from Boeing)
- ROOM(Real-time object-oriented modeling)
- ADL(Wright)
-
-
Actor-Oriented Design
12Directors are the WF Engines that
- Implement different computational models
- Define the semantics of
- execution of actors and workflows
- interactions between actors
- Ptolemy and Kepler are unique in combining
different execution models in heterogeneous
models! - Kepler is extending Ptolemy directors with
specialized ones for web service based workflows
and distributed workflows.
- Process Networks
- Rendezvous
- Publish and Subscribe
- Continuous Time
- Finite State Machines
- Dataflow
- Time Triggered
- Synchronous/reactive model
- Discrete Event
- Wireless
13Vergil is the GUI for Kepler
Actor Search
Data Search
- Actor ontology and semantic search for actors
- Search -gt Drag and drop -gt Link via ports
- Metadata-based search for datasets
14Actor Search
- Kepler Actor Ontology
- Used in searching actors and creating conceptual
views ( folders) - Currently 160 Kepler actors added!
15Some actors in place for
- Generic Web Service Client and Web Service
Harvester - Customizable RDBMS query and update
- Command Line wrapper tools
- Some Grid actors-Globus Job Runner,
GridFTP-based file access, Proxy Certificate
Generator - SRB support
- Native R support
- Interaction with Nimrod and APST
- Communication with ORBs through actors and
services - Imaging, Gridding, Vis Support
- Textual and Graphical Output
- more generic and domain-oriented actors
16Data Search and Usage of Results
- Kepler DataGrid
- Discovery of data resources through local and
remote services - SRB,
- Grid and Web Services,
- Db connections
- Registry of datasets on the fly using workflows
17Promoter Identification Workflow
18(No Transcript)
19(No Transcript)
20Enter initial inputs, Run and Display results
21Custom Output Visualizer
22Kepler and ROADNet
- Interaction with ORB
- To handle different data packets
- convert them into Kepler objects
- textually and graphically display information
- plot, visualize, and monitor data values
- QA and QC of data using user constraints
- Distribution in ROADNet stack
23(No Transcript)
24Data Packet Handling -- Streaming
25(No Transcript)
26Coming soon in Kepler
- MORE INFRASTRUCTURE TO SUPPORT SCIENCE!
- Full support for distributed execution
- Plug-in Kepler archives and better versioning
support - Semantic and hybrid typed actors and workflow
construction - Portal support and registration of products
- Support on process and data provenance
- Standardization of data interfaces
- Integration with SCIRun and SDSC vis modules
- Documentation of generated products in addition
to the existing manuals and documentation
27Hot Topics in Kepler Development
28What can LOOKING get from Kepler?
- Streaming applications generated by visual
programming interface - Resource search and usage in analysis workflows
- Deployment of control and analysis workflows as
web services - Easy archiving of and access to data through
actors - 24X7 runs of analysis and control tasks on
identified servers - Use it as a web service composition tool for RAD
- Can make use of different computation models in
different layers!
29Question and System DemonstrationThanks!
Ilkay Altintas altintas_at_sdsc.edu 1 (858)
822-5453 http//www.sdsc.edu
30Examples of Model of Computations
- Dataflow
- Connections represents data streams
- Actors compute their output data stream for input
streams - Useful for designing signal processing algorithms
and sampled control laws - Time Triggered
- Follow the principle of global progress of time
- Strong composability,diagnostically, and formal
analysis - Synchronous/reactive model
- Stimulated by events from the environment, but
responds instantaneously - Excellent for applications with concurrent and
complex control logic - Discrete Events
- Actors share a global notion of time and
communication through events that placed on a
continuous line - Used in modeling hardware and software timing
properties, communication networks, and queuing
systems
31Examples of Model of Computations(Con)
- Process Networks
- Asynchronous communication between processes
- Excellent for signal processing
- Difficult to interoperate with models including
notion of time - Rendezvous
- Synchronous communication between processes or
threads - Excellent for applications where resource sharing
is a key element Poor in maintaining determinacy - Difficult to interoperate with models including
notion of time - Publish and Subscribe
- Connections are event stream Components produce
or consume events - Good for distributed applications
- Continuous Time
- Connection carries a continuous-time signal
Actors denote the relation among these signals - Used in control system design for modeling
physical dynamics and continuous control laws - Finite State Machines
- Component is called state or node. The connection
represent transitions of transfer of the control
between states - Sequential execution
- Excellent for describing control logic