Title: NGAS Presentation
1REIRecipe Execution Infrastructure
2Purpose of REI
- Main Objectives of REI
- Provide the services of a parallel Batch Queue
System. - Make it easy to control and monitor complicated
batches with job synchronization. - Make it possible to distribute tasks (processing
load) over a cluster of CPUs/nodes. - Not Provided in the Present Implementation
- Services for distributing data within the cluster
to the nodes doing the processing (data
sharing/distribution done via a common storage
area/file server). - Services provided for resource management and
advertising. - Services provided for explicit load balancing
(optimized job distribution). - Special features for GRID appliance provided.
3Main Features
- Main Features of REI
- Implemented in C (in house implementation from
scratch). - Uses RDBMS for information sharing and task
synchronization. - Execution of shell commands or native execution
of CPL Recipes (no generic interfacing to shared
object files). - Pworker task execution daemon provided can take
three roles - Process Master Commands Master Pworker.
- Process Standard Commands Standard Pworker.
- Process Master and Standard Comands.
- Command line utilities provided to
add/remove/monitor commands and to control
Pworkers. - API provided for implementing Master Command
Libraries (also referred to as Recipe Planners)
and Standard Command Libraries.
4Command Line Interface
- Interaction with REI
- Command line interface provided
- addcmd Add a Master Command in the Master
Command Queue (handles ABs and SOFs, which are
not part of core of REI). - cmdstat Query the status of all commands or a
specific command. Tail feature provided. - rmcmd Remove information for one command or all
commands from the Command Queues (clean up). - pworker The Pworker daemon.
- stopworker Stop one specific Pworker or all
Pworkers running. - listworkers List Pworkers running in the system.
- rmworker Remove a Pworker (make it exit) or all
Pworkers. - The commands are not part of the core REI system,
but should be seen as convenience features. They
are based on the REI libraries. - Can add commands in the DB directly via the REI
libraries, i.e., can control and monitor the
operation of REI programmatically.
5Command Lifecycle
- Command States
- Each command submitted has 1 of 7 states
indicating its current status
6Command Transitions
7Interprocess Synchronization
- Interprocess Synchronization/Information Sharing
- Pworkers synchronize themselves via the DB.
- DB also used for exchanging information between
processes in the system - Tables
- pworker_registry Information about Pworkers in
the system (ID, node, Master and/or Standard
Commands, ). - pworker_master_command_queue Contains
information for the Master Commands waiting to be
executed under execution and executed. - pworker_master_sequencer Contains information
about Master Commands being BLOCKED. - pworker_command_queue Standard Commands waiting
to be executed under execution and executed. - pworker_command_sequencer Used to sequence
Standard Commands. - pworker_log Log messages from Pworker processes.
8OmegaCam Demo Science Reduction Cascade/1
- OmegaCam Science Demo Cascade Example
- Used adapted WFI frames (8 extensions).
- Provided
- OCAM REI Recipe Planner Plug-In to schedule tasks
for the recipes (general Recipe Planner for all
Recipes made). - REI Standard Command Library Plug-Ins to do FITS
file splitting and joining. - Cascade Scheduler Script to submit Master
Commands and to create SOFs needed. - 6 Recipes executed during the cascade (6 Master
Commands issued to REI). - Total number of commands scheduled within REI for
the cascade 100. - Total number of intermediate/temporary and final
data products 200. - Number of SOFs involved 10.
9OmegaCam Demo Science Reduction Cascade/2
- Setting up Cascade Example
addcmd -name ocam_reduce_sci_W_2005-02-08T1629
05 -bg -waitfor ocam_reduce_std_W_2005-02-08T1629
05 -recipe ocam_reduce_sci /data/ocam/sof/ocam_re
duce_sci_W_2005-02-08T162905.sof -out
/raid/data/ocam/products/ocam_reduce_sci_W_2005-02
-08T162905 addcmd -name ocam_reduce_std_W_2005
-02-08T162905 -bg -waitfor ocam_mflat_W_2005-02-
08T162905 -trigger ocam_reduce_std_W_2005-02-08T
162905 -recipe ocam_reduce_std
/raid/data/ocam/sof/ocam_reduce_std_W_2005-02-08T1
62905.sof -out /raid/data/ocam/products/ocam_red
uce_std_W_2005-02-08T162905 addcmd -name
ocam_mflat_W_2005-02-08T162905 -bg -waitfor
ocam_mtwilight_W_2005-02-08T162905 -trigger
ocam_mflat_W_2005-02-08T162905 -recipe
ocam_mflat /raid/data/ocam/sof/ocam_mflat_W_2005-0
2-08T162905.sof -out /raid/data/ocam/products/oc
am_mflat_W_2005-02-08T162905
10Task Synchronization
Compl
11Command Scheduling
Split
Split
Frame A
Frame B
Recipe
Recipe
Recipe
Recipe
Join
Join
12DFO Cascading
- Controlling REI DFO Environment
- Already used in operation by DFO (since a while).
- DFO uses REI to control scheduling of a UNIX
shell script, which itself controls the execution
of the recipes (calling internally esorex). - DFO uses parallelism at frame level, no
parallelism in connection with the processing of
each frame. - REI used as a queue system, jobs are submitted
and the scheduling and execution of the jobs
carried out by REI. - Example addcmd in DFO environment
addcmd -name SINFO.2004-08-21T202528.895_tpl.a
b -bg -trigger mflat_SINFO.2004-08-21T202528.895
_tpl.ab -exe processAB -a SINFO.2004-08-21T20252
8.895_tpl.ab addcmd -name SINFO.2004-08-21T1955
07.961_tpl.ab -bg -trigger mwave_SINFO.2004-08-21
T195507.961_tpl.ab -waitfor mflat_SINFO.2004-08-
21T202528.895_tpl.ab -exe processAB -a
SINFO.2004-08-21T195507.961_tpl.ab
13Using REI
- How to Integrate a Pipeline in REI (Simplified )
- Decide how to execute the recipes
- Native way in the form of CPL Recipes.
- Invoke the recipe library methods/functions from
within Standard Commands. - Execute via jacket scripts/applications
encapsulating recipe. - Define the necesary/desirable level of
parallelism. - Define execution plans for the various cascades.
- Implement Recipe Planner, if necessary, to do the
internal coordination of the command scheduling
( producing data for the Standard Commands). - Implement Standard Command Library with special
commands, which should execute internally within
the REI environment (if required). - Implement external control scripts to submit
Master Commands, defining dependencies and
providing data for the command execution if
necessary. - Decide architecture of processing cluster (number
of Master Pworkers, Pworkers, CPUs, nodes, amount
of memory per CPU, ). - Start up Pworkers, defining their proper role
referring to the Command Plug-in Libraries
provided (if any) and/or possible CPL Recipe
Plug-in Libraries.