SCEC Workflows on the Grid - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

SCEC Workflows on the Grid

Description:

Ewa Deelman, Sridhar Gullapalli, Carl Kesselman, Gurmeet Singh, Mei-Hui Su, ... information about transformations (executables) on remote resources either ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 17
Provided by: ewa49
Category:

less

Transcript and Presenter's Notes

Title: SCEC Workflows on the Grid


1
SCEC Workflows on the Grid
  • Gaurang Mehta
  • Center for Grid Technologies
  • USC Information Sciences Institute

2
Acknowledgements
  • Ewa Deelman, Sridhar Gullapalli, Carl Kesselman,
    Gurmeet Singh, Mei-Hui Su, Karan Vahi, (Center
    for Grid Technologies, ISI)
  • James Blythe, Yolanda Gil (Intelligent Systems
    Division, ISI)
  • Phil Maechling, Vipin Gupta (SCEC)
  • http//pegasus.isi.edu
  • Research funded as part of the NSF GriPhyN, NVO
    and SCEC projects and EU-funded GridLab

3
Scientific Applications and need for workflows
  • Increasing in the level of complexity
  • Use of individual application components
  • Reuse of individual intermediate data products
    (files)
  • Description of Data Products using Metadata
    Attributes
  • Execution environment is complex and very dynamic
  • Resources come and go
  • Data is replicated
  • Components can be found at various locations or
    staged in on demand
  • Separation between
  • the application description
  • the actual execution description

4
Workflow Definitions
  • Workflow template shows the main steps in the
    scientific analysis and their dependencies
    without specifying particular data products
  • Abstract workflow/Workflow Instance depicts the
    scientific analysis including the data used and
    generated, but does not include information about
    the resources needed for execution
  • Concrete workflow/Executable workflow a workflow
    that includes details of the execution environment

5
INTEGRATED WORKFLOW ARCHITECTURE
J. Zechar _at_ USC (Teamwork Geo CS)
Workflow Template Editor (CAT)
Query for components
D. Okaya _at_ USC
Tools
Domain Ontology
Workflow Template (WT)
Workflow Library
Component Library
Query for WT
Data Selection
Query for data given metadata
L. Hearn _at_ UBC
COMPONENTS
I/O data descriptions
Conceptual Data Query Engine (DataFinder)
Metadata Catalog
Workflow Instance (WI)
Execution requirements
Engineer
Workflow Mapping (Pegasus)
Grid info svcs
Tools
Grid
K. Olsen _at_ SDSU
Executable Workflow
6
Concrete Workflow Generation and Mapping
Application
-
dependent
jobs
Application
independent
7
PegasusPlanning for Execution in Grids
  • Maps from abstract to concrete workflow
  • Algorithmic and AI-based techniques
  • Automatically locates physical locations for both
    workflow components and data
  • Finds appropriate resources to execute
  • Reuses existing data products where applicable
  • Publishes newly derived data products
  • Provides provenance information

8
Generating a Concrete Workflow
  • Information
  • location of files and component Instances
  • State of the Grid resources
  • Select specific
  • Resources
  • Files
  • Add jobs required to form a concrete workflow
    that can be executed in the Grid environment
  • Data movement
  • Data registration
  • Each component in the abstract workflow is turned
    into an executable job

9
Information Components used by Pegasus
  • Globus Monitoring and Discovery Service (MDS)
  • Locates available resources
  • Finds resource properties
  • Dynamic load, queue length
  • Static location of GridFTP server, RLS, etc
  • Globus Replica Location Service
  • Stores mappings of logical files to their
    physical instances.
  • Locates data that may be replicated
  • Registers new data products
  • Transformation Catalog
  • Stores information about transformations
    (executables) on remote resources either in
    installed or stageable form.

10
Data Management Components
  • GridFTP A grid extension to the regular ftp
    protocol that allows third party transfers,
    parallel streams and striping.
  • SRB/GridFTP DSI A data service interface is
    available to provide a gridftp protocol access to
    store and retrieve files from SRB. (Storage
    Resource Broker)
  • Reliable File Transfer Builds on the GridFTP
    server. Allows to do reliable transfers by
    allowing restarts, retries and other
    capabilities.

11
Benefits of the workflow Pegasus approach
  • The workflow exposes
  • the structure of the application
  • maximum parallelism of the application
  • Pegasus can take advantage of the structure to
  • Set a planning horizon (how far into the workflow
    to plan)
  • Cluster a set of workflow nodes to be executed as
    one (for performance)
  • Can cluster a set of workflow nodes to be
    executed on the same site. (for reducing data
    transfers)
  • Pegasus shields from the Grid details

12
Benefits of the workflow Pegasus approach
  • Pegasus can run the workflow on a variety of
    resources
  • Pegasus can run a single workflow across multiple
    resources
  • Pegasus can opportunistically take advantage of
    available resources (through dynamic workflow
    mapping)
  • Pegasus can take advantage of pre-existing
    intermediate data products
  • Pegasus can improve the performance of the
    application.

13
Nagios Monitoring
14
CyberSHAke Workflow
15
CyberSHAke Workflow
  • Tests done with ruptures from a 50km region
    around USC
  • Approx 2350 rupture with about 415,000 points
  • Test done on multiple sites including HPC
    cluster at USC and TeraGrid at SDSC.
  • System uses Pegasus to generate and plan the
    workflows to run on the grid
  • The peak acceleration values are use to construct
    a final hazard curve

16
Future Goal
  • Goal to run a cyber-shake analysis on ruptures
    200 km around USC
  • Generate a hazard curve using seismogram and peak
    acceleration values from 45,000 ruptures or
    300,000 ruptures with moments.
  • QUESTIONS?
Write a Comment
User Comments (0)
About PowerShow.com