Ewa Deelman, deelmanisi'eduwww'isi'edudeelmanpegasus'isi'edu - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

Ewa Deelman, deelmanisi'eduwww'isi'edudeelmanpegasus'isi'edu

Description:

DAGMan: Miron Livny and the Condor team ... CyberShake simulations, Neuroscience, Artificial Intelligence, Genomics (GADU), others ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 20

Provided by: ewa83

Category:

more less

Transcript and Presenter's Notes

Title: Ewa Deelman, deelmanisi'eduwww'isi'edudeelmanpegasus'isi'edu

1
Workflow Optimization and Sharing

Ewa Deelman
USC Information Sciences Institute
presented by
Rizos Sakellariou, University of Manchester

2
Acknowledgments

Pegasus Gaurang Mehta, Mei-Hui Su, Karan Vahi
(developers), Nandita Mandal, Arun Ramakrishnan,
Tsai-Ming Tseng (students)
DAGMan Miron Livny and the Condor team
Other Collaborators Yolanda Gil, Jihie Kim,
Varun Ratnakar (Wings System), Henan Zhao, Rizos
Sakellariou
LIGO Kent Blackburn, Duncan Brown, Stephen
Fairhurst, David Meyers
Montage Bruce Berriman, John Good, Dan Katz, and
Joe Jacobs
SCEC Tom Jordan, Robert Graves, Phil Maechling,
David Okaya, Li Zhao

3
Scientific (Computational) Workflows

Enable the assembly of community codes into
large-scale analysis
Montage example Generating science-grade mosaics
of the sky (Bruce Berriman, Caltech)

4
Pegasus and Condor DAGMan

Automatically map high-level resource-independent
workflow descriptions onto distributed resources
such as the Open Science Grid and the TeraGrid
Improve performance of applications through
Data reuse to avoid duplicate computations and
provide reliability
Workflow restructuring to improve resource
allocation
Automated task and data transfer scheduling to
improve overall runtime
Provide reliability through dynamic workflow
remapping and execution
Pegasus and DAGMan applications include LIGOs
Binary Inspiral Analysis, NVOs Montage, SCECs
CyberShake simulations, Neuroscience, Artificial
Intelligence, Genomics (GADU), others
Workflows with thousands of tasks and TeraBytes
of data
Use Condor and Globus to provide the middleware
for distributed environments

5
Pegasus Workflow Mapping
4
1
Original workflow 15 compute nodes devoid of
resource assignment
8
5
9
10
12
13
15
6
Typical Pegasus and DAGMan Deployment
7
Scalability
SCEC workflows run each week using Pegasus and
DAGMan on the TeraGrid and USC resources.
Cumulatively, the workflows consisted of over
half a million tasks and used over 2.5 CPU Years.
Managing Large-Scale Workflow Execution from
Resource Provisioning to Provenance tracking The
CyberShake Example, Ewa Deelman, Scott Callaghan,
Edward Field, Hunter Francoeur, Robert Graves,
Nitin Gupta, Vipin Gupta, Thomas H. Jordan, Carl
Kesselman, Philip Maechling, John Mehringer,
Gaurang Mehta, David Okaya, Karan Vahi, Li Zhao,
e-Science 2006, Amsterdam, December 4-6, 2006,
best paper award
8
Montage application7,000 compute jobs in
instance10,000 nodes in the executable
workflowsame number of clusters as
processorsspeedup of 15 on 32 processors
Performance optimization through workflow
restructuring
Small 1,200 Montage Workflow
Pegasus a Framework for Mapping Complex
Scientific Workflows onto Distributed Systems,
Ewa Deelman, Gurmeet Singh, Mei-Hui Su, James
Blythe, Yolanda Gil, Carl Kesselman, Gaurang
Mehta, Karan Vahi, G. Bruce Berriman, John Good,
Anastasia Laity, Joseph C. Jacob, Daniel S. Katz,
Scientific Programming Journal, Volume 13, Number
3, 2005
9
Data Reuse

Sometimes it is cheaper to access the data than
to regenerate it
Keeping track of data as it is generated supports
workflow-level checkpointing

Mapping Complex Workflows Onto Grid Environments,
E. Deelman, J. Blythe, Y. Gil, C. Kesselman, G.
Mehta, K. Vahi, K. Backburn, A. Lazzarini, A.
Arbee, R. Cavanaugh, S. Koranda, Journal of Grid
Computing, Vol.1, No. 1, 2003., pp25-39.
10
Data Reuse

Share the full version of the workflow?
or
Share a shorter version with data files?

Workflow input data is staged dynamically, new
data products are generated during execution
For large workflows 10,000 input files

(Similar order of intermediate/output files)
If not enough space failures occur
Solution
Determine which data are no longer needed and
when
Add nodes to the workflow to cleanup data along
the way
Take into account disk space onto resources
Benefits simulations show up to 57 space
improvements for LIGO-like workflows

Scheduling Data-Intensive Workflows onto
Storage-Constrained Distributed Resources, A.
Ramakrishnan, G. Singh, H. Zhao, E. Deelman, R.
Sakellariou, K. Vahi, K. Blackburn, D. Meyers,
and M. Samidi, accepted to CCGrid 2007
12
44 Improvement in footprint for Montage workflow
running on OSG
13
Efficient data handling

Sharing workflow with nodes that cleanup data is
resource-independent.
Taking into account space constraints onto
resources is resource-dependent. Sharing the
workflow would also require about the specific
resource constraints.

Scheduling Data-Intensive Workflows onto
Storage-Constrained Distributed Resources, A.
Ramakrishnan, G. Singh, H. Zhao, E. Deelman, R.
Sakellariou, K. Vahi, K. Blackburn, D. Meyers,
and M. Samidi, accepted to CCGrid 2007
14
LIGO Inspiral Analysis Workflow Small Workflow
164 nodes Full Scale analysis 185,000 nodes and
466,000 edges 10 TB of input data and 1 TB of
output data
LIGO workflow running on OSG
Optimizing Workflow Data Footprint G. Singh, K.
Vahi, A. Ramakrishnan, G. Mehta, E. Deelman, H.
Zhao, R. Sakellariou, K. Blackburn, D. Brown, S.
Fairhurst, D. Meyers, G. B. Berriman , J. Good,
D. S. Katz, submitted.
15
LIGO Workflows
26 Improvement In disk space Usage 50 slower
runtime
16
LIGO Workflows
56 improvement in space usage 3 times slower
in runtime
Looking into new DAGMan capabilities for workflow
node prioritization Need automated techniques to
determine priorities
17
Aggressive Optimizationsfor Workflow Footprint

They will affect the performance of the
executable workflow

Optimizing Workflow Data Footprint G. Singh, K.
Vahi, A. Ramakrishnan, G. Mehta, E. Deelman, H.
Zhao, R. Sakellariou, K. Blackburn, D. Brown, S.
Fairhurst, D. Meyers, G. B. Berriman , J. Good,
D. S. Katz, submitted.
18

What information related to optimizations do we
need to keep track for efficient workflow sharing?

19
What do Pegasus DAGMan do for an application?

Provide a level of abstraction above gridftp,
condor-submit, globus-job-run, etc commands
Provide automated mapping and execution of
workflow applications onto distributed resources
Manage data files, can store and catalog
intermediate and final data products
Improve successful application execution
Improve application performance
Provide provenance tracking capabilities
Provides a Grid-aware workflow management tool

20
Relevant Links

Pegasus pegasus.isi.edu
Currently released as part of VDS and VDT
Standalone pegasus distribution v 2.0 coming out
in May 2007, will remain part of VDT
DAGMan www.cs.wisc.edu/condor/dagman
NSF Workshop on Challenges of Scientific
Workflows www.isi.edu/nsf-workflows06, E.
Deelman and Y. Gil (chairs)
Workflows for e-Science, Taylor, I.J. Deelman,
E. Gannon, D.B. Shields, M. (Eds.), Dec. 2006
Open Science Grid www.opensciencegrid.org
LIGO www.ligo.caltech.edu/
SCEC www.scec.org
Montage montage.ipac.caltech.edu/
Condor www.cs.wisc.edu/condor/
Globus www.globus.org
TeraGrid www.teragrid.org