Title: Workflow Task Clustering
1Workflow Task Clustering for Best Effort Systems
with Pegasus
pegasus.isi.edu
Gurmeet Singh, Mei-Hui Su, Karan Vahi Ewa
Deelman, Gaurang Mehta Information Sciences
Institute University of Southern
California Marina del Rey, CA 90292
Bruce Berriman, John Good Infrared Processing and
Analysis Center California Institute of
Technology Pasadena, CA 91125
Daniel S. Katz Center for Computation and
Technology Louisiana State University Baton
Rouge, LA 70803
A view of the Rho Oph dark cloud constructed with
Montage from deep exposures made with the Two
Micron All Sky Survey (2MASS) Extended Mission
Automatic Node clustering
The structure of a small Montage workflow
Two clusters per level Two tasks per cluster
1 degree2 Montage On TeraGrid
Level-based, clustering factor 5
No clustering
SCEC CyberShake workflows run using Pegasus and
DAGMan on the TeraGrid and USC resources Cumulati
vely, the workflows consisted of over half a
million tasks and used over 2.5 CPU Years. The
largest CyberShake workflow contained on the
order of 100,000 nodes and accessed 10TB of data
Support for LIGO on Open Science Grid LIGO
Workflows 185,000 nodes, 466,000 edges 10 TB of
input data, 1 TB of output data.