Running Inspiral Workflows on the OSG - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Running Inspiral Workflows on the OSG

Description:

Dax. Abstract Work-flow. Resource independent. Logical File Names. No Sites Selection ... The Dax is generated with LIGO sofware Glue, Lal and Lalapps ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 21
Provided by: brittad
Category:

less

Transcript and Presenter's Notes

Title: Running Inspiral Workflows on the OSG


1
Running Inspiral Work-flows on the OSG
  • Britta Daudert
  • LIGO-Pegasus Face to Face
  • February 11/12 2008

G070489-00-R
2
Work-flow Planning
Abstract Work-flow Resource independent
Logical File Names No Sites Selection
Dax
Pegasus
Work-flow Planner finds Physical File Names
writes submit files
Concrete Work-flow Resource
dependent Ready for submission to OSG site(s)
Dag
3
Running Workflows on the OSG
Dax
DX
x
Dag
G070489-00-R
4
What Pegasus needs
  • 1)The dax file, say inspiral-0.dax (abstract
    Work-flow)
  • 2)The site catalogue sites.xml (site layout)
  • 3)The transformation catalogue tc.data(location
    of executables)
  • 4) The property file properties.bundle.STAR_BNL(Pe
    gasus config file)

5
How to get what Pegasus needs
  • The Dax is generated with LIGO sofware Glue, Lal
    and Lalapps
  • and 3) tc.data and sites.xml are generated from
    the command line
  • pegasus-get-sites source vors grid osg
  • Pegasus-get-sites queries
  • VORSVO Resource Selector
  • http//vors.grid.iu.edu/cgi-bin/index.cgi
  • 4) The property file is written manually (by me)

6
Pegasus-plan and pegasus-run
  • From the comand line we generate the dag

pegasus-plan -Dpegasus.user.properties./propertie
s.bundle.STAR_BNL--dir ./test_STAR_BNL --sites
STAR_BNL --output local --dax inspiral-0.dax
Storage of submit files
7
Pegasus-plan and pegasus-run(2)
  • This generates the promt

I have concretized your abstract workflow. The
workflow has been enteredinto the workflow
database with a state of "planned". The next step
isto start or execute your workflow. The
invocation required ispegasus-run-Dpegasus.use
r.properties/usr2/bdaudert/test_STAR_BNL/bdaudert
/pegasus/inspiral/run0001 /pegasus.32712.propertie
s--nodatabase /usr2/bdaudert/test_STAR_BNL/bdaude
rt/pegasus/inspiral
8
Pegasus-plan and pegasus-run(3)
  • And all that is left to do to submit the work
    flow to STAR_BNL is to copy and paste to the
    command line

pegasus-run-Dpegasus.user.properties/usr2/bdaude
rt/test_STAR_BNL/bdaudert/pegasus/inspiral/run0001
/pegasus.32712.properties--nodatabase
/usr2/bdaudert/test_STAR_BNL/bdaudert/pegasus/insp
iral
9
The Inspiral Pipeline On UWMilwaukee vs Local
Cluster
G070489-00-R
10
Monitoring with Gratia
11
Challenges and Solutions
Large Data Sets
Long Data Transfer Times Disk Space Problems
G070489-00-R
12
Challenges and Solutions
Compressing data (LIGO)
Large Data Sets
Restructure Work-flow (LIGO)
SRM (OSG)
G070489-00-R
13
Challenges and Solutions
Work-flow design
Cleanup issues Disk space problems
G070489-00-R
14
Challenges and Solutions
Dynamic Cleanup (Pegasus 2.0)
Work-flow design
Depth First (Condor)
G070489-00-R
15
When things go wrong
Britta
Error analysis
Condor_G/Globus
Pegasus
LIGO software
16
When things go wrong (2)
  • Britta Errors
  • Check for typos
  • Check tc.data, sites.xml for correct locations of
    executable

17
When things go wrong (3)
  • Pegasus Errors
  • java.lang.RuntimeException There are no entries
    for the sites
  • Site was down at time of VORS query or site is
    not suporting LIGO
  • java.lang.RuntimeException Site Selector could
    not map the job ligolalapps_tmpltbank
  • Executable lalapps_tmpltbank was compiled on 32
    bit machine but listed as 46 bit

18
When things go wrong (4)
  • LIGO Software issues
  • Error messages in .out of failed job could look
    like
  • XLAL Error XLALFrGenerateCache Gap in Frame
    data found
  • Bug in glue software (3 month to identify and
    fix)
  • Or
  • XLAL Error XLALFrGenerateCache No Frame Files
    found in ./gwf
  • Data Corruption during file transfer (re-submit)
  • WNs cant see files in DATA
  • Condor Issue???

19
When things go wrong (5)
  • Condor_G/Globus errors
  • Recently, at Star_BNL
  • XLAL Error XLALFrGenerateCache No Frame Files
    found in ./gwf
  • Condor Upgrade on Gatekeeper needed!!!
  • Open GOC ticket, Trouble Shooting with Sys admin,
    Config changes, testing
  • Took 2 months to identify and resolve

20
G070489-00-R
Write a Comment
User Comments (0)
About PowerShow.com