Chris Brew - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Chris Brew

Description:

Calls spcheck needed to get lists of jobs to submit and of failed jobs to rebuild. Calls spsub to submit the jobs which calls modified batchUtils.pl which: ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 12
Provided by: chris326
Category:
Tags: brew | chris | submit

less

Transcript and Presenter's Notes

Title: Chris Brew


1
UK-SPGrid Where we are and where were heading
  • Chris Brew
  • RAL

2
Aproach
  • BaBar UK already had distributed infrastructure
    with 7 farms, each with the BaBar software
    installed and local Objy CondDB etc.
  • Decision was to initially add grid to SP
    production these farms rather than gridify the SP
    software

3
Approach (2)
  • Used Globus to turn the Remote farms into a
    single pseudo batch system
  • SP Jobs are built, merged and exported as at any
    other SP site
  • Submission is to globus rather than a local batch
    system
  • Once we have a working prototype look to moving
    that to LCG

4
Components
  • The software is made up of three parts
  • Submitter
  • Unpacker
  • Monitoring

5
SPTools Interface/Submitter
  • Does most of the interfacing to the SPTools
  • Calls spcheck needed to get lists of jobs to
    submit and of failed jobs to rebuild
  • Calls spsub to submit the jobs which calls
    modified batchUtils.pl which
  • Tars up the rundir and puts it on a GridFTP
    server
  • Use globus to run a job wrapper script at the
    remote site
  • wrapper script get rundir, sets up env, runs
    Moose job then GridFTPs rundir back

6
  • Submitter runs cyclically in the background
    sleeping for a set time between runs
  • It reads a list of sites for a config file
  • Builds lists of Jobs that are ready to run,
    pending running or failed
  • Rebuilds the failed jobs
  • It then submits jobs to sites where the number of
    pending and/or running jobs are below site
    thresholds
  • Have prototype submitter/batchutils.pl that
    submits to LCG sites

7
Unpacker
  • Job GridFTPs its own output back to the
    submitter
  • spunpacker daemon checks return dir
  • checks the returned tar file
  • unpacks it to the allruns folder
  • spchecks it
  • removes the tar files
  • copies in the log files
  • Starting process of parallelising unpacking and
    checking (which is slow to improve throughput)

8
Monitoring Process
  • Script checks
  • submitter and unpacker processes are running
  • numbers of jobs in various states on disk
  • number of jobs at each site and their status
  • number of jobs submitted and returned in last
    day/4hr/1hr
  • Number of jobs that failed in last day
  • Number of jobs waiting for unpacking
  • written to
  • http//hepunx.rl.ac.uk/BaBar/uk-spgrid/reports/uk
    -spgrid-report.txt

9
Status
  • Has run at Manchester, Bristol, RAL and RHUL
  • Maximum number of concurrent
  • Sites 3
  • Jobs 100
  • Total production gt30M Events
  • Max rate gt2.5M/week
  • System and/or Firewall problems/changes means
    its only running at RHUL now (Liverpool in test)

10
Future
  • Started rolling reinstalls of the UK farms with
    SL (Ferguss Talk)
  • All BaBar UK Farm site are also GridPP Tier 2
    sites
  • New LCG installation methods means we can add LCG
    with minimal changes on our farms
  • Overlap of work done between UK-SPGrid and
    Grid.It SP is quite small

11
Future (2)
  • Rewrite submitter to
  • Submit grid.it style jobs and full install
    jobs (or merge these so Job choose appropriate
    type when it runs)
  • Probably do it in python to make use of native
    edg-job- commands
  • Abstract the submit function so we can use other
    Grids
  • Rewrite unpacker to
  • Unpack Grid.It jobs
  • Do it in parallel
  • Rewrite monitoring to monitor LCG Jobs
Write a Comment
User Comments (0)
About PowerShow.com