Status of UTA MC production farm and Its Software - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Status of UTA MC production farm and Its Software

Description:

The new, Globus-enabled, bookkeeper. HEP farm. CSE farm ... In this world the bookkeeper becomes a 'supervisor of farm servers', also known as ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 17
Provided by: TomW196
Category:

less

Transcript and Presenter's Notes

Title: Status of UTA MC production farm and Its Software


1
Status of UTA MC production farm and Its Software
  • David Adams
  • Karthik Gopalratnam
  • Drew Meyer
  • Tomasz Wlodek
  • Jae Yu

2
UTA D0 Monte Carlo Farm
  • UTA operates 2 Linux MC farms HEP and CSE
  • HEP farm 6-566 , 36-866 MHz processors, 3
    file servers,
  • (250 GB) one job server,
    8mm tape drive.
  • CSE farm 10 866 MHz processors, 1 file
    server (20 GB),
  • 1 job server
  • There is a possibility of adding third
    farm(ACS, 36 866 MHz)
  • A possibility of a fourth one emerged few
    days ago
  • Control software (job submission, load
    balancing, archiving, bookkeeping, job execution
    control etc) developed entirely in UTA by former
    UTA student, Drew Meyer,
  • Scalablestarted with 7, then 25, now 52
    processors, http//wwwhep.uta.edu/mcfarm/mcfarm/m
    ain.html

3
HEP Monte Carlo farm at UTA
4
MCFARM UTA farm control system
  • MCFARM is a specialized batch system for
  • Pythia, Isajet, D0g, D0sim, D0reco,
    recoanalyze
  • Can be adapted for ATLAS, CDF
  • It is intelligent it knows how to handle and
    in
  • most cases recover typical error
    conditions.
  • Hard to break even if several nodes crash the
    production can continue for a few hours
  • Interfaced to SAM and bookkeeping package,
  • (more about bookkeeping later)
  • http//www-hep.uta.edu/mcfarm/mcfarm/main.htm
    l

5
A couple of experimental groups in D0 have
expressed interest in our software and plan to
install it on their farms
  • LSU
  • Boston
  • Dubna
  • ???

6
Main server (Job Manager) Can read and write to
all other nodes Contains executables and job
archive
Execution node (The worker) Mounts its home
directory on main server Can read and write to
file server disk
File server Mounts /home on main server Its disk
stores min bias and generator files and is
readable and writable by everybody
Both CSE and HEP farm share the same layout, they
differ only by the number of nodes involved and
by the software which exports completed jobs to
final destination The layout is flexible enough
to allow for farm expansion when new nodes are
available
7
D0 Monte Carlo production chain
Generator job (Pythia, Isajet, )
D0gstar (D0 GEANT)
D0gstar (D0 GEANT)
Background events (prepared in advance)
D0sim (Detector response)
D0reco (reconstruction)
SAM storage in FNAL
RecoA (root tuple)
SAM storage in FNAL
8
MC farm software daemons and their control
WWW
Root daemon
Lock manager
Bookkeeper
Monitor daemon
Distribute daemon
Execute daemon
Gather daemon
Job archive
Cache disk
Tape
SAM
Remote machine
9
UTA cluster of Linux farms
SAM mass storage in FNAL
32 866MHz
bb_ftp (planned)
ACS farm at UTA (planned)
UTA www server
bb_ftp
UTA analysis server (300Gb)
8 mm tape
12 866MHz


25 dual 866MHz
CSE farm at UTA (1 supervisor, 1 file server, 10
workers)
HEP farm at UTA (1 supervisor, 3 file servers, 21
workers, tape drive)
10
Production bookeeping
  • During a running period the farms produce few
    thousand jobs
  • Some jobs crash, need to be restarted
  • Users must be kept up to date about their MC
    requests status (waiting? Running? Done?)
  • A dedicated bookkeeping software is needed

11
Original bookeeping
WWW server
Each farm server runs a bookkeeper which keeps
track of production progress
Every few hours the bookkeeper compiles a html
table with production status and pulls it to
WWW server
HEP farm
This works fine as long as you have small number
of farms! We need GRID-enabled bookeeper
12
Before we start writing Globus-based bookeeper,
we had to learn a little bit about Globus!
  • Installed Globus 2.0-beta on UTA MC farm
  • Checked that we can communicate between D0 and
    Atlas farms and that authentication works
  • Execute simple Hello World programs between
    farms
  • Executed shell scripts from farm to farm
  • Executed simple python scripts
  • Executed python scripts with module dependencies

Everything works, we are experts! Now we can
write the Globus-enabled bookeeper.
13
The new, Globus-enabled, bookkeeper
A machine from Atlas farm is running bookkeeper
www server
Globus-job-run Grid-ftp
Globus domain
HEP farm
CSE farm
14
The new bookeeper
  • One dedicated bookeeper machine can serve any
    number of MC production farms running mcfarm
    software
  • The communication with remote centers is done
    using Globus-tools only
  • No need to install bookeeper on every farm
    makes life simpler if many farms participate!

15
What next?
  • Right now MC runs are submitted from every farm
    server
  • I would like to start runs on farms from the
    bookkeeping machine, via Globus
  • In this world the bookkeeper becomes a
    supervisor of farm servers, also known as

King of the world
16
Conclusions
  • The UTA farm is very successful
  • UTA MCFARM software is solid and robust
  • First step towards Grid-enabling the farm
    clusters has been made
Write a Comment
User Comments (0)
About PowerShow.com