Status of UTA MC production farm and Its Software

About This Presentation

Title:

Status of UTA MC production farm and Its Software

Description:

The new, Globus-enabled, bookkeeper. HEP farm. CSE farm ... In this world the bookkeeper becomes a 'supervisor of farm servers', also known as ... – PowerPoint PPT presentation

Number of Views:48

Avg rating:3.0/5.0

Slides: 17

Provided by: TomW196

Category:

more less

Transcript and Presenter's Notes

Title: Status of UTA MC production farm and Its Software

1
Status of UTA MC production farm and Its Software

David Adams
Karthik Gopalratnam
Drew Meyer
Tomasz Wlodek
Jae Yu

2
UTA D0 Monte Carlo Farm

UTA operates 2 Linux MC farms HEP and CSE
HEP farm 6-566 , 36-866 MHz processors, 3
file servers,
(250 GB) one job server,
8mm tape drive.
CSE farm 10 866 MHz processors, 1 file
server (20 GB),
1 job server
There is a possibility of adding third
farm(ACS, 36 866 MHz)
A possibility of a fourth one emerged few
days ago
Control software (job submission, load
balancing, archiving, bookkeeping, job execution
control etc) developed entirely in UTA by former
UTA student, Drew Meyer,
Scalablestarted with 7, then 25, now 52
processors, http//wwwhep.uta.edu/mcfarm/mcfarm/m
ain.html

3
HEP Monte Carlo farm at UTA
4
MCFARM UTA farm control system

MCFARM is a specialized batch system for
Pythia, Isajet, D0g, D0sim, D0reco,
recoanalyze
Can be adapted for ATLAS, CDF
It is intelligent it knows how to handle and
in
most cases recover typical error
conditions.
Hard to break even if several nodes crash the
production can continue for a few hours
Interfaced to SAM and bookkeeping package,
(more about bookkeeping later)
http//www-hep.uta.edu/mcfarm/mcfarm/main.htm
l

5
A couple of experimental groups in D0 have
expressed interest in our software and plan to
install it on their farms

LSU
Boston
Dubna
???

6
Main server (Job Manager) Can read and write to
all other nodes Contains executables and job
archive
Execution node (The worker) Mounts its home
directory on main server Can read and write to
file server disk
File server Mounts /home on main server Its disk
stores min bias and generator files and is
readable and writable by everybody
Both CSE and HEP farm share the same layout, they
differ only by the number of nodes involved and
by the software which exports completed jobs to
final destination The layout is flexible enough
to allow for farm expansion when new nodes are
available
7
D0 Monte Carlo production chain
Generator job (Pythia, Isajet, )
D0gstar (D0 GEANT)
D0gstar (D0 GEANT)
Background events (prepared in advance)
D0sim (Detector response)
D0reco (reconstruction)
SAM storage in FNAL
RecoA (root tuple)
SAM storage in FNAL
8
MC farm software daemons and their control
WWW
Root daemon
Lock manager
Bookkeeper
Monitor daemon
Distribute daemon
Execute daemon
Gather daemon
Job archive
Cache disk
Tape
SAM
Remote machine
9
UTA cluster of Linux farms
SAM mass storage in FNAL
32 866MHz
bb_ftp (planned)
ACS farm at UTA (planned)
UTA www server
bb_ftp
UTA analysis server (300Gb)
8 mm tape
12 866MHz

25 dual 866MHz
CSE farm at UTA (1 supervisor, 1 file server, 10
workers)
HEP farm at UTA (1 supervisor, 3 file servers, 21
workers, tape drive)
10
Production bookeeping

During a running period the farms produce few
thousand jobs
Some jobs crash, need to be restarted
Users must be kept up to date about their MC
requests status (waiting? Running? Done?)
A dedicated bookkeeping software is needed

11
Original bookeeping
WWW server
Each farm server runs a bookkeeper which keeps
track of production progress
Every few hours the bookkeeper compiles a html
table with production status and pulls it to
WWW server
HEP farm
This works fine as long as you have small number
of farms! We need GRID-enabled bookeeper
12
Before we start writing Globus-based bookeeper,
we had to learn a little bit about Globus!

Installed Globus 2.0-beta on UTA MC farm
Checked that we can communicate between D0 and
Atlas farms and that authentication works
Execute simple Hello World programs between
farms
Executed shell scripts from farm to farm
Executed simple python scripts
Executed python scripts with module dependencies

Everything works, we are experts! Now we can
write the Globus-enabled bookeeper.
13
The new, Globus-enabled, bookkeeper
A machine from Atlas farm is running bookkeeper
www server
Globus-job-run Grid-ftp
Globus domain
HEP farm
CSE farm
14
The new bookeeper

One dedicated bookeeper machine can serve any
number of MC production farms running mcfarm
software
The communication with remote centers is done
using Globus-tools only
No need to install bookeeper on every farm
makes life simpler if many farms participate!

15
What next?

Right now MC runs are submitted from every farm
server
I would like to start runs on farms from the
bookkeeping machine, via Globus
In this world the bookkeeper becomes a
supervisor of farm servers, also known as