Title: Farm Management
1https//bbrweb.pd.infn.it5212/farm/
D. Andreotti1) , A. Crescente2), A. Dorigo2), F.
Galeazzi2), M. Marzolla3), M. Morandin2), F.
Safai Tehrani4), R. Stroili2), G. Tiozzo2), G.
Vedovato2) 1) I.N.F.N. of Ferrara, Italy, 2)
Univ. and I.N.F.N. of Padova, Italy , 3) Univ.
Ca Foscari, Venezia and I.N.F.N. of Padova,
Italy , 4) I.N.F.N. of Roma, Italy and the
BaBar Computing Group
A new dedicated facility for (re)processing of
BaBar raw data, supported by INFN, has been
installed in Padova (Italy) in 2002 as part of
the distributed TierA system at disposal of the
experiment. The facility consists of four
independent farms, each capable of processing 2
million events (corresponding to 160 pb-1 of raw
data) per day. Reconstructed data are stored in
an Objectivity federation, checked and finally
transferred to SLAC. The facility exploits
commodity CPU and disk storage while preserving
good reliability, high performance and well
organized system management. The center, which
now counts on approx. 200 dual CPU PIII and 30 TB
of disk space, has been in operation since
October 2002 and experience so far has been very
satisfactory.
- First BaBar Data Processing farm fully based on
- Linux
- cheap hardware
Farm Performance
System is continuously stressed!
- Existing hardware
- All machines 2 x 1.26 GHz CPU, 1 GB ram
- 140 clients, 40 GB local IDE disk (software RAID)
- 20 servers, same configuration as clients,
Gigabit ethernet - 30 storage servers, 1.28 TB IDE disk with 3ware
RAID controller, Gigabit ethernet - 5 PR servers, up to 0.35 TB SCSI disk 10k RPM,
with SCSI controller ServeRaid, Gigabit ethernet - one tape library for 700 LTO tapes (70 TB
uncompressed) - New acquisitions
- new tape library for 700 LTO2 tapes (140 TB
uncompressed) - 103 clients, 2 x Xeon 2.4 GHz, 2 GB ram
- 14 storage servers, 2 x Xeon 2.4 GHz, 2 GB ram.
1.4 TB IDE disk - 10 PR servers, 2 x Xeon 2.4 GHz, 2 GB ram
Extensive work done to optimize resources and to
reduce bottlenecks (e.g., minimizing usage of NFS)
time_of_day
time_of_day
Farm Monitoring
- Machines are organized into
- 4 identical farms, 60 CPUs each
- 160 pb-1/day/farm
- 2,000,000 events/day/farm (output)
- 160 GB/day/farm input (raw) data
- 330 GB/week/farm output (Objy) data
- Based on
- SNMP, to be compatible with widest variety of
hardwareusing asynchronous non-blocking SNMPv2
bulk Get requests - RRDtool library, for graphs.
- PerfMC (presented _at_ CHEP03), a high performance
monitoring program developed for this farm - scalable
- efficient
- requires low resources
- easily configurable using XML
- operates in background (no GUI)
Farm Management
- Using IBM's xCAT (eXtreme Cluster Administration
Toolkit) allowing - remote power control ()
- remote BIOS console ()
- remote OS console
- remote software reset
- parallel remote shell
- network installation
- .
- () on IBM machines only
- Monitored quantities
- CPU
- Disk I/O
- Network I/O
- Temperatures
- Total disk needed for whole farm 5 GB.
Screenshot of parallel installation of gt100
clients
MySQL widely used for farmmonitoring,management
and production 12 databases, 3.5 GB total
First Boot Machines must support PXE
SysAlarm Home-made Perl tool to parse system
logfiles and save errors in MySQL database.
Software installation Kickstart installation
method preferred, because easier to configure
according to machine type. Cloning (hard disk
copy) or imaging (partition copy) methods also
possible. Can use 2nd level repositories.
- Problems
- vendor driver availability and support for
different Linux releases - had to recompile for large file support
- nfs not optimal under (heavy load on) Linux
Network configuration All machines on a private
network. A few front-end machines have two
interfaces. Public machines resolve private names
using a NIS server.
Log server used tocentralize system logs on
one machine