Title: The D0 NIKHEF Farm
1The D0 NIKHEF Farm
Kors Bos
Fermilab, May 23 2001
2Layout of this talk
D0 Monte Carlo needs The NIKHEF D0 farm The data
we produce The SAM data base A Grid
intermezzo The network The next steps
Fermilab, May 23 2001
3D0 Monte Carlo needs
- D0 Trigger rate is 100 Hz, 107 seconds/yr ? 109
events/yr - We want 10 of that be simulated ? 108 events/yr
- To simulate 1 QCD event takes 3 minutes (size 2
Mbyte) - On a 800 MHz PIII
- So 1 cpu can produce 105 events/yr (200 Gbyte)
- Assuming a 60 overall efficiency
- So our 100 cpu farm can produce 107 events/yr
(20 Tbyte) - And this is only 10 of the goal we set ourselves
- Not counting Nijmegen D0 farm yet
- So we need another 900 cpus
- UTA (50), Lyon (200), Prague(10), BU(64),
- Nijmegen(50), Lancaster(200), Rio(25),
4How it looks
5The NIKHEF D0 Farm
Â
650 Farm nodes(100 cpus)Dell Precision
Workstation 220
- Dual Pentium III processor 800 MHz / 256 kB
cache each - 512 MB PC800 ECC RDRAM
- 40 GB (7200 rpm) ATA-66 disk drive
- no screen
- no keyboard
- no mouse
- wake up on Lan functionality
7The File ServerElonex EIDE Server
The Farm ServerDell Precision 620 workstation
- Dual Pentium III 700 MHz
- 512 MB SDRAM
- 20 GByte EIDE disk
- Dual Pentium III Xeon 1 GHz
- 512 MB RDRAM
- 72.8 GByte SCSI disk
- 1.2 Tbyte 75 GB EIDE disks
- Will also serve as D0 software server for the
NIKHEF/D0 people
- 2 x Gigabit Netgear GA620 network card
8Software on the farm
- Boot via the network
- Standard Redhat Linux 6.2
- Ups/upd on the server
- D0 software on the server
- FBSNG on the server, deamon on the nodes
- SAM on the file server
- Used to test new machines
9What we run on the farm
- Particle Generator Pythia or Isajet
- Geant Detector simulation d0gstar
- Digitization, adding min.bias psim
- Check the data mc_analyze
- Reconstruction preco
- Analysis reco_analyze
10Example Min.bias
- Did a run with 1000 events on all cpus
- Took 2 min./event
- So 1.5 days for the whole run
- Ouput file size 575 MByte
- We left those files on the nodes
- reason for enough local disk space
- Intend to repeat that sometimes
11Output data
- -rw-r--r-- 1 a03 computer
298 Nov 5 1925 RunJob_farm_qcdJob308161443.param
s - -rw-r--r-- 1 a03 computer 1583995325 Nov
5 1035 d0g_mcp03_pmc03.00.01_nikhef.d0farm_isajet
_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000 - -rw-r--r-- 1 a03 computer
791 Nov 5 1925 d0gstar_qcdJob308161443.params - -rw-r--r-- 1 a03 computer
809 Nov 5 1925 d0sim_qcdJob308161443.params - -rw-r--r-- 1 a03 computer 47505408
Nov 3 1615 gen_mcp03_pmc03.00.01_nikhef.d0farm_i
sajet_qcd-incl-PtGt2.0_mb-none_p1.1_308161443_2000
- -rw-r--r-- 1 a03 computer
1003 Nov 5 1925 import_d0g_qcdJob308161443.py - -rw-r--r-- 1 a03 computer
912 Nov 5 1925 import_gen_qcdJob308161443.py - -rw-r--r-- 1 a03 computer
1054 Nov 5 1926 import_sim_qcdJob308161443.py - -rw-r--r-- 1 a03 computer
752 Nov 5 1925 isajet_qcdJob308161443.params - -rw-r--r-- 1 a03 computer
636 Nov 5 1925 samglobal_qcdJob308161443.params - -rw-r--r-- 1 a03 computer 777098777 Nov
5 1924 sim_mcp03_psim01.02.00_nikhef.d0farm_isaj
et_qcd-incl-PtGt2.0_mb-poisson-2.5_p1.1_308161443_
2000 - -rw-r--r-- 1 a03 computer
2132 Nov 5 1926 summary.conf
12Output data translated
- 0.047 Gbyte gen_
- 1.5 Gbyte d0g_
- 0.7 Gbyte sim_
- import_gen_.py
- import_d0g_.py
- import_sim_.py
isajet_.params RunJob_Farm_.params d0gstar_.par
ams d0sim_.params samglobal_.params Summary.conf
12 files for generatord0gstarpsim But of course
only 3 big ones Total 2 Gbyte
13Data management
parameters
Import_gen.py
geant data (hits)
Import_d0g.py
sim data (digis)
Import_sim.py
Import_reco.py
14Automation
- Mc_runjob (modified)
- Prepares MC jobs (gensimrecoanal)
- (f.e.) 300 events per job/cpu
- Repeat (f.e.) 500 times
- Submits them into the batch (FBS)
- Ran on the nodes
- Copy to fileserver after completion
- A separate batch job onto the fileserver
- Submits them into SAM
- Sam does file transfers to Fermi and SARA
- Runs for a week
151.2 TB
fbs(rcp) fbs(sam)
mcc request
farm server
SAM DB
file server
fbs job 1 mcc 2 rcp 3 sam
fbs(mcc)
datastore
mcc input
FNAL SARA
mcc output
node
50
control
40 GB
data
metadata
16This is a grid!
17The Grid
- Not just D0, but for the LHC expts.
- Not just SAM, but for any database
- Not just farms, but any cpu resource
- Not just SARA, but any mass storage
- Not just FBS, but any batch system
- Not just HEP, but any science, EO,
18European Datagrid Project
- 3 yr. Project for 10 M
- Manpower to develop grid tools
- Cern, in2p3, infn, pparc, esa, fom
- Nikhef sara knmi
- Farm management
- Mass storage management
- Network management
- Testbed
- HEP EO applications
19LHC - Regional Centres
KEK
CERN Tier 0
INFN
BNL
IN2P3
NIKHEF/ SARA
RAL
FNAL
Tier 1
Utrecht
Vrije Univ.
Tier2
Nijmegen
Amsterdam
Brussel
SURFnet
Leuven
Department
Atlas
LHCb
Alice
possibly
20DataGrid Test bed sites
Nikhef
21The NL-Datagrid Project
22NL-Datagrid Goals
- National test bed for middleware development
- WP4, WP5, WP6, WP7, WP8, WP9
- To become an LHC Tier-1 center
- ATLAS, LHCb, Alice
- To use it for the existing program
- D0, Antares
- To use it for other sciences
- EO, Astronomy, Biology
- for tests with other (Trans Atlantic) grids
- D0
- PPDG, GriPhyN
23NL-Datagrid Testbed Sites
Univ.Amsterdam (Atlas)
Vrije Univ. (LHCb)
CERN RAL FNAL ESA
Nijmegen Univ. (Atlas)
Univ.Utrecht (Alice)
24Dutch Grid topology
Alice
Utrecht Univ.
Nijmegen Univ.
LHCb
D0 Atlas
D0 Atlas LHCb Alice
25End of the Grid intermezzo
Back to The NIKHEF D0 farm and Fermilab The
network
26Network bandwidth
- NIKHEF ?SURFnet 1 Gbit
- SURFnet Amsterdam ? Chicago 622 Mbit
- Esnet Chicago ? Fermilab 155 Mbit ATM
- But ftp gives us 4 Mbit/sec
- bbftp gives us 25 Mbit/sec
- bbftp processes in parallel 45 Mbit/sec
- For 2002
- NIKHEF ?SURFnet 2.5 Gbit
- SURFnet Amsterdam ? Chicago 622 Mbit
- SURFnet Amsterdam ? Chicago 2.5 Bbit optical
- Chicago ? Fermilab ? but more ..
27ftp
- ftp gives you 4 Mb/s to Fermilab
- bbftp increased buffer, streams
- gsiftp with security layer, increased buffer, ..
- grid_ftp increased buffer, streams, sockets,
fail-over protection, security - bbftp ? 20 Mb/s
- grid_ftp ? 25 Mb/s
- Multiple ftp in // ? factor 2 seen
- Should get to gt 100 Mbit/sec ?
- Or 1 Gbyte/minute
28SURFnet5 access capacity
29TA access capacity
NewYork
Abilene
STAR-LIGHT
ESNET
Geneva
2.5 Gb
MREN
622 Mb
STAR-TAP
30Network load last week
- Needed for 100 MC CPUs 10 Mbit/s (200 GB/day)
- Available to Chicago 622 Mbit/s
- Available to FNAL 155 Mbit/s
- Needed next year (double cap.) 25 Mbit/s
- Available to Chicago 2.5 Gbit/s factor 100
more !! - Available to FNAL ??
31New nodes for D0
- In a 2u 19 mounting
- Dual 1 GHz PIII
- 1 Gbyte RAM
- 40 Gbyte disk
- 100 Mbit ethernet
- Cost k2
- Dell machines were k4 (tax incl) ?
- FACTOR 2 cheaper!!
- assembly time 1/hour
- 1 switch k2.5 (24 ports)
- 1 rack k2 (46u high)
- Requested for 2001 k60
- 22 dual cpus
- 1 switch
- 1 19 rack
32(No Transcript)
33The End
Kors Bos
Fermilab, May 23 2001