Title: les robertson cernit0899 1
1Offline Computing Farms for LHC
- Summary of the requirements of the LHC
experiments - Current ideas about components
- Strawman LHC computing farm
- Space, power cooling requirements
- Questions to ST
2Units
- Data storage
- PetaByte (PB) 1015 Bytes
- TeraByte (TB) 1012 Bytes
- 1 PetaByte
- 20,000 Redwood tapes (gt3 StorageTek silos)
- 30,000 Cheetah 36 disks (largest hard disk used
today) - 100,000 dual-sided DVD-RAM disks
- 1,500,000 sets of the Encyclopaedia Britannica
- Processors - SPECint95 (SI95)
- 1 SI95 10 CERN-units 40 MIPS
- 400 MHz PentiumII 8 SI95 (CERN benchmark)
3Raw Data requirements recording via the network
to B.513
- CMS, ATLAS
- 100 MB/sec
- 1 PetaByte per year during the proton run
- LHCb
- 50 MB/sec
- 500 TeraBytes per year during the proton run
- ALICE
- 1 GigaByte/sec
- 1 PetaByte per year during the ions run
Current data recording rates NA48 - 25
MB/sec COMPASS (next year) - 35 MB/sec
30 of CMS 3 of ALICE
4Offline Capacity Estimates(i.e. capacity in
B.513)
1998 estimates
- Estimate uses figures from CMS in mid-98ATLAS
would be similar, ALICE, LHCb about half the size
5Evolution of Computing Capacity - SPECint95
1'000
900
800
700
600
500
K SPECint95 Units
400
COMPASS
300
LHC
200
Others
100
0
1997
1998
1999
2000
2001
2002
2003
2004
2005
year
5K SI951100 processors
6Long Term Tape Storage Estimates
14'000
12'000
10'000
8'000
LHC
TeraBytes
6'000
4'000
Current
COMPASS
2'000
Experiments
0
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
Year
7Basic Principles
- HEP computing has the property of event
independenceso we can process any number of
events in parallel - CERN distributed architecture - SHIFT 99
- simplest components (hyper-sensitive to cost,
aversion to complication) - throughput (before performance)
- resilience (mostly up all of the time)
- computing fabric for flexibility, scalability
Mass Computing rather than Supercomputing
8Components (i)
- off-the-shelf, mass market components whenever
possible - Processors
- low-end PCs (simple boxes intended for the home
or small office) - assembled into clusters and sub-farms
- according to practical considerations like
- throughput of first level LAN switch
- rack capacity
- power cooling, .
- each cluster comes with a suitable chunk of I/O
capacity - each sub-farm fits in a rack
9Processor cluster
sub-farm 36 boxes, 144 cpus, 5 m2
basic box four 100 SI95 processors standard
network connection (2 Gbps) 15 of systems
configured as I/O servers (disk server,
disk-tape mover, Objy AMS, ..) with
additional connection to the storage
network cluster 9 basic boxes with a network
switch (lt10 Gbps) sub-farm 4 clusters - with a
second-level network switch (lt50 Gbps) one
sub-farm fits in one rack
cluster and sub-farm sizing adjusted to fit
conveniently the capabilities of network switch,
racking, power distribution components
lmr for Monarc study- april 1999
10(No Transcript)
11Components (ii)
- Disks
- inexpensive disks - designed for the PC market
- packaged with a smart controller (probably a PC)
to provide data caching, data redundancy,
recovery
12disk sub-system
rack Integral number of arrays, with first level
network switches In the main model, half-height
3.5 disks are assumed, 16 per shelf of a 19
rack. With space for 18 shelves in the rack
(two-sided), half of the shelves are populated
with disks, the remainder housing controllers,
network switches, power distribution.
array Two RAID controllers Dual-attached
disks Controllers connect to the storage
network Sizing of array subject to components
available
disk size restricted to give a disk count which
matches the number of processors (and thus number
of active processes)
lmr for Monarc study- april 1999
13Components (iii)
- Tapes
- a mass market solution will probably NOT be
available- - possibly we shall still be using robots like the
ones installed today - General disclaimer
- these are just estimates
- how the technology evolves is only one
component, the market decides on the capacity
of the products
14storage network
12 Gbps
processors
5600 processors 1400 boxes 160 clusters 40
sub-farms
tapes
1.5 Gbps
0.8 Gbps
6 Gbps
8 Gbps
24 Gbps
farm network
960 Gbps
0.8 Gbps (daq)
100 drives
CMS Offline Farm at CERN circa 2006
LAN-WAN routers
250 Gbps
storage network
5 Gbps
0.5 M SPECint95 5,600 processors 0.5 PByte
disk 5,400 disks
0.8 Gbps
5400 disks 340 arrays ...
disks
lmr for Monarc study- april 1999
15Layout and power - CMS or ATLAS
totals 400 KWatts 370 m2
18 metres
tapes 120 m2
14 KW
245 KW
24 metres
lmr for Monarc study- april 1999
16Caution
- These are only estimates
- requirements
- technology
- http//nicewww.cern.ch/les/pasta/welcome.html
- http//nicewww.cern.ch/omartin/nt3-99-ohm.html
17Space available in B.513
- Total space in technical rooms in B.513
- Computer room 1.400 m2
- Tape vault 1.100 m2
- MG room 200 m2
- Total 2.700 m2
- Estimate for LHC 1.600 m2 cleared space
18Questions
- Power
- REQUIRED about 2 MW surviving short cuts (UPS)
- what infrastructure needs to be changed?
- power distribution within the building, rooms?
- what about backup generators?
- cost estimates?
- Cooling (52 week per year usage)
- How much cooling capacity is required, how much
exists? - Power requirements for cooling?
- Advice on packaging of the equipment (e.g. should
we buy cards in racks rather than use
flow-through office systems on shelves?) - Cooling in the B.513 sous-sol?
- Cost estimates?
19Questions (ii)
- smoke/fire detection
- current discussion between IT (Dave Underhill)
and ST, TIS - is the current method sufficient?
- what is the best practice for computer halls?
- can smoke/heat sources be localised in an open
hall? - does it make sense to tie detection to power?
- What questions should we be asking?
20Each silo has 6,000 slots, each of which can hold
a 50GB cartridge gt theoretical capacity 1.2
PetaBytes
21(No Transcript)
22(No Transcript)
23About 250 PCs, with 500 Pentium processors are
currently installed for offline physics data
processing