Building a Regional Centre - PowerPoint PPT Presentation

About This Presentation

Title:

Building a Regional Centre

Description:

... for the RCs is that we are hard pressed to finance the scale of computing needed ... Think hard if you really need mass storage. Discourage empires & egos ... – PowerPoint PPT presentation

Number of Views:15

Avg rating:3.0/5.0

Slides: 25

Provided by: lesr157

Category:

more less

Transcript and Presenter's Notes

Title: Building a Regional Centre

1
Building a Regional Centre

A few ideas a personal view
CHEP 2000 Padova
10 February 2000
Les Robertson
CERN/IT

2
Summary

LHC regional computing centre topology
Some capacity and performance parameters
From components to computing fabrics
Remarks about regional centres
Policies sociology
Conclusions

3
Why Regional Centres?

Bring computing facilities closer to home
final analysis on a compact cluster in the
physics department
Exploit established computing expertise
infrastructure
Reduce dependence on links to CERN
full ESD available nearby - through a fat, fast,
reliable network link
Tap funding sources not otherwise available to
HEP
Devolve control over resource allocation
national interests?
regional interests?
at the expense of physics interests?

4
The MONARC RC Topology
CERN Tier 0

University physics department
Final analysis
Dedicated to local users
Limited data capacity cached only via the
network
Zero administration costs (fully automated)

Tier 0 CERN
Data recording, reconstruction, 20 analysis
Full data sets on permanent mass storage
raw, ESD, simulated data
Hefty WAN capability
Range of export-import media
24 X 7 availability

Tier 1 established data centre or new
facility hosted by a lab
Major subset of data all/most of the ESD,
selected raw data
Mass storage, managed data operation
ESD analysis, AOD generation, major analysis
capacity
Fat pipe to CERN
High availability
User consultancy Library Collaboration
Software support

Tier 2 smaller labs, smaller countries,
probably hosted by existing data centre
Mainly AOD analysis
Data cached from Tier 1, Tier 0 centres
No mass storage management
Minimal staffing costs

MONARC report http//home.cern.ch/barone/monarc/
RCArchitecture.html
5
The MONARC RC Topology
CERN Tier 0
IN2P3
RAL
FNAL
Tier 1
Uni n
Lab a
Tier2
Uni b
Lab c
?
?
Department
?
MONARC report http//home.cern.ch/barone/monarc/
RCArchitecture.html
6
More realistically - a Grid Topology
CERN Tier 0
IN2P3
DHL
RAL
FNAL
Tier 1
Uni n
Lab a
Tier2
Uni b
Lab c
?
?
Department
?
7
Capacity / Performance
Based on CMS/Monarc estimates (early 1999) Rounded, extended and adapted by LMR CERNCMS or ATLAS CERNCMS or ATLAS Tier 11 expt. Tier 12 expts.
Based on CMS/Monarc estimates (early 1999) Rounded, extended and adapted by LMR Capacity in 2006 Annual increase Capacity in 2006
CPU (K SPECint95) 600 200 120 240
Disk (TB) 550 200 110 220
Tape (PB) (including copies at CERN) 3.4 2 0.4 lt1
I/O rates disk (GB/sec) tape (MB/sec) 50400 1050 20100
WAN bandwidth Gbps 2.5 2.5
all CERN today 15K SI95 25 TB 100
MB/sec
20 CERN
1 SPECint95 10 CERNunits 40 MIPS
8
Capacity / Performance
Based on CMS/Monarc estimates (early 1999) Rounded, extended and adapted by LMR Tier 12 expts. Tier 12 expts.
Based on CMS/Monarc estimates (early 1999) Rounded, extended and adapted by LMR Capacity in 2006
CPU (K SPECint95) 240 1200 cpus600 boxes
Disk (TB) 220 At least 2400 disks? 100 GB/disk (only!)
Tape (PB) (including copies at CERN) lt1
I/O rates disk (GB/sec) tape (MB/sec) 20100 40 MB/sec/cpu20 MB/sec/disk
WAN bandwidth Gbps 2.5 300 MB/sec
Approx. Number of farm PCs at CERN today
May not find disks as small as that! But we need
a high disk count for access, performance,
RAID/mirroring, etc.
We probably have to buy more disks, larger
disks, use the disks that come with the PCs?
much more disk space
Effective throughput of LAN backbone
1.5 of LAN
9
Building a Regional Centre

Commodity components are just fine for HEP
Masses of experience with inexpensive farms
LAN technology is going the right way
Inexpensive high performance PC attachments
Compatible with hefty backbone switches
Good ideas for improving automated operation and
management

10
Evolution of todays analysis farms

Computing Storage Fabric
built up from commodity components
Simple PCs
Inexpensive network-attached disk
Standard network interface (whatever Ethernet
happens to be in 2006)
with a minimum of high(er)-end components
LAN backbone
WAN connection

11
Standard components

Computing Storage Fabric
built up from commodity components
Simple PCs
Inexpensive network-attached disk
Standard network interface (whatever Ethernet
happens to be in 2006)
with a minimum of high(er)-end components
LAN backbone
WAN connection

12
HEPs not special, just more cost conscious

Computing Storage Fabric
built up from commodity components
Simple PCs
Inexpensive network-attached disk
Standard network interface (whatever Ethernet
happens to be in 2006)
with a minimum of high(er)-end components
LAN backbone
WAN connection

13
Limit the role of high end equipment

Computing Storage Fabric
built up from commodity components
Simple PCs
Inexpensive network-attached disk
Standard network interface (whatever Ethernet
happens to be in 2006)
with a minimum of high(er)-end components
LAN backbone WAN
connection

14
Components ? building blocks
36 dual 200 SI95 cpus 14K SI95s 100K
224 3.5 disks 25-100 TB 50K - 200K
2000 standard office equipment 36 dual cpus
900 SI95 120 72GB disks 9 TB
2005 standard, cost-optimised, Internet
warehouse equipment
For capacity cost estimates see the 1999 Pasta
Report http//nicewww.cern.ch/les/pasta/welcome.
html
15
The Physics Department System

Two 19 racks 200K
CPU 14K SI95 (10 of a Tier1 centre)
Disk 50TB (50 of a Tier1 centre)
Rather comfortable analysis machine
?
Small Regional Centres are not going to be
competitive
Need to rethink the storage capacity at the Tier1
centres

16
Tier 1, Tier 2 RCs, CERN

A few general remarks
A major motivation for the RCs is that we are
hard pressed to finance the scale of computing
needed for LHC
We need to start now to work together towards
minimising costs
Standardisation among experiments, regional
centres, CERN
so that we can use the same tools and practices
to
Automate everything
Operation monitoring
Disk data management
Work scheduling
Data export/import (prefer the network to mail)
in order to
Minimise operation, staffing
Trade off mass storage for disk network
bandwidth
Acquire contingency capacity rather than fighting
bottlenecks
Outsource what you can (at a sensible price)
.

Keep it simple Work together
17
The middleware

The issues are
integration of this amorphous collection of
Regional Centres
Data
Workload
Network performance
application monitoring
quality of data analysis service
Leverage the Grid developments
Extending Meta-computing to Mass-computing
Emphasis on data management caching
and production reliability quality

Keep it simple Work together
18
A 2-experiment Tier 1 Centre
Requirement 240K SI95 220 TB
Basic equipment 3m
cpus/disks
Processors 20 standard racks 1,440 cpus ?
280K SI95 Disks 12 standard racks 2,688
disks ? 300TB (with low capacity disks)
19
The full costs?

Space
Power, cooling
Software
LAN
Replacement/Expansion 30 per year
Mass storage
People

20
mass storage ?

Do all Tier 1 centres really need a full mass
storage operation?
Tapes, robots, storage management software?
Need support for export/import media
But think hard before getting into mass storage
Rather
more disks, bigger disks, mirrored disks
cache data across the network from another
centre (that is willing to tolerate the stresses
of mass storage management)
Mass storage is person-power intensive ? long
term costs

21
Consider outsourcing

Massive growth in co-location centres, ISP
warehouses, ASPs, storage renters, etc.
Level 3, Intel, Hot Office, Network Storage Inc,
PSI, .
There will probably be one near you
Check it out compare costs prices
Maybe personnel savings can be made

22
Policies sociology

Access policy?
Collaboration-wide? or restricted access
(regional, national, .)
A rich source of unnecessary complexity
Data distribution policies
Analysis models
Monarc work will help to plan the centres
But the real analysis models will evolve when the
data arrives
Keep everything flexible
simple architecture
- simple policies
- minimal politics

23
Concluding remarks I

Lots of experience with farms of inexpensive
components
We need to scale them up lots of work but we
think we understand it
But we have to learn how to integrate distributed
farms into a coherent analysis facility
Leverage other developments
But we need to learn through practice and
experience
Retain a healthy scepticism for scalability
theories
Check it all out on a realistically sized testbed

24
Concluding remarks II

Dont get hung up on optimising component
costsDo be very careful with head-count
Personnel costs will probably dominate
Define clear objectives for the centre
Efficiency, capacity, quality
Think hard if you really need mass storage
Discourage empires egos
Encourage collaboration out-sourcing
In fact maybe we can just buy all this as an
Internet service