Title: les'robertsoncern'ch foil 1
1LHC Computinga Challenge for Grids
- Swiss Computer Science Conference SCSC'02
- 22 February 2002, ETH Zürich
- Les Robertson
- CERN - IT Division
- les.robertson_at_cern.ch
2- LHC Computing
- Grid technology
- Grids for LHC
3- On-line System
- Multi-level trigger
- Filter out background
- Reduce data volume
- 24 x 7 operation
40 MHz (1000 TB/sec)
Level 1 - Special Hardware
75 KHz (75 GB/sec)
Level 2 - Embedded Processors
5 KHz (5 GB/sec)
Level 3 Farm of commodity CPUs
100 Hz (100 MB/sec)
Data Recording Offline Analysis
4 The Large Hadron Collider Project 4 detectors
CMS
ATLAS
Storage Raw recording rate 0.1 1
GByte/sec Accumulating at 5-8 PetaBytes/year
(plus copies) 10 PetaBytes of
disk Processing 200,000 of todays fastest
PCs
LHCb
52.4 PetaBytes Today
6 Problem 1 Cost Problem 2 Number of
components
7Data Handling and Computation for Physics Analysis
reconstruction
event filter (selection reconstruction)
detector
processed data
event summary data
analysis
raw data
batch physics analysis
event reprocessing
simulation
analysis objects (extracted by physics topic)
event simulation
interactive physics analysis
les.robertson_at_cern.ch
8LHC Physics Data - Summary
- 40 MHz collision rate
- Reduced by real-time triggers/filters to few 100
Hz - Digitised collision data few Mbytes
- Collisions recorded 1011 per year
- Embarrassingly parallel each collision
independent - Reconstruction transform from detector view to
physics view (tracks, energies, particles, ..) - Analysis
- Find collisions with similar features
- Physics extracted by collective iterative
discovery - Keys are dynamically defined by professors and
students using complex algorithms - Simulation start from the theory and detector
characteristics and compute what the detector
should have seen
9High Energy Physics ComputingHigh Throughput
Computing
- Large numbers of independent events - trivial
parallelism - Small records - mostly read-only
- Modest I/O rates - few MB/sec per fast processor
- Modest floating point requirement - SPECint
performance
Good fit for clusters of PCs
- Chaotic workload
- research environment ? unpredictable, no limit to
the requirements - Very large aggregate requirements computation,
data, i/o - exceeds the capabilities of a single geographical
installation
10CERN's Users in the World
Europe 267 institutes, 4603 usersElsewhere
208 institutes, 1632 users
11- Problem 1 Cost
- Problem 2 Number of components
- Problem 3 Global community
- Solution 1 Global community ? Access to
national and regional computer centres
12analysis
reconstruction
- Worldwide distributed computing system
- Small fraction of the analysis at CERN
- High-level analysis using 12-20 large regional
centres - how to use the resources efficiently
- establishing and maintaining a uniform physics
environment - Data exchange with tens of smaller regional
centres, universities, labs
- Importance of cost containment
- components architecture
- utilisation efficiency
- maintenance, capacity evolution
- personnel management costs
- ease of use (usability efficiency)
13- Moores law
- capacity growth with -
- a fixed cpu count
- or a fixed annual budget
14(No Transcript)
15The MONARC Multi-Tier Model (1999)
les.robertson_at_cern.ch
16Are Grids a solution?
- Analogous with the electrical power grid
- Unlimited ubiquitous distributed computing
- Transparent access to multi peta byte distributed
data bases - Easy to plug in
- Hidden complexity of the infrastructure
Ian Foster and Carl Kesselman, editors, The
Grid Blueprint for a New Computing
Infrastructure, Morgan Kaufmann, 1999,
http//www.mkp.com/grids
17What is the Grid for?
- creates a virtual computer centre spanning
different geographical locations that
are managed independently - frees the end-user from the details of the
different centres resources, policies, data
location, - enables flexible sharing of large scale,
distributed resources by users from different
communities
18LHC Computing Model2002 - evolving
The opportunity of Grid technology
The opportunity of Grid technology
The LHC Computing Centre
les.robertson_at_cern.ch
19The Promise of Grid Technology
What should the Grid do for you?
- you submit your work
- and the Grid
- Finds convenient places for it to be run
- Optimises use of the widely dispersed resources
- Organises efficient access to your data
- Caching, migration, replication
- Deals with authentication to the different sites
that you will be using - Interfaces to local site authorisation and
resource allocation mechanisms, policies - Runs your jobs
- Monitors progress
- Recovers from problems
- .. and .. Tells you when your work is complete
20The DataGRID Project
www.eu-datagrid.org
21DataGRID Partners
- Managing partners
- UK PPARC Italy INFN
- France CNRS Holland NIKHEF
- Italy ESA/ESRIN CERN proj.mgt. - Fabrizio
Gagliardi - Industry
- IBM (UK), Communications Systems (F),
Datamat (I) - Associate partners
University of Heidelberg, CEA/DAPNIA (F), IFAE
Barcelona, CNR (I), CESNET (CZ), KNMI (NL), SARA
(NL), SZTAKI (HU)
Finland- Helsinki Institute of Physics
CSC, Swedish Natural Science Research Council
(ParallelldatorcentrumKTH, Karolinska
Institute), Istituto Trentino di Cultura, Zuse
Institut Berlin,
22DataGRID Challenges
- Data
- Scaling
- Reliability
- Manageability
- Usability
23Part i - Grid Middleware
- Building on an existing framework (Globus)
- Develop enhanced Grid middleware for data
intensive computing - workload management
- data management
- application and grid monitoring
24Datagrid Middleware Architecture
25Workload Management
- Optimal co-allocation of data, CPU and network
for specific grid/network-aware jobs - Distributed scheduling (data/code migration) of
unscheduled/scheduled jobs - Uniform interface to various local resource
managers - Priorities, policies on resource usage (CPU,
Data, Network)
26Data Management
- Data Transfer
- Efficient, secure and reliable transfer of data
between sites - Data Replication
- Replicate data consistently across sites
- Data Access Optimization
- Optimize data access using replication and remote
open - Data Access Control
- Authentication, ownership, access rights on data
- Metadata Storage
- Grid-wide persistent metadata store for all
kinds of Grid information
Data Granularity Files, not Objects.
27A Job Submission Example
Replica Catalogue
Information Service
Input Sandbox
Resource Broker
Storage Element
Logging Book-keeping
Job Submission Service
Compute Element
28Part ii Local Computing Fabrics
- Fabric management
- Mass storage
29Management of Grid Nodes
- Before tackling the Grid, better know how to
manage efficiently giant local clusters ? fabrics - commodity components processors, disks, network
switches - massive mass storage
- new level of automation required
- self-diagnosing
- self-healing
- Key Issues
- scale
- efficiency performance
- resilience fault tolerance
- cost acquisition, maintenance, operation
- usability
- security
30Part iii - Applications
- HEP
- The four LHC experiments
- Live proof-of-concept prototype of the Regional
Centre model - Earth Observation
- ESA-ESRIN
- KNMI (Dutch meteo) climatology
- Processing of atmospheric ozone data derived from
ERS GOME and ENVISAT SCIAMACHY sensors - Biology
- CNRS (France), Karolinska (Sweden)
31Part iv Deployment
32Datagrid Monitor
http//ccwp7.in2p3.fr/mapcenter/
33The Data Grid Project - Summary
- European dimension
- EC funding 3 years, 10M Euro
- Closely coupled to several national initiatives
- Multi-science
- Technology leverage
- Globus, Condor, HEP farming MSS, Monarc,
INFN-Grid, Géant - Emphasis
- Data Scaling - Reliability
- Rapid deployment of working prototypes -
production quality - Collaboration with other European and US projects
- Status
- Started 1 January 2001
- Testbed 1 in operation now
- Open
- Open-source and communication
- Global GRID Forum
- Industry and Research Forum
34Trans-Atlantic Testbeds with High Energy Physics
Leadership
US projects
European projects
35- EU-Funded Project
- Partners CERN, PPARC (UK), Amsterdam (NL),
INFN (IT) - Géant research network backbone
- Extending Datagrid to collaboration with US
projects - iVDGL, GriPhyN
- EU, DoE NSF funded HEP Transatlantic network
- Main Aims
- Compatibility and interoperability between US
and EU projects - Transatlantic testbed for advanced network
research - 2.5 Gbps wavelength-based US-CERN Link -
6/2002 ? 10 Gbps 2003 or 2004
36DataTAG Project
NewYork
ABILENE
STARLIGHT
ESNET
GENEVA
? Triangle
MREN
STAR-TAP
37- LHC Computing Environment
- Many Grid Projects
- Very many Regional Computing Centres
- How can this be moulded into a coherent, reliable
computing facility for LHC?
38The LHC Computing Grid Project
Goal Prepare and deploy the LHC computing
environment
- applications - tools, frameworks, environment,
persistency - computing system ? evolution of traditional
services - cluster ? automated fabric
- collaborating computer centres ? grid
- CERN-centric analysis ? global analysis
environment
Borrow, buy or build
- foster collaboration, coherence of LHC
regional computing centres - central role of production testbeds
39This is not yet another grid technology project
it is a grid deployment project
This is not yet another grid technology project
it is a grid deployment project
Two phases
- Phase 1 2002-05
- Development and prototyping
- Build a 50 model of the grid needed for one
experiment
- Phase 2 2006-08
- Installation and operation of the full world-wide
initial production Grid
40The LHC Computing Grid Project
- Multi-disciplinary, collaborative, open
environment - Grid projects, other sciences, Globus, Global
Grid Forum - Mission oriented
- LHC computing facility focus
- Succession of prototypes
- Acquire technology, rather than develop
- Industrial participation
- www.cern.ch/openlab
41Final Remarks
- Rapid pace of grid technology development
- public investments
- enthusiasm of potential user communities
- industrial interest
- LHC has a real need and a good application
- International networks available to research will
be critical to success - Importance of reliability and usability in early
grids
42(No Transcript)