Title: The Grid approach for the HEP computing problem
1The Grid approach for the HEP computing problem
- Massimo Sgaravatto
- INFN Padova
- massimo.sgaravatto_at_pd.infn.it
2What is a Grid ?
- Dependable, consistent, pervasive access to
resources - Enable communities (virtual organizations) to
share geographically distributed resources as
they pursue common goals in the absence of
central control, omniscience, trust
relationships - Make it easy to use diverse, geographically
distributed, locally managed and controlled
computing facilities as if they formed a coherent
local cluster
3What does the Grid do for you?
- You submit your work
- And the Grid
- Partitions your work into convenient execution
units based on the available resources, data
distribution, if there is scope for parallelism - Finds convenient places for it to be run
- Organises efficient access to your data
- Caching, migration, replication
- Deals with authentication and authorization to
the different sites that you will be using - Interfaces to local site resource allocation
mechanisms, policies - Runs your jobs
- Monitors progress
- Recovers from problems
- Tells you when your work is complete
4- Grid approach in many sciences and disciplines
5Mathematicians Solve NUG30
- Looking for the solution to the NUG30 quadratic
assignment problem - An informal collaboration of mathematicians and
computer scientists - Condor-G delivered 3.46E8 CPU seconds in 7 days
(peak 1009 processors) in U.S. and Italy (8 sites)
14,5,28,24,1,3,16,15, 10,9,21,2,4,29,25,22, 13,26,
17,30,6,20,19, 8,18,7,27,12,11,23
MetaNEOS Argonne, Iowa, Northwestern, Wisconsin
6 Network for Earthquake Engineering
Simulation
- NEESgrid national infrastructure to couple
earthquake engineers with experimental
facilities, databases, computers, each other - On-demand access to experiments, data streams,
computing, archives, collaboration
NEESgrid Argonne, Michigan, NCSA, UIUC, USC
7- Grid approach to address the High Energy Physics
(HEP) computing problem
8HEP computing characteristics
- Large numbers of independent events to process
- Large data sets, mostly read-only
- Modest floating point requirement
- Batch processing for production selection -
interactive for analysis - Commodity components are just fine for HEP
- Very large aggregate requirements computation,
data - The LHC challenge
- Jump in orders of magnitude wrt. previous
experiments - Geographical dispersion of people and of
resources - Scale
- Petabytes per year of data
- Thousands of processors
- Thousands of disks
- Terabits/second of I/O bandwidth
-
- Complexity
- Lifetime (20 years)
-
9World Wide Collaboration ? distributed
computing storage capacity
CMS 1800 physicists 150 institutes 32 countries
10Solution?
- Regional Computing Centres
- Serve better the needs of the world-wide
distributed community - Data available nearby
- Reduce dependence on links to CERN
- Exploit established computing expertise
infrastructure in national labs, universities - See http//www.cern.ch/monarc
11 12Grid as a possible approach
- Various technical issues to address
- Resource Discovery
- Resource Management
- Distributed scheduling, optimal co-allocation of
CPU, data and network resources, uniform
interface to different local resource managers,
- Data Management
- Petabyte-scale information volumes, high speed
data moving and replica, replica synchronization,
data caching, uniform interface to mass storage
management systems, - Automated system mgmt techniques of large
computing fabrics - Monitoring Services
- Security
- Authentication, Authorization
-
- Scalability, Robustness, Resilience
- ? Grid model to address such problems
13State (HEP-centric view) circa 2.5 years ago
- Globus project
- Globus toolkit core services for Grid tools and
applications (Authentication, Information
service, Resource management, etc) - Good basis to build on but
- No higher level services
- Handling of lots of data not addressed
- No production quality implementations
- Not possible to do real work with Grids yet
14DataGrid Project (EDG)
- Project started Jan 2001, duration 3 years
- Goals
- To build a significant prototype of the LHC
computing model - To collaborate with and complement other European
and US projects - To develop a sustainable computing model
applicable to other sciences and industry
biology, earth observation etc. - Specific project objectives
- Middleware for fabric Grid management
evaluation, test, and integration of existing M/W
S/W and research and development of new S/W as
appropriate - Large scale testbed
- Production quality demonstrations
- Open source and technology transfer
- See http//www.eu-datagrid.org
15Main Partners
- CERN
- CNRS - France
- ESA/ESRIN - Italy
- INFN - Italy
- NIKHEF The Netherlands
- PPARC - UK
16Associated Partners
- Research and Academic Institutes
- CESNET (Czech Republic)
- Commissariat à l'énergie atomique (CEA) France
- Computer and Automation Research Institute,Â
Hungarian Academy of Sciences (MTA SZTAKI) - Consiglio Nazionale delle Ricerche (Italy)
- Helsinki Institute of Physics Finland
- Institut de Fisica d'Altes Energies (IFAE) -
Spain - Istituto Trentino di Cultura (IRST) Italy
- Konrad-Zuse-Zentrum für Informationstechnik
Berlin - Germany - Royal Netherlands Meteorological Institute (KNMI)
- Ruprecht-Karls-Universität Heidelberg - Germany
- Stichting Academisch Rekencentrum Amsterdam
(SARA) Netherlands - Swedish Natural Science Research Council (NFR) -
Sweden
- Industry Partners
- Datamat (Italy)
- IBM (UK)
- Compagnie des Signaux (France)
17- The Middleware Working Group coordinates the
development of the software modules leveraging,
existing and long tested open standard solutions.
Five parallel development teams implement the
software job scheduling, data management, grid
monitoring, fabric management and mass storage
management. - The Infrastructure Working Group is focused on
the integration of middleware software with
systems and networks to provide testbeds to
demonstrate the effectiveness of DataGrid in
production quality operations over high
performance networks. - The Applications Working Group exploits the
project developments to process large amounts of
data produced by experiments in the fields of
High Energy Physics (HEP), Earth Observations
(EO) and Biology. - The Management Working Group has in charge the
coordination of the entire project on a
day-to-day basis and the dissemination of the
results among industries and research institutes.
Applications
Testbed
Middleware
Management
Infrastructure
18DataGrid Architecture
19DataGrid achievements
- Testbed 1 first release of EDG middleware
- First workload management system
- Super scheduling" component using application
data and computing elements requirements - File Replication Tools (GDMP), Replica Catalog,
SQL Grid Database Service, - Tools for farm installation and configuration
-
- Used for real productions
- Towards testbed 2 new functionalities and
increased reliability
20Job submission scenario
21Other HEP Grid initiatives
- PPDG (US)
- GriPhyN (US)
- DataTag iVDGL
- Transatlantic testbeds (to address
interoperability) - LCG (LHC Computing Grid Project)
22The Grid World current status
- Dozens of major Grid projects in scientific
technical computing/research education - Considerable consensus on key concepts and
technologies - Open source Globus Toolkit a de facto standard
for major protocols services - Industrial interest emerging rapidly
- Opportunity convergence of eScience and
eBusiness requirements technologies
23Problems
- Almost all projects have developed specialized
services which have been layered on top of
standard services (security, remote job
execution, etc.) - Patchwork of protocols and non-interoperable
standards and difficult to re-use
implementations - ? Exploit Web Services
24Web Services
- Increasingly popular standards-based framework
for accessing network applications - W3C standardization Microsoft, IBM, Sun, others
- WSDL Web Services Description Language
- Interface Definition Language for Web services
- SOAP Simple Object Access Protocol
- XML-based RPC protocol common WSDL target
- WS-Inspection
- Conventions for locating service descriptions
- UDDI Universal Desc., Discovery, Integration
- Directory for Web services
25Open Grid Service Architecture (OGSA)
- Service orientation
- Computational resources, storage resources,
networks, programs, databases, etc. all
represented as services - Allows standard interface definition mechanisms
multiple protocol bindings, multiple
implementations, local/remote transparency - Grid service web service with semantic for
service interactions - Management of transient instances ( state)
26Global Grid Forum
- Mission
- To focus on the promotion and development of Grid
technologies and applications via the development
and documentation of "best practices,"
implementation guidelines, and standards with an
emphasis on "rough consensus and running code" - An Open Process for Development of Standards
- A Forum for Information Exchange
- A Regular Gathering to Encourage Shared Effort
- See http//www.globalgridforum.org