Title: Distributed Computing and Analysis
1Distributed Computingand Analysis
- Lamberto Luminari
- Italo Hellenic School of Physics 2004
- Martignano - May 20, 2004
2Outline
- Introduction
- General remarks
- Distributed computing
- Principles
- Projects
- Computing facilities testbeds and production
infrastructures - Database Systems
- Principles
- Distributed analysis
- Requirements and issues
3General remarks
- Schematic approach
- For the purpose of clarity, differences among
possible alternatives are stressed in reality,
solutions are often a mix or a compromise - Only main features of relevant items are
described no aim of exhaustivity - HEP (LHC) oriented presentation
- Examples are mainly taken from HEP world
- Projects with HEP community involvement are
preferred - Options chosen by LHC
4Distributed Computing
5Distributed computing
- What is it
- processing of data and objects across a network
of connected systems - hardware and software infrastructure that
provides pervasive (and inexpensive) access to
computational capabilities. - A long story
- mainframes more and more expensive
- cluster technology
- RISC machines very powerful.
- What makes it appealing now
- CPU power!
- Storage capacity!!
- Network bandwidth!!!
- ... but Distr. Comp. is not a choice,
- rather a necessity or an opportunity.
6Network performances
7Advantages of distributed computing
- Scalability and flexibility
- in principle, distributed computing systems are
infinitely scalable simply add more units and
get more computing power. Moreover you can add or
remove specific resources and adapt the system to
your needs. - Efficiency
- private resources are usually poorly used
pooling them greatly increases their
exploitation. - Reliability
- failure of a component little affects the overall
performances. - Load balancing and averaging
- distributing tasks according to the availability
of resources optimize the behavior of the whole
system and minimize the execution time - load peaks arising from different user
communities rarely sum up, then the use of
resources is averaged (and optimized) over long
periods.
8Disadvantages of distributed computing
- Difficult integration and coordination
- many heterogeneous computing systems have to be
integrated - data sets are splitted over different storage
systems - many users have to cooperate and share resources.
- Unpredictability
- the quantity of available resources may largely
fluctuate - computing units may become unavailable or
unreachable suddenly and for long periods, making
unpredictable the completion time of the tasks
running there. - Security problems
- distributed systems are prone to intrusion.
9Applications and distributed computing
- Suitable
- high compute to data ratio
- batch processes
- loosely coupled tasks
- statistical evaluations dependent on random
trials - data mining through distributed filesystems or
databases. - Unsuitable
- real time
- interactive processes
- strongly coupled
- sequential.
10Distributed computing architectures
- Peer-to-peer
- flat organization of components, with similar
functionalities, talking to each other - suitable for
- independent tasks or poor inter-task
communication - access to sparse data organized in a non
hierarchical way. - Client - server
- components with different functionalities and
roles - processing unit (client) provided with a
lightweight agent able to perform simple
operations detect system status and notify it to
the server, ask (or wait) for tasks, accept and
send data, execute processes according to
priorities or in spare cycles, .... - dedicated unit (server) provided with complex
software able to take or send computing
requests, monitor the status of the jobs sent to
the clients, receive the results and assemble
them, possibly in a database. It also takes care
of security and access policy, and stores
statistics and accounting data. - suitable for
- complex architectures and tasks.
11Multi-tier computing systems
- Components with different levels of service,
arranged in tiers - computing centers (multi-processors, PC farms,
data storage systems) - clusters of dedicated machines
- individual, general use PCs.
- Different functionalities for each tier
- amount of CPU power installed and data stored
- quality and schedule of user support
- level of reliability and security.
12(No Transcript)
13Distributed computing models
- Clusters
- groups of homogeneous, tightly coupled
components, sharing file systems and peripheral
devices (e.g., Beowulf) - Pools of desktop PCs
- loosely interconnected private machines (e.g.,
Condor) - Grids
- heterogeneous systems of (mainly dedicated)
resources (e.g., LCG).
14Comparison of computing models
15- Condor is a specialized workload management
system for compute-intensive jobs. It provides - a job queueing mechanism
- scheduling policy
- priority scheme
- resource monitoring
- resource management.
- Users submit their serial or parallel jobs to
Condor, which places them into a queue, chooses
when and where to run the jobs based upon a
policy, carefully monitors their progress, and
ultimately informs the user upon completion. - Unique mechanisms enable Condor to effectively
harness wasted CPU power from otherwise idle
desktop workstations. Condor is able to
transparently produce a checkpoint and migrate a
job to a different machine. - Condor does not require a shared file system
across machines if no shared file system is
available, Condor can transfer the job's data
files on behalf of the user, or Condor may be
able to transparently redirect all the job's I/O
requests back to the submit machine.
16resources
data
network
17Distributed computing environment
- DCE standards
- A distributed computing network may include many
different systems. The Distributed Computing
Environment (DCE) formulated by The Open Group
formalizes the technologies needed to make the
components communicate with each other, such as
remote procedural calls and middleware. DCE runs
on all major computing platforms and is designed
to support distributed applications in
heterogeneous hardware and software environments. - DCE provides a complete infrastructure, with
services, interfaces, protocols, encoding rules
for - authentication and security (Kerberos, Public Key
certificate) - objects interoperability across different
platforms (CORBA Common Object Request Broker
Architecture) - directories (with global name and cell name) for
distributed resources - time services (including synchronization)
- distributed file systems
- Remote Procedure Call
- Internet/Intranet communications.
18Grid computing specifications
- The Global Grid Forum (GGF) is the primary
organization whose purpose is to define
specifications about Grid Computing. It is a
forum for information exchange and collaboration
among people who are - doing Grid research,
- designing and building Grid software,
- deploying Grids,
- using Grids,
- spanning technology areas scheduling, data
handling, security - The Globus Toolkit (developed in Argonne Nat.
Lab. and Univ. of Southern California) is an
implementation of these standards, and has become
a de facto standard for grid middleware because
of some attractive features - a object-oriented approach, which allows
developers of specific applications to take just
what meets their needs, to introduce tools one at
a time and to make programs increasingly
"Grid-enabled - the toolkit software is open-source this
allows developers to freely make and add
improvements.
19Globus toolkit
- Practically all major Grid projects are being
built on protocols and services provided by the
Globus Toolkit, a software "work-in-progress"
which is being developed by the Globus Alliance,
which involves primarily Ian Foster's team at
Argonne National Laboratory and Carl Kesselman's
team at the University of Southern California in
Los Angeles. - The toolkit provides a set of software tools to
implement the basic services and capabilities
required to construct a computational Grid, such
as security, resource location, resource
management, and communications. - Globus includes programs such as
- Computing Element receives job requests and
delivers them to the Worker Nodes, which will
perform the real work. The Computing Element
provides an interface to the local batch queuing
systems. A Computing Element can manage one or
more Worker Nodes
20Globus Toolkit
- The Globus toolkit provides a set of software
tools to implement the basic services and
capabilities required to construct a
computational Grid, such as security, resource
location, resource management, and
communications - GRAM (Globus Resource Allocation Manager), to
convert a request for resources into commands
that local computers can understand - GSI (Grid Security Infrastructure), to provide
authentication of the user and work out that
person's access rights - MDS (Monitoring and Discovery Service), to
collect information about resource (processing
capacity, bandwidth capacity, type of storage,
etc) - GRIS (Grid Resource Information Service), to
query resources for their current configuration,
capabilities, and status - GIIS (Grid Index Information Service), to
coordinate arbitrary GRIS services - GridFTP, to provide a high-performance, secure
and robust data transfer mechanism - Replica Catalog, a catalog that allows other
Globus tools to look up where on the Grid other
replicas of a given dataset can be found - Replica Management system, which ties together
the Replica Catalog and GridFTP technologies,
allowing applications to create and manage
replicas of large datasets.
21OGSA the future?
22Grid projects
and many others!
23Grid projects
- UK GRIPP
- Netherlands DutchGrid
- Germany UNICORE, Grid project
- France Grid funding approved
- Italy INFN Grid
- Eire Grid project
- Switzerland - Network/Grid project
- Hungary DemoGrid
- Norway, Sweden NorduGrid
- NASA Information Power Grid
- DOE Science Grid
- NSF National Virtual Observatory
- NSF GriPhyN
- DOE Particle Physics Data Grid
- NSF TeraGrid
- DOE ASCI Grid
- DOE Earth Systems Grid
- DARPA CoABS Grid
- NEESGrid
- DOH BIRN
- NSF iVDGL
- Grid2003
- .
- DataGrid (CERN, ...)
- EuroGrid (Unicore)
- DataTag (CERN,)
- Astrophysical Virtual Observatory
- GRIP (Globus/Unicore)
- GRIA (Industrial applications)
- GridLab (Cactus Toolkit)
- CrossGrid (Infrastructure Components)
- EGSO (Solar Physics)
- EGEE
-
24Middleware projects relevant for HEP
- EDG
- European Data Grid (EU project)
- EGEE
- Enabling Grids for E-science in Europe (EU
project) - Grid2003
- joint project of the U.S. Grid projects iVDGL,
GriPhyN and PPDG, and the U.S. participants in
the LHC experiments ATLAS and CMS.
25(No Transcript)
26(No Transcript)
27(No Transcript)
28LCG hierarchical information service
29Replica management
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34Job submission steps (1)
35Job submission steps (2)
36Portals
- Why a portal?
- It can be accessed from everywhere and by
everything (desktop, laptop, PDA, phone). - It can keep the same user interface independently
of the underlying middleware. - It must be redundantly secure at all levels
- secure for web transactions,
- secure for user credentials,
- secure for user authentication,
- secure at VO level.
- All available grid services must be incorporated
in a logic way, just one mouse click away. - Its layout must be easily understandable and user
friendly.
37(No Transcript)
38(No Transcript)
39Computing facilities (1)
- Computing facilities (testbeds or production
infrastructures) are made up of one or more
nodes. Each node (computer center or cluster of
resources) contains a certain number of
components, which may be playing different roles.
Some are site specific - Computing Element receives job requests and
delivers them to the Worker Nodes, which will
perform the real work. The Computing Element
provides an interface to the local batch queuing
systems. A Computing Element can manage one or
more Worker Nodes - Worker Node the machine that will actually
process data. Typically managed via a local batch
system. A Worker Node can also be installed on
the same machine as the Computing Element. - Storage Element provides storage space to the
facility. The storage element may control large
disk arrays, mass storage systems and the like
however, the SE interface hides the differences
between these systems allowing uniform user
access. - User Interface the machine that allows users to
access the facility. This is typically the
machine the end-user logs into to submit jobs to
the grid and to retrieve the output from those
jobs.
40Computing facilities (2)
- Some other roles are shared by groups of users or
by thwe whole grid - Resource Broker receives users' requests and
queries the Information Index to find suitable
resources. - Information Index resides on the same machine as
the Resource Broker, keeps information about the
available resources. - Replica Manager coordinates file replication
from one Storage Element to another. Useful for
data redundancy but also to move data closer to
the machines which will perform computation. - Replica Catalog can reside on the same machine
as the Replica Manager, keeps information about
file replicas. A logical file can be associated
to one or more physical files which are replicas
of the same data. Thus a logical file name can
refer to one or more physical file names.
41Computing facilities relevant for HEP
- EDG
- Testbed
- LCG
- Production infrastructure
- EGEE
- Production infrastructure
- Grid3
- Production infrastructure operated jointly by the
U.S. Grid projects iVDGL, GriPhyN and PPDG, and
the U.S. participants in the LHC experiments
ATLAS and CMS.
42(No Transcript)
43(No Transcript)
44LCG hybrid architecture
Multi-tier hierarchy Grids
45(No Transcript)
46EGEE Timeline
- May 2003 proposal submitted
- July 2003 proposal accepted
- April 2004 start project
47Grid3 infrastructure
48Virtual Organizations (User Communities)
I. Foster
49Multi-VO and one Grid
Grid (shared resources and services)
50One VO and multi-Grid
ATLAS Production System
51Multi-VO and multi-Grid
Shared Resources and Services
VO services and private resources
VO services and private resources
Shared Resources and Services
Shared Resources and Services
VO services and private resources
VO services
VO services
52 HEP Requirements
- User requirements
- Concerning services, the HEP community has
already made a lot of work within EDG and LCG.
The basic requirements have already been
specified as use cases for HEP data processing (
HEPCAL report, May 2002). Using the HEPCAL
document to provide templates for requirements
analysis, the EDG/AWG(Application Working Group)
aim at defining requirements for a high level
common application layer based on the needs of
HEP, Bio-medicine and Earth Sciences, and is.
High level APIs for Grid Services have also been
defined by the EU funded project Gridlab. - Concerning resources, the production service must
provide a continuous, stable, robust environment
and a controlled, reliable access to the
resources. The agreed sharing policies must be
fully implemented and easily changeable. - Besides implementing the user requirements,
practical help should be given in interfacing the
experiment applications to grid services, and to
evaluate the performance of the software
deployed within the production environment, as
well as in pre-production testbeds.
53Security
- Security Policy
- The security organizational model, often tailored
so far on the needs and characteristics of
homogeneous communities, should in the future be
based on service needs of many heterogeneous
V.O.s, introducing in the Grid organizational
and security model a new complexity. - CA Policy
- A European Grid Policy Management Authority is a
prerequisite for running a Grid infrastructure
both in Europe and worldwide. The Grid Security
Infrastructure relies on trusted Certification
Authorities (CA). It is therefore essential that
a network of CAs, based on a commonly agreed set
of requirements, is established and maintained in
Europe.
54VO management
- As more and more communities will join common
production infrastructures, VO management is
becoming crucial. - Current technology offers support for rather
static and large communities. The assignment of
access rights is separated into two parts local
resource administrators grant rights to the VO as
a whole, while VO administrators grant them to
individual members of the community. - In the future there will be the need for small
(even only two people), short-lived (of the order
of few days) and unforeseen (dynamically
discovered) VOs. The goal would be to provide a
very fine-grained authorization and access
control mechanism, where applicable based on
global standards.
55Resource allocation and usage
- Resource allocation and reservation
- In order to meet the needs of all the different
Grid users, mechanisms will be required to
control and balance usage of the resources
(including networks) by highly demanding
applications, and to categorise and prioritise
jobs so that they can receive the required level
of service. - In particular, users should be able to allocate
resources both immediately and in advance.
Allocations must be restricted to authenticated
users acting within authorized roles, the
services available must be determined by policies
agreed with the user organisations, and the
aggregate services made available to VOs must be
monitored to ensure adherence to the agreements. - Resource usage and accounting
- A major issue is the control of usage of
resources, once access to them has been
established. This includes interfaces to
traditional Usage Control mechanisms such as
quotas and limits, and also the extraction and
recording of usage for Budgeting, Accounting and
Auditing purposes. - The usage quotas may be owned either by
individuals or by VO's, and specified both in
site-specific or Grid-wide protocols. This will
include the ability to allow enforcement of
quotas across a set of distributed resources.
56Organizational issues
- The need for resource sharing gives rise to a set
of organisational issues to be faced, analysed
and solved. Indeed, when a given organisation
makes its own resources available on line - Each organisation has its own decision and
management independence the resources to be
shared with other organisations should not
jeopardize such independence. - Each organisation has its own access policies.
It's not true that everybody in the Grid can use
everything, but it's true that new generations of
network and grid technologies allow to define new
sharing models. Each organisation should be able
to decide on each individual data, on each
individual resource and on which organisation
have the access/use right. - Each organisation has its own security policies
University security policies are usually
completely different from those of physics
laboratory that works in close co-operation with
government and the army. In order to guarantee a
real resources sharing among different kinds of
organisations, it's necessary to ensure the
maximum level of flexibility in the management of
the above mentioned issues.
57Requirements in LCG
- Requirements are set by Experiments in the SC2
- Requirements and Technical Assessment Groups
(RTAGs) - On applications
- data persistency
- software support process
- mathematical libraries
- detector geometry description
- Monte Carlo generators
- applications architectural blueprint
- detector simulation
- On Fabrics
- mass storage requirements
- On Grid technology and deployment area
- Grid technology use cases
- Regional Center categorization
58HEPCAL
LCG RTAG Common Use Cases for a HEP Common
Application Layer Requirements are given as a
set of use cases free of implementation details
- GENERAL USE CASES
- Obtain Grid Authorization
- Revoke Grid Authorization
- Grid Login
- Browse Grid Resources
59HEPCAL
- DATA MANAGEMENT USE CASES
- Data Set (DS) Metadata Update
- DS Metadata Access
- DS Registration
- Virtual DS Declaration
- Virtual DS Materialization
- DS Upload
- Catalogue Creation
- DS Access
- DS transfer to non-Grid storage
- DS Replica Upload
- DS Access Cost Evaluation
- DS Replication
- Physical DS Instance Deletion
- DS Deletion
- Catalogue Deletion
- Read from Remote DS
- DS Verification
- DS Browsing
60HEPCAL
- JOB MANAGEMENT USE CASES
- Job Catalogue Update
- Job Catalogue Query
- Job Submission
- Job Output Access or Retrieval
- Job Error Recovery
- Job Control
- Steer Job Submission
- Job Resource Estimation
- Job Environment Modification
- Job Splitting
- Production Job
- Analysis
- DS Transformation
- Job Monitoring
- Conditions Publishing
- Software Publishing
- Simulation Job
- Expt Software Dev for Grid
61HEPCAL
- VO MANAGEMENT USE CASES
- Configuring the VO
- Configuring the DS metadata catalogue (either
initially or reconfiguring). - Configuring the job catalogue (either initially
or reconfiguring). - Configuring the user profile (if this is possible
at all on a VO basis). - Adding or removing VO elements, e.g. computing
elements, storage elements, etc - Configuring VO elements, including quotas,
privileges etc. - Managing the Users
- Add and remove users to/from the VO.
- Modify the user information ( privileges, quotas,
priorities) either for single users or for
subgroups of users within a VO. - VO wide resource reservation
- The Grid should provide a tool to estimate the
time-to-completion given as input an estimate of
the resources needed by the job. This is needed
in particular to estimate the access cost. - There should be use cases for releasing reserved
resources, and system use cases for what to do in
case a user does not submit a job for which
resources are reserved. - VO wide resource allocation to users or
groups/users of a VO - Software (or condition set) publishing, i.e.
making it available on the Grid
62Database Systems
63Database Systems
CERN IT DB Group
- Database
- one or more, large structured sets of persistent
data. Usually associated with software to update
and query the data. A simple database might be a
single file containing many records, each of
which contains the same set of fields, where each
field is a certain fixed width. A database is one
component of a database management system. - Database Management System (DBMS)
- a set of programs (functions) that allows to
manage the large, structured sets of persistent
data, which make up the database, and provide
access to the data for multiple, concurrent users
whilst maintaining the integrity of the data. The
DBMS is in charge of all the functionalities
related to the database access, security,
storage
64Database Management Systems
CERN IT DB Group
- DBMS provides
- security facilities to prevent unauthorized users
from accessing the system, using names and
passwords to identify operators, programs and
individual machines and sets of privileges
assigned to them these privileges can include
the ability to read, write and update data in the
database - lock facilities to maintain data integrity locks
are used for read and write to chunks of data by
doing this only one user at a time can alter data
or users can be prevented from accessing data
being changed. These requirements are referred as
ACID (Atomicity, Consistency, Isolation and
Durability) - Atomicity all the parts of a transaction's
execution are either all committed or all rolled
back. All changes take effect, or none do. This
ensures that there is not erroneous data in the
systems or data which does not correspond to
other data as it should. - Consistency the database is transformed from one
valid state to another valid state. A transaction
is legal only if it obeys user-defined integrity
constraints. Illegal transactions aren't allowed
and, if an integrity constraint can't be
satisfied the transaction is rolled back to its
previously valid state and the user informed that
the transaction has failed. - Isolation the results of a transaction are
invisible to other transactions until the
transaction is complete. - Durability once a transaction has been
committed (completed), the results of a
transaction are permanent and can survive future
system and media failures.
65Database Systems
CERN IT DB Group
- Databases are based on many different models,
each of which is designed with a specific
problem, industry or set of functions in mind.
Here we attempt to look at the main types in some
depth - Relational Databases data are structured in a
series of tables, which have columns representing
the variables and rows that contain specific
instances of data. Currently the most wide spread
model. - Object Oriented Databases information is stored
as a persistent object, and not as a row in a
table. User defines objects and operations which
can be executed on them. - Object Relational Databases relational systems
to which object oriented functions are added.
They allow data to be manipulated in the form of
objects, as well as providing the traditional
relational interface. - Distributed Databases data are stored on two or
more computers, called nodes, and that these
nodes are connected over a network across a
country, continent or planet. - Multimedia Databases model for storing several
different types of file i.e. text, audio, video
and images in a single database. - Network Databases organizes data in a network of
linked records. A very early form of database,
fast but not very adaptable, which is little used
at present. - Hierarchical Databases data are stored as
records, linked with Parent-Child Relationships.
Mostly used in the past on mainframes.
66Relational Database Systems
CERN IT DB Group
- The Relational Model is one of the oldest models
used for creating a database, and the one that is
used by the majority of businesses today. It was
first outlined in a paper published by Ted Codd
in 1970. The relational model is based on Set
Theory and Predicate Logic - set theory allows data to be structured in a
series of tables, which have columns representing
the variables and rows that contain specific
instances of data. These tables are organized
using normalization, which is a process (derived
from Normal Forms theory) of reducing the
occurrences of repeated data by breaking it into
smaller pieces and creating new tables (e.g.,
personal data of a customer). - predicate logic is the basis of the query
language, i.e. the set of commands that allows to
insert, retrieve, modify or delete data,
according to some specified criteria. Data can
also be virtually or effectively joined in new
tables. - The current standard for relational databases is
set out in the Structured Query Language. Version
2 of the language is currently in use with
Version 3 expected to be released in the near
future by the International Standards
Organization (ISO) and American National
Standards Institution (ANSI). - The most widely used relational database systems
are produced by Oracle Corporation, Microsoft,
Sybase, IBM, but there is a large number of other
RDBMS designed to be either a general system or
for specific applications used in HEP, like MySQL
and PostgreSQL.
67Object Oriented Database Systems
CERN IT DB Group
- The ODBMSs were introduced to overcome many
restrictions imposed by the relational model on
certain types of data (mainly in case of huge
amounts or complex structures). Its main
advantage is the degree of low level control of
the system it allows the programmer. This gives
the programmer control of how the data is to be
store and manipulated - information is stored as a persistent object (and
not as a row in a table). This makes it more
efficient in terms of storage space requirements
and ensures that users can only manipulate data
in the ways the programmer has specified. It also
saves on the disk space needed for queries, as
instead of having to allocate resources for the
results, the space required is already there in
the objects themselves. - Because of the specific low level methods used in
a ODBMS, it is very difficult for third parties
to produce add-on products. Whilst relational
databases can benefit from software which has
been produced by other vendors, users of ODBMS's
either have to produce additional software in
house, by contracting other firms or in
collaboration with other organizations using the
same system. - The first commercially available object oriented
DBMS became available in the mid-1980's. By the
early 1990's there were a range of ODBMS's
available from a variety of vendors.
Objectivity/DB is the most widely used in HEP
community.
68Distributed Database Systems
CERN IT DB Group
- Distributed databases have the common
characteristics that they are stored on two or
more computers, called nodes, connected over a
network. They are classified as homogeneous and
heterogeneous - homogeneous databases use the same DBMS software
and have the same applications on each node. They
have a common schema (a file specifying the
structure of the database), and can have varying
degrees of local autonomy. They can be based on
any DBMS which supports this function, but it is
not possible to have more than one DBMS type in
the system. To be efficient, they have to have
very large network connections and a lot of
processing power. - heterogeneous databases have a very high degree
of local autonomy. Each node in the system has
its own local users, applications and data and
dealing with them itself, and only connects to
other nodes for information it does not have.
This type of distributed database is often just
called a federated system or a federation. It is
becoming more popular with organizations, both
for its scalability and the reduced cost in being
able to add extra nodes when necessary and the
ability to mix software packages. Unlike the
homogenous systems, heterogeneous systems can
include different database management systems in
the system. This makes them appealing to
organizations since they can incorporate legacy
systems and data into new systems.
69Beyond standard Database Systems
70(No Transcript)
71(No Transcript)
72(No Transcript)
73Distributed Analysis
74Distributed Analysis
- Within LCG a working group, with representatives
from all LHC experiments is working on a
blueprint architecture for grid services ARDA (A
Roadmap to Distributed Analysis). This will
serve as a first input to the EGEE Architecture
team. The HEPCAL work is continuing in the
framework of the LCG/GAG (Grid Applications
Group), developing use cases and requirements for
the analysis of physics data. This will also
give important input to architecture and design
work. - GAG reports
- Hepcal
- Systematic descriptions of HEP Grid Use Cases
- CERN-LCG-2002-020 (29 May 2002)
lcg.web.cern.ch/LCG/sc2/RTAG4/finalreport.doc - Hepcal-prime cern.ch/fca/HEPCAL-prime.doc
- Hepcal 2
- Analysis Use Cases
- CERN-LCG-2003-032 (29 October 2003)
lcg.web.cern.ch/LCG/SC2/GAG/HEPCAL-II.doc
75ARDA working group mandate
- To review the current Distributed Analysis
activities and to capture their architectures in
a consistent way - To confront these existing projects to the HEPCAL
II use cases and the user's potential work
environments in order to explore potential
shortcomings. - To consider the interfaces between Grid, LCG and
experiment specific services - Review the functionality of experiment-specific
packages, state of advancement and role in the
experiment - Identify similar functionalities in the different
packages - Identify functionalities and components that
could be integrated in the generic GRID
middleware - To confront the current projects with critical
GRID areas - To develop a roadmap specifying wherever possible
the architecture, the components and potential
sources of deliverables to guide the medium term
(2 year) work of the LCG and the DA planning in
the experiments.
76ARDA Architecture
77SEAL Overview
Shared Environment for Applications at LHC
- SEAL aims to
- Provide the software infrastructure, basic
frameworks, libraries and tools that are common
among the LHC experiments - Select, integrate, develop and support foundation
and utility class libraries - Develop a coherent set of basic framework
services to facilitate the integration of LCG and
non - LCG software - The scope of the SEAL project is basically the
scope of the LCG Applications Area.
78PROOF (Parallel ROOT Facility)
- Collaboration between core ROOT group at CERN and
MIT Heavy Ion Group - Part of and based on ROOT framework
- Uses heavily ROOT networking and other
infrastructure classes - Currently no external technologies
- The PROOF system allows
- parallel analysis of trees in a set of files
- parallel analysis of objects in a set of files
- parallel execution of scripts
- on a cluster of heterogeneous machines
79(No Transcript)
80(No Transcript)
81(No Transcript)
82Useful links
- Projects
- EDG (European Data Grid) http//eu-datagrid.web.
cern.ch/eu-datagrid/ - GGF (Global Grid Forum) http//www.gridforum.org
/ - Globus http//www.globus.org/
- LCG (LHC Computing Grid) http//lcg.web.cern.ch/
LCG/ - Pool (Pool Of persistent Objects for LHC)
http//pool.cern.ch/