Distributed Computing and Analysis

About This Presentation

Title:

Distributed Computing and Analysis

Description:

UK GRIPP. Netherlands DutchGrid. Germany UNICORE, Grid project ... can be accessed from everywhere and by 'everything' (desktop, laptop, PDA, phone) ... – PowerPoint PPT presentation

Number of Views:108

Avg rating:3.0/5.0

Slides: 83

Provided by: wp56

Category:

more less

Transcript and Presenter's Notes

Title: Distributed Computing and Analysis

1
Distributed Computingand Analysis

Lamberto Luminari
Italo Hellenic School of Physics 2004
Martignano - May 20, 2004

2
Outline

Introduction
General remarks
Distributed computing
Principles
Projects
Computing facilities testbeds and production
infrastructures
Database Systems
Principles
Distributed analysis
Requirements and issues

3
General remarks

Schematic approach
For the purpose of clarity, differences among
possible alternatives are stressed in reality,
solutions are often a mix or a compromise
Only main features of relevant items are
described no aim of exhaustivity
HEP (LHC) oriented presentation
Examples are mainly taken from HEP world
Projects with HEP community involvement are
preferred
Options chosen by LHC

4
Distributed Computing
5
Distributed computing

What is it
processing of data and objects across a network
of connected systems
hardware and software infrastructure that
provides pervasive (and inexpensive) access to
computational capabilities.
A long story
mainframes more and more expensive
cluster technology
RISC machines very powerful.
What makes it appealing now
CPU power!
Storage capacity!!
Network bandwidth!!!
... but Distr. Comp. is not a choice,
rather a necessity or an opportunity.

6
Network performances
7
Advantages of distributed computing

Scalability and flexibility
in principle, distributed computing systems are
infinitely scalable simply add more units and
get more computing power. Moreover you can add or
remove specific resources and adapt the system to
your needs.
Efficiency
private resources are usually poorly used
pooling them greatly increases their
exploitation.
Reliability
failure of a component little affects the overall
performances.
Load balancing and averaging
distributing tasks according to the availability
of resources optimize the behavior of the whole
system and minimize the execution time
load peaks arising from different user
communities rarely sum up, then the use of
resources is averaged (and optimized) over long
periods.

8
Disadvantages of distributed computing

Difficult integration and coordination
many heterogeneous computing systems have to be
integrated
data sets are splitted over different storage
systems
many users have to cooperate and share resources.
Unpredictability
the quantity of available resources may largely
fluctuate
computing units may become unavailable or
unreachable suddenly and for long periods, making
unpredictable the completion time of the tasks
running there.
Security problems
distributed systems are prone to intrusion.

9
Applications and distributed computing

Suitable
high compute to data ratio
batch processes
loosely coupled tasks
statistical evaluations dependent on random
trials
data mining through distributed filesystems or
databases.
Unsuitable
real time
interactive processes
strongly coupled
sequential.

10
Distributed computing architectures

Peer-to-peer
flat organization of components, with similar
functionalities, talking to each other
suitable for
independent tasks or poor inter-task
communication
access to sparse data organized in a non
hierarchical way.
Client - server
components with different functionalities and
roles
processing unit (client) provided with a
lightweight agent able to perform simple
operations detect system status and notify it to
the server, ask (or wait) for tasks, accept and
send data, execute processes according to
priorities or in spare cycles, ....
dedicated unit (server) provided with complex
software able to take or send computing
requests, monitor the status of the jobs sent to
the clients, receive the results and assemble
them, possibly in a database. It also takes care
of security and access policy, and stores
statistics and accounting data.
suitable for
complex architectures and tasks.

11
Multi-tier computing systems

Components with different levels of service,
arranged in tiers
computing centers (multi-processors, PC farms,
data storage systems)
clusters of dedicated machines
individual, general use PCs.
Different functionalities for each tier
amount of CPU power installed and data stored
quality and schedule of user support
level of reliability and security.

12
(No Transcript)
13
Distributed computing models

Clusters
groups of homogeneous, tightly coupled
components, sharing file systems and peripheral
devices (e.g., Beowulf)
Pools of desktop PCs
loosely interconnected private machines (e.g.,
Condor)
Grids
heterogeneous systems of (mainly dedicated)
resources (e.g., LCG).

14
Comparison of computing models
15

Condor is a specialized workload management
system for compute-intensive jobs. It provides
a job queueing mechanism
scheduling policy
priority scheme
resource monitoring
resource management.
Users submit their serial or parallel jobs to
Condor, which places them into a queue, chooses
when and where to run the jobs based upon a
policy, carefully monitors their progress, and
ultimately informs the user upon completion.
Unique mechanisms enable Condor to effectively
harness wasted CPU power from otherwise idle
desktop workstations. Condor is able to
transparently produce a checkpoint and migrate a
job to a different machine.
Condor does not require a shared file system
across machines if no shared file system is
available, Condor can transfer the job's data
files on behalf of the user, or Condor may be
able to transparently redirect all the job's I/O
requests back to the submit machine.

16
resources
data
network
17
Distributed computing environment

DCE standards
A distributed computing network may include many
different systems. The Distributed Computing
Environment (DCE) formulated by The Open Group
formalizes the technologies needed to make the
components communicate with each other, such as
remote procedural calls and middleware. DCE runs
on all major computing platforms and is designed
to support distributed applications in
heterogeneous hardware and software environments.
DCE provides a complete infrastructure, with
services, interfaces, protocols, encoding rules
for
authentication and security (Kerberos, Public Key
certificate)
objects interoperability across different
platforms (CORBA Common Object Request Broker
Architecture)
directories (with global name and cell name) for
distributed resources
time services (including synchronization)
distributed file systems
Remote Procedure Call
Internet/Intranet communications.

18
Grid computing specifications

The Global Grid Forum (GGF) is the primary
organization whose purpose is to define
specifications about Grid Computing. It is a
forum for information exchange and collaboration
among people who are
doing Grid research,
designing and building Grid software,
deploying Grids,
using Grids,
spanning technology areas scheduling, data
handling, security
The Globus Toolkit (developed in Argonne Nat.
Lab. and Univ. of Southern California) is an
implementation of these standards, and has become
a de facto standard for grid middleware because
of some attractive features
a object-oriented approach, which allows
developers of specific applications to take just
what meets their needs, to introduce tools one at
a time and to make programs increasingly
"Grid-enabled
the toolkit software is open-source this
allows developers to freely make and add
improvements.

19
Globus toolkit

Practically all major Grid projects are being
built on protocols and services provided by the
Globus Toolkit, a software "work-in-progress"
which is being developed by the Globus Alliance,
which involves primarily Ian Foster's team at
Argonne National Laboratory and Carl Kesselman's
team at the University of Southern California in
Los Angeles.
The toolkit provides a set of software tools to
implement the basic services and capabilities
required to construct a computational Grid, such
as security, resource location, resource
management, and communications.
Globus includes programs such as
Computing Element receives job requests and
delivers them to the Worker Nodes, which will
perform the real work. The Computing Element
provides an interface to the local batch queuing
systems. A Computing Element can manage one or
more Worker Nodes

20
Globus Toolkit

The Globus toolkit provides a set of software
tools to implement the basic services and
capabilities required to construct a
computational Grid, such as security, resource
location, resource management, and
communications
GRAM (Globus Resource Allocation Manager), to
convert a request for resources into commands
that local computers can understand
GSI (Grid Security Infrastructure), to provide
authentication of the user and work out that
person's access rights
MDS (Monitoring and Discovery Service), to
collect information about resource (processing
capacity, bandwidth capacity, type of storage,
etc)
GRIS (Grid Resource Information Service), to
query resources for their current configuration,
capabilities, and status
GIIS (Grid Index Information Service), to
coordinate arbitrary GRIS services
GridFTP, to provide a high-performance, secure
and robust data transfer mechanism
Replica Catalog, a catalog that allows other
Globus tools to look up where on the Grid other
replicas of a given dataset can be found
Replica Management system, which ties together
the Replica Catalog and GridFTP technologies,
allowing applications to create and manage
replicas of large datasets.

21
OGSA the future?
22
Grid projects
and many others!
23
Grid projects

UK GRIPP
Netherlands DutchGrid
Germany UNICORE, Grid project
France Grid funding approved
Italy INFN Grid
Eire Grid project
Switzerland - Network/Grid project
Hungary DemoGrid
Norway, Sweden NorduGrid

NASA Information Power Grid
DOE Science Grid
NSF National Virtual Observatory
NSF GriPhyN
DOE Particle Physics Data Grid
NSF TeraGrid
DOE ASCI Grid
DOE Earth Systems Grid
DARPA CoABS Grid
NEESGrid
DOH BIRN
NSF iVDGL
Grid2003
.

DataGrid (CERN, ...)
EuroGrid (Unicore)
DataTag (CERN,)
Astrophysical Virtual Observatory
GRIP (Globus/Unicore)
GRIA (Industrial applications)
GridLab (Cactus Toolkit)
CrossGrid (Infrastructure Components)
EGSO (Solar Physics)
EGEE

24
Middleware projects relevant for HEP

EDG
European Data Grid (EU project)
EGEE
Enabling Grids for E-science in Europe (EU
project)
Grid2003
joint project of the U.S. Grid projects iVDGL,
GriPhyN and PPDG, and the U.S. participants in
the LHC experiments ATLAS and CMS.

25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
LCG hierarchical information service
29
Replica management
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Job submission steps (1)
35
Job submission steps (2)
36
Portals

Why a portal?
It can be accessed from everywhere and by
everything (desktop, laptop, PDA, phone).
It can keep the same user interface independently
of the underlying middleware.
It must be redundantly secure at all levels
secure for web transactions,
secure for user credentials,
secure for user authentication,
secure at VO level.
All available grid services must be incorporated
in a logic way, just one mouse click away.
Its layout must be easily understandable and user
friendly.

37
(No Transcript)
38
(No Transcript)
39
Computing facilities (1)

Computing facilities (testbeds or production
infrastructures) are made up of one or more
nodes. Each node (computer center or cluster of
resources) contains a certain number of
components, which may be playing different roles.
Some are site specific
Computing Element receives job requests and
delivers them to the Worker Nodes, which will
perform the real work. The Computing Element
provides an interface to the local batch queuing
systems. A Computing Element can manage one or
more Worker Nodes
Worker Node the machine that will actually
process data. Typically managed via a local batch
system. A Worker Node can also be installed on
the same machine as the Computing Element.
Storage Element provides storage space to the
facility. The storage element may control large
disk arrays, mass storage systems and the like
however, the SE interface hides the differences
between these systems allowing uniform user
access.
User Interface the machine that allows users to
access the facility. This is typically the
machine the end-user logs into to submit jobs to
the grid and to retrieve the output from those
jobs.

40
Computing facilities (2)

Some other roles are shared by groups of users or
by thwe whole grid
Resource Broker receives users' requests and
queries the Information Index to find suitable
resources.
Information Index resides on the same machine as
the Resource Broker, keeps information about the
available resources.
Replica Manager coordinates file replication
from one Storage Element to another. Useful for
data redundancy but also to move data closer to
the machines which will perform computation.
Replica Catalog can reside on the same machine
as the Replica Manager, keeps information about
file replicas. A logical file can be associated
to one or more physical files which are replicas
of the same data. Thus a logical file name can
refer to one or more physical file names.

41
Computing facilities relevant for HEP

EDG
Testbed
LCG
Production infrastructure
EGEE
Production infrastructure
Grid3
Production infrastructure operated jointly by the
U.S. Grid projects iVDGL, GriPhyN and PPDG, and
the U.S. participants in the LHC experiments
ATLAS and CMS.

42
(No Transcript)
43
(No Transcript)
44
LCG hybrid architecture
Multi-tier hierarchy Grids
45
(No Transcript)
46
EGEE Timeline

May 2003 proposal submitted
July 2003 proposal accepted
April 2004 start project

47
Grid3 infrastructure
48
Virtual Organizations (User Communities)
I. Foster
49
Multi-VO and one Grid
Grid (shared resources and services)
50
One VO and multi-Grid
ATLAS Production System
51
Multi-VO and multi-Grid
Shared Resources and Services
VO services and private resources
VO services and private resources
Shared Resources and Services
Shared Resources and Services
VO services and private resources
VO services
VO services
52
HEP Requirements

User requirements
Concerning services, the HEP community has
already made a lot of work within EDG and LCG.
The basic requirements have already been
specified as use cases for HEP data processing (
HEPCAL report, May 2002). Using the HEPCAL
document to provide templates for requirements
analysis, the EDG/AWG(Application Working Group)
aim at defining requirements for a high level
common application layer based on the needs of
HEP, Bio-medicine and Earth Sciences, and is.
High level APIs for Grid Services have also been
defined by the EU funded project Gridlab.
Concerning resources, the production service must
provide a continuous, stable, robust environment
and a controlled, reliable access to the
resources. The agreed sharing policies must be
fully implemented and easily changeable.
Besides implementing the user requirements,
practical help should be given in interfacing the
experiment applications to grid services, and to
evaluate the performance of the software
deployed within the production environment, as
well as in pre-production testbeds.

53
Security

Security Policy
The security organizational model, often tailored
so far on the needs and characteristics of
homogeneous communities, should in the future be
based on service needs of many heterogeneous
V.O.s, introducing in the Grid organizational
and security model a new complexity.
CA Policy
A European Grid Policy Management Authority is a
prerequisite for running a Grid infrastructure
both in Europe and worldwide. The Grid Security
Infrastructure relies on trusted Certification
Authorities (CA). It is therefore essential that
a network of CAs, based on a commonly agreed set
of requirements, is established and maintained in
Europe.

54
VO management

As more and more communities will join common
production infrastructures, VO management is
becoming crucial.
Current technology offers support for rather
static and large communities. The assignment of
access rights is separated into two parts local
resource administrators grant rights to the VO as
a whole, while VO administrators grant them to
individual members of the community.
In the future there will be the need for small
(even only two people), short-lived (of the order
of few days) and unforeseen (dynamically
discovered) VOs. The goal would be to provide a
very fine-grained authorization and access
control mechanism, where applicable based on
global standards.

55
Resource allocation and usage

Resource allocation and reservation
In order to meet the needs of all the different
Grid users, mechanisms will be required to
control and balance usage of the resources
(including networks) by highly demanding
applications, and to categorise and prioritise
jobs so that they can receive the required level
of service.
In particular, users should be able to allocate
resources both immediately and in advance.
Allocations must be restricted to authenticated
users acting within authorized roles, the
services available must be determined by policies
agreed with the user organisations, and the
aggregate services made available to VOs must be
monitored to ensure adherence to the agreements.
Resource usage and accounting
A major issue is the control of usage of
resources, once access to them has been
established. This includes interfaces to
traditional Usage Control mechanisms such as
quotas and limits, and also the extraction and
recording of usage for Budgeting, Accounting and
Auditing purposes.
The usage quotas may be owned either by
individuals or by VO's, and specified both in
site-specific or Grid-wide protocols. This will
include the ability to allow enforcement of
quotas across a set of distributed resources.

56
Organizational issues

The need for resource sharing gives rise to a set
of organisational issues to be faced, analysed
and solved. Indeed, when a given organisation
makes its own resources available on line
Each organisation has its own decision and
management independence the resources to be
shared with other organisations should not
jeopardize such independence.
Each organisation has its own access policies.
It's not true that everybody in the Grid can use
everything, but it's true that new generations of
network and grid technologies allow to define new
sharing models. Each organisation should be able
to decide on each individual data, on each
individual resource and on which organisation
have the access/use right.
Each organisation has its own security policies
University security policies are usually
completely different from those of physics
laboratory that works in close co-operation with
government and the army. In order to guarantee a
real resources sharing among different kinds of
organisations, it's necessary to ensure the
maximum level of flexibility in the management of
the above mentioned issues.

57
Requirements in LCG

Requirements are set by Experiments in the SC2
Requirements and Technical Assessment Groups
(RTAGs)
On applications
data persistency
software support process
mathematical libraries
detector geometry description
Monte Carlo generators
applications architectural blueprint
detector simulation
On Fabrics
mass storage requirements
On Grid technology and deployment area
Grid technology use cases
Regional Center categorization

58
HEPCAL
LCG RTAG Common Use Cases for a HEP Common
Application Layer Requirements are given as a
set of use cases free of implementation details

GENERAL USE CASES
Obtain Grid Authorization
Revoke Grid Authorization
Grid Login
Browse Grid Resources

59
HEPCAL

DATA MANAGEMENT USE CASES
Data Set (DS) Metadata Update
DS Metadata Access
DS Registration
Virtual DS Declaration
Virtual DS Materialization
DS Upload
Catalogue Creation
DS Access
DS transfer to non-Grid storage
DS Replica Upload
DS Access Cost Evaluation
DS Replication
Physical DS Instance Deletion
DS Deletion
Catalogue Deletion
Read from Remote DS
DS Verification
DS Browsing

60
HEPCAL

JOB MANAGEMENT USE CASES
Job Catalogue Update
Job Catalogue Query
Job Submission
Job Output Access or Retrieval
Job Error Recovery
Job Control
Steer Job Submission
Job Resource Estimation
Job Environment Modification
Job Splitting
Production Job
Analysis
DS Transformation
Job Monitoring
Conditions Publishing
Software Publishing
Simulation Job
Expt Software Dev for Grid

61
HEPCAL

VO MANAGEMENT USE CASES
Configuring the VO
Configuring the DS metadata catalogue (either
initially or reconfiguring).
Configuring the job catalogue (either initially
or reconfiguring).
Configuring the user profile (if this is possible
at all on a VO basis).
Adding or removing VO elements, e.g. computing
elements, storage elements, etc
Configuring VO elements, including quotas,
privileges etc.
Managing the Users
Add and remove users to/from the VO.
Modify the user information ( privileges, quotas,
priorities) either for single users or for
subgroups of users within a VO.
VO wide resource reservation
The Grid should provide a tool to estimate the
time-to-completion given as input an estimate of
the resources needed by the job. This is needed
in particular to estimate the access cost.
There should be use cases for releasing reserved
resources, and system use cases for what to do in
case a user does not submit a job for which
resources are reserved.
VO wide resource allocation to users or
groups/users of a VO
Software (or condition set) publishing, i.e.
making it available on the Grid

62
Database Systems
63
Database Systems
CERN IT DB Group

Database
one or more, large structured sets of persistent
data. Usually associated with software to update
and query the data. A simple database might be a
single file containing many records, each of
which contains the same set of fields, where each
field is a certain fixed width. A database is one
component of a database management system.
Database Management System (DBMS)
a set of programs (functions) that allows to
manage the large, structured sets of persistent
data, which make up the database, and provide
access to the data for multiple, concurrent users
whilst maintaining the integrity of the data. The
DBMS is in charge of all the functionalities
related to the database access, security,
storage

64
Database Management Systems
CERN IT DB Group

DBMS provides
security facilities to prevent unauthorized users
from accessing the system, using names and
passwords to identify operators, programs and
individual machines and sets of privileges
assigned to them these privileges can include
the ability to read, write and update data in the
database
lock facilities to maintain data integrity locks
are used for read and write to chunks of data by
doing this only one user at a time can alter data
or users can be prevented from accessing data
being changed. These requirements are referred as
ACID (Atomicity, Consistency, Isolation and
Durability)
Atomicity all the parts of a transaction's
execution are either all committed or all rolled
back. All changes take effect, or none do. This
ensures that there is not erroneous data in the
systems or data which does not correspond to
other data as it should.
Consistency the database is transformed from one
valid state to another valid state. A transaction
is legal only if it obeys user-defined integrity
constraints. Illegal transactions aren't allowed
and, if an integrity constraint can't be
satisfied the transaction is rolled back to its
previously valid state and the user informed that
the transaction has failed.
Isolation the results of a transaction are
invisible to other transactions until the
transaction is complete.
Durability once a transaction has been
committed (completed), the results of a
transaction are permanent and can survive future
system and media failures.

65
Database Systems
CERN IT DB Group

Databases are based on many different models,
each of which is designed with a specific
problem, industry or set of functions in mind.
Here we attempt to look at the main types in some
depth
Relational Databases data are structured in a
series of tables, which have columns representing
the variables and rows that contain specific
instances of data. Currently the most wide spread
model.
Object Oriented Databases information is stored
as a persistent object, and not as a row in a
table. User defines objects and operations which
can be executed on them.
Object Relational Databases relational systems
to which object oriented functions are added.
They allow data to be manipulated in the form of
objects, as well as providing the traditional
relational interface.
Distributed Databases data are stored on two or
more computers, called nodes, and that these
nodes are connected over a network across a
country, continent or planet.
Multimedia Databases model for storing several
different types of file i.e. text, audio, video
and images in a single database.
Network Databases organizes data in a network of
linked records. A very early form of database,
fast but not very adaptable, which is little used
at present.
Hierarchical Databases data are stored as
records, linked with Parent-Child Relationships.
Mostly used in the past on mainframes.

66
Relational Database Systems
CERN IT DB Group

The Relational Model is one of the oldest models
used for creating a database, and the one that is
used by the majority of businesses today. It was
first outlined in a paper published by Ted Codd
in 1970. The relational model is based on Set
Theory and Predicate Logic
set theory allows data to be structured in a
series of tables, which have columns representing
the variables and rows that contain specific
instances of data. These tables are organized
using normalization, which is a process (derived
from Normal Forms theory) of reducing the
occurrences of repeated data by breaking it into
smaller pieces and creating new tables (e.g.,
personal data of a customer).
predicate logic is the basis of the query
language, i.e. the set of commands that allows to
insert, retrieve, modify or delete data,
according to some specified criteria. Data can
also be virtually or effectively joined in new
tables.
The current standard for relational databases is
set out in the Structured Query Language. Version
2 of the language is currently in use with
Version 3 expected to be released in the near
future by the International Standards
Organization (ISO) and American National
Standards Institution (ANSI).
The most widely used relational database systems
are produced by Oracle Corporation, Microsoft,
Sybase, IBM, but there is a large number of other
RDBMS designed to be either a general system or
for specific applications used in HEP, like MySQL
and PostgreSQL.

67
Object Oriented Database Systems
CERN IT DB Group

The ODBMSs were introduced to overcome many
restrictions imposed by the relational model on
certain types of data (mainly in case of huge
amounts or complex structures). Its main
advantage is the degree of low level control of
the system it allows the programmer. This gives
the programmer control of how the data is to be
store and manipulated
information is stored as a persistent object (and
not as a row in a table). This makes it more
efficient in terms of storage space requirements
and ensures that users can only manipulate data
in the ways the programmer has specified. It also
saves on the disk space needed for queries, as
instead of having to allocate resources for the
results, the space required is already there in
the objects themselves.
Because of the specific low level methods used in
a ODBMS, it is very difficult for third parties
to produce add-on products. Whilst relational
databases can benefit from software which has
been produced by other vendors, users of ODBMS's
either have to produce additional software in
house, by contracting other firms or in
collaboration with other organizations using the
same system.
The first commercially available object oriented
DBMS became available in the mid-1980's. By the
early 1990's there were a range of ODBMS's
available from a variety of vendors.
Objectivity/DB is the most widely used in HEP
community.

68
Distributed Database Systems
CERN IT DB Group

Distributed databases have the common
characteristics that they are stored on two or
more computers, called nodes, connected over a
network. They are classified as homogeneous and
heterogeneous
homogeneous databases use the same DBMS software
and have the same applications on each node. They
have a common schema (a file specifying the
structure of the database), and can have varying
degrees of local autonomy. They can be based on
any DBMS which supports this function, but it is
not possible to have more than one DBMS type in
the system. To be efficient, they have to have
very large network connections and a lot of
processing power.
heterogeneous databases have a very high degree
of local autonomy. Each node in the system has
its own local users, applications and data and
dealing with them itself, and only connects to
other nodes for information it does not have.
This type of distributed database is often just
called a federated system or a federation. It is
becoming more popular with organizations, both
for its scalability and the reduced cost in being
able to add extra nodes when necessary and the
ability to mix software packages. Unlike the
homogenous systems, heterogeneous systems can
include different database management systems in
the system. This makes them appealing to
organizations since they can incorporate legacy
systems and data into new systems.

69
Beyond standard Database Systems
70
(No Transcript)
71
(No Transcript)
72
(No Transcript)
73
Distributed Analysis
74
Distributed Analysis

Within LCG a working group, with representatives
from all LHC experiments is working on a
blueprint architecture for grid services ARDA (A
Roadmap to Distributed Analysis). This will
serve as a first input to the EGEE Architecture
team. The HEPCAL work is continuing in the
framework of the LCG/GAG (Grid Applications
Group), developing use cases and requirements for
the analysis of physics data. This will also
give important input to architecture and design
work.
GAG reports
Hepcal
Systematic descriptions of HEP Grid Use Cases
CERN-LCG-2002-020 (29 May 2002)
lcg.web.cern.ch/LCG/sc2/RTAG4/finalreport.doc
Hepcal-prime cern.ch/fca/HEPCAL-prime.doc
Hepcal 2
Analysis Use Cases
CERN-LCG-2003-032 (29 October 2003)
lcg.web.cern.ch/LCG/SC2/GAG/HEPCAL-II.doc

75
ARDA working group mandate

To review the current Distributed Analysis
activities and to capture their architectures in
a consistent way
To confront these existing projects to the HEPCAL
II use cases and the user's potential work
environments in order to explore potential
shortcomings.
To consider the interfaces between Grid, LCG and
experiment specific services
Review the functionality of experiment-specific
packages, state of advancement and role in the
experiment
Identify similar functionalities in the different
packages
Identify functionalities and components that
could be integrated in the generic GRID
middleware
To confront the current projects with critical
GRID areas
To develop a roadmap specifying wherever possible
the architecture, the components and potential
sources of deliverables to guide the medium term
(2 year) work of the LCG and the DA planning in
the experiments.

76
ARDA Architecture
77
SEAL Overview
Shared Environment for Applications at LHC

SEAL aims to
Provide the software infrastructure, basic
frameworks, libraries and tools that are common
among the LHC experiments
Select, integrate, develop and support foundation
and utility class libraries
Develop a coherent set of basic framework
services to facilitate the integration of LCG and
non - LCG software
The scope of the SEAL project is basically the
scope of the LCG Applications Area.

78
PROOF (Parallel ROOT Facility)