Third%20LCB%20Workshop

About This Presentation

Title:

Third%20LCB%20Workshop

Description:

Define RC Architectures and Services. Provide Guidelines for the final Models ... centers. Tape Mass Storage & Disk Servers. Database Servers. Physics. Software ... – PowerPoint PPT presentation

Number of Views:98

Avg rating:3.0/5.0

Slides: 71

Provided by: cms561

Category:

more less

Transcript and Presenter's Notes

Title: Third%20LCB%20Workshop

1

Third LCB Workshop
Distributed Computing and Regional Centres
Session
Harvey B. Newman (CIT)
Marseilles, September 29, 1999
http//l3www.cern.ch/newman/marseillessep29.ppt
http//l3www.cern.ch/newman/marseillessep29/index
.htm

2
LHC Computing Different from Previous Experiment
Generations

Geographical dispersion of people and resources
Complexity the detector and the LHC environment
Scale Petabytes per year of data

1800 Physicists 150 Institutes 32
Countries
Major challenges associated with ?
Coordinated Use of Distributed computing
resources ? Remote software development and
physics analysis ? Communication and
collaboration at a distance RD New Forms
of Distributed Systems
3
Challenges Complexity

Events
Signal event is obscured by 20 overlapping
uninteresting collisions in same crossing
Track reconstruction time at 1034
Luminosityseveral times 1033
Time does not scale from previous generations

4
HEP Bandwidth Needs Price Evolution

HEP GROWTH
1989 - 1999 A Factor of one to Several Hundred
on Principal Transoceanic
Links
A Factor of Up to 1000 in
Domestic Academic and
Research Nets
HEP NEEDS
1999 - 2006 Continued Study by ICFA-SCIC
1998 Results of ICFA-NTF Show A
Factor of One to Several Hundred (2X
Per Year)
COSTS ( to Vendors)
Optical Fibers and WDM a factor gt 2/year
reduction now ?Limits of Transmission Speed,
Electronics, Protocol Speed
PRICE to HEP ?
Complex Market, but Increased Budget likely to be
neededReference BW/Price Evolution 1.5
times/year

5
Cost Evolution CMS 1996 Versus1999 Technology
Tracking Team
CMS 1996 Estimates
1996 Estimates
1996 Estimates

Compare to 1999 Technology Tracking Team
Projections for 2005
CPU Unit cost will be close to early
prediction
Disk Will be more expensive (by 2) than early
prediction
Tape Currently Zero to 10 Annual Cost
Decrease (Potential Problem)

6
LHC (and HENP) Computing and
Software Challenges

Software Modern Languages, Methods and
Tools The Key to Manage Complexity
FORTRAN The End of an EraOBJECTS A
Coming of Age
TRANSPARENT Access To Data
Location and Storage Medium Independence
Data Grids A New Generation of Data-Intensive
Network-Distributed Systems for Analysis
A Deep Heterogeneous Client/Server
Hierarchy, of Up to 5 Levels
An Ensemble of Tape and Disk Mass Stores
LHC Object Database Federations
Interaction of the Software and Data Handling
Architectures The Emergence of New Classes of
Operating Systems

7
Four Experiments
The Petabyte to Exabyte Challenge

ATLAS, CMS, ALICE, LHCB
Higgs and New particles Quark-Gluon Plasma CP
Violation

Data written to tape 5 Petabytes/Year
and UP (1 PB 1015 Bytes)
0.1 to 1 Exabyte (1 EB
1018 Bytes) (2010) (2020 ?) Total for
the LHC Experiments
8
To Solve the HENP Data
Problem

While the proposed future computing and data
handling facilities are large by present-day
standards,They will not support FREE access,
transport or reconstruction for more than a
Minute portion of the data.
Need effective global strategies to handle and
prioritise requests, based on both policies and
marginal utility
Strategies must be studied and prototyped, to
ensure Viability acceptable turnaround times
efficient resource utilization
Problems to be Explored How To
Meet the demands of hundreds of users who need
transparent access to local and remote data, in
disk caches and tape stores
Prioritise hundreds to thousands of requests
from local and remote communities
Ensure that the system is dimensioned
optimally, for the aggregate demand

9
MONARC

Models Of Networked Analysis
At Regional Centers
Caltech, CERN, Columbia, FNAL, Heidelberg,
Helsinki, INFN, IN2P3, KEK, Marseilles, MPI,
Munich, Orsay, Oxford, Tufts
GOALS
Specify the main parameters characterizing
the Models performance throughputs, latencies
Develop Baseline Models in the feasible
category
Verify resource requirement baselines
(computing, data handling, networks)
COROLLARIES
Define the Analysis Process
Define RC Architectures and Services
Provide Guidelines for the final Models
Provide a Simulation Toolset for Further Model
studies

622 Mbits/s
FNAL/BNL 4.106 MIPS 200 Tbyte Robot
Desk tops
622 Mbits/s
Desk tops
University n.106MIPS 100 Tbyte Robot
Optional Air Freight
622 Mbits/s
CERN 6.107 MIPS 2000 Tbyte Robot
Desk tops
622Mbits/s
Model Circa 2006
622 Mbits/s
622 Mbits/s
10
MONARC General Conclusions on LHC
Computing

Following discussions of computing and network
requirements, technology evolution and projected
costs, support requirements etc.
The scale of LHC Computing is such that it
requires a worldwide effort to accumulate the
necessary technical and financial resources
The uncertainty in the affordable network BW
implies that several scenarios of computing
resource-distribution must be developed
A distributed hierarchy of computing centres will
lead to better useof the financial and manpower
resources of CERN, the Collaborations,and the
nations involved, than a highly centralised model
focused at CERN
Hence The distributed model also provides better
use of physics opportunities at the LHC by
physicists and students
At the top of the hierarchy is the CERN Center,
with the ability to perform allanalysis-related
functions, but not the ability to do them
completely
At the next step in the hierarchy is a collection
of large, multi-service Tier1
Regional Centres, each with
10-20 of the CERN capacity devoted to one
experiment
There will be Tier2 or smaller special purpose
centers in many regions

11
Grid-Hierarchy Concept

Matched to the Worldwide-Distributed
Collaboration Structure of LHC Experiments
Best Suited for the Multifaceted
Balance Between
Proximity of the data to centralized processing
resources
Proximity to end-users for frequently accessed
data
Efficient use of limited network bandwidth
(especially transoceanic and many world
regions)through organized caching/mirroring/repli
cation
Appropriate use of (world-) regional and local
computing and data handling resources
Effective involvement of scientists and students
in eachworld region, in the data analysis and
the physics

12
MONARC Phase 1 and 2Deliverables

September 1999 Benchmark test validating the
simulation
Milestone completed
Fall 1999 A Baseline Model representing a
possible (somewhat simplified)
solution for LHC Computing.
Baseline numbers for a set of system and
analysis process parameters
CPU times, data volumes, frequency and site of
jobs and data...
Reasonable ranges of parameters
Derivatives How the effectiveness depends
on some of the more sensitive parameters
Agreement of the experiments on the
reasonableness of the Baseline Model
Chapter on Computing Models in the CMS and ATLAS
Computing Technical Progress Reports

13
MONARC and Regional Centres

MONARC RC Representative Meetings in April and
August
Regional Centre Planning well-advanced, with
optimistic outlook, in US (FNAL for CMS BNL for
ATLAS), France (CCIN2P3), Italy
Proposals to be submitted late this year or
early next
Active RD and prototyping underway, especially
in US, Italy, Japanand UK (LHCb), Russia (MSU,
ITEP), Finland (HIP/Tuovi)
Discussions in the national communities also
underway in Japan, Finland, Russia, UK, Germany
Varying situations according to the funding
structure and outlook
Need for more active planning outside of US,
Europe, Japan, Russia
Important for RD and overall planning
There is a near-term need to understand the level
and sharing ofsupport for LHC computing between
CERN and the outside institutes, to enable the
planning in several countries to advance.
MONARC CMS/SCB assumption traditional 1/3
2/3 sharing

14
MONARC Working Groups Chairs

Analysis Process Design
P. Capiluppi (Bologna, CMS)
Architectures
Joel Butler (FNAL, CMS)
Simulation
Krzysztof Sliwa (Tufts, ATLAS)
Testbeds
Lamberto Luminari (Rome, ATLAS)
Steering
Laura Perini (Milan, ATLAS)
Harvey Newman (Caltech, CMS)
Regional Centres Committee

15
MONARC Architectures WG

Discussion and study of Site Requirements
Analysis task division between CERN and RC
Facilities required with different analysis
scenarios, and network bandwidth
Support required to (a) sustain the Centre, and
(b) contribute effectively to the distributed
system
Reports
Rough Sizing Estimates for a Large LHC
Experiment Facility
Computing Architectures of Existing
Experiments
LEP, FNAL Run2, CERN Fixed Target (NA45, NA48),
FNAL Fixed Target (KTeV, FOCUS)
Regional Centres for LHC Computing
(functionality services)
Computing Architectures of Future Experiments
(in progress)
Babar, RHIC, COMPASS
Conceptual Designs, Drawings and
Specifications for Candidate Site Architecture

16
Comparisons with LHC sized experiment CMS or
ATLAS

Total CPU CMS or ATLAS 1.5-2,000,000
MSi95 (Current Concepts maybe for 1033
Luminosity)

17
Architectural Sketch One Major LHC Experiment,
At CERN (L. Robertson)

Mass Market Commodity PC Farms
LAN-SAN and LAN-WAN Stars (Switch/Routers)
Tapes (Many Drives for ALICE) an archival
medium only ?

18
MONARC Architectures WG Lessons and Challenges
for LHC

SCALE 100 Times more CPU and 10 Times more
Data than CDF at Run2 (2000-2003)
DISTRIBUTION Mostly Achieved in HEP Only for
Simulation. For Analysis (and some
re-Processing), it will not happen without
advance planning and commitments
REGIONAL CENTRES Require Coherent support,
continuity, the ability to maintain the code
base, calibrations and job parameters
up-to-date
HETEROGENEITY Of facility architecture and
mode of use, and of operating systems, must be
accommodated.
FINANCIAL PLANNING Analysis of the early
planning for the LEP era showed a definite
tendency to underestimate the more requirements
(by more than an order of magnitude)
Partly due to budgetary considerations

19
Regional Centre ArchitectureExample by I. Gaines
Tape Mass Storage Disk Servers Database Servers
Tier 2
Local institutes
Data Import
Data Export
Production Reconstruction Raw/Sim ?
ESD Scheduled, predictable experiment/ physics
groups
Production Analysis ESD ? AOD AOD ?
DPD Scheduled Physics groups
Individual Analysis AOD ? DPD and
plots Chaotic Physicists
CERN
Tapes
Desktops
Physics Software Development
RD Systems and Testbeds
Info servers Code servers
Web Servers Telepresence Servers
Training Consulting Help Desk
20
MONARC Architectures WGRegional Centre
Facilities Services

Regional Centres Should Provide
All technical and data services required to do
physics analysis
All Physics Objects, Tags and Calibration data
Significant fraction of raw data
Caching or mirroring calibration constants
Excellent network connectivity to CERN and the
regions users
Manpower to share in the development of common
maintenance, validation and production software
A fair share of post- and re-reconstruction
processing
Manpower to share in the work on Common RD
Projects
Service to members of other regions on a (?)
best effort basis
Excellent support services for training,
documentation, troubleshooting at the Centre
or remote sites served by it
Long Term Commitment for staffing, hardware
evolution and supportfor RD, as part of the
distributed data analysis architecture

21
MONARC Analysis Process WG

How much data is processed by how many people,
how often, in how many places, with which
priorities
Analysis Process Design Initial Steps
Consider number and type of processing and
analysis jobs, frequency, number of events, data
volumes, CPU etc.
Consider physics goals, triggers, signals and
background rates
Studies covered Reconstruction, Selection/Sample
Reduction (one or more passes), Analysis,
Simulation
Lessons from existing experiments are limited
each case is tuned to the detector, run
conditions, physics goals and technology of the
time
Limited studies so far, from the user rather
than the system point of view more as
feedback from simulations are obtained
Limitations on CPU dictate a largely Physics
Analysis Group oriented approach to
reprocessing of data
And Regional (local) support for individual
activities
Implies dependence on the RC Hierarchy

22
MONARC Analysis ProcessInitial Sharing
Assumptions

Assume similar computing capacity available
outside CERN for re-processing and data
analysis
There is no allowance for event simulation and
reconstruction of simulated data, which it is
assumed will be performed entirely outside CERN
Investment, services and infrastructure should be
optimised to reduce overall costs TCO
Tape sharing makes sense if Alice needs so much
more at a different time of the year
First two assumptions would likely result in
at least a 1/32/3 CERNOutside ratio of
resources(I.e., likely to be larger outside).

23
MONARC Analysis Process Example

24
MONARC Analysis Process BaselineGroup-Oriented
Analysis

25
MONARC Baseline Analysis ProcessATLAS/CMS
Reconstruction Step

26
Monarc Analysis Model Baseline Event Sizes and
CPU Times

Sizes
Raw data 1 MB/event
ESD 100 KB/event
AOD 10 KB/event
TAG or DPD 1 KB/event
CPU Time in SI95 seconds
(without ODBMS overhead 20)
Creating ESD (from Raw) 350
Selecting ESD 0.25
Creating AOD (from ESD) 2.5
Creating TAG (from AOD) 0.5
Analyzing TAG or DPD 3.0
Analyzing AOD 3.0
Analyzing ESD 3.0
Analyzing RAW 350

27
Monarc Analysis Model Baseline ATLAS or CMS at
CERN Center

CPU Power 520 KSI95
Disk space 540 TB
Tape capacity 3 PB, 400 MB/sec
Link speed to RC 40 MB/sec (1/2 of 622 Mbps)
Raw data 100 1-1.5 PB/year
ESD data 100 100-150 TB/year
Selected ESD 100 20 TB/year
Revised ESD 100 40 TB/year
AOD data 100 2 TB/year
Revised AOD 100 4 TB/year
TAG/DPD 100 200 GB/year
Simulated data 100 100 TB/year (repository)
Covering all Analysis Groups each selecting
1 of Total ESD or AOD data for a Typical
Analysis

28
Monarc Analysis Model Baseline ATLAS or CMS at
CERN Center
LHCb (Prelim.)

CPU Power 520 KSI95
Disk space 540 TB
Tape capacity 3 PB, 400 MB/sec
Link speed to RC 40 MB/sec (1/2 of 622 Mbps)
Raw data 100 1-1.5 PB/year
ESD data 100 100-150 TB/year
Selected ESD 100 20 TB/year
Revised ESD 100 40 TB/year
AOD data 100 2 TB/year
Revised AOD 100 4 TB/year
TAG/DPD 100 200 GB/year
Simulated data 100 100 TB/year (repository)
Some of these Basic Numbers require
further Study

300 KSI95 ? 200 TB/yr 140 TB/yr
1-10 TB/yr 70 TB/yr
29
Monarc Analysis Model Baseline ATLAS or CMS
Typical Tier1 RC

CPU Power 100 KSI95
Disk space 100 TB
Tape capacity 300 TB, 100 MB/sec
Link speed to Tier2 10 MB/sec (1/2 of 155 Mbps)
Raw data 1 10-15 TB/year
ESD data 100 100-150 TB/year
Selected ESD 25 5 TB/year
Revised ESD 25 10 TB/year
AOD data 100 2 TB/year
Revised AOD 100 4 TB/year
TAG/DPD 100 200 GB/year Simulated data 25 25
TB/year (repository)
Covering Five Analysis Groups each
selecting 1 of Total ESD or AOD data for a
Typical Analysis
Covering All Analysis Groups

30
MONARC Analysis Process WGA Short List of
Upcoming Issues

Priorities, schedules and policies
Production vs. Analysis Group vs. Individual
activities
Allowed percentage of access to higher data
tiers (TAG /Physics Objects/Reconstructed/RAW)
Improved understanding of the Data Model, and
ODBMS
Including MC production simulated data storage
and access
Mapping the Analysis Process onto heterogeneous
distributed resources
Determining the role of Institutes workgroup
servers and desktops, in the Regional Centre
Hierarchy
Understanding how to manage persistent data
e.g. storage / migration / transport /
re-compute strategies
Deriving a methodology for Model testing and
optimisation
Metrics for evaluating the global efficiency of
a Model Cost vs throughput turnaround
reliability of data access

31
MONARC Testbeds WG

Measurements of Key Parameters governing the
behavior and scalability of the Models
Simple testbed configuration defined and
implemented
Sun Solaris 2.6, C compiler version 4.2
Objectivity 5.1 with /C, /stl, /FTO, /Java
options
Set up at CNAF, FNAL, Genova, Milano, Padova,
Roma, KEK, Tufts, CERN
Four Use Case Applications Using Objectivity
ATLASFAST, GIOD/JavaCMS, ATLAS 1 TB Milestone,
CMS Test Beams
System Performance Tests Simulation Validation
Milestone Carried Out See I. Legrand talk

32
MONARC Testbed Systems
33
MONARC Testbeds WG Isolation of Key Parameters

Some Parameters Measured,Installed in the MONARC
Simulation Models,and Used in First Round
Validation of Models.
Objectivity AMS Response Time-Function, and its
dependence on
Object clustering, page-size, data
class-hierarchy and access pattern
Mirroring and caching (e.g. with the Objectivity
DRO option)
Scalability of the System Under Stress
Performance as a function of the number of jobs,
relative to the single-job performance
Performance and Bottlenecks for a variety of
data access patterns
Frequency of following TAG ? AOD AOD ? ESD
ESD ? RAW
Data volume accessed remotely
Fraction on Tape, and on Disk
As Function of Net Bandwidth Use of QoS

34
MONARC Simulation

A CPU- and code-efficient approach for the
simulation of distributed systemshas been
developed for MONARC
provides an easy way to map the distributed data
processing, transport, and analysis tasks onto
the simulation
can handle dynamically any Model
configuration,including very elaborate ones with
hundreds of interacting complex Objects
can run on real distributed computer systems,
and may interact with real components

The Java (JDK 1.2) environment is well-suited
for developinga flexible and distributed process
oriented simulation.
This Simulation program is still under
development, and dedicated measurements to
evaluate realistic parameters and validate the
simulation program are in progress.

35
Example Physics Analysis at Regional Centres

Similar data processing jobs are performed
in several RCs
Each Centre has TAG and AOD databases
replicated.
Main Centre provides ESD and RAW data
Each job processes AOD data, and also a a
fraction of ESD and RAW.

36
Example Physics Analysis

37
Simple Validation Measurements The AMS Data
Access Case
Simulation
Measurements
4 CPUs Client
LAN
Raw Data
DB
38
MONARC Strategy and Tools for Phase 2

Strategy Vary System Capacity and Network
Performance Parameters Over a Wide Range
Avoid complex, multi-step decision processes
that could require protracted study.
Keep for a possible Phase 3
Majority of the workload satisfied in an
acceptable time
Up to minutes for interactive queries, up to
hours for short jobs, up to a few days for
the whole workload
Determine requirements baselines and/or flaws
in certain Analysis Processes in this way
Perform a comparison of a CERN-tralised Model,
and suitable variations of Regional Centre
Models
Tools and Operations to be Designed in Phase 2
Query estimators
Affinity evaluators, to determine proximity of
multiple requests in space or time
Strategic algorithms for caching, reclustering,
mirroring, or pre-emptively moving
data (or jobs or parts of jobs)

39
MONARC Phase 2Detailed Milestones
July 1999 Complete Phase 1 Begin Second Cycle
of Simulationswith More Refined Models
40
MONARC Possible Phase 3

TIMELINESS and USEFUL IMPACT
Facilitate the efficient planning and design of
mutually compatible site and network
architectures, and services
Among the experiments, the CERN Centre
and Regional Centres
Provide modelling consultancy and service to the
experiments and Centres
Provide a core of advanced RD activities, aimed
at LHC computing system optimisation and
production prototyping
Take advantage of work on distributed
data-intensive computingfor HENP this year in
other next generation projects
For example in US Particle Physics Data Grid
(PPDG) of DoE/NGI A Physics Optimized Grid
Environment for Experiments (APOGEE) to
DoE/HENP joint GriPhyN proposal to NSF by
ATLAS/CMS/LIGO
See H. Newman, http//www.cern.ch/MONARC/progr
ess_report/longc7.html

41
MONARC Phase 3

Possible Technical Goal System
OptimisationMaximise Throughput and/or Reduce
Long Turnaround
Include long and potentially complex
decision-processesin the studies and simulations
Potential for substantial gains in the work
performed or resources saved
Phase 3 System Design Elements
RESILIENCE, resulting from flexible management of
each data transaction, especially over WANs
FAULT TOLERANCE, resulting from robust fall-back
strategies to recover from abnormal conditions
SYSTEM STATE PERFORMANCE TRACKING, to match and
co-schedule requests and resources, detect
or predict faults
Synergy with PPDG and other Advanced RD
Projects.
Potential Importance for Scientific Research and
IndustrySimulation of Distributed Systems for
Data-Intensive Computing.

42
MONARC Status Conclusions

MONARC is well on its way to specifying baseline
Models representing cost-effective solutions
to LHC Computing.
Initial discussions have shown that LHC computing
has a new scale and level of complexity.
A Regional Centre hierarchy of networked centres
appears to be the most promising solution.
A powerful simulation system has been developed,
and we areconfident of delivering a very useful
toolset for further model studies by the end of
the project.
Synergy with other advanced RD projects has been
identified.This may be of considerable mutual
benefit.
We will deliver important information, and
example Models
That is very timely for the Hoffmann Review and
discussions of LHC Computing over the next
months
In time for the Computing Progress Reports of
ATLAS and CMS

43
LHC Data Models RD45

HEP data models are complex!
Rich hierarchy of hundreds of complex data
types (classes)
Many relations between them
Different access patterns (Multiple Viewpoints)
LHC experiments rely on OO technology
OO applications deal with networks of objects
(and containers)
Pointers (or references) are used to describe
relations
Existing solutions do not scale
Solution suggested by RD45 ODBMS coupled to a
Mass Storage System

Event
Tracker
Calorimeter
TrackList
HitList
Track
Hit
Hit
Track
Track
Hit
Hit
Track
Hit
Track
44
System View of Data Analysis by 2005

Multi-Petabyte Object Database Federation
Backed by a Networked Set of Archival Stores
High Availability and Immunity from Corruption
Seamless response to database queries
Location Independence storage brokers caching
Clustering and Reclustering of Objects
Transfer only useful data
Tape/disk across networks disk/client
Access and Processing Flexibility
Resource and application profiling, state
tracking, co-scheduling
Continuous retrieval/recalculation/storage
decisions
Trade off data storage, CPU and network
capabilities to optimize performance and costs

45
CMS Analysis and Persistent Object Store

Data Organized In a(n Object) Hierarchy
Raw, Reconstructed (ESD), Analysis Objects (AOD),
Tags
Data Distribution
All raw, reconstructed and master parameter DBs
at CERN
All event TAG and AODs at all regional centers
Selected reconstructed data sets at each regional
center
HOT data (frequently accessed) moved to RCs

CMS
L1
Slow Control Detector Monitoring
L4
L2/L3
Filtering
Persistent Object Store Object Database
Management System
Simulation
Calibrations, Group Analyses
User Analysis
Common Filters and Pre-Emptive Object Creation
On Demand Object Creation
46
GIOD Summary (Caltech/CERN/FNAL/HP/SDSC)

GIOD has
Constructed a Terabyte-scale set of fully
simulated events and used these to create a large
OO database
Learned how to create large database federations
Completed 100 (to 170) Mbyte/sec CMS Milestone
Developed prototype reconstruction and analysis
codes, and Java 3D OO visualization prototypes,
that work seamlessly with persistent
objects over networks
Deployed facilities and database federations as
useful testbeds for Computing Model
studies

Hit
Track
Detector
47
Babar OOFS Putting The Pieces Together
48
Dynamic Load Balancing Hierarchical Secure AMS

Defer Request Protocol
Transparently delays client while data is made
available
Accommodates high latency storage systems (e.g.,
tape)
Request Redirect Protocol
Redirects client to an alternate AMS
Provides for dynamic replication and real-time
load balancing

49
Regional Centers ConceptA Data Grid Hierarchy

LHC Grid Hierarchy Example
Tier0 CERN
Tier1 National Regional Center
Tier2 Regional Center
Tier3 Institute Workgroup Server
Tier4 Individual Desktop
Total 5 Levels

50
Background Why Grids?
For transparent, rapid access and delivery of
Petabyte-scale data(and Multi-TIPS computing
resources)

I. Foster, ANL/Chicago
Because the resources needed to solve complex
problems are rarely colocated
Advanced scientific instruments
Large amounts of storage
Large amounts of computing
Groups of smart people
For a variety of reasons
Resource allocations not optimized for one
application
Required resource configurations change
Different views of priorities and truth

51
Grid Services Architecture
Adapted from Ian Foster there are computing
grids, data grids, access (collaborative)
grids,...
52
Roles of HENP Projectsfor Distributed Analysis (
? Grids)

RD45, GIOD Networked Object Databases
Clipper/GC High speed access to Objects
or File data FNAL/SAM for
processing and analysis
SLAC/OOFS Distributed File System
Objectivity Interface
NILE, Condor Fault Tolerant Distributed
Computing with Heterogeneous CPU Resources
MONARC LHC Computing Models Architecture,
Simulation, Testbeds Strategy, Politics
PPDG First Distributed Data Services and
Grid System Prototype
ALDAP OO Database Structures and
Access Methods for Astrophysics and HENP
Data
APOGEE Full-Scale Grid Design
Instrumentation, System Modeling and
Simulation, Evaluation/Optimization
GriPhyN Production Prototype Grid in
Hardware and Software then Production

53
ALDAP (NSF/KDI) Project

ALDAP Accessing Large Data Archives in
Astronomy and Particle Physics
NSF Knowledge Discovery Initiative (KDI)
CALTECH, Johns Hopkins, FNAL(SDSS)
Explore advanced adaptive database structures,
physical data storage hierarchies for archival
storage of next generation astronomy and
particle physics data
Develop spatial indexes, novel data
organizations, distribution and delivery
strategies, for Efficient and transparent
access to data across networks
Create prototype network-distributed data query
execution systems using Autonomous Agent workers
Explore commonalities and find effective common
solutions for particle physics and astrophysics
data

54
The China Clipper ProjectA Data Intensive Grid
ANL-SLAC-Berkeley

China Clipper Goal
Develop and demonstrate middleware allowing
applications transparent, high-speed access to
large data sets distributed over wide-area
networks.
? Builds on expertise and assets at ANL, LBNL
SLAC
? NERSC, ESnet
? Builds on Globus Middleware and
high-performance distributed storage
system (DPSS from LBNL)
? Initial focus on large DOE HENP applications
? RHIC/STAR, BaBar
? Demonstrated data rates to 57 Mbytes/sec.

55
HENP Grand Challenge/Clipper Testbed and Tasks

High-Speed Testbed
Computing and networking (NTON, ESnet)
infrastructure
Differentiated Network Services
Traffic shaping on ESnet
End-to-end Monitoring Architecture (QE, QM, CM)
Traffic analysis, event monitor agents to
support traffic shaping and CPU scheduling
Transparent Data Management Architecture
OOFS/HPSS, DPSS/ADSM
Application Demonstration
Standard Analysis Framework (STAF)
Access data at SLAC, LBNL, or ANL (net and data
quality)

56
The Particle Physics Data Grid (PPDG)

DoE/NGI Next Generation Internet Project
ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC,
U.Wisc/CS
Goal To be able to query and partially retrieve
data from PB data stores across Wide Area
Networks within seconds
Drive progress in the development of the
necessary middleware, networks and fundamental
computer science of distributed systems.
Deliver some of the infrastructure for widely
distributed data analysis at multi-PetaByte
scales by 100s to 1000s of physicists

57
PPDG First Year DeliverableSite-to-Site
Replication Service
PRIMARY SITE Data Acquisition, CPU, Disk, Tape
Robot
SECONDARY SITE CPU, Disk, Tape Robot

Network Protocols Tuned for High Throughput
Use of DiffServ for (1) Predictable high
priority delivery of high - bandwidth data
streams (2) Reliable background transfers
Use of integrated instrumentation to
detect/diagnose/correct problems in long-lived
high speed transfers NetLogger DoE/NGI
developments
Coordinated reservaton/allocation techniques
for storage-to-storage performance
First Year Goal Optimized cached read access
to 1-10 Gbytes, drawn from a total data set of
order One Petabyte

58
PPDG Multi-site Cached File Access System
PRIMARY SITE Data Acquisition, Tape, CPU, Disk,
Robot
Satellite Site Tape, CPU, Disk, Robot
University CPU, Disk, Users
Satellite Site Tape, CPU, Disk, Robot
Satellite Site Tape, CPU, Disk, Robot
University CPU, Disk, Users
University CPU, Disk, Users
59
First Year PPDG System Components

Middleware Components (Initial Choice) See PPDG
Proposal Page 15
Object and File-Based Objectivity/DB (SLAC
enhanced) Application Services GC Query Object,
Event Iterator, Query Monitor
FNAL SAM System
Resource Management Start with Human
Intervention (but begin to deploy resource
discovery mgmnt tools)
File Access Service Components of OOFS
(SLAC)
Cache Manager GC Cache Manager (LBNL)
Mass Storage Manager HPSS, Enstore, OSM
(Site-dependent)
Matchmaking Service Condor (U.
Wisconsin)
File Replication Index MCAT
(SDSC)
Transfer Cost Estimation Service Globus (ANL)
File Fetching Service Components of OOFS
File Movers(s)
SRB (SDSC) Site specific
End-to-end Network Services Globus tools for
QoS reservation
Security and authentication Globus (ANL)

60
PPDG Middleware Architecture for Reliable High
Speed Data Delivery
Resource Management
Object-based and File-based Application Services
File Replication Index
Matchmaking Service
File Access Service
Cost Estimation
Cache Manager
File Fetching Service
Mass Storage Manager
File Mover
File Mover
End-to-End Network Services
Site Boundary
Security Domain
61
PPDG Developments In 2000-2001

Co-Scheduling algorithms
Matchmaking and Prioritization
Dual-Metric Prioritization
Policy and Marginal Utility
DiffServ on Networks to segregate tasks
Performance Classes
Transaction Management
Cost Estimators
Application/TM Interaction
Checkpoint/Rollback
Autonomous Agent Hierarchy

62
Beyond Traditional ArchitecturesMobile Agents
(Java Aglets)
Agents are objects with rules and legs -- D.
Taylor
Agent
Service
Agent
Application

Mobile Agents Reactive, Autonomous, Goal Driven,
Adaptive
Execute Asynchronously
Reduce Network Load Local Conversations
Overcome Network Latency Some Outages
Adaptive ? Robust, Fault Tolerant
Naturally Heterogeneous
Extensible Concept Agent Hierarchies

63
Distributed Data Delivery and LHC Software
Architecture

LHC Software and/or Analysis Process must
account for data and resource-related realities
Delay for data location, queueing, scheduling
sometimes for transport and reassembly
Allow for long transaction times,
performance shifts, errors, out-of-order arrival
of data
Software Architectural Choices
Traditional, single-threaded applications
Allow for data arrival and reassembly OR
Performance-Oriented (Complex)
I/O requests up-front multi-threaded data
driven respond to ensemble of (changing) cost
estimates
Possible code movement as well as data movement
Loosely coupled, dynamic e.g. Agent-based
implementation

64
GriPhyN First Production Scale Grid Physics
Network

Develop a New Form of Integrated Distributed
System, while Meeting Primary Goals of the US
LIGO and LHC Programs
Single Unified GRID System Concept Hierarchical
Structure
(Sub-)Implementations, for LIGO, SDSS, US CMS,
US ATLAS
20 Centers Few Each in US for LIGO, CMS,
ATLAS, SDSS
Aspects Complementary to Centralized DoE Funding
University-Based Regional Tier2 Centers,
Partnering with the Tier1 Centers
Emphasis on Training, Mentoring and Remote
Collaboration
Making the Process of Search and Discovery
Accessible to Students
GriPhyN Web Site http//www.phys.ufl.edu/avery/m
re/
White Paper http//www.phys.ufl.edu/avery/mre/wh
ite_paper.html

65
APOGEE/GriPhyN Data Grid Implementation

An Integrated Distributed System of Tier1 and
Tier2 Centers
Flexible relatively low-cost (PC-based) Tier2
architectures
Medium-scale (for the LHC era) data storage and
I/O capability
Well-adapted to local operation modest system
engineer support
Meet changing local and regional needs in the
active, early phases of
data analysis
Interlinked with Gbps Network Links Internet2
and Regional Nets Circa 2001-2005
State of the Art QoS techniques to prioritise
and shape traffic, to manage
bandwidth. Preview transoceanic BW, within the US
A working Production-Prototype (2001-2003) for
Petabyte-Scale Distributed Computing Models
Focus on LIGO ( BaBar and Run2) handling of
real data, and LHC Mock Data
Challenges with simulated data
Meet the needs, and learn from system
performance under stress

66
VRVS From Videoconferencing to Collaborative
Environments

gt 1400 registered hosts, 22 reflectors, 34
Countries
Running in U.S. Europe and Asia
Switzerland CERN (2)
Italy CNAF Bologna
UK Rutherford Lab
France IN2P3 Lyon, Marseilles
Germany Heidelberg Univ.
Finland FUNET
Spain IFCA-Univ. Cantabria
Russia Moscow State Univ., Tver. U.
U.S
Caltech, LBNL, SLAC, FNAL,
ANL, BNL, Jefferson Lab.
DoE HQ Germantown
Asia Academia Sinica, Taiwan
South America CeCalcula, Venezuala

67
(No Transcript)
68
Role of Simulationfor Distributed Systems

Simulations are widely recognized and used as
essential tools for the design, performance
evaluation and optimisation of complex
distributed systems
From battlefields to agriculture from the
factory floor to telecommunications systems
Discrete event simulations with an appropriate
and high level of abstraction are powerful tools
Time intervals, interrupts and performance/load
characteristics are the essentials
Not yet an integral part of the HENP culture, but
Some experience in trigger, DAQ and tightly
coupledcomputing systems CERN CS2 models
Simulation is a vital part of the study of site
architectures, network behavior, data
access/processing/delivery strategies,
for HENP Grid Design and Optimization

69
Monitoring ArchitectureUse of NetLogger as in
CLIPPER

End-to-end monitoring of grid assets is
needed to
Resolve network throughput problems
Dynamically schedule resources
Add precision-timed event monitor agents to
ATM switches
DPSS servers
Testbed computational resources
Produce trend analysis modules for monitor
agents
Make results available to applications
See talk by B. Tierney

70
Summary

The HENP/LHC Data Analysis Problem
Worldwide-distributed Petabyte scale compacted
binary data, and computing resources
Development of a robust networked data access
and analysis system is mission-critical
An aggressive RD program is required
to develop systems for reliable data access,
processing and analysis across an ensemble
of networks
An effective inter-field partnership is now
developing through many RD projects
HENP analysis is now one of the driving forces
for the development of Data Grids
Solutions to this problem could be widely
applicable in other scientific fields and
industry, by LHC startup

71
LHC Computing Upcoming Issues

Cost of Computing at CERN for the LHC Program
May Exceed 100 MCHF at CERN Correspondingly
More in Total
Some ATLAS/CMS Basic Numbers (CPU, 100 kB Reco.
Event) from 1996 Require Further Study
We cannot scale up from previous generations
(new methods)
CERN/Outside Sharing MONARC and CMS/SCB Use
1/32/3 Rule
Computing Architecture and Cost Evaluation
Integration and Total Cost of Ownership
Possible Role of Central I/O Servers
Manpower Estimates
CERN versus scaled Regional Centre estimates
Scope of services and support provided
Limits of CERN support and service and the need
for Regional Centres
Understanding that LHC Computing is Different
A different scale and worldwide distributed
computing for the first time
Continuing, System RD is required