Title: Third%20LCB%20Workshop
1- Third LCB Workshop
- Distributed Computing and Regional Centres
Session - Harvey B. Newman (CIT)
- Marseilles, September 29, 1999
- http//l3www.cern.ch/newman/marseillessep29.ppt
- http//l3www.cern.ch/newman/marseillessep29/index
.htm
2LHC Computing Different from Previous Experiment
Generations
- Geographical dispersion of people and resources
- Complexity the detector and the LHC environment
- Scale Petabytes per year of data
1800 Physicists 150 Institutes 32
Countries
Major challenges associated with ?
Coordinated Use of Distributed computing
resources ? Remote software development and
physics analysis ? Communication and
collaboration at a distance RD New Forms
of Distributed Systems
3Challenges Complexity
- Events
- Signal event is obscured by 20 overlapping
uninteresting collisions in same crossing - Track reconstruction time at 1034
Luminosityseveral times 1033 - Time does not scale from previous generations
4HEP Bandwidth Needs Price Evolution
- HEP GROWTH
- 1989 - 1999 A Factor of one to Several Hundred
on Principal Transoceanic
Links - A Factor of Up to 1000 in
Domestic Academic and
Research Nets - HEP NEEDS
- 1999 - 2006 Continued Study by ICFA-SCIC
1998 Results of ICFA-NTF Show A
Factor of One to Several Hundred (2X
Per Year) - COSTS ( to Vendors)
- Optical Fibers and WDM a factor gt 2/year
reduction now ?Limits of Transmission Speed,
Electronics, Protocol Speed
PRICE to HEP ? - Complex Market, but Increased Budget likely to be
neededReference BW/Price Evolution 1.5
times/year
5Cost Evolution CMS 1996 Versus1999 Technology
Tracking Team
CMS 1996 Estimates
1996 Estimates
1996 Estimates
- Compare to 1999 Technology Tracking Team
Projections for 2005 - CPU Unit cost will be close to early
prediction - Disk Will be more expensive (by 2) than early
prediction - Tape Currently Zero to 10 Annual Cost
Decrease (Potential Problem)
6LHC (and HENP) Computing and
Software Challenges
- Software Modern Languages, Methods and
Tools The Key to Manage Complexity - FORTRAN The End of an EraOBJECTS A
Coming of Age - TRANSPARENT Access To Data
- Location and Storage Medium Independence
- Data Grids A New Generation of Data-Intensive
Network-Distributed Systems for Analysis - A Deep Heterogeneous Client/Server
Hierarchy, of Up to 5 Levels - An Ensemble of Tape and Disk Mass Stores
- LHC Object Database Federations
- Interaction of the Software and Data Handling
Architectures The Emergence of New Classes of
Operating Systems
7Four Experiments
The Petabyte to Exabyte Challenge
- ATLAS, CMS, ALICE, LHCB
- Higgs and New particles Quark-Gluon Plasma CP
Violation
Data written to tape 5 Petabytes/Year
and UP (1 PB 1015 Bytes)
0.1 to 1 Exabyte (1 EB
1018 Bytes) (2010) (2020 ?) Total for
the LHC Experiments
8To Solve the HENP Data
Problem
- While the proposed future computing and data
handling facilities are large by present-day
standards,They will not support FREE access,
transport or reconstruction for more than a
Minute portion of the data. - Need effective global strategies to handle and
prioritise requests, based on both policies and
marginal utility - Strategies must be studied and prototyped, to
ensure Viability acceptable turnaround times
efficient resource utilization - Problems to be Explored How To
- Meet the demands of hundreds of users who need
transparent access to local and remote data, in
disk caches and tape stores - Prioritise hundreds to thousands of requests
from local and remote communities - Ensure that the system is dimensioned
optimally, for the aggregate demand
9MONARC
- Models Of Networked Analysis
At Regional Centers - Caltech, CERN, Columbia, FNAL, Heidelberg,
- Helsinki, INFN, IN2P3, KEK, Marseilles, MPI,
Munich, Orsay, Oxford, Tufts - GOALS
- Specify the main parameters characterizing
the Models performance throughputs, latencies - Develop Baseline Models in the feasible
category - Verify resource requirement baselines
(computing, data handling, networks) - COROLLARIES
- Define the Analysis Process
- Define RC Architectures and Services
- Provide Guidelines for the final Models
- Provide a Simulation Toolset for Further Model
studies
622 Mbits/s
FNAL/BNL 4.106 MIPS 200 Tbyte Robot
Desk tops
622 Mbits/s
Desk tops
University n.106MIPS 100 Tbyte Robot
Optional Air Freight
622 Mbits/s
CERN 6.107 MIPS 2000 Tbyte Robot
Desk tops
622Mbits/s
Model Circa 2006
622 Mbits/s
622 Mbits/s
10MONARC General Conclusions on LHC
Computing
- Following discussions of computing and network
requirements, technology evolution and projected
costs, support requirements etc. - The scale of LHC Computing is such that it
requires a worldwide effort to accumulate the
necessary technical and financial resources - The uncertainty in the affordable network BW
implies that several scenarios of computing
resource-distribution must be developed - A distributed hierarchy of computing centres will
lead to better useof the financial and manpower
resources of CERN, the Collaborations,and the
nations involved, than a highly centralised model
focused at CERN - Hence The distributed model also provides better
use of physics opportunities at the LHC by
physicists and students - At the top of the hierarchy is the CERN Center,
with the ability to perform allanalysis-related
functions, but not the ability to do them
completely - At the next step in the hierarchy is a collection
of large, multi-service Tier1
Regional Centres, each with - 10-20 of the CERN capacity devoted to one
experiment - There will be Tier2 or smaller special purpose
centers in many regions
11Grid-Hierarchy Concept
- Matched to the Worldwide-Distributed
Collaboration Structure of LHC Experiments - Best Suited for the Multifaceted
- Balance Between
- Proximity of the data to centralized processing
resources - Proximity to end-users for frequently accessed
data - Efficient use of limited network bandwidth
(especially transoceanic and many world
regions)through organized caching/mirroring/repli
cation - Appropriate use of (world-) regional and local
computing and data handling resources - Effective involvement of scientists and students
in eachworld region, in the data analysis and
the physics
12MONARC Phase 1 and 2Deliverables
- September 1999 Benchmark test validating the
simulation - Milestone completed
- Fall 1999 A Baseline Model representing a
possible (somewhat simplified)
solution for LHC Computing. - Baseline numbers for a set of system and
analysis process parameters - CPU times, data volumes, frequency and site of
jobs and data... - Reasonable ranges of parameters
- Derivatives How the effectiveness depends
on some of the more sensitive parameters - Agreement of the experiments on the
reasonableness of the Baseline Model - Chapter on Computing Models in the CMS and ATLAS
Computing Technical Progress Reports
13MONARC and Regional Centres
- MONARC RC Representative Meetings in April and
August - Regional Centre Planning well-advanced, with
optimistic outlook, in US (FNAL for CMS BNL for
ATLAS), France (CCIN2P3), Italy - Proposals to be submitted late this year or
early next - Active RD and prototyping underway, especially
in US, Italy, Japanand UK (LHCb), Russia (MSU,
ITEP), Finland (HIP/Tuovi) - Discussions in the national communities also
underway in Japan, Finland, Russia, UK, Germany - Varying situations according to the funding
structure and outlook - Need for more active planning outside of US,
Europe, Japan, Russia - Important for RD and overall planning
- There is a near-term need to understand the level
and sharing ofsupport for LHC computing between
CERN and the outside institutes, to enable the
planning in several countries to advance. - MONARC CMS/SCB assumption traditional 1/3
2/3 sharing
14MONARC Working Groups Chairs
-
- Analysis Process Design
- P. Capiluppi (Bologna, CMS)
- Architectures
- Joel Butler (FNAL, CMS)
- Simulation
- Krzysztof Sliwa (Tufts, ATLAS)
- Testbeds
- Lamberto Luminari (Rome, ATLAS)
- Steering
- Laura Perini (Milan, ATLAS)
- Harvey Newman (Caltech, CMS)
-
- Regional Centres Committee
15MONARC Architectures WG
- Discussion and study of Site Requirements
- Analysis task division between CERN and RC
- Facilities required with different analysis
scenarios, and network bandwidth - Support required to (a) sustain the Centre, and
(b) contribute effectively to the distributed
system - Reports
- Rough Sizing Estimates for a Large LHC
Experiment Facility - Computing Architectures of Existing
Experiments - LEP, FNAL Run2, CERN Fixed Target (NA45, NA48),
FNAL Fixed Target (KTeV, FOCUS) - Regional Centres for LHC Computing
(functionality services) - Computing Architectures of Future Experiments
(in progress) - Babar, RHIC, COMPASS
- Conceptual Designs, Drawings and
Specifications for Candidate Site Architecture
16Comparisons with LHC sized experiment CMS or
ATLAS
- Total CPU CMS or ATLAS 1.5-2,000,000
MSi95 (Current Concepts maybe for 1033
Luminosity)
17Architectural Sketch One Major LHC Experiment,
At CERN (L. Robertson)
- Mass Market Commodity PC Farms
- LAN-SAN and LAN-WAN Stars (Switch/Routers)
- Tapes (Many Drives for ALICE) an archival
medium only ?
18MONARC Architectures WG Lessons and Challenges
for LHC
- SCALE 100 Times more CPU and 10 Times more
Data than CDF at Run2 (2000-2003) - DISTRIBUTION Mostly Achieved in HEP Only for
Simulation. For Analysis (and some
re-Processing), it will not happen without
advance planning and commitments - REGIONAL CENTRES Require Coherent support,
continuity, the ability to maintain the code
base, calibrations and job parameters
up-to-date - HETEROGENEITY Of facility architecture and
mode of use, and of operating systems, must be
accommodated. - FINANCIAL PLANNING Analysis of the early
planning for the LEP era showed a definite
tendency to underestimate the more requirements
(by more than an order of magnitude) - Partly due to budgetary considerations
19Regional Centre ArchitectureExample by I. Gaines
Tape Mass Storage Disk Servers Database Servers
Tier 2
Local institutes
Data Import
Data Export
Production Reconstruction Raw/Sim ?
ESD Scheduled, predictable experiment/ physics
groups
Production Analysis ESD ? AOD AOD ?
DPD Scheduled Physics groups
Individual Analysis AOD ? DPD and
plots Chaotic Physicists
CERN
Tapes
Desktops
Physics Software Development
RD Systems and Testbeds
Info servers Code servers
Web Servers Telepresence Servers
Training Consulting Help Desk
20MONARC Architectures WGRegional Centre
Facilities Services
- Regional Centres Should Provide
- All technical and data services required to do
physics analysis - All Physics Objects, Tags and Calibration data
- Significant fraction of raw data
- Caching or mirroring calibration constants
- Excellent network connectivity to CERN and the
regions users - Manpower to share in the development of common
maintenance, validation and production software - A fair share of post- and re-reconstruction
processing - Manpower to share in the work on Common RD
Projects - Service to members of other regions on a (?)
best effort basis - Excellent support services for training,
documentation, troubleshooting at the Centre
or remote sites served by it - Long Term Commitment for staffing, hardware
evolution and supportfor RD, as part of the
distributed data analysis architecture
21MONARC Analysis Process WG
- How much data is processed by how many people,
how often, in how many places, with which
priorities - Analysis Process Design Initial Steps
- Consider number and type of processing and
analysis jobs, frequency, number of events, data
volumes, CPU etc. - Consider physics goals, triggers, signals and
background rates - Studies covered Reconstruction, Selection/Sample
Reduction (one or more passes), Analysis,
Simulation - Lessons from existing experiments are limited
each case is tuned to the detector, run
conditions, physics goals and technology of the
time - Limited studies so far, from the user rather
than the system point of view more as
feedback from simulations are obtained - Limitations on CPU dictate a largely Physics
Analysis Group oriented approach to
reprocessing of data - And Regional (local) support for individual
activities - Implies dependence on the RC Hierarchy
22MONARC Analysis ProcessInitial Sharing
Assumptions
- Assume similar computing capacity available
outside CERN for re-processing and data
analysis - There is no allowance for event simulation and
reconstruction of simulated data, which it is
assumed will be performed entirely outside CERN
- Investment, services and infrastructure should be
optimised to reduce overall costs TCO - Tape sharing makes sense if Alice needs so much
more at a different time of the year - First two assumptions would likely result in
at least a 1/32/3 CERNOutside ratio of
resources(I.e., likely to be larger outside).
23MONARC Analysis Process Example
24MONARC Analysis Process BaselineGroup-Oriented
Analysis
25MONARC Baseline Analysis ProcessATLAS/CMS
Reconstruction Step
26Monarc Analysis Model Baseline Event Sizes and
CPU Times
- Sizes
- Raw data 1 MB/event
- ESD 100 KB/event
- AOD 10 KB/event
- TAG or DPD 1 KB/event
- CPU Time in SI95 seconds
- (without ODBMS overhead 20)
- Creating ESD (from Raw) 350
- Selecting ESD 0.25
- Creating AOD (from ESD) 2.5
- Creating TAG (from AOD) 0.5
- Analyzing TAG or DPD 3.0
- Analyzing AOD 3.0
- Analyzing ESD 3.0
- Analyzing RAW 350
-
27Monarc Analysis Model Baseline ATLAS or CMS at
CERN Center
- CPU Power 520 KSI95
- Disk space 540 TB
- Tape capacity 3 PB, 400 MB/sec
- Link speed to RC 40 MB/sec (1/2 of 622 Mbps)
- Raw data 100 1-1.5 PB/year
- ESD data 100 100-150 TB/year
- Selected ESD 100 20 TB/year
- Revised ESD 100 40 TB/year
- AOD data 100 2 TB/year
- Revised AOD 100 4 TB/year
- TAG/DPD 100 200 GB/year
- Simulated data 100 100 TB/year (repository)
- Covering all Analysis Groups each selecting
1 of Total ESD or AOD data for a Typical
Analysis
28Monarc Analysis Model Baseline ATLAS or CMS at
CERN Center
LHCb (Prelim.)
- CPU Power 520 KSI95
- Disk space 540 TB
- Tape capacity 3 PB, 400 MB/sec
- Link speed to RC 40 MB/sec (1/2 of 622 Mbps)
- Raw data 100 1-1.5 PB/year
- ESD data 100 100-150 TB/year
- Selected ESD 100 20 TB/year
- Revised ESD 100 40 TB/year
- AOD data 100 2 TB/year
- Revised AOD 100 4 TB/year
- TAG/DPD 100 200 GB/year
- Simulated data 100 100 TB/year (repository)
- Some of these Basic Numbers require
further Study
300 KSI95 ? 200 TB/yr 140 TB/yr
1-10 TB/yr 70 TB/yr
29Monarc Analysis Model Baseline ATLAS or CMS
Typical Tier1 RC
- CPU Power 100 KSI95
- Disk space 100 TB
- Tape capacity 300 TB, 100 MB/sec
- Link speed to Tier2 10 MB/sec (1/2 of 155 Mbps)
- Raw data 1 10-15 TB/year
- ESD data 100 100-150 TB/year
- Selected ESD 25 5 TB/year
- Revised ESD 25 10 TB/year
- AOD data 100 2 TB/year
- Revised AOD 100 4 TB/year
- TAG/DPD 100 200 GB/year Simulated data 25 25
TB/year (repository) - Covering Five Analysis Groups each
selecting 1 of Total ESD or AOD data for a
Typical Analysis - Covering All Analysis Groups
30MONARC Analysis Process WGA Short List of
Upcoming Issues
- Priorities, schedules and policies
- Production vs. Analysis Group vs. Individual
activities - Allowed percentage of access to higher data
tiers (TAG /Physics Objects/Reconstructed/RAW) - Improved understanding of the Data Model, and
ODBMS - Including MC production simulated data storage
and access - Mapping the Analysis Process onto heterogeneous
distributed resources - Determining the role of Institutes workgroup
servers and desktops, in the Regional Centre
Hierarchy - Understanding how to manage persistent data
e.g. storage / migration / transport /
re-compute strategies - Deriving a methodology for Model testing and
optimisation - Metrics for evaluating the global efficiency of
a Model Cost vs throughput turnaround
reliability of data access
31MONARC Testbeds WG
- Measurements of Key Parameters governing the
behavior and scalability of the Models - Simple testbed configuration defined and
implemented - Sun Solaris 2.6, C compiler version 4.2
- Objectivity 5.1 with /C, /stl, /FTO, /Java
options - Set up at CNAF, FNAL, Genova, Milano, Padova,
Roma, KEK, Tufts, CERN - Four Use Case Applications Using Objectivity
- ATLASFAST, GIOD/JavaCMS, ATLAS 1 TB Milestone,
CMS Test Beams - System Performance Tests Simulation Validation
Milestone Carried Out See I. Legrand talk
32MONARC Testbed Systems
33MONARC Testbeds WG Isolation of Key Parameters
- Some Parameters Measured,Installed in the MONARC
Simulation Models,and Used in First Round
Validation of Models. -
- Objectivity AMS Response Time-Function, and its
dependence on - Object clustering, page-size, data
class-hierarchy and access pattern - Mirroring and caching (e.g. with the Objectivity
DRO option) - Scalability of the System Under Stress
- Performance as a function of the number of jobs,
relative to the single-job performance - Performance and Bottlenecks for a variety of
data access patterns - Frequency of following TAG ? AOD AOD ? ESD
ESD ? RAW - Data volume accessed remotely
- Fraction on Tape, and on Disk
- As Function of Net Bandwidth Use of QoS
34MONARC Simulation
- A CPU- and code-efficient approach for the
simulation of distributed systemshas been
developed for MONARC - provides an easy way to map the distributed data
processing, transport, and analysis tasks onto
the simulation - can handle dynamically any Model
configuration,including very elaborate ones with
hundreds of interacting complex Objects - can run on real distributed computer systems,
and may interact with real components
- The Java (JDK 1.2) environment is well-suited
for developinga flexible and distributed process
oriented simulation. - This Simulation program is still under
development, and dedicated measurements to
evaluate realistic parameters and validate the
simulation program are in progress.
35Example Physics Analysis at Regional Centres
- Similar data processing jobs are performed
in several RCs - Each Centre has TAG and AOD databases
replicated. - Main Centre provides ESD and RAW data
- Each job processes AOD data, and also a a
fraction of ESD and RAW.
36Example Physics Analysis
37Simple Validation Measurements The AMS Data
Access Case
Simulation
Measurements
4 CPUs Client
LAN
Raw Data
DB
38MONARC Strategy and Tools for Phase 2
- Strategy Vary System Capacity and Network
Performance Parameters Over a Wide Range - Avoid complex, multi-step decision processes
that could require protracted study. - Keep for a possible Phase 3
- Majority of the workload satisfied in an
acceptable time - Up to minutes for interactive queries, up to
hours for short jobs, up to a few days for
the whole workload - Determine requirements baselines and/or flaws
in certain Analysis Processes in this way - Perform a comparison of a CERN-tralised Model,
and suitable variations of Regional Centre
Models - Tools and Operations to be Designed in Phase 2
- Query estimators
- Affinity evaluators, to determine proximity of
multiple requests in space or time - Strategic algorithms for caching, reclustering,
mirroring, or pre-emptively moving
data (or jobs or parts of jobs)
39MONARC Phase 2Detailed Milestones
July 1999 Complete Phase 1 Begin Second Cycle
of Simulationswith More Refined Models
40MONARC Possible Phase 3
- TIMELINESS and USEFUL IMPACT
- Facilitate the efficient planning and design of
mutually compatible site and network
architectures, and services - Among the experiments, the CERN Centre
and Regional Centres - Provide modelling consultancy and service to the
experiments and Centres - Provide a core of advanced RD activities, aimed
at LHC computing system optimisation and
production prototyping - Take advantage of work on distributed
data-intensive computingfor HENP this year in
other next generation projects - For example in US Particle Physics Data Grid
(PPDG) of DoE/NGI A Physics Optimized Grid
Environment for Experiments (APOGEE) to
DoE/HENP joint GriPhyN proposal to NSF by
ATLAS/CMS/LIGO - See H. Newman, http//www.cern.ch/MONARC/progr
ess_report/longc7.html
41MONARC Phase 3
- Possible Technical Goal System
OptimisationMaximise Throughput and/or Reduce
Long Turnaround - Include long and potentially complex
decision-processesin the studies and simulations - Potential for substantial gains in the work
performed or resources saved - Phase 3 System Design Elements
- RESILIENCE, resulting from flexible management of
each data transaction, especially over WANs - FAULT TOLERANCE, resulting from robust fall-back
strategies to recover from abnormal conditions - SYSTEM STATE PERFORMANCE TRACKING, to match and
co-schedule requests and resources, detect
or predict faults - Synergy with PPDG and other Advanced RD
Projects. - Potential Importance for Scientific Research and
IndustrySimulation of Distributed Systems for
Data-Intensive Computing.
42MONARC Status Conclusions
- MONARC is well on its way to specifying baseline
Models representing cost-effective solutions
to LHC Computing. - Initial discussions have shown that LHC computing
has a new scale and level of complexity. - A Regional Centre hierarchy of networked centres
appears to be the most promising solution. - A powerful simulation system has been developed,
and we areconfident of delivering a very useful
toolset for further model studies by the end of
the project. - Synergy with other advanced RD projects has been
identified.This may be of considerable mutual
benefit. - We will deliver important information, and
example Models - That is very timely for the Hoffmann Review and
discussions of LHC Computing over the next
months - In time for the Computing Progress Reports of
ATLAS and CMS
43LHC Data Models RD45
- HEP data models are complex!
- Rich hierarchy of hundreds of complex data
types (classes) - Many relations between them
- Different access patterns (Multiple Viewpoints)
- LHC experiments rely on OO technology
- OO applications deal with networks of objects
(and containers) - Pointers (or references) are used to describe
relations - Existing solutions do not scale
- Solution suggested by RD45 ODBMS coupled to a
Mass Storage System
Event
Tracker
Calorimeter
TrackList
HitList
Track
Hit
Hit
Track
Track
Hit
Hit
Track
Hit
Track
44System View of Data Analysis by 2005
- Multi-Petabyte Object Database Federation
- Backed by a Networked Set of Archival Stores
- High Availability and Immunity from Corruption
- Seamless response to database queries
- Location Independence storage brokers caching
- Clustering and Reclustering of Objects
- Transfer only useful data
- Tape/disk across networks disk/client
- Access and Processing Flexibility
- Resource and application profiling, state
tracking, co-scheduling - Continuous retrieval/recalculation/storage
decisions - Trade off data storage, CPU and network
capabilities to optimize performance and costs
45CMS Analysis and Persistent Object Store
- Data Organized In a(n Object) Hierarchy
- Raw, Reconstructed (ESD), Analysis Objects (AOD),
Tags - Data Distribution
- All raw, reconstructed and master parameter DBs
at CERN - All event TAG and AODs at all regional centers
- Selected reconstructed data sets at each regional
center - HOT data (frequently accessed) moved to RCs
CMS
L1
Slow Control Detector Monitoring
L4
L2/L3
Filtering
Persistent Object Store Object Database
Management System
Simulation
Calibrations, Group Analyses
User Analysis
Common Filters and Pre-Emptive Object Creation
On Demand Object Creation
46GIOD Summary (Caltech/CERN/FNAL/HP/SDSC)
- GIOD has
- Constructed a Terabyte-scale set of fully
simulated events and used these to create a large
OO database - Learned how to create large database federations
- Completed 100 (to 170) Mbyte/sec CMS Milestone
- Developed prototype reconstruction and analysis
codes, and Java 3D OO visualization prototypes,
that work seamlessly with persistent
objects over networks - Deployed facilities and database federations as
useful testbeds for Computing Model
studies
Hit
Track
Detector
47Babar OOFS Putting The Pieces Together
48Dynamic Load Balancing Hierarchical Secure AMS
- Defer Request Protocol
- Transparently delays client while data is made
available - Accommodates high latency storage systems (e.g.,
tape) - Request Redirect Protocol
- Redirects client to an alternate AMS
- Provides for dynamic replication and real-time
load balancing
49Regional Centers ConceptA Data Grid Hierarchy
- LHC Grid Hierarchy Example
- Tier0 CERN
- Tier1 National Regional Center
- Tier2 Regional Center
- Tier3 Institute Workgroup Server
- Tier4 Individual Desktop
- Total 5 Levels
50Background Why Grids?
For transparent, rapid access and delivery of
Petabyte-scale data(and Multi-TIPS computing
resources)
- I. Foster, ANL/Chicago
- Because the resources needed to solve complex
problems are rarely colocated - Advanced scientific instruments
- Large amounts of storage
- Large amounts of computing
- Groups of smart people
- For a variety of reasons
- Resource allocations not optimized for one
application - Required resource configurations change
- Different views of priorities and truth
51Grid Services Architecture
Adapted from Ian Foster there are computing
grids, data grids, access (collaborative)
grids,...
52Roles of HENP Projectsfor Distributed Analysis (
? Grids)
-
- RD45, GIOD Networked Object Databases
- Clipper/GC High speed access to Objects
or File data FNAL/SAM for
processing and analysis - SLAC/OOFS Distributed File System
Objectivity Interface - NILE, Condor Fault Tolerant Distributed
Computing with Heterogeneous CPU Resources - MONARC LHC Computing Models Architecture,
Simulation, Testbeds Strategy, Politics - PPDG First Distributed Data Services and
Grid System Prototype - ALDAP OO Database Structures and
Access Methods for Astrophysics and HENP
Data - APOGEE Full-Scale Grid Design
Instrumentation, System Modeling and
Simulation, Evaluation/Optimization - GriPhyN Production Prototype Grid in
Hardware and Software then Production
53ALDAP (NSF/KDI) Project
- ALDAP Accessing Large Data Archives in
Astronomy and Particle Physics - NSF Knowledge Discovery Initiative (KDI)
- CALTECH, Johns Hopkins, FNAL(SDSS)
- Explore advanced adaptive database structures,
physical data storage hierarchies for archival
storage of next generation astronomy and
particle physics data - Develop spatial indexes, novel data
organizations, distribution and delivery
strategies, for Efficient and transparent
access to data across networks - Create prototype network-distributed data query
execution systems using Autonomous Agent workers - Explore commonalities and find effective common
solutions for particle physics and astrophysics
data
54The China Clipper ProjectA Data Intensive Grid
ANL-SLAC-Berkeley
- China Clipper Goal
- Develop and demonstrate middleware allowing
applications transparent, high-speed access to
large data sets distributed over wide-area
networks. - ? Builds on expertise and assets at ANL, LBNL
SLAC - ? NERSC, ESnet
- ? Builds on Globus Middleware and
high-performance distributed storage
system (DPSS from LBNL) - ? Initial focus on large DOE HENP applications
- ? RHIC/STAR, BaBar
- ? Demonstrated data rates to 57 Mbytes/sec.
55HENP Grand Challenge/Clipper Testbed and Tasks
- High-Speed Testbed
- Computing and networking (NTON, ESnet)
infrastructure - Differentiated Network Services
- Traffic shaping on ESnet
- End-to-end Monitoring Architecture (QE, QM, CM)
- Traffic analysis, event monitor agents to
support traffic shaping and CPU scheduling - Transparent Data Management Architecture
- OOFS/HPSS, DPSS/ADSM
- Application Demonstration
- Standard Analysis Framework (STAF)
- Access data at SLAC, LBNL, or ANL (net and data
quality)
56The Particle Physics Data Grid (PPDG)
- DoE/NGI Next Generation Internet Project
- ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC,
U.Wisc/CS - Goal To be able to query and partially retrieve
data from PB data stores across Wide Area
Networks within seconds - Drive progress in the development of the
necessary middleware, networks and fundamental
computer science of distributed systems. - Deliver some of the infrastructure for widely
distributed data analysis at multi-PetaByte
scales by 100s to 1000s of physicists
57PPDG First Year DeliverableSite-to-Site
Replication Service
PRIMARY SITE Data Acquisition, CPU, Disk, Tape
Robot
SECONDARY SITE CPU, Disk, Tape Robot
- Network Protocols Tuned for High Throughput
- Use of DiffServ for (1) Predictable high
priority delivery of high - bandwidth data
streams (2) Reliable background transfers - Use of integrated instrumentation to
detect/diagnose/correct problems in long-lived
high speed transfers NetLogger DoE/NGI
developments - Coordinated reservaton/allocation techniques
for storage-to-storage performance - First Year Goal Optimized cached read access
to 1-10 Gbytes, drawn from a total data set of
order One Petabyte
58PPDG Multi-site Cached File Access System
PRIMARY SITE Data Acquisition, Tape, CPU, Disk,
Robot
Satellite Site Tape, CPU, Disk, Robot
University CPU, Disk, Users
Satellite Site Tape, CPU, Disk, Robot
Satellite Site Tape, CPU, Disk, Robot
University CPU, Disk, Users
University CPU, Disk, Users
59First Year PPDG System Components
- Middleware Components (Initial Choice) See PPDG
Proposal Page 15 - Object and File-Based Objectivity/DB (SLAC
enhanced) Application Services GC Query Object,
Event Iterator, Query Monitor - FNAL SAM System
- Resource Management Start with Human
Intervention (but begin to deploy resource
discovery mgmnt tools) - File Access Service Components of OOFS
(SLAC) - Cache Manager GC Cache Manager (LBNL)
- Mass Storage Manager HPSS, Enstore, OSM
(Site-dependent) - Matchmaking Service Condor (U.
Wisconsin) - File Replication Index MCAT
(SDSC) - Transfer Cost Estimation Service Globus (ANL)
- File Fetching Service Components of OOFS
- File Movers(s)
SRB (SDSC) Site specific - End-to-end Network Services Globus tools for
QoS reservation - Security and authentication Globus (ANL)
60PPDG Middleware Architecture for Reliable High
Speed Data Delivery
Resource Management
Object-based and File-based Application Services
File Replication Index
Matchmaking Service
File Access Service
Cost Estimation
Cache Manager
File Fetching Service
Mass Storage Manager
File Mover
File Mover
End-to-End Network Services
Site Boundary
Security Domain
61PPDG Developments In 2000-2001
- Co-Scheduling algorithms
- Matchmaking and Prioritization
- Dual-Metric Prioritization
- Policy and Marginal Utility
- DiffServ on Networks to segregate tasks
- Performance Classes
- Transaction Management
- Cost Estimators
- Application/TM Interaction
- Checkpoint/Rollback
- Autonomous Agent Hierarchy
62Beyond Traditional ArchitecturesMobile Agents
(Java Aglets)
Agents are objects with rules and legs -- D.
Taylor
Agent
Service
Agent
Application
- Mobile Agents Reactive, Autonomous, Goal Driven,
Adaptive - Execute Asynchronously
- Reduce Network Load Local Conversations
- Overcome Network Latency Some Outages
- Adaptive ? Robust, Fault Tolerant
- Naturally Heterogeneous
- Extensible Concept Agent Hierarchies
63Distributed Data Delivery and LHC Software
Architecture
- LHC Software and/or Analysis Process must
account for data and resource-related realities - Delay for data location, queueing, scheduling
sometimes for transport and reassembly - Allow for long transaction times,
performance shifts, errors, out-of-order arrival
of data - Software Architectural Choices
- Traditional, single-threaded applications
- Allow for data arrival and reassembly OR
- Performance-Oriented (Complex)
- I/O requests up-front multi-threaded data
driven respond to ensemble of (changing) cost
estimates - Possible code movement as well as data movement
- Loosely coupled, dynamic e.g. Agent-based
implementation
64GriPhyN First Production Scale Grid Physics
Network
- Develop a New Form of Integrated Distributed
System, while Meeting Primary Goals of the US
LIGO and LHC Programs -
- Single Unified GRID System Concept Hierarchical
Structure - (Sub-)Implementations, for LIGO, SDSS, US CMS,
US ATLAS - 20 Centers Few Each in US for LIGO, CMS,
ATLAS, SDSS - Aspects Complementary to Centralized DoE Funding
- University-Based Regional Tier2 Centers,
Partnering with the Tier1 Centers - Emphasis on Training, Mentoring and Remote
Collaboration - Making the Process of Search and Discovery
Accessible to Students - GriPhyN Web Site http//www.phys.ufl.edu/avery/m
re/ - White Paper http//www.phys.ufl.edu/avery/mre/wh
ite_paper.html
65APOGEE/GriPhyN Data Grid Implementation
- An Integrated Distributed System of Tier1 and
Tier2 Centers - Flexible relatively low-cost (PC-based) Tier2
architectures - Medium-scale (for the LHC era) data storage and
I/O capability - Well-adapted to local operation modest system
engineer support - Meet changing local and regional needs in the
active, early phases of
data analysis - Interlinked with Gbps Network Links Internet2
and Regional Nets Circa 2001-2005 - State of the Art QoS techniques to prioritise
and shape traffic, to manage
bandwidth. Preview transoceanic BW, within the US - A working Production-Prototype (2001-2003) for
Petabyte-Scale Distributed Computing Models - Focus on LIGO ( BaBar and Run2) handling of
real data, and LHC Mock Data
Challenges with simulated data - Meet the needs, and learn from system
performance under stress
66VRVS From Videoconferencing to Collaborative
Environments
- gt 1400 registered hosts, 22 reflectors, 34
Countries - Running in U.S. Europe and Asia
- Switzerland CERN (2)
- Italy CNAF Bologna
- UK Rutherford Lab
- France IN2P3 Lyon, Marseilles
- Germany Heidelberg Univ.
- Finland FUNET
- Spain IFCA-Univ. Cantabria
- Russia Moscow State Univ., Tver. U.
- U.S
- Caltech, LBNL, SLAC, FNAL,
- ANL, BNL, Jefferson Lab.
- DoE HQ Germantown
- Asia Academia Sinica, Taiwan
- South America CeCalcula, Venezuala
67(No Transcript)
68Role of Simulationfor Distributed Systems
- Simulations are widely recognized and used as
essential tools for the design, performance
evaluation and optimisation of complex
distributed systems - From battlefields to agriculture from the
factory floor to telecommunications systems - Discrete event simulations with an appropriate
and high level of abstraction are powerful tools - Time intervals, interrupts and performance/load
characteristics are the essentials - Not yet an integral part of the HENP culture, but
- Some experience in trigger, DAQ and tightly
coupledcomputing systems CERN CS2 models - Simulation is a vital part of the study of site
architectures, network behavior, data
access/processing/delivery strategies,
for HENP Grid Design and Optimization
69Monitoring ArchitectureUse of NetLogger as in
CLIPPER
- End-to-end monitoring of grid assets is
needed to - Resolve network throughput problems
- Dynamically schedule resources
- Add precision-timed event monitor agents to
- ATM switches
- DPSS servers
- Testbed computational resources
- Produce trend analysis modules for monitor
agents - Make results available to applications
- See talk by B. Tierney
70Summary
- The HENP/LHC Data Analysis Problem
- Worldwide-distributed Petabyte scale compacted
binary data, and computing resources - Development of a robust networked data access
and analysis system is mission-critical - An aggressive RD program is required
- to develop systems for reliable data access,
processing and analysis across an ensemble
of networks - An effective inter-field partnership is now
developing through many RD projects - HENP analysis is now one of the driving forces
for the development of Data Grids - Solutions to this problem could be widely
applicable in other scientific fields and
industry, by LHC startup
71LHC Computing Upcoming Issues
- Cost of Computing at CERN for the LHC Program
- May Exceed 100 MCHF at CERN Correspondingly
More in Total - Some ATLAS/CMS Basic Numbers (CPU, 100 kB Reco.
Event) from 1996 Require Further Study - We cannot scale up from previous generations
(new methods) - CERN/Outside Sharing MONARC and CMS/SCB Use
1/32/3 Rule - Computing Architecture and Cost Evaluation
- Integration and Total Cost of Ownership
- Possible Role of Central I/O Servers
- Manpower Estimates
- CERN versus scaled Regional Centre estimates
- Scope of services and support provided
- Limits of CERN support and service and the need
for Regional Centres - Understanding that LHC Computing is Different
- A different scale and worldwide distributed
computing for the first time - Continuing, System RD is required