Title: Data Grids for Next Generation Experiments
1- Data Grids for Next Generation Experiments
-
- Harvey B NewmanCalifornia Institute of
Technology - ACAT2000Fermilab, October 19, 2000
- http//l3www.cern.ch/newman/grids_acat2k.ppt
2Physics and Technical Goals
- The extraction of small or subtle new discovery
signals from large and potentially overwhelming
backgrounds or precision analysis of large
samples - Providing rapid access to event samples and
subsets from massive data stores, from 300
Terabytes in 2001 Petabytes by 2003, 10
Petabytes by 2006, to 100 Petabytes by 2010. - Providing analyzed results with rapid turnaround,
bycoordinating and managing the LIMITED
computing, data handling and network resources
effectively - Enabling rapid access to the data and the
collaboration, across an ensemble of networks of
varying capability, using heterogeneous resources.
3Four LHC Experiments The
Petabyte to Exabyte Challenge
- ATLAS, CMS, ALICE, LHCBHiggs New particles
Quark-Gluon Plasma CP Violation
Data written to tape 25 Petabytes/Year and
UP (CPU 6 MSi95 and UP) 0.1
to 1 Exabyte (1 EB 1018 Bytes)
(2010) (2020 ?) Total for the LHC
Experiments
4LHC Vision Data Grid Hierarchy
1 Bunch crossing 17 interactions per 25 nsecs
100 triggers per second. Event is 1 MByte in
size
PBytes/sec
100 MBytes/sec
Online System
Experiment
Offline Farm,CERN Computer Ctr gt 30 TIPS
Tier 0 1
HPSS
0.6-2.5 Gbits/sec
Tier 1
FNAL Center
Italy Center
UK Center
FranceCenter
2.5 Gbits/sec
Tier 2
622 Mbits/sec
Tier 3
Institute 0.25TIPS
Institute
Institute
Institute
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels
100 - 1000 Mbits/sec
Physics data cache
Tier 4
Workstations
5Why Worldwide Computing?Regional Center Concept
Advantages
- Managed, fair-shared access for Physicists
everywhere - Maximize total funding resources while meeting
the total computing and data handling needs - Balance between proximity of datasets to
appropriate resources, and to the users - Tier-N Model
- Efficient use of network higher throughput
- Per Flow Local gt regional gt national gt
international - Utilizing all intellectual resources, in several
time zones - CERN, national labs, universities, remote sites
- Involving physicists and students at their home
institutions - Greater flexibility to pursue different physics
interests, priorities, and resource allocation
strategies by region - And/or by Common Interests (physics topics,
subdetectors,) - Manage the Systems Complexity
- Partitioning facility tasks, to manage and focus
resources
6SDSS Data Grid (In GriPhyN) A Shared Vision
- Three main functions
- Raw data processing on a Grid (FNAL)
- Rapid turnaround with TBs of data
- Accessible storage of all image data
- Fast science analysis environment (JHU)
- Combined data access analysis of calibrated
data - Distributed I/O layer and processing layer
shared by whole collaboration - Public data access
- SDSS data browsing for astronomers, and students
- Complex query engine for the public
7US-CERN BW RequirementsProjection (PRELIMINARY)
Includes 1.5 Gbps Each for ATLAS and CMS,
Plus Babar, Run2 and Other D0 and CDF at
Run2 Needs Presumed to Be to be Comparable
to BaBar
8Daily, Weekly, Monthly and Yearly Statistics on
the 45 Mbps US-CERN Link
9Regional Center Architecture(I. Gaines)
Tape Mass Storage Disk Servers Database Servers
Tier 2
Local institutes
Data Import
Data Export
Production Reconstruction Raw/Sim ?
ESD Scheduled, predictable experiment/ physics
groups
Production Analysis ESD ? AOD AOD ?
DPD Scheduled Physics groups
Individual Analysis AOD ? DPD and
plots Chaotic Physicists
CERN
Tapes
Desktops
Support Services
RD Systems and Testbeds
Info servers Code servers
Physics Software Development
Web Servers Telepresence Servers
Training Consulting Help Desk
10 MONARC Architectures WG Regional Centre
Services Required
- All data and technical services required to do
physics analysis - All Physics Objects, Tags and Calibration data
- Significant fraction of raw data
- Excellent network connectivity to CERN and the
regions users - A fair share of post- and re-reconstruction
processing - Manpower to share in the development of common
validation and production software - Manpower to share in ongoing work on Common
(Grid and Other) RD Projects - Excellent support services for training,
documentation, troubleshooting at the Centre or
remote sites served by it - Service to members of other regions
- Long Term Commitment staffing, hardware
evolution, support -
See http//monarc.web.cern.ch/MONARC/docs/phas
e2report/Phase2Report.pdf
11LHC Tier 2 Center In 2001
- Tier2 Prototype (CMS)
- Distributed Caltech/UCSD, Over CALREN
(NTON) - UC Davis, Riverside, UCLA Clients
- University (UC) Fund Sharing
- 2 X 40 Dual Nodes 160 CPUs Rackmounted 2U
- 2 TB RAID Array
- Multi-Scheduler
- GDMP Testbed
- Startup By End October 2000 (CMS HLT Production)
OC-12
12Roles of Projectsfor HENP Distributed Analysis
- RD45, GIOD Networked Object Databases
- Clipper/GC High speed access to Objects or File
data FNAL/SAM for processing
and analysis - SLAC/OOFS Distributed File System
Objectivity Interface - NILE, Condor Fault Tolerant Distributed
Computing - MONARC LHC Computing Models Architecture,
Simulation, Strategy, Politics - ALDAP OO Database Structures Access Methods
for Astrophysics and HENP Data - PPDG First Distributed Data Services and
Data Grid System Prototype - GriPhyN Production-Scale Data Grids
- EU Data Grid
13Grid Services Architecture
A Rich Set of HEP Data-Analysis Related
Applications
Applns
Remote viz toolkit
Remote comp. toolkit
Remote data toolkit
Remote sensors toolkit
Remote collab. toolkit
Appln Toolkits
...
Grid Services
Protocols, authentication, policy, resource
management, instrumentation, discovery,etc.
Data stores, networks, computers, display
devices, associated local services
Grid Fabric
Adapted from Ian Foster there are computing
grids, access (collaborative) grids, data
grids, ...
14The Particle Physics Data Grid (PPDG)
ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC,
U.Wisc/CS
Site to Site Data Replication Service 100
Mbytes/sec
PRIMARY SITE Data Acquisition, CPU, Disk, Tape
Robot
SECONDARY SITE CPU, Disk, Tape Robot
- First Round Goal Optimized cached read access
to 10-100 Gbytes drawn from a total data set of
0.1 to 1 Petabyte
Multi-Site Cached File Access Service
- Matchmaking, Co-Scheduling SRB, Condor, Globus
services HRM, NWS
15PPDG WG1 Request Manager
REQUEST MANAGER
CLIENT
CLIENT
Event-file Index
Logical Request
Replica catalog
Logical Set of Files
Disk Cache
DRM
Network Weather Service
Physical file transfer requests
GRID
DRM
HRM
Disk Cache
Disk Cache
tape system
16Earth Grid System Prototype Inter-communication
Diagram
LLNL
ANL
Client
Replica Catalog
LDAP Script
Disk
Request Manager
LDAP C API or Script
GIS with NWS
GSI-ncftp
GSI-ncftp
GSI-ncftp
GSI-ncftp
GSI-ncftp
GSI-ncftp
CORBA
LBNL GSI-wuftpd
LBNL
NCAR GSI-wuftpd
SDSC GSI-pftpd
HPSS
HPSS
Disk on Clipper
HRM
Disk
Disk
17Grid Data Management Prototype (GDMP)
- Distributed Job Execution and Data
HandlingGoals - Transparency
- Performance
- Security
- Fault Tolerance
- Automation
Site A
Site B
Submit job
Replicate data
Job writes data locally
Replicate data
- Jobs are executed locally or remotely
- Data is always written locally
- Data is replicated to remote sites
Site C
18EU-Grid ProjectWork Packages
?
?
?
?
?
?
?
?
19GriPhyN PetaScale Virtual Data Grids
- Build the Foundation for Petascale Virtual Data
Grids
Production Team
Individual Investigator
Workgroups
Interactive User Tools
Request Planning
Request Execution
Virtual Data Tools
Management Tools
Scheduling Tools
Resource
Other Grid
Security and
Management
Policy
Services
Services
Services
Transforms
Distributed resources
(code, storage,
Raw data
computers, and network)
source
20Data Grids Better Global Resource Use and Faster
Turnaround
- Build Information and Security Infrastructures
- Across Several World Regions
- Authentication Prioritization, Resource
Allocation - Coordinated use of computing, data handling and
network resources through - Data caching, query estimation, co-scheduling
- Network and site instrumentation performance
tracking, monitoring, problem trapping and
handling - Robust Transactions
- Agent Based Autonomous, Adaptive, Network
Efficient, Resilient - Heuristic, Adaptive Load-Balancing E.g.
Self-Organzing Neural Nets (Legrand)
21GRIDs In 2000 Summary
- Grids will change the way we do science and
engineering computation to large scale data - Key services and concepts have been identified,
and development has started - Major IT challenges remain
- An Opportunity Obligation for
HEP/CSCollaboration - Transition of services and applications to
production use is starting to occur - In future more sophisticated integrated services
and toolsets (Inter- and IntraGrids) could drive
advances in many fields of science engineering - HENP, facing the need for Petascale Virtual Data,
is both an early adopter, and a leading
developer of Data Grid technology