Data Grids for Next Generation Experiments - PowerPoint PPT Presentation

About This Presentation

Title:

Data Grids for Next Generation Experiments

Description:

Daily, Weekly, Monthly and Yearly Statistics on the 45 Mbps US-CERN Link ... OC-3. VRVS. MPEG2. OC-3. FEth Switch. FEth Switch. FEth Switch. FEth Switch. GEth ... – PowerPoint PPT presentation

Number of Views:43

Avg rating:3.0/5.0

Slides: 19

Provided by: cms591

Learn more at: https://conferences.fnal.gov

Category:

more less

Transcript and Presenter's Notes

Title: Data Grids for Next Generation Experiments

1

Data Grids for Next Generation Experiments
Harvey B NewmanCalifornia Institute of
Technology
ACAT2000Fermilab, October 19, 2000
http//l3www.cern.ch/newman/grids_acat2k.ppt

2
Physics and Technical Goals

The extraction of small or subtle new discovery
signals from large and potentially overwhelming
backgrounds or precision analysis of large
samples
Providing rapid access to event samples and
subsets from massive data stores, from 300
Terabytes in 2001 Petabytes by 2003, 10
Petabytes by 2006, to 100 Petabytes by 2010.
Providing analyzed results with rapid turnaround,
bycoordinating and managing the LIMITED
computing, data handling and network resources
effectively
Enabling rapid access to the data and the
collaboration, across an ensemble of networks of
varying capability, using heterogeneous resources.

3
Four LHC Experiments The
Petabyte to Exabyte Challenge

ATLAS, CMS, ALICE, LHCBHiggs New particles
Quark-Gluon Plasma CP Violation

Data written to tape 25 Petabytes/Year and
UP (CPU 6 MSi95 and UP) 0.1
to 1 Exabyte (1 EB 1018 Bytes)
(2010) (2020 ?) Total for the LHC
Experiments
4
LHC Vision Data Grid Hierarchy
1 Bunch crossing 17 interactions per 25 nsecs
100 triggers per second. Event is 1 MByte in
size
PBytes/sec
100 MBytes/sec
Online System
Experiment
Offline Farm,CERN Computer Ctr gt 30 TIPS
Tier 0 1
HPSS
0.6-2.5 Gbits/sec
Tier 1
FNAL Center
Italy Center
UK Center
FranceCenter
2.5 Gbits/sec
Tier 2
622 Mbits/sec
Tier 3
Institute 0.25TIPS
Institute
Institute
Institute
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels
100 - 1000 Mbits/sec
Physics data cache
Tier 4
Workstations
5
Why Worldwide Computing?Regional Center Concept
Advantages

Managed, fair-shared access for Physicists
everywhere
Maximize total funding resources while meeting
the total computing and data handling needs
Balance between proximity of datasets to
appropriate resources, and to the users
Tier-N Model
Efficient use of network higher throughput
Per Flow Local gt regional gt national gt
international
Utilizing all intellectual resources, in several
time zones
CERN, national labs, universities, remote sites
Involving physicists and students at their home
institutions
Greater flexibility to pursue different physics
interests, priorities, and resource allocation
strategies by region
And/or by Common Interests (physics topics,
subdetectors,)
Manage the Systems Complexity
Partitioning facility tasks, to manage and focus
resources

6
SDSS Data Grid (In GriPhyN) A Shared Vision

Three main functions
Raw data processing on a Grid (FNAL)
Rapid turnaround with TBs of data
Accessible storage of all image data
Fast science analysis environment (JHU)
Combined data access analysis of calibrated
data
Distributed I/O layer and processing layer
shared by whole collaboration
Public data access
SDSS data browsing for astronomers, and students
Complex query engine for the public

7
US-CERN BW RequirementsProjection (PRELIMINARY)
Includes 1.5 Gbps Each for ATLAS and CMS,
Plus Babar, Run2 and Other D0 and CDF at
Run2 Needs Presumed to Be to be Comparable
to BaBar
8
Daily, Weekly, Monthly and Yearly Statistics on
the 45 Mbps US-CERN Link
9
Regional Center Architecture(I. Gaines)
Tape Mass Storage Disk Servers Database Servers
Tier 2
Local institutes
Data Import
Data Export
Production Reconstruction Raw/Sim ?
ESD Scheduled, predictable experiment/ physics
groups
Production Analysis ESD ? AOD AOD ?
DPD Scheduled Physics groups
Individual Analysis AOD ? DPD and
plots Chaotic Physicists
CERN
Tapes
Desktops
Support Services
RD Systems and Testbeds
Info servers Code servers
Physics Software Development
Web Servers Telepresence Servers
Training Consulting Help Desk
10
MONARC Architectures WG Regional Centre
Services Required

All data and technical services required to do
physics analysis
All Physics Objects, Tags and Calibration data
Significant fraction of raw data
Excellent network connectivity to CERN and the
regions users
A fair share of post- and re-reconstruction
processing
Manpower to share in the development of common
validation and production software
Manpower to share in ongoing work on Common
(Grid and Other) RD Projects
Excellent support services for training,
documentation, troubleshooting at the Centre or
remote sites served by it
Service to members of other regions
Long Term Commitment staffing, hardware
evolution, support

See http//monarc.web.cern.ch/MONARC/docs/phas
e2report/Phase2Report.pdf
11
LHC Tier 2 Center In 2001

Tier2 Prototype (CMS)
Distributed Caltech/UCSD, Over CALREN
(NTON)
UC Davis, Riverside, UCLA Clients
University (UC) Fund Sharing
2 X 40 Dual Nodes 160 CPUs Rackmounted 2U
2 TB RAID Array
Multi-Scheduler
GDMP Testbed
Startup By End October 2000 (CMS HLT Production)

OC-12
12
Roles of Projectsfor HENP Distributed Analysis

RD45, GIOD Networked Object Databases
Clipper/GC High speed access to Objects or File
data FNAL/SAM for processing
and analysis
SLAC/OOFS Distributed File System
Objectivity Interface
NILE, Condor Fault Tolerant Distributed
Computing
MONARC LHC Computing Models Architecture,
Simulation, Strategy, Politics
ALDAP OO Database Structures Access Methods
for Astrophysics and HENP Data
PPDG First Distributed Data Services and
Data Grid System Prototype
GriPhyN Production-Scale Data Grids
EU Data Grid

13
Grid Services Architecture
A Rich Set of HEP Data-Analysis Related
Applications
Applns
Remote viz toolkit
Remote comp. toolkit
Remote data toolkit
Remote sensors toolkit
Remote collab. toolkit
Appln Toolkits
...
Grid Services
Protocols, authentication, policy, resource
management, instrumentation, discovery,etc.
Data stores, networks, computers, display
devices, associated local services

Grid Fabric
Adapted from Ian Foster there are computing
grids, access (collaborative) grids, data
grids, ...
14
The Particle Physics Data Grid (PPDG)
ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC,
U.Wisc/CS
Site to Site Data Replication Service 100
Mbytes/sec
PRIMARY SITE Data Acquisition, CPU, Disk, Tape
Robot
SECONDARY SITE CPU, Disk, Tape Robot

First Round Goal Optimized cached read access
to 10-100 Gbytes drawn from a total data set of
0.1 to 1 Petabyte

Multi-Site Cached File Access Service

Matchmaking, Co-Scheduling SRB, Condor, Globus
services HRM, NWS

15
PPDG WG1 Request Manager
REQUEST MANAGER
CLIENT
CLIENT
Event-file Index
Logical Request
Replica catalog
Logical Set of Files
Disk Cache
DRM
Network Weather Service
Physical file transfer requests
GRID
DRM
HRM
Disk Cache
Disk Cache
tape system
16
Earth Grid System Prototype Inter-communication
Diagram
LLNL
ANL
Client
Replica Catalog
LDAP Script
Disk
Request Manager
LDAP C API or Script
GIS with NWS
GSI-ncftp
GSI-ncftp
GSI-ncftp
GSI-ncftp
GSI-ncftp
GSI-ncftp
CORBA
LBNL GSI-wuftpd
LBNL
NCAR GSI-wuftpd
SDSC GSI-pftpd
HPSS
HPSS
Disk on Clipper
HRM
Disk
Disk
17
Grid Data Management Prototype (GDMP)

Distributed Job Execution and Data
HandlingGoals
Transparency
Performance
Security
Fault Tolerance
Automation

Site A
Site B
Submit job
Replicate data
Job writes data locally
Replicate data

Jobs are executed locally or remotely
Data is always written locally
Data is replicated to remote sites

Site C
18
EU-Grid ProjectWork Packages
?
?
?
?
?
?
?
?
19
GriPhyN PetaScale Virtual Data Grids

Build the Foundation for Petascale Virtual Data
Grids

Production Team
Individual Investigator
Workgroups
Interactive User Tools
Request Planning
Request Execution
Virtual Data Tools
Management Tools
Scheduling Tools
Resource
Other Grid

Resource

Security and

Other Grid

Security and
Management

Management

Policy

Services

Policy
Services
Services

Services

Services

Services
Transforms
Distributed resources
(code, storage,
Raw data
computers, and network)
source
20
Data Grids Better Global Resource Use and Faster
Turnaround

Build Information and Security Infrastructures
Across Several World Regions
Authentication Prioritization, Resource
Allocation
Coordinated use of computing, data handling and
network resources through
Data caching, query estimation, co-scheduling
Network and site instrumentation performance
tracking, monitoring, problem trapping and
handling
Robust Transactions
Agent Based Autonomous, Adaptive, Network
Efficient, Resilient
Heuristic, Adaptive Load-Balancing E.g.
Self-Organzing Neural Nets (Legrand)

21
GRIDs In 2000 Summary

Grids will change the way we do science and
engineering computation to large scale data
Key services and concepts have been identified,
and development has started
Major IT challenges remain
An Opportunity Obligation for
HEP/CSCollaboration
Transition of services and applications to
production use is starting to occur
In future more sophisticated integrated services
and toolsets (Inter- and IntraGrids) could drive
advances in many fields of science engineering
HENP, facing the need for Petascale Virtual Data,
is both an early adopter, and a leading
developer of Data Grid technology