Data Grids for Next Generation Experiments - PowerPoint PPT Presentation

About This Presentation
Title:

Data Grids for Next Generation Experiments

Description:

Data Grids for Next Generation Experiments Harvey B Newman California Institute of Technology ACAT2000 Fermilab, October 19, 2000 http://l3www.cern.ch/~newman/grids ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 19
Provided by: cmscan
Category:

less

Transcript and Presenter's Notes

Title: Data Grids for Next Generation Experiments


1
  • Data Grids for Next Generation Experiments
  • Harvey B NewmanCalifornia Institute of
    Technology
  • ACAT2000Fermilab, October 19, 2000
  • http//l3www.cern.ch/newman/grids_acat2k.ppt

2
Physics and Technical Goals
  • The extraction of small or subtle new discovery
    signals from large and potentially overwhelming
    backgrounds or precision analysis of large
    samples
  • Providing rapid access to event samples and
    subsets from massive data stores, from 300
    Terabytes in 2001 Petabytes by 2003, 10
    Petabytes by 2006, to 100 Petabytes by 2010.
  • Providing analyzed results with rapid turnaround,
    bycoordinating and managing the LIMITED
    computing, data handling and network resources
    effectively
  • Enabling rapid access to the data and the
    collaboration, across an ensemble of networks of
    varying capability, using heterogeneous resources.

3
Four LHC Experiments The
Petabyte to Exabyte Challenge
  • ATLAS, CMS, ALICE, LHCBHiggs New particles
    Quark-Gluon Plasma CP Violation

Data written to tape 25 Petabytes/Year and
UP (CPU 6 MSi95 and UP) 0.1
to 1 Exabyte (1 EB 1018 Bytes)
(2010) (2020 ?) Total for the LHC
Experiments
4
LHC Vision Data Grid Hierarchy
1 Bunch crossing 17 interactions per 25 nsecs
100 triggers per second. Event is 1 MByte in
size
PBytes/sec
100 MBytes/sec
Online System
Experiment
Offline Farm,CERN Computer Ctr gt 30 TIPS
Tier 0 1
HPSS
0.6-2.5 Gbits/sec
Tier 1
FNAL Center
Italy Center
UK Center
FranceCenter
2.5 Gbits/sec
Tier 2
622 Mbits/sec
Tier 3
Institute 0.25TIPS
Institute
Institute
Institute
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels
100 - 1000 Mbits/sec
Physics data cache
Tier 4
Workstations
5
Why Worldwide Computing?Regional Center Concept
Advantages
  • Managed, fair-shared access for Physicists
    everywhere
  • Maximize total funding resources while meeting
    the total computing and data handling needs
  • Balance between proximity of datasets to
    appropriate resources, and to the users
  • Tier-N Model
  • Efficient use of network higher throughput
  • Per Flow Local gt regional gt national gt
    international
  • Utilizing all intellectual resources, in several
    time zones
  • CERN, national labs, universities, remote sites
  • Involving physicists and students at their home
    institutions
  • Greater flexibility to pursue different physics
    interests, priorities, and resource allocation
    strategies by region
  • And/or by Common Interests (physics topics,
    subdetectors,)
  • Manage the Systems Complexity
  • Partitioning facility tasks, to manage and focus
    resources

6
SDSS Data Grid (In GriPhyN) A Shared Vision
  • Three main functions
  • Raw data processing on a Grid (FNAL)
  • Rapid turnaround with TBs of data
  • Accessible storage of all image data
  • Fast science analysis environment (JHU)
  • Combined data access analysis of calibrated
    data
  • Distributed I/O layer and processing layer
    shared by whole collaboration
  • Public data access
  • SDSS data browsing for astronomers, and students
  • Complex query engine for the public

7
US-CERN BW RequirementsProjection (PRELIMINARY)
Includes 1.5 Gbps Each for ATLAS and CMS,
Plus Babar, Run2 and Other D0 and CDF at
Run2 Needs Presumed to Be to be Comparable
to BaBar
8
Daily, Weekly, Monthly and Yearly Statistics on
the 45 Mbps US-CERN Link
9
Regional Center Architecture(I. Gaines)
Tape Mass Storage Disk Servers Database Servers
Tier 2
Local institutes
Data Import
Data Export
Production Reconstruction Raw/Sim ?
ESD Scheduled, predictable experiment/ physics
groups
Production Analysis ESD ? AOD AOD ?
DPD Scheduled Physics groups
Individual Analysis AOD ? DPD and
plots Chaotic Physicists
CERN
Tapes
Desktops
Support Services
RD Systems and Testbeds
Info servers Code servers
Physics Software Development
Web Servers Telepresence Servers
Training Consulting Help Desk
10
MONARC Architectures WG Regional Centre
Services Required
  • All data and technical services required to do
    physics analysis
  • All Physics Objects, Tags and Calibration data
  • Significant fraction of raw data
  • Excellent network connectivity to CERN and the
    regions users
  • A fair share of post- and re-reconstruction
    processing
  • Manpower to share in the development of common
    validation and production software
  • Manpower to share in ongoing work on Common
    (Grid and Other) RD Projects
  • Excellent support services for training,
    documentation, troubleshooting at the Centre or
    remote sites served by it
  • Service to members of other regions
  • Long Term Commitment staffing, hardware
    evolution, support

See http//monarc.web.cern.ch/MONARC/docs/phas
e2report/Phase2Report.pdf
11
LHC Tier 2 Center In 2001
  • Tier2 Prototype (CMS)
  • Distributed Caltech/UCSD, Over CALREN
    (NTON)
  • UC Davis, Riverside, UCLA Clients
  • University (UC) Fund Sharing
  • 2 X 40 Dual Nodes 160 CPUs Rackmounted 2U
  • 2 TB RAID Array
  • Multi-Scheduler
  • GDMP Testbed
  • Startup By End October 2000 (CMS HLT Production)

OC-12
12
Roles of Projectsfor HENP Distributed Analysis
  • RD45, GIOD Networked Object Databases
  • Clipper/GC High speed access to Objects or File
    data FNAL/SAM for processing
    and analysis
  • SLAC/OOFS Distributed File System
    Objectivity Interface
  • NILE, Condor Fault Tolerant Distributed
    Computing
  • MONARC LHC Computing Models Architecture,
    Simulation, Strategy, Politics
  • ALDAP OO Database Structures Access Methods
    for Astrophysics and HENP Data
  • PPDG First Distributed Data Services and
    Data Grid System Prototype
  • GriPhyN Production-Scale Data Grids
  • EU Data Grid

13
Grid Services Architecture
A Rich Set of HEP Data-Analysis Related
Applications
Applns
Remote viz toolkit
Remote comp. toolkit
Remote data toolkit
Remote sensors toolkit
Remote collab. toolkit
Appln Toolkits
...
Grid Services
Protocols, authentication, policy, resource
management, instrumentation, discovery,etc.
Data stores, networks, computers, display
devices, associated local services

Grid Fabric
Adapted from Ian Foster there are computing
grids, access (collaborative) grids, data
grids, ...
14
The Particle Physics Data Grid (PPDG)
ANL, BNL, Caltech, FNAL, JLAB, LBNL, SDSC, SLAC,
U.Wisc/CS
Site to Site Data Replication Service 100
Mbytes/sec
PRIMARY SITE Data Acquisition, CPU, Disk, Tape
Robot
SECONDARY SITE CPU, Disk, Tape Robot
  • First Round Goal Optimized cached read access
    to 10-100 Gbytes drawn from a total data set of
    0.1 to 1 Petabyte

Multi-Site Cached File Access Service
  • Matchmaking, Co-Scheduling SRB, Condor, Globus
    services HRM, NWS

15
PPDG WG1 Request Manager
REQUEST MANAGER
CLIENT
CLIENT
Event-file Index
Logical Request
Replica catalog
Logical Set of Files
Disk Cache
DRM
Network Weather Service
Physical file transfer requests
GRID
DRM
HRM
Disk Cache
Disk Cache
tape system
16
Earth Grid System Prototype Inter-communication
Diagram
LLNL
ANL
Client
Replica Catalog
LDAP Script
Disk
Request Manager
LDAP C API or Script
GIS with NWS
GSI-ncftp
GSI-ncftp
GSI-ncftp
GSI-ncftp
GSI-ncftp
GSI-ncftp
CORBA
LBNL GSI-wuftpd
LBNL
NCAR GSI-wuftpd
SDSC GSI-pftpd
HPSS
HPSS
Disk on Clipper
HRM
Disk
Disk
17
Grid Data Management Prototype (GDMP)
  • Distributed Job Execution and Data
    HandlingGoals
  • Transparency
  • Performance
  • Security
  • Fault Tolerance
  • Automation

Site A
Site B
Submit job
Replicate data
Job writes data locally
Replicate data
  • Jobs are executed locally or remotely
  • Data is always written locally
  • Data is replicated to remote sites

Site C
18
EU-Grid ProjectWork Packages
?
?
?
?
?
?
?
?
19
GriPhyN PetaScale Virtual Data Grids
  • Build the Foundation for Petascale Virtual Data
    Grids

Production Team
Individual Investigator
Workgroups
Interactive User Tools
Request Planning
Request Execution
Virtual Data Tools
Management Tools
Scheduling Tools
Resource
Other Grid
  • Resource
  • Security and
  • Other Grid

Security and
Management
  • Management
  • Policy
  • Services

Policy
Services
Services
  • Services
  • Services

Services
Transforms
Distributed resources
(code, storage,
Raw data
computers, and network)
source
20
Data Grids Better Global Resource Use and Faster
Turnaround
  • Build Information and Security Infrastructures
  • Across Several World Regions
  • Authentication Prioritization, Resource
    Allocation
  • Coordinated use of computing, data handling and
    network resources through
  • Data caching, query estimation, co-scheduling
  • Network and site instrumentation performance
    tracking, monitoring, problem trapping and
    handling
  • Robust Transactions
  • Agent Based Autonomous, Adaptive, Network
    Efficient, Resilient
  • Heuristic, Adaptive Load-Balancing E.g.
    Self-Organzing Neural Nets (Legrand)

21
GRIDs In 2000 Summary
  • Grids will change the way we do science and
    engineering computation to large scale data
  • Key services and concepts have been identified,
    and development has started
  • Major IT challenges remain
  • An Opportunity Obligation for
    HEP/CSCollaboration
  • Transition of services and applications to
    production use is starting to occur
  • In future more sophisticated integrated services
    and toolsets (Inter- and IntraGrids) could drive
    advances in many fields of science engineering
  • HENP, facing the need for Petascale Virtual Data,
    is both an early adopter, and a leading
    developer of Data Grid technology
Write a Comment
User Comments (0)
About PowerShow.com