Title: US Grid Projects and LHC Computing
1- US Grid Projects and LHC Computing
Paul Avery University of Florida avery_at_phys.ufl.ed
u
Joint Review of US-LHC Computing Fermilab Nov.
27, 2001
2Overview
- Introduction and context
- Major U.S. Data Grid projects
- PPDG
- GriPhyN
- iVDGL
- Other efforts
- Benefits to US-CMS computing
- HENP Grid coordination effort
3LHC Data Grid Overview
- LHC presents unprecedented computing challenges
- CPU (100s of TeraOps), storage (10s of petabytes)
complexity - Globally distributed community (1000s of
physicists) - New approaches needed for efficient, effective
remote participation - Computational infrastructure based on Data Grid
- Hierarchical distribution of resources (next two
slides) - Resources in Tier0Tier1Tier2 111
(approximately) - This configuration permits optimal balance of
resources - Large Tier1 resources for collaboration-wide
priority tasks - Regional Tier2 and local Tier3 resources close to
users - Effective efficient resource use, high
flexibility, rapid turnaround - The US has an appropriate (co-)leading role
- Development and use of necessary tools (with
other disciplines) - Shared distributed production milestones
4Schematic LHC Data Grid Hierarchy
Tier0 CERNTier1 National LabTier2 Regional
Center at UniversityTier3 University
workgroupTier4 Workstation
- Key features
- Hierarchical resources
- Grid configuration
- Tier2 centers
5Example CMS Data Grid Hierarchy
CERN/Outside Resource Ratio 12Tier0/(?
Tier1)/(? Tier2) 111
Experiment
PBytes/sec
Online System
100 MBytes/sec
Bunch crossing per 25 nsecs.100 triggers per
second1 MByte per event
CERN Computer Center gt 20 TIPS
Tier 0 1
HPSS
gt2.5 Gbits/sec
France Center
Italy Center
UK Center
USA Center
Tier 1
0.6-2.5 Gbits/sec
Tier 2
0.6-2.5 Gbits/sec
Tier 3
Institute 0.25TIPS
Institute
Institute
Institute
Physics data cache
Physicists work on analysis channels. Each
institute has 10 physicists working on one or
more channels
0.1-1 Gbits/sec
Tier 4
Workstations,other portals
6HEP Related Data Grid Projects
- Funded projects
- PPDG USA DOE 2M9.5M 1999-2004
- GriPhyN USA NSF 11.9M 1.6M 2000-2005
- iVDGL USA NSF 13.7M 2M 2001-2006
- EU DataGrid EU EC 10M 2001-2004
- LGCP (LHC) CERN gt120M CHF? 2001-?
- Supportive funded proposals
- TeraGrid USA NSF 53M 2001-?
- DataTAG EU EC 4M 2002-2004
- About to be funded projects
- GridPP UK PPARC gt15M? 2001-2004
- Many national projects
- Initiatives in US, UK, Italy, France, NL,
Germany, Japan, - EU networking initiatives (Géant, SURFNet)
7Data Grid Project Timeline
8Infrastructure Data Grid Projects
- GriPhyN (US, NSF)
- Petascale Virtual-Data Grids
- http//www.griphyn.org/
- Particle Physics Data Grid (US, DOE)
- Data Grid applications for HENP
- http//www.ppdg.net/
- European Data Grid (EC, EU)
- Data Grid technologies, EU deployment
- http//www.eu-datagrid.org/
- TeraGrid Project (US, NSF)
- Distributed supercomputer resources
- http//www.teragrid.org/
- iVDGL DataTAG (NSF, EC, others)
- Global Grid lab transatlantic network
- Collaborations of application scientists
computer scientists - Focus on infrastructure development deployment
- Globus infrastructure
- Broad application to HENP other sciences
9Why Do We Need All These Projects?
- Agencies see LHC Grid computing in wider context
- (next slide)
- DOE priorities
- LHC, D0, CDF, BaBar, RHIC, JLAB
- Computer science
- ESNET
- NSF priorities
- Computer science
- Networks
- LHC, other physics, astronomy
- Other basic sciences
- Education and outreach
- International reach
- Support for universities
10Projects (cont.)
- We must justify investment
- Benefit to wide scientific base
- Education and outreach
- Oversight from Congress always present
- Much more competitive funding environment
- We have no choice anyway
- This is the mechanism by which we will get funds
- Pros
- Exploits initiatives, brings new funds
facilities (e.g., TeraGrid) - Drives deployment of high-speed networks
- Brings many new technologies, tools
- Attracts attention/help from computing experts,
vendors - Cons
- Diverts effort from mission, makes management
more complex
11Particle Physics Data Grid
- Funded 7/2001 _at_ 9.5M for 3 years (DOE MICS/HENP)
- High Energy Nuclear Physics projects (DOE labs)
- DB file/object replication, caching, catalogs,
end-to-end - Practical orientation networks, instrumentation,
monitoring
12PPDG Collaboratory Pilot
The Particle Physics Data Grid Collaboratory
Pilot will develop, evaluate and deliver vitally
needed Grid-enabled tools for data-intensive
collaboration in particle and nuclear physics.
Novel mechanisms and policies will be vertically
integrated with Grid Middleware, experiment
specific applications and computing resources to
provide effective end-to-end capability.
- Physicist involvement
- D0, BaBar, RHIC, CMS, ATLAS ? SLAC, LBNL, Jlab,
FNAL, BNL - CMS/ATLAS Caltech, UCSD, FNAL, BNL, ANL, LBNL
- Computer Science Program of Work
- CS1 Job description language
- CS2 Schedule, manage data processing, data
placement activities - CS3 Monitoring and status reporting (with
GriPhyN) - CS4 Storage resource management
- CS5 Reliable replication services
- CS6 File transfer services
- CS7 Collect/document experiment practices ?
generalize
13The GriPhyN Project
- NSF funded 9/2000 _at_ 11.9M1.6M
- US-CMS High Energy Physics
- US-ATLAS High Energy Physics
- LIGO/LSC Gravity wave research
- SDSS Sloan Digital Sky Survey
- Strong partnership with computer scientists
- Design and implement production-scale grids
- Develop common infrastructure, tools and services
(Globus based) - Integration into the 4 experiments
- Broad application to other sciences via Virtual
Data Toolkit - Research organized around Virtual Data (fig.)
- Derived data, calculable via algorithm
- Instantiated 0, 1, or many times (e.g., caches)
- Fetch data value vs execute algorithm
- Very complex (versions, consistency, cost
calculation, etc)
14Virtual Data in Action
- Data request may
- Compute locally
- Compute remotely
- Access local data
- Access remote data
- Scheduling based on
- Local policies
- Global policies
- Cost
Major facilities, archives
Regional facilities, caches
Local facilities, caches
15GriPhyN Institutions
- U Florida
- U Chicago
- Boston U
- Caltech
- U Wisconsin, Madison
- USC/ISI
- Harvard
- Indiana
- Johns Hopkins
- Northwestern
- Stanford
- U Illinois at Chicago
- U Penn
- U Texas, Brownsville
- U Wisconsin, Milwaukee
- UC Berkeley
- UC San Diego
- San Diego Supercomputer Center
- Lawrence Berkeley Lab
- Argonne
- Fermilab
- Brookhaven
16GriPhyN PetaScale Virtual Data Grids
Production Team
Individual Investigator
Research group
Interactive User Tools
Request Planning Scheduling Tools
Request Execution
Virtual Data Tools
Management Tools
Resource
Other Grid
Security and
Management
Policy
Services
Services
Services
Transforms
Distributed resources
(code, storage,
Raw data
computers, and network)
source
17GriPhyN Research Agenda (cont.)
- Execution management
- Co-allocation of resources (CPU, storage, network
transfers) - Fault tolerance, error reporting
- Agents (co-allocation, execution)
- Reliable event service across Grid
- Interaction, feedback to planning
- Performance analysis (with PPDG)
- Instrumentation and measurement of all grid
components - Understand and optimize grid performance
- Simulations (MONARC project at CERN)
- Virtual Data Toolkit (VDT)
- VDT virtual data services virtual data tools
- One of the primary deliverables of RD effort
- Ongoing activity feedback from experiments (5
year plan) - Technology transfer mechanism to other scientific
domains
18GriPhyN Progress
- New US-LHC hires from NSF and matching funds
- 4 CMS physicists at Florida (2 scientists, 1
postdoc, 1 Grad) - 0.5 CMS support person at Caltech
- 2 ATLAS physicists at Indiana (1 postdoc, 1 Grad)
- 1 matching ATLAS postdoc at Boston U (1 year)
- Documents
- CMS Grid requirements (K. Holtman)
- GriPhyN planning (M. Wilde, R. Cavanaugh)
- Major meetings held (http//www.griphyn.org/)
- Oct. 2000 All-hands meeting
- Dec. 2000 Architecture meeting
- Apr. 2001 All-hands meeting
- Aug. 2001 CS-Applications meeting
- Oct. 2001 All-hands meeting
19Recent GriPhyN Activities
- CMS success story (Apr. 2001)
- See GriPhyN web site
- DAG (Directed Acyclic Graph)
- University of Wisconsin
- Job description language
- DAGMan DAG Manager
- PACMAN (Boston U)
- Package Manager
- SC2001 demos (Nov. 2001)
- GriPhyN, PPDG
- Data Grid architecture (Dec. 2001)
- I. Foster, C. Kesselman, et al.
- Joint GriPhyN ? PPDG effort
- Major effort in Data Grid movement
Simple DAG
More complex DAG
20Early GriPhyN Challenge ProblemCMS Data
Reconstruction
2) Launch secondary job on Wisconsin pool input
files via Globus GASS
Master Condor job running at Caltech
Secondary Condor job on UW pool
5) Secondary reports complete to master
Caltech workstation
6) Master starts reconstruction jobs via Globus
jobmanager on cluster
3) 100 Monte Carlo jobs on Wisconsin Condor pool
9) Reconstruction job reports complete to master
4) 100 data files transferred via GridFTP, 1 GB
each
7) GridFTP fetches data from UniTree
- April 2001
- Caltech
- NCSA
- Wisconsin
NCSA Linux cluster
NCSA UniTree - GridFTP-enabled FTP server
8) Processed objectivity database stored to
UniTree
21Trace of a Condor-G Physics Run
22GriPhyN/PPDG Data Grid Architecture
Release Dec. 2001
Application
initial solution is operational
DAG
Catalog Services
Monitoring
Planner
Info Services
DAG
Repl. Mgmt.
Executor
Policy/Security
Reliable Transfer Service
Compute Resource
Storage Resource
23Virtual Catalog Architecture
24SC2001 Demo CMS production pipeline
pythia cmsim writeHits
writeDigis
CPU 2 min 8 hours 5 min
45 min
1 run
1 run
1 run
. . .
. . .
. . .
. . .
. . .
. . .
1 run
Data 0.5 MB 175 MB
275 MB 105 MB
truth.ntpl hits.fz hits.DB
digis.DB
1 run 500 events
SC2001 Demo Version
1 event
25TeraGrid 13.6 Tflop / 40 Gbps WAN
Site Resources
Site Resources
26
HPSS
HPSS
4
24
External Networks
External Networks
8
5
Caltech
Argonne
External Networks
External Networks
NCSA/PACI 8 TF 240 TB
SDSC 4.1 TF 225 TB
Site Resources
Site Resources
HPSS
UniTree
26iVDGL
We propose to create, operate and evaluate, over
asustained period of time, an international
researchlaboratory for data-intensive
science. From NSF proposal, 2001
- International Virtual-Data Grid Laboratory
- A global Grid laboratory (US, EU, Asia, )
- A place to conduct Data Grid tests at scale
- A mechanism to create common Grid infrastructure
- A facility to perform production exercises for
LHC experiments - A laboratory for other disciplines to perform
Data Grid tests
27iVDGL Summary Information
- GriPhyN PPDG project
- Principal components (as seen by USA)
- Tier1 proto-Tier2 selected Tier3 sites
- Fast networks US, Europe, transatlantic
(DataTAG), transpacific? - Grid Operations Center (GOC)
- Computer Science support teams
- Coordination, management
- Experiments
- HEP ATLAS, CMS (ALICE, CMS Heavy Ion, BTEV,
others?) - Non-HEP LIGO, SDSS, NVO, biology (small)
- Proposed international participants
- 6 Fellows funded by UK for 5 years, work in US
- US, UK, EU, Japan, Australia
- Discussions w/ Russia, China, Pakistan, India,
Brazil
28iVDGL Partners
- US national partners
- TeraGrid
- Complementary EU project DataTAG
- Transatlantic network from CERN to STAR-TAP (
people) - Initially 2.5 Gb/s
- Industry
- Discussions, but nothing firm yet
29Initial US-LHC Data Grid
Caltech/UCSD
Incomplete site list?
30iVDGL Map Circa 2002-2003
SURFNet
DataTAG
31US iVDGL Proposal Participants
- U Florida CMS
- Caltech CMS, LIGO
- UC San Diego CMS, CS
- Indiana U ATLAS, iGOC
- Boston U ATLAS
- U Wisconsin, Milwaukee LIGO
- Penn State LIGO
- Johns Hopkins SDSS, NVO
- U Chicago CS
- U Southern California CS
- U Wisconsin, Madison CS
- Salish Kootenai Outreach, LIGO
- Hampton U Outreach, ATLAS
- U Texas, Brownsville Outreach, LIGO
- Fermilab CMS, SDSS, NVO
- Brookhaven ATLAS
- Argonne Lab ATLAS, CS
T2 / Software
CS support
T3 / Outreach
T1 / Labs(not funded)
32US iVDGL Management Coordination
Collaboration Board(Advisory)
External Advisory Board
Project DirectorsP. Avery, I. Foster
Project Coordination GroupProject Directors
CoordinatorCoordinators of Systems Integration
OutreachPhysics Experiment RepresentativesUniver
sity Research Center or Group RepresentativesPACI
Representatives
iVDGL Design and Deployment
Integration with Applications
University Research Centers / Groups
International Grid Operations Center
33US iVDGL Budget
Units 1K
34International Grid Coordination Effort
- Participants
- GriPhyN, PPDG, TeraGrid, EU-DataGrid, CERN
- National efforts (USA, France, Italy, UK, NL,
Japan, ) - Have agreed to collaborate, develop joint
infrastructure - 1st meeting Mar. 2001 Amsterdam (GGF1)
- 2nd meeting Jun. 2001 Rome (GGF2)
- 3rd meeting Oct. 2001 Rome
- 4th meeting Feb. 2002 Toronto (GGF4)
- Coordination details
- Coordination, technical boards, open software
agreement - Inter-project dependencies, mostly High energy
physics - Grid middleware development integration into
applications - Major Grid and network testbeds ? iVDGL DataTAG
Complementary
35Summary
- Robust set of U.S. Data Grid projects underway
- GriPhyN
- PPDG
- iVDGL
- TeraGrid
- Projects demonstrating great value to ATLAS/CMS
- Grid RD
- Virtual Data Toolkit
- ATLAS/CMS physics hires
- Fund proto-Tier2 sites (hardware, people)
- Testbed for ATLAS/CMS Data Grids
- Collaboration among projects
- Joint PPDG-GriPhyN architecture
- Collaboration with EU DataGrid
- Coordination among various Data Grid projects
36Grid References
- Grid Book
- www.mkp.com/grids
- Globus
- www.globus.org
- Global Grid Forum
- www.gridforum.org
- TeraGrid
- www.teragrid.org
- EU DataGrid
- www.eu-datagrid.org
- PPDG
- www.ppdg.net
- GriPhyN
- www.griphyn.org
- iVDGL
- www.ivdgl.org
37Extra slides
38Selected Major Grid Projects
New
New
39Selected Major Grid Projects
New
New
New
New
New
40Selected Major Grid Projects
New
New
41Selected Major Grid Projects
New
New
Also many technology RD projects e.g., Condor,
NetSolve, Ninf, NWS See also www.gridforum.org