HENP Grid Testbeds, Applications and Demonstrations - PowerPoint PPT Presentation

1 / 27

About This Presentation

Title:

HENP Grid Testbeds, Applications and Demonstrations

Description:

Acknowledgement to all the speakers who gave fine ... (MOP) Local. Batch Manager. EDG. Scheduler. Computer farm. LCG-1. testbed. User's Site. Resources ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 28

Provided by: mikew50

Learn more at: https://www.slac.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: HENP Grid Testbeds, Applications and Demonstrations

1
HENP Grid Testbeds, Applications and
Demonstrations
Ruth Pordes Fermilab

Rob Gardner
University of Chicago
CHEP03
March 29, 2003

2
Overview

High altitude survey of contributions
group, application, testbed, services/tools
Discuss common and recurring issues
grid building, services development, use
Concluding thoughts
Acknowledgement to all the speakers who gave fine
presentations, and my apologies in advance for
not providing this very limited sampling

3
Testbeds, applications, and development of tools
and services

Testbeds
Alien grids
BaBar Grid
CrossGrid
DataTAG
EDG Testbed(s)
Grid Canada
IGT Testbed (US CMS)
Korean DataGrid
NorduGrid(s)
SAMGrid
US ATLAS Testbed
WorldGrid
Evaluations
EDG testbed evaluations and experience in
multiple exps.
Testbed management experience

Applications
ALICE production
ATLAS production
BaBar analysis, file replication
CDF/D0 analysis
CMS production
LHCb production
Medical applications in Italy
Phenix
Sloan sky survey
Tools development
Use cases (HEPCAL)
Proof/Grid analysis
LCG Pool and grid catalogs
SRM, Magda
Clarens, Ganga, Genius, Grappa, JAS

4
EDG TB History

Successes
Matchmaking/Job Mgt.
Basic Data Mgt.
Known Problems
High Rate Submissions
Long FTP Transfers

Known Problems
GASS Cache Coherency
Race Conditions in Gatekeeper
Unstable MDS

ATLAS phase 1 start
CMS stress test Nov.30 - Dec. 20

Successes
Improved MDS Stability
FTP Transfers OK
Known Problems
Interactions with RC

Intense Use by Applications!
Limitations
Resource Exhaustion
Size of Logical Collections

CMS, ATLAS, LHCB, ALICE
Emanuele Leonardi
5
Resumé of experiment DC use of EDG-see experiment
talks elsewhere at CHEP
Stephen Burke

ATLAS were first, in August 2002. The aim was to
repeat part of the Data Challenge. Found two
serious problems which were fixed in 1.3
CMS stress test production Nov-Dec 2002 found
more problems in area of job submission and RC
handling led to 1.4.x

ALICE started on Mar 4 production of 5,000
central Pb-Pb events - 9 TB 40,000 output files
120k CPU hours
Progressing with similar efficiency levels to CMS
About 5 done by Mar 14
Pull architecture
LHCb started mid Feb
70K events for physics
Like ALICE, using a pull architecture
BaBar/D0
Have so far done small scale tests
Larger scale planned with EDG 2

6
CMS Data Challenge 2002 on Grid
C. Grande

Two official CMS productions on the grid in
2002
CMS-EDG Stress Test on EDG testbed CMS sites
260K events CMKIN and CMSIM steps
Top-down approach more functionality but less
robust, large manpower needed
USCMS IGT Production in the US
1M events Ntuple-only (full chain in single job)
500K up to CMSIM (two steps in single job)
Bottom-up approach less functionality but more
stable, little manpower needed
See talk by P.Capiluppi

7
CMS production components interfaced to EDG

Four submitting UIs Bologna/CNAF (IT), Ecole
Polytechnique (FR), Imperial College (UK),
Padova/INFN (IT)
Several Resource Brokers (WMS), CMS-dedicated and
shared with other Applications one RB for each
CMS UI backup
Replica Catalog at CNAF, MDS (and II) at CERN and
CNAF, VO server at NIKHEF

CMS ProdTools on UI
8
CMS/EDG Production
260K events produced 7 sec/event average 2.5
sec/event peak (12-14 Dec)
Events
Upgrade of MW
Hit some limit of implement.
20 Dec
CMS Week
30 Nov
P. Capiluppi talk
9
US-CMS IGT Production
25 Oct

gt 1 M events
4.7 sec/event average
2.5 sec/event peak (14-20 Dec 2002)
Sustained efficiency about 44

28 Dec
P. Capiluppi talk
10
G.Poulard
Grid in ATLAS DC1
US-ATLAS EDG Testbed Prod
NorduGrid part of Phase 1
reproduce part of full phase 1
2 production phase 1 data
production Full Phase 2
several tests production
See other ATLAS talks for more details
11
ATLAS DC1 Phase 1 July-August 02
G.Poulard
3200 CPUs 110 kSI95 71000 CPU days
39 Institutes in 18 Countries

Australia
Austria
Canada
CERN
Czech Republic
France
Germany
Israel
Italy
Japan
Nordic
Russia
Spain
Taiwan
UK
USA

grid tools used at 11 sites
5107 events generated 1107 events
simulated 3107 single particles 30 Tbytes 35
000 files
12
Meta Systems
G.Graham

MCRunJob approach by CMS production team
Framework for dealing with multiple grid
resources and testbeds (EDG, IGT)

13
Hybrid production model
C. Grande
Users Site
Resources
Production Manager defines assignments
RefDB
Phys.Group asks for an official dataset
shell scripts
Local Batch Manager
JDL
EDG Scheduler
Site Manager starts an assignment
LCG-1 testbed
MCRunJob
DAGMan (MOP)
User starts a private production
Chimera VDL
Virtual Data Catalogue
Planner
14
Interoperability glue
15
Integrated Grid Systems

Two examples of integrating advanced production
and analysis to multiple grids

SamGrid
AliEn
16
SamGrid Map

CDF
Kyungpook National University, Korea
Rutgers State University, New Jersey, US
Rutherford Appelton Laboratory, UK
Texas Tech, Texas, US
University of Toronto, Canada

DØ
Imperial College, London, UK
Michigan State University, Michigan, US
University of Michigan, Michigan, US
University of Texas at Arlington, Texas, US

17
Physics with SAM-Grid
S. Stonjek
z0(µ1)
z0(µ2)
Standard CDF analysis job submitted via SAM-Grid
and executed somewhere
J/? gt µ µ-
18
The BaBar Grid as of March 2003
D. Boutigny
CE SE WN
VO RC
CE SE WN
CE SE WN
RB
special challenges faced by a running
experiment with heterogeneous data requirements,
root, Objy
CE SE WN
CE SE WN
19
Grid Applications, Interfaces, Portals

Clarens
Ganga
Genius
Grappa
JAS-Grid
Magda
Proof-Grid

and higher level services
Storage Resource Manager (SRM)
Magda data management
POOL-Grid interface

20
PROOF and Data Grids
Fons Rademakers

Many services are a good fit
Authentication
File Catalog, replication services
Resource brokers
Monitoring
? Use abstract interfaces
Phased integration
Static configuration
Use of one or multiple Grid services
Driven by Grid infrastructure

21
Different PROOFGRID Scenarios
Fons Rademakers

Static stand-alone
Current version, static config file,
pre-installed
Dynamic, PROOF in control
Using grid file catalog and resource broker,
pre-installed
Dynamic, ALiEn in control
Idem, but installed and started on the fly by
AliEn
Dynamic, Condor in control
Idem, but allowing in addition slave migration in
a Condor pool

22
see WorldGrid Poster this conf.
Executable "/usr/bin/env" Arguments "zsh
prod.dc1_wrc 00001" VirtualOrganization"datatag"
RequirementsMember(other.GlueHostApplicationSof
tware RunTimeEnvironment,"ATLAS-3.2.1" ) Rank
other.GlueCEStateFreeCPUs InputSandbox"prod.dc1
_wrc",rc.conf","plot.kumac" OutputSandbox"dc1
.002000.test.00001.hlt.pythia_jet_17.log","dc1.002
000.test.00001.hlt.pythia_jet_17.his","dc1.002000.
test.00001.hlt.pythia_jet_17.err","plot.kumac" R
eplicaCatalog"ldap//dell04.cnaf.infn.it9211/lc
ATLAS,rcGLUE,dcdell04,dccnaf,dcinfn,dcit" In
putData "LFdc1.002000.evgen.0001.hlt.pythia_je
t_17.root" StdOutput " dc1.002000.test.00001.h
lt.pythia_jet_17.log" StdError
"dc1.002000.test.00001.hlt.pythia_jet_17.err" Dat
aAccessProtocol "file"
JDL GLUE-aware files
JDL
input data location
GLUE Testbed
RB/JSS
II
Replica Catalog
TOP GIIS
GLUE-Schema based Information System
Job
data registration
CE
. . .
WN ATLAS sw
23
Ganga ATLAS and LHCb
C. Tull
24
C. Tull
Ganga EDG Grid Interface
Job Handler class
Job class
JobsRegistry class
Data management service
Job submission
Job monitoring
Security service
dg-job-list-match dg-job-submit dg-job-cancel
grid-proxy-init MyProxy
dg-job-status dg-job-get-logging-info GRM/PROVE
edg-replica-manager dg-job-get-output globus-url-c
opy GDMP
EDG UI
25
Comment Building Grid Applications