HENP Grid Testbeds, Applications and Demonstrations - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

HENP Grid Testbeds, Applications and Demonstrations

Description:

Acknowledgement to all the speakers who gave fine ... (MOP) Local. Batch Manager. EDG. Scheduler. Computer farm. LCG-1. testbed. User's Site. Resources ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 28
Provided by: mikew50
Category:

less

Transcript and Presenter's Notes

Title: HENP Grid Testbeds, Applications and Demonstrations


1
HENP Grid Testbeds, Applications and
Demonstrations
Ruth Pordes Fermilab
  • Rob Gardner
  • University of Chicago
  • CHEP03
  • March 29, 2003

2
Overview
  • High altitude survey of contributions
  • group, application, testbed, services/tools
  • Discuss common and recurring issues
  • grid building, services development, use
  • Concluding thoughts
  • Acknowledgement to all the speakers who gave fine
    presentations, and my apologies in advance for
    not providing this very limited sampling

3
Testbeds, applications, and development of tools
and services
  • Testbeds
  • Alien grids
  • BaBar Grid
  • CrossGrid
  • DataTAG
  • EDG Testbed(s)
  • Grid Canada
  • IGT Testbed (US CMS)
  • Korean DataGrid
  • NorduGrid(s)
  • SAMGrid
  • US ATLAS Testbed
  • WorldGrid
  • Evaluations
  • EDG testbed evaluations and experience in
    multiple exps.
  • Testbed management experience
  • Applications
  • ALICE production
  • ATLAS production
  • BaBar analysis, file replication
  • CDF/D0 analysis
  • CMS production
  • LHCb production
  • Medical applications in Italy
  • Phenix
  • Sloan sky survey
  • Tools development
  • Use cases (HEPCAL)
  • Proof/Grid analysis
  • LCG Pool and grid catalogs
  • SRM, Magda
  • Clarens, Ganga, Genius, Grappa, JAS

4
EDG TB History
  • Successes
  • Matchmaking/Job Mgt.
  • Basic Data Mgt.
  • Known Problems
  • High Rate Submissions
  • Long FTP Transfers
  • Known Problems
  • GASS Cache Coherency
  • Race Conditions in Gatekeeper
  • Unstable MDS

ATLAS phase 1 start
CMS stress test Nov.30 - Dec. 20
  • Successes
  • Improved MDS Stability
  • FTP Transfers OK
  • Known Problems
  • Interactions with RC
  • Intense Use by Applications!
  • Limitations
  • Resource Exhaustion
  • Size of Logical Collections

CMS, ATLAS, LHCB, ALICE
Emanuele Leonardi
5
Resumé of experiment DC use of EDG-see experiment
talks elsewhere at CHEP
Stephen Burke
  • ATLAS were first, in August 2002. The aim was to
    repeat part of the Data Challenge. Found two
    serious problems which were fixed in 1.3
  • CMS stress test production Nov-Dec 2002 found
    more problems in area of job submission and RC
    handling led to 1.4.x
  • ALICE started on Mar 4 production of 5,000
    central Pb-Pb events - 9 TB 40,000 output files
    120k CPU hours
  • Progressing with similar efficiency levels to CMS
  • About 5 done by Mar 14
  • Pull architecture
  • LHCb started mid Feb
  • 70K events for physics
  • Like ALICE, using a pull architecture
  • BaBar/D0
  • Have so far done small scale tests
  • Larger scale planned with EDG 2

6
CMS Data Challenge 2002 on Grid
C. Grande
  • Two official CMS productions on the grid in
    2002
  • CMS-EDG Stress Test on EDG testbed CMS sites
  • 260K events CMKIN and CMSIM steps
  • Top-down approach more functionality but less
    robust, large manpower needed
  • USCMS IGT Production in the US
  • 1M events Ntuple-only (full chain in single job)
  • 500K up to CMSIM (two steps in single job)
  • Bottom-up approach less functionality but more
    stable, little manpower needed
  • See talk by P.Capiluppi

7
CMS production components interfaced to EDG
  • Four submitting UIs Bologna/CNAF (IT), Ecole
    Polytechnique (FR), Imperial College (UK),
    Padova/INFN (IT)
  • Several Resource Brokers (WMS), CMS-dedicated and
    shared with other Applications one RB for each
    CMS UI backup
  • Replica Catalog at CNAF, MDS (and II) at CERN and
    CNAF, VO server at NIKHEF

CMS ProdTools on UI
8
CMS/EDG Production
260K events produced 7 sec/event average 2.5
sec/event peak (12-14 Dec)
Events
Upgrade of MW
Hit some limit of implement.
20 Dec
CMS Week
30 Nov
P. Capiluppi talk
9
US-CMS IGT Production
25 Oct
  • gt 1 M events
  • 4.7 sec/event average
  • 2.5 sec/event peak (14-20 Dec 2002)
  • Sustained efficiency about 44

28 Dec
P. Capiluppi talk
10
G.Poulard
Grid in ATLAS DC1
US-ATLAS EDG Testbed Prod
NorduGrid part of Phase 1
reproduce part of full phase 1
2 production phase 1 data
production Full Phase 2
several tests production
See other ATLAS talks for more details
11
ATLAS DC1 Phase 1 July-August 02
G.Poulard
3200 CPUs 110 kSI95 71000 CPU days
39 Institutes in 18 Countries
  • Australia
  • Austria
  • Canada
  • CERN
  • Czech Republic
  • France
  • Germany
  • Israel
  • Italy
  • Japan
  • Nordic
  • Russia
  • Spain
  • Taiwan
  • UK
  • USA

grid tools used at 11 sites
5107 events generated 1107 events
simulated 3107 single particles 30 Tbytes 35
000 files
12
Meta Systems
G.Graham
  • MCRunJob approach by CMS production team
  • Framework for dealing with multiple grid
    resources and testbeds (EDG, IGT)

13
Hybrid production model
C. Grande
Users Site
Resources
Production Manager defines assignments
RefDB
Phys.Group asks for an official dataset
shell scripts
Local Batch Manager
JDL
EDG Scheduler
Site Manager starts an assignment
LCG-1 testbed
MCRunJob
DAGMan (MOP)
User starts a private production
Chimera VDL
Virtual Data Catalogue
Planner
14
Interoperability glue
15
Integrated Grid Systems
  • Two examples of integrating advanced production
    and analysis to multiple grids

SamGrid
AliEn
16
SamGrid Map
  • CDF
  • Kyungpook National University, Korea
  • Rutgers State University, New Jersey, US
  • Rutherford Appelton Laboratory, UK
  • Texas Tech, Texas, US
  • University of Toronto, Canada
  • DØ
  • Imperial College, London, UK
  • Michigan State University, Michigan, US
  • University of Michigan, Michigan, US
  • University of Texas at Arlington, Texas, US

17
Physics with SAM-Grid
S. Stonjek
z0(µ1)
z0(µ2)
Standard CDF analysis job submitted via SAM-Grid
and executed somewhere
J/? gt µ µ-
18
The BaBar Grid as of March 2003
D. Boutigny
CE SE WN
VO RC
CE SE WN
CE SE WN
RB
special challenges faced by a running
experiment with heterogeneous data requirements,
root, Objy
CE SE WN
CE SE WN
19
Grid Applications, Interfaces, Portals
  • Clarens
  • Ganga
  • Genius
  • Grappa
  • JAS-Grid
  • Magda
  • Proof-Grid
  • and higher level services
  • Storage Resource Manager (SRM)
  • Magda data management
  • POOL-Grid interface

20
PROOF and Data Grids
Fons Rademakers
  • Many services are a good fit
  • Authentication
  • File Catalog, replication services
  • Resource brokers
  • Monitoring
  • ? Use abstract interfaces
  • Phased integration
  • Static configuration
  • Use of one or multiple Grid services
  • Driven by Grid infrastructure

21
Different PROOFGRID Scenarios
Fons Rademakers
  • Static stand-alone
  • Current version, static config file,
    pre-installed
  • Dynamic, PROOF in control
  • Using grid file catalog and resource broker,
    pre-installed
  • Dynamic, ALiEn in control
  • Idem, but installed and started on the fly by
    AliEn
  • Dynamic, Condor in control
  • Idem, but allowing in addition slave migration in
    a Condor pool

22
see WorldGrid Poster this conf.
Executable "/usr/bin/env" Arguments "zsh
prod.dc1_wrc 00001" VirtualOrganization"datatag"
RequirementsMember(other.GlueHostApplicationSof
tware RunTimeEnvironment,"ATLAS-3.2.1" ) Rank
other.GlueCEStateFreeCPUs InputSandbox"prod.dc1
_wrc",rc.conf","plot.kumac" OutputSandbox"dc1
.002000.test.00001.hlt.pythia_jet_17.log","dc1.002
000.test.00001.hlt.pythia_jet_17.his","dc1.002000.
test.00001.hlt.pythia_jet_17.err","plot.kumac" R
eplicaCatalog"ldap//dell04.cnaf.infn.it9211/lc
ATLAS,rcGLUE,dcdell04,dccnaf,dcinfn,dcit" In
putData "LFdc1.002000.evgen.0001.hlt.pythia_je
t_17.root" StdOutput " dc1.002000.test.00001.h
lt.pythia_jet_17.log" StdError
"dc1.002000.test.00001.hlt.pythia_jet_17.err" Dat
aAccessProtocol "file"
JDL GLUE-aware files
JDL
input data location
GLUE Testbed
RB/JSS
II
Replica Catalog
TOP GIIS
GLUE-Schema based Information System
Job
data registration
CE
. . .
WN ATLAS sw
23
Ganga ATLAS and LHCb
C. Tull
24
C. Tull
Ganga EDG Grid Interface
Job Handler class
Job class
JobsRegistry class
Data management service
Job submission
Job monitoring
Security service
dg-job-list-match dg-job-submit dg-job-cancel
grid-proxy-init MyProxy
dg-job-status dg-job-get-logging-info GRM/PROVE
edg-replica-manager dg-job-get-output globus-url-c
opy GDMP
EDG UI
25
Comment Building Grid Applications
  • P is a dynamic configuration script
  • Turns abstract bundle into a concrete one
  • Challenge
  • building integrated systems
  • distributed developers and support

attributes user info grid info
P
26
In summaryCommon issues
  • Installation and configuration of MW
  • Application packaging, run time environments
  • Authentication mechanisms
  • Policies differing among sites
  • Private networks, firewalls, ports
  • Fragility of services, job submission chain
  • Inaccuracies, poor performance of information
    services
  • Monitoring and several levels
  • Debugging, site cleanup

27
Conclusions
  • Progress in the past 18 months has been dramatic!
  • lots of experience gained in building integrated
    grid systems
  • demonstrated functionality with large scale
    production
  • more attention being given to analysis
  • Many pitfalls exposed, areas for improvement
    identified
  • some of these are core middleware ? feedback
    given to technology providers
  • Policy issues remain using shared resources,
    authorization
  • operation of production services
  • user interactions, support models to be developed
  • Many thanks to the contributors to this session
Write a Comment
User Comments (0)
About PowerShow.com