Grid3: Practice and Principles

About This Presentation

Title:

Grid3: Practice and Principles

Description:

... Michael Ernst, Ian Fisk, Lisa Giacchetti, Greg ... Shane Canon, Jason Lee, Doug Olson, Iowa Sakrejda, Brian Tierney. University at Buffalo ... MonA LISA ... – PowerPoint PPT presentation

Number of Views:177

Avg rating:3.0/5.0

Slides: 60

Provided by: Mar5527

Category:

more less

Transcript and Presenter's Notes

Title: Grid3: Practice and Principles

1
Grid3 Practice and Principles

Rob Gardner
University of Chicago
rwg_at_hep.uchicago.edu
April 12, 2004

2
Introduction

Today Id like to review a little about Grid3,
its principles and practice
Id also like to show how we intend to use it for
ATLAS Data Challenge 2
and, how we will evolve it over the course of
the next several months
Acknowledgements
Im entirely reporting on others work!
Much of the project was summarized at the iVDGL
NSF review (Feb 04), and US LHC Project reviews
(Jan 04)
Especially I am indebted to Marco Mambelli, Ian
Fisk, Ruth Pordes, Jorge Rodriguez, Leigh
Grundhoefer for slides

3
Outline

Intro/project background
Grid3 infrastructure and operations
Applications
Metrics and Lessons
Grid3, development, and ATLAS DC2
Conclusions

4
Grid themes then and now

Eg. proposal (simple, naïve?)
Internationally integrated virtual data grid
system, interoperable, controlled sharing of data
and compute resources
Common grid infrastructure, export to other
application domains
Large scale research Laboratory for VDT
development

Now
Many new initiatives, in US worldwide
Dynamic resources and organizations
Dynamic project priorities
Ability to adapt to change is key factor to
success
? Grid2003 (Grid3)

5
Grid2003 Project history

Joint project with USATLAS, USCMS, iVDGL, PPDG,
GriPhyN
Organized as a Project Grid2003
Developed Summer/Fall 2003 Grid3 grid
benefited from STAR cycles and local efforts at
BNL/ITD
Use US-developed components
VDT based (GRAM, Gridftp, MDS, Monitoring
components) applications
iGOC monitoring and VO level services
Interoperate, or federate with other Grids like
LCG
ATLAS successful use of ChimeraLCG-1 last
December
USCMS storage element interoperability

6
23 institutes
Argonne National Laboratory Jerry Gieraltowski,
Scott Gose, Natalia Maltsev, Ed May, Alex
Rodriguez, Dinanath Sulakhe Boston University Jim
Shank, Saul Youssef Brookhaven National
Laboratory David Adams, Rich Baker, Wensheng
Deng, Jason Smith, Dantong Yu Caltech Iosif
Legrand, Suresh Singh, Conrad Steenberg, Yang
Xia Fermi National Accelerator Laboratory Anzar
Afaq, Eileen Berman, James Annis, Lothar
Bauerdick, Michael Ernst, Ian Fisk, Lisa
Giacchetti, Greg Graham, Anne Heavey, Joe Kaiser,
Nickolai Kuropatkin, Ruth Pordes, Vijay Sekhri,
John Weigand, Yujun Wu Hampton University
Keith Baker, Lawrence Sorrillo Harvard
University John Huth Indiana University Matt
Allen, Leigh Grundhoefer, John Hicks, Fred
Luehring, Steve Peck, Rob Quick, Stephen Simms
Johns Hopkins University George Fekete, Jan
vandenBerg Kyungpook National University /
KISTI Kihyeon Cho, Kihwan Kwon, Dongchul Son,
Hyoungwoo Park Lawrence Berkeley National
Laboratory Shane Canon, Jason Lee, Doug Olson,
Iowa Sakrejda, Brian Tierney University at
Buffalo Mark Green, Russ Miller
University of California San Diego James Letts,
Terrence Martin University of Chicago David Bury,
Catalin Dumitrescu, Daniel Engh, Ian Foster,
Robert Gardner, Marco Mambelli, Yuri Smirnov,
Jens Voeckler, Mike Wilde, Yong Zhao, Xin
Zhao University of Florida Paul Avery, Richard
Cavanaugh, Bockjoo Kim, Craig Prescott, Jorge L.
Rodriguez, Andrew Zahn University of
Michigan Shawn McKee University of New
Mexico Christopher T. Jordan, James E. Prewett,
Timothy L. Thomas University of Oklahoma Horst
Severini University of Southern California Ben
Clifford, Ewa Deelman, Larry Flon, Carl
Kesselman, Gaurang Mehta, Nosa Olomu, Karan
Vahi University of Texas, Arlington Kaushik De,
Patrick McGuigan, Mark Sosebee University of
Wisconsin-Madison Dan Bradley, Peter Couvares,
Alan De Smet, Carey Kireyev, Erik Paulson, Alain
Roy University of Wisconsin-Milwaukee Scott
Koranda, Brian Moe Vanderbilt University Bobby
Brown, Paul Sheldon Contact authors
60 people working directly 8 full time, 10 half
time, 20 site admins ¼ time
7
Grid3 services, roughly

Site Software
VO Services
Information Services
Monitoring
Applications

8
Operations - Site Software

Design of Grid3 Site software distribution base
Largely based upon successful WorldGrid
installation deployment
Create and Maintain Installation Guides
Coordinate upgrades after initial installation
(Installation Fests)

Pacman
iVDGLGrid3 Site Software
Grid3 Site

VDT VO service GIIS Reg InfoProv G3 Schema LogMgmt
Compute Facility
Storage
9
Operations Security Base

One of the Virtual Organizations Registration
Authorities (VO RA) operating with some delegated
authority of the DOE Grids Certificate Authority
is the iVDGL Registration Authority
The iVDGL RA is used to check the identity of
individuals requesting certificates.
282 iVDGL Certificates have been issued for iVDGL
use.

10
Operations Security Model

Provide Grid3 compute resources with automated
multi-VO authorization model, using VOMS and
mkgridmap
Each VO manages a service and its members
Each Grid3 site is able to generate a Globus
Auth. file with an authenticated SOAP query to
each VO service

SDSS VOMS
USCMS VOMS
Grid3 Sites
Globus Auth.
USAtlas VOMS
BTeVVOMS
LSC VOMS
iVDGLVOMS
11
Operations - Support and Policy

Investigation and resolution of grid middleware
problems at the level of 16-20 contacts per week
Develop Service Level Agreements for Grid service
systems and iGOC support service
and other centralized support points such as LHC
Tier1
Membership Charter completed which defines the
process to add new VOs, sites and applications
to the Grid Laboratory
Support Matrix defining Grid3 and VO services
providers and contact information

12
Operations - Index Services

Hierarchical Globus Information Index Service
(GIIS) design
Automated Resource registration to Index Service
MDS Grid3 Schema development and Information
Provider verification
MDS tuning for large heterogeneous grid

GRID3 Site Resources
VO Index Service (6)
Grid3 Index Service
USAtlas GIIS
Boston GRIS
Boston U
GRID3 Location Grid3 Data_DIR
UofChicago
GRID3 GIIS
ANL
BNL
ANL BNL Boston U UofChicago
UFL GRIS
USCMS GIIS
Grid3 Location Grid3 Data DIR Grid3
Applications Grid3 Temporary DIR
UFL FNAL RiceU CalTech
UFL FNAL RiceU CalTech
13
Grid Operations - Site Monitoring

Ganglia
Open source tool to collect cluster monitoring
information such as CPU and network load, memory
and disk usage
MonA LISA
Monitoring tool to support resource discovery,
access to information and gateway to other
information gathering systems
ACDC Job Monitoring System
Application using grid submitted jobs to query
the job managers and collect information about
jobs. This information is stored in a DB and
available for aggregated queries and browsing.
Metrics Data Viewer (MDViewer)
analyzes and plots information collected by the
different monitoring tools, such as the DBs at
iGOC.
Globus MDS
Grid2003 Schema for Information Services and
Index Services for Information services

14
Monitoring services
Intermediaries
Producers
Consumers
OS (syscall, /proc)
WWW
GRIS
Reports
Log files
System config.
Job manager
MonALISA client
User clients
MDViewer
15
Ganglia

Usage information
CPU load
NIC traffic
Memory usage
Disk usage
Used directly and indirectly
Site Web pages
Central Web pages
MonALISA agent

16
MonALISA

Flexible framework
Java based
JINI directory
Multiple agents
Nice graphic interface
3D globe
Elastic connection
ML repository
Persistent repository

17
Metrics Data Viewer

Flexible tool for information analysis
Databases, log files
Multiple plot capabilities
Predefined plots, possibility to add new ones
Customizable, possibility to export the plots

18
MDViever (2)

CPU provided
CPU used
Load
Usage per VO
IO
NIC
File transfers per VO
Jobs
Submitted, Failed,
Running, Idle,

19
MDViever (3)
20
Operations - Site Catalog Map
21
Grid2003 Applications
22
Application Overview

7 Scientific applications and 3 CS demonstrators
All iVDGL experiments participated in the
Grid2003 project
A third HEP and two Bio-Chemical experiments also
participated
Over 100 users authorized to run on Grid3
Application execution performed by dedicated
individuals
Typically 1, 2 or 3 users ran the applications
from a particular experiment
Participation from all Grid3 sites
Sites categorized according to policies and
resource
Applications ran concurrently on most of the
sites
Large sites with generous local use policies
where more popular

23
Scientific Applications

High Energy Physics Simulation and Analysis
USCMS MOP, GEANT based full MC simulation and
reconstruction
Work flow and batch job scripts generated by
McRunJob
Jobs generated at MOP master (outside of Grid3)
which submits to Grid3 sites via condor-G
Data products are archived at FermiLab
SRM/dcache
USATLAS GCE, GEANT based full MC simulation and
reconstruction
Workflow is generated by Chimera VDS, Pegasus
grid scheduler and globus MDS for resource
discovery
Data products archived at BNL Magada and globus
RLS are employed
USATLAS DIAL, Distributed analysis application
Dataset catalogs built, n-tuple analysis and
histogramming (data generated on Grid3)
BTeV Full MC simulation
Also utilizes the Chimera workflow generator and
condor G (VDT)

24
Scientific Applications, cont

Astrophysics and Astronomical
LIGO/LSC blind search for continuous
gravitational waves
SDSS maxBcg, cluster finding package
Bio-Chemical
SnB Bio-molecular program, analyses on X-ray
diffraction to find molecular structures
GADU/Gnare Genome analysis, compares protein
sequences
Computer Science
Evaluation of Adaptive data placement and
scheduling algorithms

25
CS Demonstrator Applications

Exerciser
Periodically runs low priority jobs at each site
to test operational status
NetLogger-grid2003
Monitored data transfers between Grid3 sites via
NetLogger instrumented pyglobus-url-copy
GridFTP Demo
Data mover application using GridFTP designed to
meet the 2TB/day metric

26
Running on Grid3

With information provided by the Grid3
information system
Composes list of target sites
Resource available
Local site policies
Finds where to install application and where to
write data
MDS information system used
Provides pathname for APP, DATA, TMP and
WNTMP
User sends and remotely installs application from
a local site
User submit job(s) through globus GRAM
User does not need to interact with local site
administrators

27
US CMS use of Grid3

history of past three months cpu-days)

28
ATLAS PreDC2 on Grid3 (Fall 2003)

US ATLAS PreDC2 exercise
Development of ATLAS tools for DC2
Collaborative work on Grid2003 project
Gain experience with the LCG grid

US ATLAS Testbed
US ATLAS shared, heterogeneous resources
contributed to Grid2003
29
PreDC2 approach

Monte Carlo production on Grid3 sites
Dynamic software installation of ATLAS releases
on Grid3
Integrated GCE client-server based on VDT tools
(Chimera, Pegasus, control database, lightweight
grid scheduler, Globus MDS)
Measure job performance metrics using MonALISA
and MDViewer (metrics data viewer)
Collect and archive output to BNL
MAGDA and Globus RLS used (distributed LRC RLI
at two sites)
Reconstruct fraction of data at CERN and other
LCG-1 sites
To pursue Chimera/VDT interoperability issues
with LCG-1
Copy data back to BNL exercise Tier1-CERN links
Successful integration post SC2003 !
Analysis Distributed analysis of datasets using
DIAL
Dataset catalogs built, n-tuple analysis and
histogramming
Metrics collected and archived on all of the above

30
US ATLAS Datasets on Grid3
ATLAS

Grid3 resources used
16 sites, 1500 CPUs exercised peak 400 jobs
over three week period
Higgs ? 4 lepton sample
Simulation and Reconstruction
2000 jobs ( X 6 subjobs) 100200 events per job
( 200K events)
500 GB output data files
Top sample
Reproduce DC1 dataset simulation and
reconstruction steps
1200 jobs ( x 6 subjobs) 100 events per job
(120K sample)
480 GB input data files
Data used by PhD student at U. Geneva

11/18/03 50 sample 800 jobs
H ? 4e
31
US ATLAS during SC2003
Total
CPU usage (totals)
ATLAS
Total
CMS
CPU usage (by day)
ATLAS
32
US ATLAS and LCG

ATLSIM output staged on disk at BNL
Use GCE-Client host to submit to LCG-1 server
Jobs executing on LCG-1
Input files registered at BNL-RLI
Stage data from BNL to local scratch area
Run Athena-reconstruction using release 6.5.0
Write data to local storage element
Copy back to disk cache at BNL, RLS register
implementing 3rd-party transfer between LCG SE
and BNL gridftp servers
Post job
Magda registration of ESD (Combined ntuple
output), Verification
DIAL dataset catalog and analysis
Many lessons learned about site architectural
differences, service configurations
Concrete steps towards combined use of LCG and US
Grids!

33
Grid2003 Metrics and Lessons
34
Metrics Summary
35
Grid3 Metrics Collection

Grid3 monitoring system
MonALISA
MetricsDataViewer
Queries to persistent storage DB
MonALISA plots
MDViewer plots

36
Grid2003 Metrics Results

Hardware resources
Total of 2762 CPUs
Maximum CPU count
Off project contribution gt 60
Total of 27 sites
27 administrative domains with local policies in
effect
All across US and Korea
Running jobs
Peak number of jobs 1100
During SC2003 various applications were running
simultaneously across various Grid3 sites

37
Data transfers around Grid3 sites

Data transfer metric
GridFTP demo
Data transfer application
Used concurrently with application runs
Target met 11.12.03 (4.4 TB)

38
Global Lessons

Grid2003 was an evolutionary change in
environment
The infrastructure was built on established
testbed efforts from participating experiments
However, it was a revolutionary change in scale
The available computing and storage resources
increased
Factor of 4 -5 over individual VO environments
The human interactions in the project increased
More participants from more experiments and
closer interactions with CS colleagues.
Difficult to find a lot of positive examples of
successful common projects between experiments
Grid2003 is an exemplar of how this can work

39
Lessons Categories
The Grid2003 Lessons can be divided into two
categories

Architectural Lessons
Service Setup on the clusters
Processing, Storage, and Information Providing
Issues
Scale that can be realistically achieved
Interoperability issues with other grid projects
Identifying Single Point Failures

Operational Lessons
Software Configuration and Software Updates
Trouble Shooting
Support and Contacts
Information exchange and coordination
Service Levels
Grid Operations

These lessons are guiding the next phase of
development
40
Identifying System Bottlenecks

Grid2003 attempted to keep requirements on sites
light
Install software on interface nodes
Do not install software on worker nodes
Makes the system flexible and deployable on a lot
of architectures.
Maximizes participation
Can place heavy loads on the gateways systems
As these services become overloaded they also
become unreliable
Need to improve the scalability of the
architecture
Difficult to make larger clusters
Already headnodes need to be powerful systems to
keep up with the requirements

Worker Node
Worker Node
LAN
Worker Node
Storage
Worker Node
Head Node
Grid2003 Packages
WAN
41
Gateway Functionality

Gateway systems are currently expected to
Handle GRAM bridge for processing
Involves querying running and queued jobs
regularly
Significant Load
Handle data transfers in and out for all jobs
running on the cluster
Allows worker nodes to not need WAN access
Significant Load
Monitor and publish status of cluster elements
Audit and record cluster usage
Small Load
Collect and publish information
Site configuration and GLUE schema elements
Small Load

42
Processing Requirements

Grid2003 is a diverse set of sites
We are attempting to provide applications with a
richer description of resource requirements and
collect job execution requirements
We have not deployed resource brokers and have
limited use of job optimizers.
Even with manually submitted grid jobs. It is
hard to know and the satisfy the site
requirements
Periodically have fallen back to e-mail
information exchange
Equally hard for sites to know what the
requirements and activities of the incoming
processes will be.
Grid2003 lesson was that it is difficult to
submit jobs with requirement exchange
Need to improve and automate information exchange.

43
Single Point Failures

The Grid2003 team already knew that single point
failures are bad
Fortunately, We also learned that in our
environment they are rare.
The grid certificate authorities are, by design,
a single point of information.
The only grid wide meltdowns were a result of
certificate authority problems.
Unfortunately, this happened more than once.
The Grid2003 information providers were not
protected against loss
Even during failures the grid functionality was
not lost
The information provided is currently fairly
static

44
Operational Issue Configuration

One of the first operational lessons Grid2003
learned was the need for better tools to check
and verify the configuration
We had good packaging and deployment tools
through Pacman, but we spent an enormous amount
of time and effort diagnosing configuration
problems
Early on Grid2003 implemented tools and scripts
to check site status, but there were few tools
that allowed the site admin to check site
configuration
Too much manual information exchange was required
before a site was operational
In the next phase of the project we expect to
improve this.

45
Software Updates

The software updates in Grid2003 varied from
relatively easy to extremely painful
Some software packages could self-update
The monitoring framework Mona Lisa could upgrade
the entire distribution
This allowed developers to make improvements and
satisfy changing requirements without involving
site admins
Could be set to update regularly with a cron job
or triggered manually
Grid2003 is investigating whether this is a
reasonable operating mode
This is functionality is becoming more common

46
Grid3 Evolution Grid3
47
Grid3 ? Grid3

Endorsement in December for extension of project
US LHC, Trillium Grid projects support adiabatic
upgrades to Grid3 and continued operation for
next 6 months
Begin as well a development activity focusing on
grid/web services, but at a much reduced level
Planning exercise in January-Feb to collect key
issues from stakeholders
each VO US ATLAS, US CMS, SDSS, LIGO, BTEV,
IVDGL
IGOC coordinating operations
VDT as the main supplier of core middleware
Two key recommendations
Development of storage element interface and
introduction into Grid3
Support for a common development test grid
environment

48
Grid Deployment Goals for 2004

Detailed instructions for operating grid
middleware through firewalls and other port
limiting software.
Introduction of VOMS extended X.509 proxies to
allow callout authorization at the resource
Secure web service for User Registration for VO
members
Evaluate new methods for VO accounts at sites
Move to new Pacman Version 3 for software
distribution and installation.
Adoption of new VDT Core Software which provides
install time service configuration
Support and testing for other distributed file
system architectures
Install time support for multi- homed systems

49
iGOC Infrastructure Goals for 2004

Status of site monitoring to enable a effective
job scheduling system - collection and archival
of monitoring data.
Reporting of available storage resources at each
site.
Status display of current and queued workload for
each site.
Integrate Grid3 schema into GLUE schema.
Acceptable Use Policy in development
Grid3 Operational Model in negotiation

50
New Development Grid Infrastructure
Grid3 Common Environment
grid3dev Environment

Authentication Service
Approved VOMS servers
Monitoring Service
catalog
MonALISA
ganglia
ACDC
Stable Grid3 software cache

Authentication Service
test VOMS server
Approved VOMS
servers
new VOMS server(s)
Monitoring Service
catalog (test vers)
MonALISA (test vers)
ganglia (test vers)
Development s/w caches

51
Grid3 upgrades

Tests of new Grid3 installation on US ATLAS DTG
(Development Test Grid) for past two weeks
These tests based on VDT 1.1.13
Decision to wait for VDT 1.1.14
VDT 1.1.14 has many desirable features
Globus upgrade to 2.4 has many patches
MDS and rest of Globus software now in synch
Much improved MonALISA installation
Upgrade can proceed adiabatically
submission to mixed VDT grid okay
Support for storage element will come later
during DC2

52
US ATLAS DC2 and Grid3
53
Execution System for DC2

The execution system for US ATLAS is based on the
ATLAS supervisor/executor job model
ATLAS jobs described in the ATLAS production
database must be translated by the executor into
the job description language and execution
environment of the US grid system
Based on the VDT and divided into two parts
client side (job submission)
Supervisor (Windmill) client
GCE-Client (Grid Component Environment software
VDT-Client, Chimera virtual data system, Pegasus
DAG builder)
Capone (ATLAS job execution framework) web
service
server side (job execution) the Grid
Grid3 middleware and ACE (ATLAS software kit
releases, GCE-Server, DC2 transformations, Pacman
readiness kit)
and other services information, monitoring, VO
management

54
Phase 1
Windmill host oracle client user certificate
Jabber Switch
PDB
J Proxy any host
XMPP (XML) SOAP GRAM (Globus) TCP/IP Catalog
reference
Capone Web Service
Capone diagnostic client
RLS Globus RLS PRD ATLAS prod db DQ Don Quixote
GCE Client
submit host user certificate
RLS
DQ
CE Grid3 (VDT) ACE GCE-Server Site
Readiness ATLAS releases
Sim outFile SE
Gen inFile SE
any grid visible location
BNL
55
Saul Youssef
DC2 Environments Use Pacman 3 to capture working
environments as things come together for DC2
ACE
Windmill
Capone
ACE
Don Quijote
Monitor sites, update software, add sites
56
Windmill Capone Communication

Supervisor request (WS/Jabber)
XML message
Request message translated
Processing
CPE elaboration
Grid interactions
Don Quixote interactions
Response
XML response
Response to Supervisor

Supervisor
Messaging (WS/Jab)
Translate
5
ProcDB
CPE
4.A
4.B
4.C
Grid (GCE)
DQx
57
Capone Processing

Jobs received from supervisor undergo 13 process
steps
Receive, Translate, DAXgen, RLSreg, Schedule,
cDAGgen, Submit, Run, Check, stageOut, Clean,
Finish, Kill
Status codes
Success/failure (each step)
Completion/failure (job)

executeJob
received
recovery

stageOut
stageOut
recovery
end
fixJob
58
DC2 Metrics, to be collected

CPU usedaverage number of CPU used during the
day
CPU providedaverage number of CPU provided (by
ATLAS and by the Grid) during the day
Response timecompletion time submission time
Jobs wanted, submitted, successfully completed,
failed

59
Conclusions and Outlook

Grid2003 taught us many lessons about how to
deploy and run a grid
Grid3 will be a critical resource for continued
Data Challenges, which are driving the
development of the next phase of development
Challenge will be to maintain vital RD effort
while doing sustained production operations
ATLAS DC2 beginning May 1
will provide a new class of lessons, directions
for future development