NorduGrid%20and%20Advanced%20Resource%20Connector - PowerPoint PPT Presentation

About This Presentation

Title:

NorduGrid%20and%20Advanced%20Resource%20Connector

Description:

... for Grid computing, trying to leave 'testbed' management and politics to others ... Contains a broker that polls MDS and decides to which queue at which cluster a ... – PowerPoint PPT presentation

Number of Views:52

Avg rating:3.0/5.0

Slides: 25

Provided by: oxan6

Learn more at: http://www.nordugrid.org

Category:

more less

Transcript and Presenter's Notes

Title: NorduGrid%20and%20Advanced%20Resource%20Connector

1
NorduGrid and Advanced Resource Connector

Oxana SmirnovaLund/CERN
NorduGrid/LCG/ATLAS
Reykjavik, November 17, 2004

2
Outlook

NorduGrid background
Challenges of Grid computing
Advanced Resource Connector

3
Background
4
Nordic Testbed for Wide Area Computing and Data
Handling (1/2)

Ran in 2001-2002 as a part of the NORDUNet2
program, aimed to enable Grid middleware and
applications in the Nordic countries
Middleware EDG
Applications ATLAS DC1, theory (Lund, NORDITA)
Participants academic groups from 4 Nordic
countries
DK Research Center COM, DIKU, NBI
FI HIP
NO U. of Bergen, U. of Oslo
SE KTH, Stockholm U., Lund U., Uppsala U. (ATLAS
groups)
Funded resources
3 FTEs
4 test Linux clusters
4-6 CPUs each
Variety of GNU/Linux OS RedHat, Mandrake,
Slackware
Other resources
2-3 0.5 FTEs
Rental CPU cycles

5
Nordic Testbed for Wide Area Computing and Data
Handling (2/2)

Strong links with EDG
WP6 active work with the ITeam Nordic CA
WP8 active work with ATLAS DC1
WP2 contribution to GDMP
Attempts to contribute to RC, Infosystem
Had to diverge from EDG in 2002
January 2002 became increasingly aware that EDG
wont deliver a production-lever middleware
February 2002 developed own lightweight Grid
architecture
March 2002 prototypes of the core services in
place
April 2002 first live demos ran
May 2002 entered a continuous production mode
Guess what became known as the NorduGrid
(http//www.nordugrid.org)

6
NorduGrid

Since end-2002 is a research collaboration
between Nordic academic institutes
Open to anybody, non-binding
Contributed up to 15 to the ATLAS DC1
(2002-2003) using local institute clusters and
rental resources from HPC
Since end-2003 focuses only on the middleware
support and development
The middleware was baptized Advanced Resource
Connector (ARC) in end-2003
6 core developers, many contributing student
projects
Provides middleware to research groups (ATLAS,
theory) and national Grid projects
ARC is installed on 40 sites (5000 CPUs) in 10
countries

7
ARC Grid

A Grid based on ARC middleware
Driven (so far) mostly by the needs of the LHC
experiments
One of the worlds largest production-level Grids
Close cooperation with other Grid projects
EU DataGrid (2001-2003)
SWEGRID, DCGC
NDGF
LCG
EGEE
Assistance in Grid deployment outside the Nordic
area

8
Challenges
9
Grid computing the challenges

Network connectivity is NOT a problem (normally)
Bandwidth is yet to be saturated
Storage/data management servers are the
bottlenecks
Computing and storage resources
Different ownership
Often incompatible purposes, practical and
political
Often incompatible allocation and usage policies
Often competition/distrust within a single
country, let alone different ones
Different technical characteristics
Whole spectrum of operating systems (mostly
GNU/Linux though)
Whole range of hardware (CPUs from Pentium II to
Opteron, RAM from 128MB to 2GB, disk space from
1GB to 2TB, network connectivity from 10Mbps to
Gbps etc)
Big variety of cluster configurations (PBS in
many flavours, SGE, Condor, standalone
workstations)

10
Grid challenges continued

Users and applications
Different users background
Ranging from a novice user to a sysadmin
Everybody has a preferred OS (many prefer MS
Windows)
Most are reluctant to learn new ways
Very different applications
Whole spectrum from data-intensive to
CPU-intensive tasks
Very different requirements on CPU, memory, disk
and network consumption
Each application needs a certain runtime
environment, which is sometimes an obscure
application-specific piece of software, and
sometimes a licensed s/w.
Users and resources are not in the same
administrative domain

11
Middleware RD versus production facility
deployment and support

Technical solutions for distributedcomputing and
data management are a-plenty however, political
and sociological obstacles are even more
NorduGrid focuses on providing technical
solutions for Grid computing, trying to leave
testbed management and politics to others
In reality, developers inevitably get involved
into management to some degree
Political considerations are ever nagging

12
Advanced Resource Connector
13
Philosophy

The system must be
Light-weight
Portable
Non-intrusive
Resource owners retain full control Grid Manager
is effectively a yet another user (with many
faces though)
No requirements w.r.t. OS, resource
configuration, etc.
Clusters need not be dedicated
Runs independently of other existing Grid
installation
Client part must be easily installable by a
novice user
Strategy start with something simple that works
for users and add functionality gradually

14
Architecture

Oriented towards serial batch jobs
Parallel jobs are perfectly possible, but only
within a cluster ARC is however not optimized
for this (yet)
Dynamical, heterogeneous set of resources
Computing Linux clusters (pools) or workstations
Addition of non-Linux resources is possible via
Linux front-ends
Storage disk storage (no tape storages offered
so far)
Each resource has a front-end
Custom GridFTP server for all the communications
Local information database LDAP DB Grid
front-end (so-called GRIS)
Each user can have a lightweight brokering client
Grid topology is achieved by an hierarchical,
multi-rooted set of indexing services (customized
Globus MDS structure)
LDAP DB Grid front-end (so-called GIIS)
Serve as dynamical list of GRISes (via down-up
registrations)
Several levels (project ? country ?
international)
Matchmaking is performed by every client
independently

15
ARC components
Goal no single point of failure
16
Implementation

Based on Globus Toolkit 2 API and libraries
Very limited subset is actually used mostly GSI
and parts of MDS
Newly developed components follow Web services
framework
Can be built upon GT3 libraries, but does not use
its services
Stable by design
The heart(s) Grid Manager(s)
Front-end accepts job requests and formulates
jobs for LRMS/fork
Performs most data movement (stage in and out),
cache management, interacts with replica catalogs
Manages user work area
The nervous system Information System (MDS)
Provides the pseudo-mesh architecture, similar to
file sharing networks
Information is never older than 30 seconds
The brain(s) User Interface(s)
Query the InfoSys for info, select a best
resource, submit jobs
All the necessary job and data manipulation/monito
ring tools

17
Information System

Uses Globus MDS 2.2
Soft-state registration allows creation of any
dynamic structure
Multi-rooted tree
GIIS caching is not used by the clients
Several patches and bug fixes are applied
A new schema is developed, to serve clusters
Clusters are expected to be fairly homogeneous

18
Front-end and the Grid Manager

Grid Manager replaces Globus GRAM, still using
Globus ToolkitTM 2 libraries
All transfers are made via GridFTP
Possibility to pre- and post-stage files,
optionally using information from data indexing
systems (RC, RLS)
Caching of pre-staged files is enabled
Application-specific runtime environment support

19
The User Interface

Provides a set of utilities to be invoked from
the command line
Contains a broker that polls MDS and decides to
which queue at which cluster a job should be
submitted
The user must be authorized to use the cluster
and the queue
The clusters and queues characteristics must
match the requirements specified in the xRSL
string (max CPU time, required free disk space,
installed software etc)
If the job requires a file that is registered in
a data indexing service, the brokering gives
priority to clusters where a copy of the file is
already present
From all queues that fulfills the criteria one is
chosen randomly, with a weight proportional to
the number of free CPUs available for the user in
each queue
If there are no available CPUs in any of the
queues, the job is submitted to the queue with
the lowest number of queued job per processor

ngsub to submit a task
ngstat to obtain the status of jobs and clusters
ngcat to display the stdout or stderr of a running job
ngget to retrieve the result from a finished job
ngkill to cancel a job request
ngclean to delete a job from a remote cluster
ngrenew to renew users proxy
ngsync to synchronize the local job info with the MDS
ngls to list storage element contents
ngcopy to transfer files to, from and between clusters
ngrequest to transfer files asynchronously (requires SSE)
ngremove to remove files
20
Job Description extended Globus RSL

((executable"recon.gen.v5.NG")
(arguments"dc1.002000.lumi02.01101.hlt.pythia_jet
_17.zebra" "dc1.002000.lumi02.recon.007.01101.hlt.
pythia_jet_17.eg7.602.ntuple" "eg7.602.job"
999")
(stdout"dc1.002000.lumi02.recon.007.01101.hlt.pyt
hia_jet_17.eg7.602.log")
(stdlog"gridlog.txt")(join"yes")
(
(((cluster"farm.hep.lu.se")(cluster"lscf.nbi.
dk")(cluster"seth.hpc2n.umu.se")(cluster"login
-3.monolith.nsc.liu.se"))
(inputfiles ("dc1.002000.lumi02.01101.hlt.pythi
a_jet_17.zebra" "rc//grid.uio.no/lcdc1.lumi02.00
2000,rcNorduGrid,dcnordugrid,dcorg/zebra/dc1.00
2000.lumi02.01101.hlt.pythia_jet_17.zebra")
("recon.gen.v5.NG" "http//www.nordugrid.org/appl
ications/dc1/recon/recon.gen.v5.NG.db")
("eg7.602.job" "http//www.nordugrid.org/applicat
ions/dc1/recon/eg7.602.job.db") ("noisedb.tgz"
"http//www.nordugrid.org/applications/dc1/recon/n
oisedb.tgz"))
)
(inputfiles ("dc1.002000.lumi02.01101.hlt.pythi
a_jet_17.zebra" "rc//grid.uio.no/lcdc1.lumi02.00
2000,rcNorduGrid,dcnordugrid,dcorg/zebra/dc1.00
2000.lumi02.01101.hlt.pythia_jet_17.zebra")
("recon.gen.v5.NG" "http//www.nordugrid.org/appli
cations/dc1/recon/recon.gen.v5.NG")
("eg7.602.job" "http//www.nordugrid.org/applicat
ions/dc1/recon/eg7.602.job"))
)
(outputFiles ("dc1.002000.lumi02.recon.007.0110
1.hlt.pythia_jet_17.eg7.602.log"
"rc//grid.uio.no/lcdc1.lumi02.recon.002000,rcNo
rduGrid,dcnordugrid,dcorg/log/dc1.002000.lumi02.
recon.007.01101.hlt.pythia_jet_17.eg7.602.log")
("histo.hbook" "rc//grid.uio.no/lcdc1.lumi02.r
econ.002000,rcNorduGrid,dcnordugrid,dcorg/histo
/dc1.002000.lumi02.recon.007.01101.hlt.pythia_jet_
17.eg7.602.histo") ("dc1.002000.lumi02.recon.00
7.01101.hlt.pythia_jet_17.eg7.602.ntuple"
"rc//grid.uio.no/lcdc1.lumi02.recon.002000,rcNo
rduGrid,dcnordugrid,dcorg/ntuple/dc1.002000.lumi
02.recon.007.01101.hlt.pythia_jet_17.eg7.602.ntupl
e"))
(jobname"dc1.002000.lumi02.recon.007.01101.hlt.py
thia_jet_17.eg7.602")
(runTimeEnvironment"ATLAS-6.0.2")
(CpuTime1440)(Disk3000)(ftpThreads10))

21
More components

Storage Elements (SE)
Regular SE a GridFTP-enabled disk server
Smart SE a very new addition
Provides reliable file transfer
Communicates with various data indexing services
Asynchronous data manipulation
Monitor PHP4 client for InfoSys (localized so
far in 3 languages)
VO lists anything from an HTTP-served text file
to an LDAP database, to VOMS ca 20 VOs in total
(over 800 potential users)
Logging service job provenance database, filled
by GM
Data indexing services Globus products
Replica Catalog scalability and stability
problems, many practical limitations not
supported by Globus
Replica Location Service a history of stability
problems, no support for data collections, very
coarse-grained access and authorization

22
Performance

2002-2003 the only Grid running massive
production (more than 2000 successful jobs, ca 4
TB of data processed)
Physics (ATLAS) tasks
2003 Sweden starts allocating CPU slots for
users on SweGrid running ARC
All kind of research tasks
2004
ARC-connected Grid resources are used by ATLAS
production system on equal footing with LCG/EGEE
(EU) and Grid3 (USA)
Many Nordic Grid projects use ARC as the basic
Grid middleware

23
ARC middleware status

Current stable release 0.4.4
GPL license
Available in 12 Linux flavors
Builds on top of NorduGrid-patched Globus Toolkit
2
EDG VOMS integrated (voms-1.1.39-5ng)
Globus RLS support included
Current development series 0.5.x
Contains the Smart Storage Element and other
newly introduced features
Anybody is free to use best-effort support is
guaranteed
Support nordugrid-support_at_nordugrid.org
Download at http//ftp.nordugrid.org and
cvs.nordugrid.org
Bug reports http//bugzilla.nordugrid.org
Everybody is welcomed to contribute
Join nordugrid-discuss_at_nordugrid.org (very busy
list!)
Write-access to CVS will be given upon
consultations with the rest of the developers

24
Conclusion

NorduGrids ARC is a reliable and robust Grid
middleware, supporting distributed production
facilities already for more than 2 years
The middleware is in development, everybody is
welcomed to use and contribute
Using ARC does not give an automatic access to
any resource please negotiate with the resource
owners (create Virtual Organizations)
Deploying ARC does not open doors to all the
users only resource owners decide whom to
authorize
ARC developers are getting deeply involved in
global Grid standardization and interoperability
efforts