NorduGrid%20and%20Advanced%20Resource%20Connector - PowerPoint PPT Presentation

About This Presentation
Title:

NorduGrid%20and%20Advanced%20Resource%20Connector

Description:

... for Grid computing, trying to leave 'testbed' management and politics to others ... Contains a broker that polls MDS and decides to which queue at which cluster a ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 25
Provided by: oxan6
Learn more at: http://www.nordugrid.org
Category:

less

Transcript and Presenter's Notes

Title: NorduGrid%20and%20Advanced%20Resource%20Connector


1
NorduGrid and Advanced Resource Connector
  • Oxana SmirnovaLund/CERN
  • NorduGrid/LCG/ATLAS
  • Reykjavik, November 17, 2004

2
Outlook
  • NorduGrid background
  • Challenges of Grid computing
  • Advanced Resource Connector

3
Background
4
Nordic Testbed for Wide Area Computing and Data
Handling (1/2)
  • Ran in 2001-2002 as a part of the NORDUNet2
    program, aimed to enable Grid middleware and
    applications in the Nordic countries
  • Middleware EDG
  • Applications ATLAS DC1, theory (Lund, NORDITA)
  • Participants academic groups from 4 Nordic
    countries
  • DK Research Center COM, DIKU, NBI
  • FI HIP
  • NO U. of Bergen, U. of Oslo
  • SE KTH, Stockholm U., Lund U., Uppsala U. (ATLAS
    groups)
  • Funded resources
  • 3 FTEs
  • 4 test Linux clusters
  • 4-6 CPUs each
  • Variety of GNU/Linux OS RedHat, Mandrake,
    Slackware
  • Other resources
  • 2-3 0.5 FTEs
  • Rental CPU cycles

5
Nordic Testbed for Wide Area Computing and Data
Handling (2/2)
  • Strong links with EDG
  • WP6 active work with the ITeam Nordic CA
  • WP8 active work with ATLAS DC1
  • WP2 contribution to GDMP
  • Attempts to contribute to RC, Infosystem
  • Had to diverge from EDG in 2002
  • January 2002 became increasingly aware that EDG
    wont deliver a production-lever middleware
  • February 2002 developed own lightweight Grid
    architecture
  • March 2002 prototypes of the core services in
    place
  • April 2002 first live demos ran
  • May 2002 entered a continuous production mode
  • Guess what became known as the NorduGrid
    (http//www.nordugrid.org)

6
NorduGrid
  • Since end-2002 is a research collaboration
    between Nordic academic institutes
  • Open to anybody, non-binding
  • Contributed up to 15 to the ATLAS DC1
    (2002-2003) using local institute clusters and
    rental resources from HPC
  • Since end-2003 focuses only on the middleware
    support and development
  • The middleware was baptized Advanced Resource
    Connector (ARC) in end-2003
  • 6 core developers, many contributing student
    projects
  • Provides middleware to research groups (ATLAS,
    theory) and national Grid projects
  • ARC is installed on 40 sites (5000 CPUs) in 10
    countries

7
ARC Grid
  • A Grid based on ARC middleware
  • Driven (so far) mostly by the needs of the LHC
    experiments
  • One of the worlds largest production-level Grids
  • Close cooperation with other Grid projects
  • EU DataGrid (2001-2003)
  • SWEGRID, DCGC
  • NDGF
  • LCG
  • EGEE
  • Assistance in Grid deployment outside the Nordic
    area

8
Challenges
9
Grid computing the challenges
  • Network connectivity is NOT a problem (normally)
  • Bandwidth is yet to be saturated
  • Storage/data management servers are the
    bottlenecks
  • Computing and storage resources
  • Different ownership
  • Often incompatible purposes, practical and
    political
  • Often incompatible allocation and usage policies
  • Often competition/distrust within a single
    country, let alone different ones
  • Different technical characteristics
  • Whole spectrum of operating systems (mostly
    GNU/Linux though)
  • Whole range of hardware (CPUs from Pentium II to
    Opteron, RAM from 128MB to 2GB, disk space from
    1GB to 2TB, network connectivity from 10Mbps to
    Gbps etc)
  • Big variety of cluster configurations (PBS in
    many flavours, SGE, Condor, standalone
    workstations)

10
Grid challenges continued
  • Users and applications
  • Different users background
  • Ranging from a novice user to a sysadmin
  • Everybody has a preferred OS (many prefer MS
    Windows)
  • Most are reluctant to learn new ways
  • Very different applications
  • Whole spectrum from data-intensive to
    CPU-intensive tasks
  • Very different requirements on CPU, memory, disk
    and network consumption
  • Each application needs a certain runtime
    environment, which is sometimes an obscure
    application-specific piece of software, and
    sometimes a licensed s/w.
  • Users and resources are not in the same
    administrative domain

11
Middleware RD versus production facility
deployment and support
  • Technical solutions for distributedcomputing and
    data management are a-plenty however, political
    and sociological obstacles are even more
  • NorduGrid focuses on providing technical
    solutions for Grid computing, trying to leave
    testbed management and politics to others
  • In reality, developers inevitably get involved
    into management to some degree
  • Political considerations are ever nagging

12
Advanced Resource Connector
13
Philosophy
  • The system must be
  • Light-weight
  • Portable
  • Non-intrusive
  • Resource owners retain full control Grid Manager
    is effectively a yet another user (with many
    faces though)
  • No requirements w.r.t. OS, resource
    configuration, etc.
  • Clusters need not be dedicated
  • Runs independently of other existing Grid
    installation
  • Client part must be easily installable by a
    novice user
  • Strategy start with something simple that works
    for users and add functionality gradually

14
Architecture
  • Oriented towards serial batch jobs
  • Parallel jobs are perfectly possible, but only
    within a cluster ARC is however not optimized
    for this (yet)
  • Dynamical, heterogeneous set of resources
  • Computing Linux clusters (pools) or workstations
  • Addition of non-Linux resources is possible via
    Linux front-ends
  • Storage disk storage (no tape storages offered
    so far)
  • Each resource has a front-end
  • Custom GridFTP server for all the communications
  • Local information database LDAP DB Grid
    front-end (so-called GRIS)
  • Each user can have a lightweight brokering client
  • Grid topology is achieved by an hierarchical,
    multi-rooted set of indexing services (customized
    Globus MDS structure)
  • LDAP DB Grid front-end (so-called GIIS)
  • Serve as dynamical list of GRISes (via down-up
    registrations)
  • Several levels (project ? country ?
    international)
  • Matchmaking is performed by every client
    independently

15
ARC components
Goal no single point of failure
16
Implementation
  • Based on Globus Toolkit 2 API and libraries
  • Very limited subset is actually used mostly GSI
    and parts of MDS
  • Newly developed components follow Web services
    framework
  • Can be built upon GT3 libraries, but does not use
    its services
  • Stable by design
  • The heart(s) Grid Manager(s)
  • Front-end accepts job requests and formulates
    jobs for LRMS/fork
  • Performs most data movement (stage in and out),
    cache management, interacts with replica catalogs
  • Manages user work area
  • The nervous system Information System (MDS)
  • Provides the pseudo-mesh architecture, similar to
    file sharing networks
  • Information is never older than 30 seconds
  • The brain(s) User Interface(s)
  • Query the InfoSys for info, select a best
    resource, submit jobs
  • All the necessary job and data manipulation/monito
    ring tools

17
Information System
  • Uses Globus MDS 2.2
  • Soft-state registration allows creation of any
    dynamic structure
  • Multi-rooted tree
  • GIIS caching is not used by the clients
  • Several patches and bug fixes are applied
  • A new schema is developed, to serve clusters
  • Clusters are expected to be fairly homogeneous

18
Front-end and the Grid Manager
  • Grid Manager replaces Globus GRAM, still using
    Globus ToolkitTM 2 libraries
  • All transfers are made via GridFTP
  • Possibility to pre- and post-stage files,
    optionally using information from data indexing
    systems (RC, RLS)
  • Caching of pre-staged files is enabled
  • Application-specific runtime environment support

19
The User Interface
  • Provides a set of utilities to be invoked from
    the command line
  • Contains a broker that polls MDS and decides to
    which queue at which cluster a job should be
    submitted
  • The user must be authorized to use the cluster
    and the queue
  • The clusters and queues characteristics must
    match the requirements specified in the xRSL
    string (max CPU time, required free disk space,
    installed software etc)
  • If the job requires a file that is registered in
    a data indexing service, the brokering gives
    priority to clusters where a copy of the file is
    already present
  • From all queues that fulfills the criteria one is
    chosen randomly, with a weight proportional to
    the number of free CPUs available for the user in
    each queue
  • If there are no available CPUs in any of the
    queues, the job is submitted to the queue with
    the lowest number of queued job per processor

ngsub to submit a task
ngstat to obtain the status of jobs and clusters
ngcat to display the stdout or stderr of a running job
ngget to retrieve the result from a finished job
ngkill to cancel a job request
ngclean to delete a job from a remote cluster
ngrenew to renew users proxy
ngsync to synchronize the local job info with the MDS
ngls to list storage element contents
ngcopy to transfer files to, from and between clusters
ngrequest to transfer files asynchronously (requires SSE)
ngremove to remove files
20
Job Description extended Globus RSL
  • ((executable"recon.gen.v5.NG")
  • (arguments"dc1.002000.lumi02.01101.hlt.pythia_jet
    _17.zebra" "dc1.002000.lumi02.recon.007.01101.hlt.
    pythia_jet_17.eg7.602.ntuple" "eg7.602.job"
    999")
  • (stdout"dc1.002000.lumi02.recon.007.01101.hlt.pyt
    hia_jet_17.eg7.602.log")
  • (stdlog"gridlog.txt")(join"yes")
  • (
  • (((cluster"farm.hep.lu.se")(cluster"lscf.nbi.
    dk")(cluster"seth.hpc2n.umu.se")(cluster"login
    -3.monolith.nsc.liu.se"))
  • (inputfiles ("dc1.002000.lumi02.01101.hlt.pythi
    a_jet_17.zebra" "rc//grid.uio.no/lcdc1.lumi02.00
    2000,rcNorduGrid,dcnordugrid,dcorg/zebra/dc1.00
    2000.lumi02.01101.hlt.pythia_jet_17.zebra")
    ("recon.gen.v5.NG" "http//www.nordugrid.org/appl
    ications/dc1/recon/recon.gen.v5.NG.db")
    ("eg7.602.job" "http//www.nordugrid.org/applicat
    ions/dc1/recon/eg7.602.job.db") ("noisedb.tgz"
    "http//www.nordugrid.org/applications/dc1/recon/n
    oisedb.tgz"))
  • )
  • (inputfiles ("dc1.002000.lumi02.01101.hlt.pythi
    a_jet_17.zebra" "rc//grid.uio.no/lcdc1.lumi02.00
    2000,rcNorduGrid,dcnordugrid,dcorg/zebra/dc1.00
    2000.lumi02.01101.hlt.pythia_jet_17.zebra")
    ("recon.gen.v5.NG" "http//www.nordugrid.org/appli
    cations/dc1/recon/recon.gen.v5.NG")
    ("eg7.602.job" "http//www.nordugrid.org/applicat
    ions/dc1/recon/eg7.602.job"))
  • )
  • (outputFiles ("dc1.002000.lumi02.recon.007.0110
    1.hlt.pythia_jet_17.eg7.602.log"
    "rc//grid.uio.no/lcdc1.lumi02.recon.002000,rcNo
    rduGrid,dcnordugrid,dcorg/log/dc1.002000.lumi02.
    recon.007.01101.hlt.pythia_jet_17.eg7.602.log")
    ("histo.hbook" "rc//grid.uio.no/lcdc1.lumi02.r
    econ.002000,rcNorduGrid,dcnordugrid,dcorg/histo
    /dc1.002000.lumi02.recon.007.01101.hlt.pythia_jet_
    17.eg7.602.histo") ("dc1.002000.lumi02.recon.00
    7.01101.hlt.pythia_jet_17.eg7.602.ntuple"
    "rc//grid.uio.no/lcdc1.lumi02.recon.002000,rcNo
    rduGrid,dcnordugrid,dcorg/ntuple/dc1.002000.lumi
    02.recon.007.01101.hlt.pythia_jet_17.eg7.602.ntupl
    e"))
  • (jobname"dc1.002000.lumi02.recon.007.01101.hlt.py
    thia_jet_17.eg7.602")
  • (runTimeEnvironment"ATLAS-6.0.2")
  • (CpuTime1440)(Disk3000)(ftpThreads10))

21
More components
  • Storage Elements (SE)
  • Regular SE a GridFTP-enabled disk server
  • Smart SE a very new addition
  • Provides reliable file transfer
  • Communicates with various data indexing services
  • Asynchronous data manipulation
  • Monitor PHP4 client for InfoSys (localized so
    far in 3 languages)
  • VO lists anything from an HTTP-served text file
    to an LDAP database, to VOMS ca 20 VOs in total
    (over 800 potential users)
  • Logging service job provenance database, filled
    by GM
  • Data indexing services Globus products
  • Replica Catalog scalability and stability
    problems, many practical limitations not
    supported by Globus
  • Replica Location Service a history of stability
    problems, no support for data collections, very
    coarse-grained access and authorization

22
Performance
  • 2002-2003 the only Grid running massive
    production (more than 2000 successful jobs, ca 4
    TB of data processed)
  • Physics (ATLAS) tasks
  • 2003 Sweden starts allocating CPU slots for
    users on SweGrid running ARC
  • All kind of research tasks
  • 2004
  • ARC-connected Grid resources are used by ATLAS
    production system on equal footing with LCG/EGEE
    (EU) and Grid3 (USA)
  • Many Nordic Grid projects use ARC as the basic
    Grid middleware

23
ARC middleware status
  • Current stable release 0.4.4
  • GPL license
  • Available in 12 Linux flavors
  • Builds on top of NorduGrid-patched Globus Toolkit
    2
  • EDG VOMS integrated (voms-1.1.39-5ng)
  • Globus RLS support included
  • Current development series 0.5.x
  • Contains the Smart Storage Element and other
    newly introduced features
  • Anybody is free to use best-effort support is
    guaranteed
  • Support nordugrid-support_at_nordugrid.org
  • Download at http//ftp.nordugrid.org and
    cvs.nordugrid.org
  • Bug reports http//bugzilla.nordugrid.org
  • Everybody is welcomed to contribute
  • Join nordugrid-discuss_at_nordugrid.org (very busy
    list!)
  • Write-access to CVS will be given upon
    consultations with the rest of the developers

24
Conclusion
  • NorduGrids ARC is a reliable and robust Grid
    middleware, supporting distributed production
    facilities already for more than 2 years
  • The middleware is in development, everybody is
    welcomed to use and contribute
  • Using ARC does not give an automatic access to
    any resource please negotiate with the resource
    owners (create Virtual Organizations)
  • Deploying ARC does not open doors to all the
    users only resource owners decide whom to
    authorize
  • ARC developers are getting deeply involved in
    global Grid standardization and interoperability
    efforts
Write a Comment
User Comments (0)
About PowerShow.com