gLite Lecture 1 - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

gLite Lecture 1

Description:

INFSO-RI-508833. Why another ... INFSO-RI-508833. Processes. Experience from EDG and other ... VOMS-generated maps for each VO. Site allow/deny lists managed ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 50
Provided by: erw66
Category:

less

Transcript and Presenter's Notes

Title: gLite Lecture 1


1
gLite Lecture 1
  • Peter Kunszt
  • EGEE Middleware Activity (JRA1)Data Management
    Cluster Leader

2
Outline
  • gLite why? what is it? An Overview and
    Motivation.
  • General differences to currently deployed
    Middleware
  • Overall Security concepts
  • Workload Management
  • This is a Differential presentation trying
    not to repeat what has been said earlier. Focus
    is on the differences to LCG, as presented in the
    lectures on Monday and Tuesday.

3
Chapter Overview
  • OVERVIEW
  • Quick EGEE / JRA1 intro
  • Motivation of gLite
  • Overview of Concepts and Services

4
EGEE
  • EU 6th Framework project
  • See http//eu-egee.org for details on size,
    number of partners, etc
  • Focus on Grid Deployment
  • gt 120 sites currently participating
  • Project is organized into 11 Activities
  • Service Activities 1,2
  • SA1 Deployment (48 of EGEE )
  • SA2 Network provisioning
  • Joint Research Activities 1-4
  • JRA1 Middleware development
  • JRA2 Quality Assurance
  • JRA3 Security
  • JRA4 Networking
  • Networking Activities 1-5 management,
    administration, dissemination, tutorials, project
    relations, website..

5
JRA1 management
F. Hemmer Dep E. Laure
M. Barroso
E. Laure
LCGSPI
A. Di Meglio
F. Prelz
J. Hahkala
P. Kunszt
M.Barroso
S. Fisher
6
Why another Middleware?
  • EU DataGrid, Globus, AliEn, NordUGrid and others
    have successfully provided a first working stack
    of Grid Middleware.
  • EDG stack in use by LCG / EGEE SA1
  • Improvements provided by SA1 in the course of
    last year
  • However, some issues cannot be fixed by simple
    incremental improvements
  • Scalability issues 100s of sites, billions of
    files
  • Functionality issues interactive jobs,
    checkpointing, filesystem-like view of data, DAG
    jobs, managed transfer and replication
  • EGEE JRA1 Middleware REEngineering
  • Build on existing experience and middleware
    initial decision to take AliEn, not EDG as
    starting point (LCG-ARDA) this was changed
    later
  • Hardening and Quality
  • Baseline Services focus expect
    applications/others to build some of the
    high-level services

7
EGEE Middleware gLite
  • Aim Improve on LCG
  • Name permanent name in a world of projects
  • EDG called its services edg which could not
    taken up by EGEE for obvious reasons
  • gLite is just a cool name, the Middleware
    aspires to be LIGHTWEIGHT in USAGE but it is not
    a slim lightweight middleware (yet).
  • Reengineering Exploit experience and existing
    components from AliEn, VDT (CondorG, Globus),
    EDG/LCG, NordUGrid and others
  • Develop a stack of generic middleware useful to
    EGEE applications (HEP and Biomedical)
  • Should eventually deploy dynamically (e.g. as a
    Globus job)
  • Pluggable components cater for different
    implementations
  • Ease of use
  • Standards Build on available standards

8
Guiding Principles
Service Oriented Architecture
Interoperability
Portability
Building on existingcomponents in alightweight
manner
Web Services
Modularity
AliEn
LCG
Condor
Scalability
Globus
SRM
...
9
Web Services
  • Principles
  • (Almost) every service has a public Web Service
    interface described by a WSDL file
  • WS-I compliance (getting there)
  • Auto-generated clients in many languages
  • Uniform programming model
  • Every service adheres to the same security model
  • Not WS-Security yet because supporting tooling is
    not mature enough
  • Transport-level security (HTTPS, PKI, GSI)
  • Additional attributes in certificate (VOMS)
  • Modularity
  • Easy to build new services that make use of
    existing ones
  • Easy to replace services by custom ones if the
    WSDL is identical
  • Federation of Services as opposed to a Monolithic
    stack

10
gLite Middleware Services
Access
AvailablegLite Implementation
API
CLI
Information Monitoring
Services
Security Services
Authorization
Information Monitoring
Application Monitoring
Auditing
Authentication
ServiceDiscovery
Data Management
Workload Mgmt Services
JobProvenance
PackageManager
MetadataCatalog
File ReplicaCatalog
Accounting
ComputingElement
WorkloadManagement
StorageElement
DataMovement
Connectivity
11
Security Concepts
  • Current issues with LCG
  • No service-level security (any service may be
    used by anyone)
  • Insecure CE
  • Insecure Storage (Castor)
  • No fine-grained authorization for files
  • No possibility for VOs to assign capabilities to
    its members (e.g. Groups)
  • No possibility to apply and enforce VO policies
    (preferred users)
  • No possibility to distinguish between VOs and
    apply inter-VO policies (which VO is preferred)?
  • gLite additions and improvements
  • VO Management Service VOMS for proxy management
  • Use voms-proxy-init instead of grid-proxy-init
  • File Authorization Service for fine grained file
    security semantics
  • Services may act on VOMS roles and groups
  • Users need to authenticate with all services
  • Users may delegate cert to service
  • More consistent usage of LCAS/LCMAPS

12
WMS Concepts
  • Scalability problems with LCG
  • Necessitates updated information from all sites
    at all times
  • Push model implies that all information is
    up-to-date based on which the Resource Broker
    schedules a Job at a given Site
  • Delays may cause the information to be stale and
    jobs may be sent to unsuitable sites
  • gLite Adapt concepts from AliEn
  • Introduce pull model where CE notifies the
    Resource Broker of its ability to run jobs.
  • Provide a Task Queue that may be reordered based
    on policies
  • Have a local proxy for data (sandboxes)
  • gLite Introduce new functionality
  • Interactive jobs
  • DAG support
  • Accounting
  • Monitoring
  • Security on CE level
  • Slottable Allow VO-managed CEs to be running on
    the Headnode

13
DM Concepts
  • Known problems with LCG (2004)
  • RLS Scalability, single central instance (RLIs
    were never deployed)
  • RLS/RMC duality with poor performance, no bulk
    operations
  • No filesystem hierarchy (flat)
  • Hardlink issue (LFN GUID is a N1 relation)
  • No replica management service
  • No secure data access and no coherent security
  • gLite Learn from Alien
  • Hierarchical File Catalog Metadata
  • File Transfer service with queuing
  • Shell-like view of the system
  • gLite Added functionality
  • File catalog with hierarchy, ACL control and
    Metadata
  • POSIX-like I/O for direct file access
  • Bulk operations everywhere for performance
  • Catalogs built to be distributed (no single
    central instance necessary)
  • However, we were told not to introduce
    distributed catalogs in gLite 1
  • Security concept completely new everything is
    secured homogeneously
  • File Transfer and File Placement service,
    high-level Data Scheduler also foreseen
  • Meanwhile, LCG/SA1 has
  • Evolved the DM stack themselves.
  • Introduction of
  • GFAL
  • LFC
  • lcg-utils

14
Information System Concepts
  • Issues with LCG (2004)
  • Information system scalability and stability
  • Glue Schema deficiencies
  • gLite RGMA
  • Evolve from EDG
  • Monitoring and Information provisioning
  • Distributed messaging infrastructure
  • gLite Service Discovery
  • Instead of schema, standardize on API
  • Back-end can be anything (ldap/BDII, RGMA, plain
    file..)
  • Meanwhile LCG/SA1 improvements
  • BDII scalability and stability
  • Glue Schema extensions

15
Processes
  • Experience from EDG and other projects was
  • Software development processes were insufficient
  • Release cycle was too long
  • Bugfixing took too long
  • Software quality assurance weak
  • Installation and Configuration very difficult and
    cumbersome
  • gLite introduce the necessary processes
  • Developers guidelines for homogeneous code
    development
  • Automatic, homogeneous build system
  • Integration and deployment modules for easy
    configuration
  • Dependency management
  • Still a lot of room for improvement!
  • Meanwhile, LCG introduced
  • YAIM
  • Savannah

16
Chapter Security
  • SECURITY
  • Difference to LCG
  • VOMS
  • Service Security Model (overview)

17
Security Differences to LCG
  • VO Management Service
  • Each person has to be member of a VO
  • Services are categorized into Site and VO
    services
  • VO services will only accept certificates that
    are signed by the VOMS
  • Computing Element Authz
  • Secure scheduling and user mapping through LCMAPS
  • MyProxy renewal for long-running jobs
  • Renewal of VOMS attributes
  • Fine Grained File Authorization
  • The File Catalog contains Access Control bits for
    each file
  • This is enforced through the Grid Services (see
    Data Management talk later for the details)
  • Secure Info System

18
VOMS
  • VO Membership / Management Service
  • Every VO needs to have one
  • Extends the proxy certificate with attributes
    (X509 allows for optional attributes in the
    certificates)
  • Specifying the VO
  • The roles/groups the user requests and is
    actually member of
  • Administrator
  • Reader
  • Writer
  • Etc
  • Signs the VOMS attributes
  • The signature itself has a lifetime of 12hours
  • MyProxy does not extend this, and does not even
    return VOMS attributes so after a MyProxy
    retrieval, VOMS needs to be contacted again (done
    automatically by WMS)
  • Attribute format is described in
    http//cern.ch/edg-wp2/security/voms/edg-voms-cre
    dential.pdf

19
VOMS Admin
  • Web interface to administer VOMS
  • Every user needs to sign up through this website
  • The certificate needs to be loaded into the
    browser
  • Group memberships can be managed through this
    interface by the administrators

20
Delegation
  • Services need to perform tasks on behalf of the
    user
  • Contact other services AS the user
  • Need to delegate rights/proxy of the user to the
    given service
  • WMS
  • Transfer services
  • glite-IO
  • Currently available delegation mechanisms are
    insufficient
  • Globus CoG httpg
  • JRA3 Web Service method to do delegation step
  • Also allow for restricted delegation allow only
    certain operations to be performed on the users
    behalf

21
Security Model
  • Some Services are managed by the Site
  • Resource services
  • Local batch system
  • Storage Element
  • Transfer Channel
  • Mapping of users into local accounts
  • Usage of LCAS/LCMAPS
  • VOMS-generated maps for each VO
  • Site allow/deny lists managed through LCMAPS
  • Apply Site policies. VO granularity (fair
    share/configurable share among VOs)
  • Some Services are managed by the VO
  • VO services
  • CE
  • File Placement
  • Catalogs
  • Apply VO policies based on VOMS groups and roles
  • More on security in Data Management lecture

22
Chapter WMS
  • WORKLOAD MANAGEMENT
  • Difference to LCG
  • Some internals
  • The new and old commands
  • DAGs
  • Interactive Jobs

23
LCG - Architecture Overview
Resource Broker Node (Workload Manager, WM)
Job status
Storage Element
24
gLite Architecture Overview
Resource Broker Node (Workload Manager, WM)
Job status
Storage Element
25
Architecture
26
Architecture (2)
Job management requests (submission,
cancellation) expressed via a Job
Description Language (JDL)
27
Architecture (3)
Keeps submission Requests (tasks) Requests are
kept for a while if no matching resources
available
28
Architecture (4)
Repository of resource information available to
Matchmaker Updated via notifications and/or
active polling on resources
29
Architecture (5)
Finds an appropriate Computing Element for each
submission request.
30
Architecture (6)
Performs the actual job submission and
monitoring
31
Possible Job States
Job States
32
Job Submission Command Line Interface
  • glite-job-submit r ltres_idgt -c ltconfig
    filegt --vo ltVOgt -o ltoutput filegt
    ltjob.jdlgt
  • -r the job is submitted directly to the computing
    element identified by ltres_idgt
  • -c the configuration file ltconfig filegt is
    pointed by the UI instead of the standard
    configuration file
  • --vo the Virtual Organisation (if user is not
    happy with the one specified in the UI
    configuration file)
  • -o the generated edg_jobId is written in the
    ltoutput filegt
  • Useful for other commands, e.g.
  • glite-job-status i ltinput filegt (or jobId)

33
Job Resubmission
  • If something goes wrong, the WMS tries to
    reschedule and resubmit the job (possibly on a
    different resource satisfying all the
    requirements)
  • Maximum number of resubmissions min(RetryCount,
    MaxRetryCount)
  • RetryCount JDL attribute
  • MaxRetryCount attribute in the RB
    configuration file
  • One can disable job resubmission for a particular
    job RetryCount0 in the JDL file

34
Other (most relevant) UI commands
  • glite-job-list-match
  • Lists resources matching a job description
  • Performs the matchmaking without submitting the
    job
  • glite-job-cancel
  • Cancels a given job
  • glite-job-status
  • Displays the status of the job
  • glite-job-output
  • Returns the job-output (the OutputSandbox files)
    to the user
  • glite-job-logging-info
  • Displays logging information about submitted jobs
    (all the events pushed by the various
    components of the WMS)
  • Very useful for debugging purpose

35
WMS Matchmaking
  • The RB (Matchmaker) has to find the best
    suitable Computing Element (CE) where the job
    will be executed
  • It interacts with data management services and
    Information Services
  • They provide all the information required for the
    resolution of the matches
  • The CE chosen by RB has to match the job
    requirements (e.g. runtime environment, data
    access requirements, and so on)
  • If FuzzyRankFalse (default)
  • If 2 or more CEs satisfy all the requirements,
    the one with the best Rank is chosen
  • If there are two or more CEs with the same best
    rank, the choice is done in a random way among
    them
  • If FuzzyRankTrue in the JDL
  • Fuzziness in CE choice the CE with highest rank
    has the highest probability to be chosen

36
WMS Matchmaking Scenarios
  • Possible scenarios for matchmaking
  • Direct job submission
  • glite-job-submit r ltCEIdgt
  • Corresponds to job submission with Globus clients
    (globus-job-submit)
  • Job submission with computational requirements
    only
  • No InputData nor OutputSE specified in the JDL
  • Job submission with data access requirements
  • InputData and/or OutputSE specified in the JDL
  • Details will be given in the Data Management
    lecture

37
Example of Job Submission (1)
  • User logs in UI (User Interface) machine
  • User issues a voms-proxy-init , enters her
    certificates password and gets a valid Grid
    proxy
  • User sets up her JDL file
  • Example of Hello World JDL file
  • Executable /bin/echo
  • Arguments Hello World
  • StdOutput Message.txt
  • StdError stderr.log
  • OutputSandbox
    Message.txt,stderr.log

38
Example of Job Submission (2)
  • User issues a glite-job-submit HelloWorld.jdl
  • and gets back a unique Job Identifier (JobId)
  • User issues a glite-job-status JobId
  • to get logging information about the current
    status of her Job
  • When the Output status is reached, the user
    can issue a glite-job-output JobId
  • and the system returns the name of the temporary
    directory where the job output can be found on
    the UI machine.

39
Example of Job Submission (3)
  • glite-job-submit HelloWorld.jdl
  • Selected Virtual Organisation name (from --vo
    option) cms
  • Connecting to host egee-rb-01.mi.infn.it, port
    7772
  • Logging to host egee-rb-01.mi.infn.it, port 9002

  • JOB SUBMIT OUTCOME
  • The job has been successfully submitted to the
    Network Server.
  • Use glite-job-status command to check job
    current status. Your job identifier is
  • - https//egee-rb-01.mi.infn.it9000/LYVQ8JEjZVVq
    BgbbQw6KMw


40
Example of Job Submission (4)
  • glite-job-status https//egee-rb-01.mi.infn.it
    9000/LYVQ8JEjZVVqBgbbQw6KMw

  • BOOKKEEPING INFORMATION
  • Status info for the Job https//egee-rb-01.mi.in
    fn.it9000/LYVQ8JEjZVVqBgbbQw6KMw
  • Current Status Done (Success)
  • Exit code 0
  • Status Reason Job terminated successfully
  • Destination ce3.egee.unile.it2119/jobmana
    ger-lcgpbs-cms
  • Submitted Mon Jul 4 111855 2005 CEST


41
Example of Job Submission (5)
  • glite-job-output --dir /tmp/mydir
    https//egee-rb-01.mi.infn.it9000/LYVQ8JEjZVVqBgb
    bQw6KMw

  • JOB OUTPUT
    OUTCOME
  • Output sandbox files for the job
    https//egee-rb-01.mi.infn.it9000/LYVQ8JEjZVVqBgb
    bQw6KMw
  • have been successfully retrieved and stored in
    the directory
  • /tmp/mydir/KoBA-IgxZyVpLKhANfrhHw

  • more /tmp/mydir/KoBA-IgxZyVpLKhANfrhHw/Message.t
    xt
  • Hello World
  • more /tmp/mydir/KoBA-IgxZyVpLKhANfrhHw/stderr.lo
    g

42
Job Dependencies
Condors DAGman allows for job dependencies
DAG Direct Acyclic Graph
43
DAG JDL Structure
  • JobType DAG
  • VirtualOrganisation yourVO
  • Max_Nodes_Running int gt0
  • MyProxyServer
  • Requirements
  • Rank
  • InputSandbox
  • OutSandbox
  • Nodes nodeX more later
  • Dependencies more later

44
Attribute Nodes
  • The Nodes attribute is the core of the DAG
    description

45
Attribute Dependencies
  • It is a list of lists representing the
    dependencies between the nodes of the DAG.

MANDATORY YES!
46
Interactive Jobs
  • Specified setting JobType Interactive in JDL
  • When an interactive job is executed, a window for
    the stdin, stdout, stderr streams is opened
  • Possibility to send the stdin to
  • the job
  • Possibility the have the stderr
  • and stdout of the job when it
  • is running
  • Possibility to start a window for
  • the standard streams for a
  • previously submitted interactive
  • job with command glite-job-attach

47
Further information
  • Workload Management
  • http//egee-jra1-wm.mi.infn.it/egee-jra1-wm/
  • In particular WMS User Admin Guide and JDL docs
  • Condor ClassAd
  • http//www.cs.wisc.edu/condor/classad
  • Condor DAGman
  • http//www.cs.wisc.edu/condor/dagman/

48
Abbreviations
  • JDL Job Description Language
  • RB Resource Broker
  • WMS Workload Management System
  • CE Computing Element
  • SE Storage Element
  • UI User Interface
  • Acknowledgements, i.e. slides were used authored
    by
  • Erwin Laure (CERN) and Heinz Stockinger (U of
    Vienna)
  • Salvo Monforte, Marco Pappalardo, Valeria
    Ardizzone (INFN Catania)

49
More Information
  • The EGEE Project
  • http//www.eu-egee.org
  • The LCG Project
  • http//cern.ch/lcg
  • The gLite middleware
  • http//www.glite.org
  • The Condor Project
  • http//www.cs.wisc.edu/condor
  • The Globus Project
  • http//www.globus.org
Write a Comment
User Comments (0)
About PowerShow.com