Job Submission - PowerPoint PPT Presentation

About This Presentation
Title:

Job Submission

Description:

Job Submission Fokke Dijkstra RuG/SARA Grid tutorial Groningen September 2006 Contents The LCG Workload Management System (WMS) in gLite Job Submission to EGEE / NL ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 35
Provided by: FokkeDi1
Learn more at: http://www.dutchgrid.org
Category:

less

Transcript and Presenter's Notes

Title: Job Submission


1
Job Submission
  • Fokke Dijkstra RuG/SARA
  • Grid tutorial Groningen September 2006

2
Contents
  • The LCG Workload Management System (WMS) in gLite
  • Job Submission to EGEE / NL-Grid
  • Job Preparation
  • A simple example Job Lifecycle
  • Job Description Language (JDL)
  • Job Submission Monitoring
  • Some more advanced topics

3
WMS
?
4
The LCG WMS
  • The user submits jobs via the Workload Management
    System
  • The Goal of WMS is the distributed scheduling and
    resource management in a Grid environment.
  • What does it allow Grid users to do?
  • To submit their jobs
  • To execute them
  • To get information about their status
  • To retrieve their output
  • The WMS tries to
  • Optimize the usage of resources
  • Execute user jobs as fast as possible

5
WMS components
6
Job Preparation
  • You need to provide
  • A complete (enough) job description
  • What program?
  • What data?
  • Any requirements on OS, installed software, ??
  • Possibly a program
  • Youre submitting in unknown territory!
  • Program portably!
  • Dont rely on hard-coded paths or special
    locations
  • The program you send may not even be in HOME!
  • Perhaps some input data
  • Perhaps instructions on what to do with the output

7
How to Write a Job Description
  • Here is a minimal job description (call it
    hello.jdl)
  • We specified
  • The program to run and its arguments
  • Directed the standard error and output streams to
    files
  • Told it what to do with the output

Executable /bin/echoArguments
GoedemiddagStdError stderr.logStdOutput
stdout.logOutputSandbox stderr.log,
stdout.log
8
Job Submission Example
  • User issues a voms-proxy-init
  • enters his certificates password
  • Receives a valid Globus proxy
  • User issues a edg-job-submit mytest.jdl
  • and gets back from the system a unique Job
    Identifier (JobId)
  • User issues a edg-job-status JobId
  • to get logging information about the current
    status of his Job
  • When the OutputReady status is reached, the
    user can issue a edg-job-get-output JobId
  • and the system returns the name of the temporary
    directory where the job output can be found on
    the UI machine.

9
Submitting it
  • voms-proxy-init --voms tutor
  • Your identity /Oedgtutorial/Ousers/Orug/OUrc/
    CNFokke Dijkstra
  • Enter GRID pass phrase
  • Creating temporary proxy .........................
    ........................................ Done
  • Contacting mu4.matrix.sara.nl30007
    /Odutchgrid/Ohosts/OUsara.nl/CNmu4.matrix.sar
    a.nl "tutor" Done
  • Creating proxy ...................................
    ........... Done
  • Your proxy is valid until Mon Sep 11 232212
    2006
  • edg-job-submit hello.jdl
  • Selected Virtual Organisation name (from UI conf
    file) tutor
  • Connecting to host mu3.matrix.sara.nl, port 7772
  • Logging to host mu3.matrix.sara.nl, port 9002

  • JOB SUBMIT OUTCOME
  • The job has been successfully submitted to the
    Network Server.
  • Use edg-job-status command to check job current
    status. Your job identifier (edg_jobId) is
  • - https//mu3.matrix.sara.nl9000/Nz6PWWJCjtT7YY3
    PJWDu5Q

JobId
10
A Job Submission Example
Job Status
submitted
LCG File Catalog (LFC)
Information System (IS)
User Interface (UI)
Resource Broker (RB)
Storage Element (SE)
Logging Bookkeeping (LB)
Job Submission Service (JSS)
Computing Element (CE)
11
Checking the status
  • edg-job-status https//mu3.matrix.sara.nl9000/N
    z6PWWJCjtT7YY3PJWDu5Q

  • BOOKKEEPING INFORMATION
  • Status info for the Job https//mu3.matrix.sara.
    nl9000/Nz6PWWJCjtT7YY3PJWDu5Q
  • Current Status Done (Success)
  • Exit code 0
  • Status Reason Job terminated successfully
  • Destination mu6.matrix.sara.nl2119/jobman
    ager-pbs-long
  • reached on Tue Jun 1 081425 2004


12
A Job Submission Example
Job Status
submitted
LCG File Catalog (LFC)
Information System (IS)
User Interface (UI)
Resource Broker (RB)
Storage Element (SE)
Logging Bookkeeping (LB)
Job Submission Service (JSS)
Computing Element (CE)
13
Getting the Output
  • edg-job-get-output https//mu3.matrix.sara.nl90
    00/Nz6PWWJCjtT7YY3PJWDu5Q
  • Retrieving files from host mu3.matrix.sara.nl (
    for https//mu3.matrix.sara.nl9000/Nz6PWWJCjtT7YY
    3PJWDu5Q )

  • JOB GET OUTPUT OUTCOME
  • Output sandbox files for the job
  • - https//mu3.matrix.sara.nl9000/Nz6PWWJCjtT7YY3
    PJWDu5Q
  • have been successfully retrieved and stored in
    the directory
  • /tmp/jobOutput/fokke_Nz6PWWJCjtT7YY3PJWDu5Q

  • cat /tmp/jobOutput/fokke_Nz6PWWJCjtT7YY3PJWDu5Q/
    std.out
  • Goedemiddag

14
A Job Submission Example
Job Status
submitted
LCG File Catalog (LFC)
Information System (IS)
waiting
ready
scheduled
Resource Broker (RB)
running
Storage Element (SE)
done
Logging Bookkeeping (LB)
Job Submission Service (JSS)
outputready
Computing Element (CE)
15
Job Description Language (JDL)
  • Based upon Condors CLASSified ADvertisement
    language (ClassAd)
  • ClassAd is an extensible language
  • Sequence of attributes (key,value pairs)
    separated by semi-colons.

Executable /bin/echoArguments
GoedemiddagStdError stderr.logStdOutput
stdout.logOutputSandbox stderr.log,
stdout.log
16
Types of Attributes
  • The supported attributes are grouped in two
    categories
  • Job
  • Define the job itself
  • Resources
  • Taken into account by the RB for carrying out the
    matchmaking algorithm
  • Computing Resource (Attributes)
  • Used to build expressions of Requirements and/or
    Rank attributes by the user
  • Have to be prefixed with other.
  • Data and Storage resources (Attributes)
  • Input data to process, SE where to store output
    data, protocols spoken by application when
    accessing SEs

17
Job Definition Attributes
  • Executable (mandatory)
  • The command name
  • Arguments (optional)
  • Job command line arguments
  • StdInput, StdOutput, StdErr (optional)
  • Standard input/output/error of the job
  • Environment (optional)
  • List of environment settings
  • InputSandbox (optional)
  • List of files on the UI local disk needed by the
    job for running
  • The listed files are staged from the UI to the
    remote CE
  • OutputSandbox (optional)
  • List of files, generated by the job, which have
    to be retrieved

18
Resource Attributes
  • Requirements
  • Job requirements on computing resources
  • Specified using attributes of resources published
    in the Information System
  • If not specified, default value defined in UI
    configuration file is considered
  • Default other.GlueCEStateStatus "Production"
    (the resource has to be in the Production grid)
  • Rank
  • Expresses preference (how to rank resources that
    have already met the Requirements expression)
  • Specified using attributes of resources published
    in the Information Service
  • If not specified, default value defined in the UI
    configuration file is considered
  • Default - other.GlueCEStateFreeCPUs (the highest
    number of free CPUs)

19
Data Attributes
  • InputData (optional)
  • Refers to data used as input by the job these
    data are published in the Replica Catalog and
    stored in the SEs)
  • PFNs and/or LFNs
  • DataAccessProtocol (mandatory if InputData
    specified)
  • The protocol or the list of protocols which the
    application is able to speak with for accessing
    InputData on a given SE
  • OutputSE (optional)
  • The hostname of the output SE
  • RB uses it to choose a CE that is compatible with
    the job and is close to SE
  • OutputData (optional)
  • Output Data that will be registered at the end of
    the job

20
Example JDL File
  • Executable gridTest
  • StdError stderr.log
  • StdOutput stdout.log
  • InputSandbox /home/joda/test/gridTest
  • OutputSandbox stderr.log, stdout.log
  • InputData lfn/grid/tutor/testbed0-00019
  • DataAccessProtocol gridftp
  • Requirements other.ArchitectureINTEL \
    other.OpSysLINUX other.FreeCpus gt4
  • Rank other.GlueHostBenchmarkSF00

21
Job Submission
  • edg-job-submit r ltres_idgt n ltuser e-mail
    addressgt -c ltconfig filegt -o ltoutput filegt
    ltjob.jdlgt
  • -r the job is submitted by the RB directly to the
    computing element identified by ltres_idgt
  • -c the configuration file ltconfig filegt is used
    by the UI instead of the standard configuration
    file
  • -o the generated edg_jobId is written in the
    ltoutput filegt
  • Useful for other commands, e.g.
  • edg-job-status i ltinput filegt (or edg_jobId)
  • -i the status information about edg_jobId
    contained in the ltinput filegt are displayed
  • --vo the VO under which the job will be run

22
Other WMS UI Commands
  • edg-job-list-match
  • Lists resources matching a job description
  • Performs the matchmaking without submitting the
    job
  • edg-job-cancel
  • Cancels a given job
  • edg-job-status
  • Displays the status of the job
  • edg-job-get-output
  • Returns the job-output (the OutputSandbox files)
    to the user
  • edg-job-get-logging-info
  • Displays logging information about submitted jobs
    (all the events pushed by the various
    components of the WMS)
  • Very useful for debug purposes

23
WMS Match Making
  • The RB is the core component of WMS.
  • It has to find the best suitable computing
    resource (CE) where the job will be executed
  • It interacts with Data Management service and
    Information System
  • They supply RB with all the information required
    for the resolution of the matches
  • The CE chosen by RB has to match the job
    requirements (e.g. runtime environment, data
    access requirements, and so on)
  • If 2 or more CEs satisfy all the requirements,
    the one with the best Rank is chosen

24
Direct Job submission
  • The RB has to deal with three possible scenarios.
  • Scenario 1 Direct Job Submission
  • Job is scheduled on a given CE (specified in the
    edg-job-submit command via r option)
  • RB doesnt perform any matchmaking algorithm
  • Take care if InputData is specified!

25
Brokered Job Submission, No InputData
  • Scenario 2 Job Submission without data-access
    Requirements
  • Neither CE nor input data are specified.
  • RB starts the matchmaking algorithm, which
    consists of two phases
  • Requirements check (RB contacts the IS to check
    which CEs satisfy all the requirements)
  • If more than one CE satisfies the job
    requirements, the CE with the best rank is chosen
    by the RB

26
Brokered Job Submission, Grid Data
  • Scenario 3 CE is not specified in the JDL
  • RB contacts Data Management service to find out
    which SEs have copies of the requested input
    data sets
  • RB makes best effort match between
  • Computing resources for which user is authorized
  • SEs nearby which can provide the requested
    data sets via the requested transfer protocol
  • Any optional output SE specified in the job
    description
  • RB strategy consists of submitting jobs close to
    data!
  • The main two phases of the match making algorithm
    remain unchanged
  • Requirements check
  • Rank computation
  • The matchmaking is only performed for CEs
    satisfying the data-access requirements (i.e.
    which are close to data)

27
Proxy Renewal
  • Why?
  • To avoid job failure because it outlived the
    validity of the initial proxy
  • WMS support automatic proxy renewal mechanism as
    long as the user credentials are handled by a
    proxy server.
  • Create a proxy using
  • voms-proxy-init
  • Register this proxy with the MyProxy server using
  • myproxy-init s ltservergt -t ltcredgt -c ltproxygt
    d -n
  • server is the server address (e.g.
    px.matrix.sara.nl)
  • cred is the number of hours the proxy should be
    valid on the server
  • proxy is the number of hours renewed proxies
    should be valid
  • Short term proxies can then be used to start jobs
    using
  • grid-proxy-init hours lthoursgt command
  • The Proxy is automatic renewed by WMS without
    user intervention for all the job life

28
MPI jobs
  • MPI
  • Message passing
  • Link with parallel library
  • Run on multiple processors
  • gLite
  • Limited support
  • Some sites can run MPI jobs
  • JobType
  • JobTypeMPICH
  • NodeNumber 8
  • Adds MPICH support as requirement
  • Executable run in paralllel on 8 CPUs

29
Other JobTypes
  • Interactive
  • StdOutput, StdInput and StdError forwarded to
    user
  • default X window
  • Other tools
  • Checkpointable
  • Job must save checkpoints
  • Checkpoints can be retrieved
  • Not fully supported yet

30
Further Information
  • The gLite User Guide!
  • http//glite.web.cern.ch/glite/documentation/def
    ault.asp
  • ClassAd https//www.cs.wisc.edu/condor/classad/
  • Sara Grid pages http//www.sara.nl/userinfo/grid/

31
UI configuration file
  • Can be set if (expert) user is not happy with
    default one
  • Most relevant attributes
  • RB(s)
  • When submitting a job, the first specified RB is
    tried, if the operation fails the second one is
    considered, etc.
  • LBserver(s)
  • The LB to be used for a job is chosen by the RB
  • So when a edg-job-status ltedg-jobidgt is issued,
    the LB to contact is specified in the edg-jobid
  • This list specifies the LB(s) that must be
    contacted when issuing a edg-job-status all /
    edg-job-get-logging-info all (to have
    information for all the jobs belonging to that
    user)
  • Default JDL Requirements
  • other.GlueCEStateStatus "Production"
  • Default JDL Rank
  • other.GlueCEStateFreeCPUs
  • Default Virtual Organisation
  • Which VO the job should use to run

32
UI Command Error Messages
  • The UI commands accept some arguments in input.
    If the user makes a mistake via command line, the
    following messages can appear
  • Argument is not allowed (the argument is not
    known)
  • Argument must be specified at the end of the
    command (both the jobId and JDL file name must be
    put at the end of the command line)
  • Argument is missing for the output option
    (the user forgot to add the parameter, required
    by the argument)
  • Argument -all cannot be specified with argument
    input (some arguments are OR-exclusive)
  • CEId format is ltfull hostnamegtltport
    numbergt/jobmanager-ltservicegt. The provided CEID
    http//lx01.absolute.com10854/jobmanager has a
    wrong format. (the user has mis-spelled the CE
    identifier after resource)

33
Resource Broker errors
  • During the calling of the RB API, the following
    can happen
  • Resource Broker grid013g.cnaf.infn.it7771 not
    available (cant open a connection with the RB
    specified in the UI configuration file)
  • Unable to get LB address from RB
    grid013g.cnaf.infn.it (the function
    get_lb_contact returned an error)

34
JDL Proxy Error Messages
  • While the UI commands are checking the JDL file,
    the following errors may occur
  • Mandatory Attribute default error in the
    configuration file /opt/edg/etc/UI_ConfigENV.cfg
    (there arent any default values)
  • Mandatory Attribute missing in JDL file
    Executable (Executable is one of the mandatory
    attributes)
  • Multiple InputSandbox attribute found in JDL
    file (InputSandbox attribute is repeated twice)
  • Wrong function call for list attribute .
    Function usage is Member/IsMember(List, Value)
    (e.g. in the requirements attribute the function
    Member/IsMember is used with a wrong syntax)
  • Proxy (this refers to the security grid proxy and
    not to a proxy machine)
  • If the user specifies a duration for the proxy
    that he wants to provide, using the option h of
    edg-job-submit, a possible message is
  • Proxy certificate will expire in less then X
    hours. Creating a new X-hours-duration
    certificate (this to make sure that at least the
    required proxy validity is granted )
Write a Comment
User Comments (0)
About PowerShow.com