Generic MPI Job Submission by the P-GRADE Grid Portal Zolt - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Generic MPI Job Submission by the P-GRADE Grid Portal Zolt

Description:

Generic MPI Job Submission by the P-GRADE Grid Portal. Zolt n Farkas (zfarkas_at_sztaki.hu) ... Runs on many architectures (even on Windows) Implements Standards ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 27
Provided by: nes76
Category:

less

Transcript and Presenter's Notes

Title: Generic MPI Job Submission by the P-GRADE Grid Portal Zolt


1
Generic MPI Job Submission by the P-GRADE Grid
PortalZoltán Farkas (zfarkas_at_sztaki.hu)MTA
SZTAKI
2
Contents
  • MPI
  • Standards
  • Implementations
  • P-GRADE Grid Portal
  • Workflow execution, file handling
  • Direct job submission
  • Brokered job submission

3
MPI
  • MPI stands for Message Passing Interface
  • Standards 1.1 and 2.0
  • MPI Standard features
  • Collective communication (1.1)
  • Point-to-Point communication (1.1)
  • Group management (1.1)
  • Dynamic Processes (2.0)
  • Programming Language APIs

4
MPI Implementations
  • MPICH
  • Freely available implementation of MPI
  • Runs on many architectures (even on Windows)
  • Implements Standards 1.1 (MPICH) and 2.0 (MPICH2)
  • Supports Globus (MPICH-G2)
  • Nodes are allocated upon application execution
  • LAM/MPI
  • Open-source implementation of MPI
  • Implements Standards 1.1 and parts of 2.0
  • Many interesting features (checkpoint)
  • Nodes are allocated before application execution
  • Open MPI
  • Implements Standard 2.0
  • Uses technologies of other projects

5
MPICH execution on x86 clusters
  • Application can be started
  • using mpirun
  • specifying
  • number of requested nodes (-np ltnodenumbergt),
  • a file containing the nodes to be allocated
    (-machinefile ltarggt) OPTIONAL,
  • the executable,
  • executable arguments.
  • mpirun np 7 ./cummu N M p 32
  • Processes are spawned using rsh or ssh,
    depending on the configuration

6
MPICH x86 execution requirements
  • Executable (and input files) must be present on
    worker nodes
  • Using Shared Filesystem, or
  • User distributes the files before invoking
    mpirun.
  • Accessing worker nodes from the host running
    mpirun
  • Using rsh or ssh
  • Without user interaction (host-based
    authentication)

7
P-GRADE Grid Portal
  • Technologies
  • Apache Tomcat
  • GridSphere
  • Java Web Start
  • Condor
  • Globus
  • EGEE Middleware
  • Scripts

8
P-GRADE Grid Portal
  • Workflow execution
  • DAGMan as workflow scheduler
  • pre and post script to perform tasks around job
    exeution
  • Direct job execution using GT-2
  • GridFTP, GRAM
  • pre create temporary storage directory, copy
    input files
  • job Condor-G is executing a wrapper script
  • post download results
  • Job execution using EGEE broker (both LCG/gLite)
  • pre create application context as input sandbox
  • job Scheduler universe Condor job executing a
    script, which does job submission, status
    polling, output downloading. A wrapper script is
    submitted to the broker
  • post error checking

9
Workflow Manager Portlet
10
Workflow example
  • Jobs
  • Input/output files
  • Data transfers

11
Portal File handling
  • Local files
  • User has access to these files through the Portal
  • Local input files are uploaded from the user
    machine
  • Local output files are downloaded to the user
    machine
  • Remote files
  • Files reside on EGEE Storage Elements or are
    accessible using GridFTP
  • EGEE SE files
  • lfn/
  • guid
  • GridFTP files gsiftp//

12
Workflow Files
  • File Types
  • In/Out
  • Local/Remote
  • File Names
  • Internal
  • Global
  • File Lifetime
  • Permanent
  • Volatile

13
Portal Direct job execution
  • The resource to be used is known before job
    execution
  • The user must have a valid, accepted certificate
  • Local files are supported
  • Remote GridFTP files are supported, even in case
    of grid-unaware applications
  • Jobs may be sequential or MPI applications

14
Direct exec step-by-step I.
  • Pre script
  • creates a storage directory on the selected
    sites front-end node, using the fork
    jobmanager
  • local input files are copied to this directory
    from the Portal machine using GridFTP
  • remote input files are copied using GridFTP (in
    case of errors, a two-phase copy is tried using
    Portal machine)
  • Condor-G job
  • a wrapper script (wrapperp) is specified as the
    real executable
  • a single job is submitted to the requested
    jobmanager, for MPI jobs the hostcount RSL
    attribute is used to specify the number of
    requested nodes

15
Direct exec step-by-step II.
  • LRSM
  • allocate the number of requested nodes (if
    needed)
  • start wrapperp on one of the allocated nodes
    (master worker node)
  • Wrapperp (running on master worker node)
  • copies the executable and input files from the
    front-end node (scp or rcp)
  • in case of PBS jobmanagers, executable and input
    files are copied to the allocated nodes
    (PBS_NODEFILE). In case of non-PBS jobmanagers,
    shared filesystem is required, as the host names
    of the allocated nodes cannot be determined
  • wrapperp searches for mpirun
  • the real executable is started using the found
    mpirun
  • in case of PBS jobmanagers, output files are
    copied from the allocated worker nodes to the
    master worker node)
  • output files are copied to the front-end node

16
Direct exec step-by-step III.
  • Post script
  • local output files are copied from the temporary
    working directory created by the pre script to
    the Portal machine using GridFTP
  • remote output files are copied using GridFTP (in
    case of errors, a two-phase copy is tried using
    Portal machine)
  • DAGMan keeps on job scheduling

17
Direct execution
Remote file storage
Portal machine
1
2
5
1
5
Fork
GridFTP
Temp. Storage
PBS
Master WN
3
Wrapperp
Slave WN1
Slave WNn-1
4
In/exe
mpirun
In/exe
In/exe
Executable
4
Executable
Executable
Output
Output
Output
18
Direct Submission Summary
  • Pros
  • Users can add remote file support to legacy
    applications
  • Works for both sequential and MPI(CH)
    applications
  • For PBS jobmanagers, there is no need to have a
    shared filesystem (support for other jobmanagers
    can be added, depends on informations provided by
    jobmanagers)
  • Works in case of jobmanagers, which do not
    support MPI
  • Faster, than submitting with the broker
  • Cons
  • user needs to specify the execution resource
  • currently doesnt work on non-PBS jobmanagers
    without shared filesystems

19
Portal Brokered job submission
  • EGEE Resource Broker is used
  • The resource to be used is unknown before job
    execution
  • The user must have a valid, accepted certificate
  • Local files are supported
  • Remote files residing on Storage Elements are
    supported, even in case of grid-unaware
    applications
  • Jobs may be sequential or MPI applications

20
Broker exec step-by-step I.
  • Pre script
  • creates the Scheduler universe Condor submit file
  • Scheduler Universe Condor job
  • the job is a shell script
  • the script is responsible for
  • job submission a wrapper script (wrapperrb) is
    specified as the real executable in the JDL file
  • job status polling
  • job output downloading

21
Broker exec step-by-step II.
  • Resource Broker
  • handles requests of the Scheduler universe Condor
    job
  • sends the job to a CE
  • watches its exeution
  • reports errors
  • LRMS on CE
  • allocates the requested number of nodes
  • starts wrapperrb on the master worker node using
    mpirun

22
Broker exec step-by-step III.
  • Wrapperrb
  • the script is started by mpirun, so this script
    starts on every allocated worker node like an
    MPICH process
  • checks if remote input files are already present.
    If not, they are downloaded from the storage
    element
  • if the user specified any remote output files,
    they are removed from the storage
  • the real executable is started with the arguments
    passed to the script. These arguments already
    contain MPICH-specific ones
  • after the executable has been finished, remote
    output files are uploaded to the storage element
    (only in case of gLite)
  • Post script
  • nothing special

23
Broker execution
Portal Machine
2
Storage Element
Resource Broker
5
3
Master WN mpirun
Globus
Front-end node
5
Slave WN1
Slave WNn-1
4
PBS
wrapperrb
wrapperrb
wrapperrb
Real exe
Real exe
Real exe
5
24
Broker Submission Summary
  • Pros
  • adds support for remote file handling in case of
    legacy applications
  • extends the functionality of the EGEE broker
  • one solution supports both sequential and MPI
    applications
  • Cons
  • slow application execution
  • status polling generates high load with 500 jobs

25
(No Transcript)
26
  • Thank you for your attention
  • ?
Write a Comment
User Comments (0)
About PowerShow.com