Software stack on the worker nodes - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Software stack on the worker nodes

Description:

Title: Application Development Tools Last modified by: Your User Name Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 38
Provided by: wrgridGro
Category:

less

Transcript and Presenter's Notes

Title: Software stack on the worker nodes


1
Software stack on the worker nodes
Ganglia
Portland, GNU,Intel
Sun Grid Engine v6
Redhat 64bit Scientific Linux
OpenMPI
AMD Opteron/Intel Westmere
2
Application packages
  • Following application packages are available on
    the worker nodes via the name of the package.
  • maple , xmaple
  • matlab
  • R
  • abaquscae
  • abaqus
  • ansys
  • fluent
  • fidap
  • dyna
  • idl
  • comsol
  • paraview

3
Keeping up-to-date with application packages
  • From time to time the application packages get
    updated. When this happens
  • news articles inform us of the changes via
    iceberg news. To read the news- type news on
    iceberg or check the URL http//www.wrgrid.group.
    shef.ac.uk/icebergdocs/news.dat
  • The previous versions or the new test versions of
    software are normally accessed via the version
    number. For example
  • abaqus69 , nagexample23 , matlab2011a

4
Running application packages in batch queues
  • icberg have locally provided commands for running
    some of the popular applications in batch mode.
    These arerunfluent , runansys , runmatlab ,
    runabaqus To find out more just type the name
    of the command on its own while on iceberg.

5
Setting up your software development environment
  • Excepting the scratch areas on worker nodes, the
    view of the filestore is identical on every
    worker.
  • You can setup your software environment for each
    job by means of the module commands.
  • All the available software environments can be
    listed by using the module avail command.
  • Having discovered what software is available on
    iceberg, you can then select the ones you wish to
    use by using-      module add       or  
      module load commands
  • You can load as many non-clashing modules as you
    need by consecutive module add commands.
  • You can use the module list command to check the
    list of currently loaded modules.

6
Software development environment
  • Compilers
  • PGI compilers
  • Intel Compilers
  • GNU compilers
  • Libraries
  • NAG Fortran libraries ( MK22 , MK23 )
  • NAG C libraries ( MK 8 )
  • Parallel programming related
  • OpenMP
  • MPI ( OpenMPI , mpich2 , mvapich )

7
Managing Your Jobs Sun Grid Engine Overview
  • SGE is the resource management system, job
    scheduling and batch control system. (Others
    available such as PBS, Torque/Maui, Platform LSF
    )
  • Starts up interactive jobs on available workers
  • Schedules all batch orientated i.e.
    non-interactive jobs
  • Attempts to create a fair-share environment
  • Optimizes resource utilization

8
Job scheduling on the cluster
SGE worker node
SGE worker node
SGE worker node
SGE worker node
SGE worker node
Queue-A
Queue-B
Queue-C

SGE MASTER node
  • Queues
  • Policies
  • Priorities
  • Share/Tickets
  • Resources
  • Users/Projects

9
Job scheduling on the cluster
SGE worker node
SGE worker node
SGE worker node
SGE worker node
SGE worker node
Queue-A
Queue-B
Queue-C
SGE MASTER node
  • Queues
  • Policies
  • Priorities
  • Share/Tickets
  • Resources
  • Users/Projects

10
Submitting your job
  • There are two SGE commands submitting jobs
  • qsh or qrsh To start an interactive job
  • qsub To submit a batch job
  • There are also a list of home produced commands
    for submitting some of the popular applications
    to the batch system. They all make use of the
    qsub command.These arerunfluent , runansys ,
    runmatlab , runabaqus

11
Managing Jobs monitoring and controlling your
jobs
  • There are a number of commands for querying and
    modifying the status of a job running or waiting
    to run. These are
  • qstat or Qstat (query job status)
  • qdel (delete a job)
  • qmon ( a GUI interface for SGE )

12
Running Jobs Example Submitting a serial batch
job
  • Use editor to create a job script in a file
    (e.g. example.sh)
  • !/bin/bash
  • Scalar benchmark
  • echo This code is running on hostname
  • date
  • ./linpack
  • Submit the job
  • qsub example.sh

13
Running Jobsqsub and qsh options
-l h_rthhmmss The wall clock time. This parameter must be specified, failure to include this parameter will result in the error message Error no suitable queues. Current default is 8 hours .
-l archintel -l archamd Force SGE to select either Intel or AMD architecture nodes. No need to use this parameter unless the code has processor dependency.
-l memmemory sets the virtual-memory limit e.g. l mem10G (for parallel jobs this is per processor and not total). Current default if not specified is 6 GB .
-l rmemmemory Sets the limit of real-memory required Current default is 2 GB.Note rmem parameter must always be less than mem.
-help Prints a list of options
-pe ompigige np -pe openmpi-ib np -pe openmp np Specifies the parallel environment to be used. np is the number of processors required for the parallel job.
14
Running Jobs qsub and qsh options ( continued)
-N jobname By default a jobs name is constructed from the job-script-file-name and the job-id that is allocated to the job by SGE. This options defines the jobname. Make sure it is unique because the job output files are constructed from the jobname.
-o output_file Output is directed to a named file. Make sure not to overwrite important files by accident.
-j y Join the standard output and standard error output streams recommended
-m bea -M email-address Sends emails about the progress of the job to the specified email address. If used, both m and M must be specified. Select any or all of the b,e and a to imply emailing when the job begins, ends or aborts.
-P project_name Runs a job using the specified projects allocation of resources.
-S shell Use the specified shell to interpret the script rather than the default bash shell. Use with care. A better option is to specify the shell in the first line of the job script. E.g. !/bin/bash
-V Export all environment variables currently in effect to the job.
15
Running Jobsbatch job example
  • qsub example
  • qsub l h_rt100000 o myoutputfile j y
    myjob
  • OR alternatively the first few lines of the
    submit script myjob contains -
  • !/bin/bash
  • -l h_rt100000
  • -o myoutputfile
  • -j y
  • and you simply type qsub myjob

16
Running JobsInteractive jobs
  • qsh , qrsh
  • These two commands, find a free worker-node and
    start an interactive session for you on that
    worker-node.
  • This ensures good response as the worker node
    will be dedicated to your job.
  • The only difference between qsh and qrsh is that
    qsh starts a session in a new command window
    where asqrsh uses the existing window.
    Therefore, if your terminal connection does not
    support graphics ( i.e. XWindows) than qrsh will
    continue to work where as qsh will fail to start.

17
Running JobsA note on interactive jobs
  • Software that requires intensive computing should
    be run on the worker nodes and not the head node.
  • You should run compute intensive interactive jobs
    on the worker nodes by using the qsh or qrsh
    command.
  • Maximum ( and also default) time limit for
    interactive jobs is 8 hours.

18
Managing JobsMonitoring your jobs by qstat or
Qstat
  • Most of the time you will be interested only in
    the progress of your own jobs thro the system.
  • Qstat command gives a list of all your jobs
    interactive batch that are known to the job
    scheduler. As you are in direct control of your
    interactive jobs, this command is useful mainly
    to find out about your batch jobs ( i.e. qsubed
    jobs).
  • qstat command note lower-case q gives a list of
    all the executing and waiting jobs by everyone.
  • Having obtained the job-id of your job by using
    Qstat, you can get further information about a
    particular job by typing qstat f -j job_id
    You may find that the information produced by
    this command is far more than you care for, in
    that case the following command can be used to
    find out about memory usage for example
  • qstat f -job_id grep usage

19
Managing Jobsqstat example output
State can be rrunning , qwwaiting in the
queue, Eerror state.
ttransferingjust before starting to run
hhold waiting for other jobs to finish. job-ID
prior name user state
submit/start at queue
slots ja-task-ID ---------------------
--------------------------------------------------
--------------------------------------------------
------------------------ 206951 0.51000
INTERACTIV bo1mrl r 07/05/2005
093020 bigmem.q_at_comp58.iceberg.shef.a 1
206933 0.51000 do_batch4 pc1mdh r
07/04/2005 162820 long.q_at_comp04.iceberg.shef.ac
. 1 206700 0.51000 l100-100.m mb1nam
r 07/04/2005 133014
long.q_at_comp05.iceberg.shef.ac. 1
206698 0.51000 l50-100.ma mb1nam r
07/04/2005 132944 long.q_at_comp12.iceberg.shef.ac.
1 206697 0.51000 l24-200.ma mb1nam
r 07/04/2005 132929
long.q_at_comp17.iceberg.shef.ac. 1
206943 0.51000 do_batch1 pc1mdh r
07/04/2005 174945 long.q_at_comp20.iceberg.shef.a
c. 1 206701 0.51000 l100-200.m mb1nam
r 07/04/2005 133044
long.q_at_comp22.iceberg.shef.ac. 1
206705 0.51000 l100-100sp mb1nam r
07/04/2005 134207 long.q_at_comp28.iceberg.shef.a
c. 1 206699 0.51000 l50-200.ma mb1nam
r 07/04/2005 132959
long.q_at_comp30.iceberg.shef.ac. 1
206632 0.56764 job_optim2 mep02wsw r
07/03/2005 225530 parallel.q_at_comp43.iceberg.she
f 18 206600 0.61000 mrbayes.sh bo1nsh
qw 07/02/2005 112219
parallel.q_at_comp51.iceberg.shef 24
206911 0.51918 fluent cpp02cg qw
07/04/2005 141906 parallel.q_at_comp52.icebe
rg.shef 4
20
Managing Jobs Deleting/cancelling jobs
  • qdel command can be used to cancel running jobs
    or remove from the queue the waiting jobs.
  • To cancel an individual Job qdel job_id
  • Example qdel 123456
  • To cancel a list of jobs
  • qdel job_id1 , job_id2 , so on
  • To cancel all jobs belonging to a given username
  • qdel u username

21
Managing Jobs Job output files
  • When a job is queued it is allocated a jobid ( an
    integer ).
  • Once the job starts to run normal output is sent
    to the output (.o) file and the error o/p is sent
    to the error (.e) file.
  • The default output file name is
    ltscriptgt.oltjobidgt
  • The default error o/p filename is
    ltscriptgt.eltjobidgt
  • If -N parameter to qsub is specified the
    respective output files become ltnamegt.oltjobidgt
    and ltnamegt.eltjobidgt
  • o or e parameters can be used to define the
    output files to use.
  • -j y parameter forces the job to send both error
    and normal output into the same (output) file
    (RECOMMENDED )

22
Monitoring the job output files
  • The following is an example of submitting a SGE
    job and checking the output produced
  • qsub myjob.sh
  • job lt123456gt submitted
  • qstat f j 123456 (is the job running
    ?)
  • When the job starts to run, type
  • tail f myjob.sh.o123456

23
Problem Session
  • Problem 10
  • Only up to test 5

24
Managing JobsReasons for job failures
  • SGE cannot find the binary file specified in the
    job script
  • You ran out of file storage. It is possible to
    exceed your filestore allocation limits during a
    job that is producing large output files. Use the
    quota command to check this.
  • Required input files are missing from the startup
    directory
  • Environment variable is not set correctly
    (LM_LICENSE_FILE etc)
  • Hardware failure (eg. mpi ch_p4 or ch_gm errors)

25
Finding out the memory requirements of a job
  • Virtual Memory Limits
  • Default virtual memory limits for each job is 6
    GBytes
  • Jobs will be killed if virtual memory used by the
    job exceeds the amount requested via the l mem
    parameter.
  • Real Memory Limits
  • Default real memory allocation is 2 GBytes
  • Real memory resource can be requested by using
    l rmem
  • Jobs exceeding the real memory allocation will
    not be deleted but will run with reduced
    efficiency and the user will be emailed about the
    memory deficiency.
  • When you get warnings of that kind, increase the
    real memory allocation for your job by using the
    l rmem parameter.
  • rmem must always be less than mem
  • Determining the virtual memory requirements for a
    job
  • qstat f j jobid grep mem
  • The reported figures will indicate- the
    currently used memory ( vmem )- Maximum memory
    needed since startup ( maxvmem)- cumulative
    memory_usageseconds ( mem )
  • When you run the job next you need to use the
    reported value of vmem to specify the memory
    requirement

26
Managing JobsRunning arrays of jobs
  • Add the t parameter to the qsub command or
    script file (with at beginning of the line)
  • Example t 1-10
  • This will create 10 tasks from one job
  • Each task will have its environment variable
    SGE_TASK_ID set to a single unique value ranging
    from 1 to 10.
  • There is no guarantee that task number m will
    start before task number n , where mltn .

27
Managing Jobs Running cpu-parallel jobs
  • Parallel environment needed for a job can be
    specified by the -pe ltenvgt nn parameter of
    qsub command, where ltenvgt is..
  • openmp These are shared memory OpenMP jobs and
    therefore must run on a single node using its
    multiple processors.
  • ompigige OpenMPI library- Gigabit Ethernet.
    These are MPI jobs running on multiple hosts
    using the ethernet fabric ( 1Gbit/sec)
  • openmpi-ib OpenMPI library-Infiniband. These
    are MPI jobs running on multiple hosts using the
    Infiniband Connection ( 32GBits/sec )
  • mvapich2-ib Mvapich-library-Infiniband. As
    above but using the MVAPICH MPI library.
  • Compilers that support MPI.
  • PGI
  • Intel
  • GNU

28
Setting up the parallel environment in the job
script.
  • Having selected the parallel environment to use
    via the qsub pe parameter, the job script can
    define a corresponding environment/compiler
    combination to be used for MPI tasks.
  • MODULE commands help setup the correct compiler
    and MPI transport environment combination needed.
  • List of currently available MPI modules are
    mpi/pgi/openmpi mpi/pgi/mvapich2
    mpi/intel/openmpi mpi/intel/mvapich2
    mpi/gcc/openmpi mpi/gcc/mvapich2
  • For GPU programming with CUDA libs/cuda
  • Example module load mpi/pgi/openmpi

29
Summary of module load parameters for parallel
MPI environments

TRANSPORTqsub parameter -------------------------COMPILER to use -pe openmpi-ib -pe ompigige -pe mvapich2-ib
PGI mpi/pgi/openmpi mpi/pgi/mvapich2
Intel mpi/intel/openmpi mpi/intel/mvapich2
GNU mpi/gcc/openmpi mpi/gcc/mvapich2
30
Running GPU parallel jobs
  • GPU parallel processing is supported on 8  Nvidia
    Tesla Fermi M2070s GPU units attached to iceberg.
  • In order to use the GPU hardware you will need
    to join the GPU project by emailing
    iceberg-admins_at_sheffield.ac.uk
  • You can then submit jobs that use the GPU
    facilities by using the following three
    parameters to the qsub command
  • -P gpu
  • -l archintel
  • -l gpunn where 1lt nn lt 8 is the
    number of gpu-modules to be used by the job.
  • P stands for project that you belong to. See next
    slide.

31
Special projects and resource management
  • Bulk of the iceberg cluster is shared equally
    amongst the users. I.e. each user has the same
    privileges for running jobs as another user.
  • However, there are extra nodes connected to the
    iceberg cluster that are owned by individual
    research groups.
  • It is possible for new research project groups to
    purchase their own compute nodes/clusters and
    make it part of the iceberg cluster.
  • We define such groups as special project groups
    in SGE parlance and give priority to their jobs
    on the machines that they have purchased.
  • This scheme allows such research groups to use
    iceberg as ordinary users with equal share to
    other users or to use their privileges ( via the
    P parameter) to run jobs on their own part of
    the cluster without having to compete with other
    users.
  • This way, everybody benefits as the machines
    currently unused by a project group can be
    utilised to run normal users short jobs.

32
Job queues on iceberg
Queue name Time Limit (Hours) System specification
short.q 8
long.q 168 Long running serial jobs
parallel.q 168 Jobs requiring multiple nodes
openmp.q 168 Shared memory jobs using openmp
gpu.q 168 Jobs using the gpu units
33
Beyond Iceberg
  • Iceberg OK for many compute problems
  • N8 tier 2 facility for more demanding compute
    problems
  • Hector/Archer Larger facility for grand challenge
    problems (pier review process to access)

34
N8 HPC Polaris Compute Nodes
  • 316 nodes (5,056 cores) with 4 GByte of RAM per
    core (each node has 8 DDR3 DIMMS each of 8
    GByte). These are known as "thin nodes".
  • 16 nodes (256 cores) with 16 GByte of RAM per
    core (each node has 16 DDR3 DIMMS each of 16
    GByte). These are known as "fat nodes".
  • Each node comprises two of the Intel 2.6 GHz
    Sandy Bridge E5-2670 processors.
  • Each processor has 8 cores, giving a total core
    count of 5,312 cores
  • Each processor has a 115 Watts TDP and the Sandy
    Bridge architecture supports "turbo" mode
  • Each node has a single 500 GB SATA HDD
  • There are 4 nodes in each Steelhead chassis. 79
    chassis have 4 GB/core and 4 more have 16 GB/core

35
N8 HPC Polaris Storage and Connectivity
  • Connectivity
  • The compute nodes are fully connected by Mellanox
    QDR InfiniBand high performance interconnects
  • File System Storage
  • 174 TBytes Lustre v2 parallel file system with 2
    OSSes. This is mounted as /nobackup and has no
    quota control. It is not backed up and files are
    automatically expired after 90 days.
  • 109 TBytes NFS filesystem where user HOME is
    mounted. This is backed up.

36
Getting Access to N8
  • Note N8 is for users whose research problems
    require greater resource than that available
    through Iceberg
  • Registration is through Projects
  • Ask your supervisor or project leader to register
    their project with the N8
  • Users obtain a project code from supervisor or
    project leader
  • Complete online form provide an outline of work
    explaining why N8 resources are required

37
Steps to Apply for an N8 User Account
  • Request an N8 Project Code from
    supervisor/project leader
  • Complete user account online application form at
  • http//n8hpc.org.uk/research/gettingstarted/
  • Note
  • If your project leader does not have a project
    code
  • Project leader should apply for a project account
    at
  • http//n8hpc.org.uk/research/gettingstarted/
  • Link to N8 information is at
  • http//www.shef.ac.uk/wrgrid/n8

38
Getting help
  • Web site
  • http//www.shef.ac.uk/wrgrid/
  • Documentation
  • http//www.shef.ac.uk/wrgrid/using
  • Training (also uses the learning management
    system)
  • http//www.shef.ac.uk/wrgrid/training
  • Uspace
  • http//www.shef.ac.uk/wrgrid/uspace
  • Contacts
  • http//www.shef.ac.uk/wrgrid/contacts.html
Write a Comment
User Comments (0)
About PowerShow.com