Title: LSF for Users
1LSF for Users
- Mike Page
- mpage_at_ucar.edu
- SCD Consulting Services Group
- SCD/HSS/CSG
2What is LSF?LSF - Load Sharing FacilityBatch
Management Subsystemfor multi-host, multi-vendor
complexesSame role as LoadLeveler or NQE with
capability to manage computing resources across
multiple platforms LSF runs on the Lightning
cluster------------------------------------------
------------------------------------Documentation
/usr/local/docs/LSF/6.0/.pdfHardware
description http//www.scd.ucar.edu/docs/lightnin
g/overview.html At a lightning command line
enter man lsfintro Further reading
http//accl.grc.nasa.gov/lsf/about.html
3To be able to access LSFThis has been added to
your login processing. /usr/local/lsf/conf/prof
ile.lsf (sh users) or source /usr/local/lsf/conf
/cshrc.lsf (csh users)These commands are
executed before you receive a command
prompt.There is no need for you to add anything
to your login files in order to use LSF.These
commands define the LSF environmentLSF_SERVERDIR
, LSF_BINDIR, LSF_LIBDIR, XLSF_UIDDIR,
LSF_ENVDIR, PATH, MANPATH------------------------
-------------------------------------------Check
env grep -i lsf
4Essential Commandsfor Users
- bhosts
- bqueues
- bsub
- bjobs
- bhist
- bpeek
- bmod
- bbot/btop
- bswitch
- bstop/bresume
- bkill
5Essential CommandsPurpose
- bhosts - information about available hosts
(lshosts) - bqueues - information about available queues
- bsub - submit jobs to batch subsystem
- bjobs - list jobs in the batch subsystem
- bhist - displays historical information about
users jobs - bpeek - displays stdout and stderr of users
unfinished job - bmod - modifies job submission options for users
job
6Essential CommandsPurpose (contd)
- bbot/btop - moves a pending job relative to
users last/first job in a queue - bswitch - switches users unfinished jobs from
one queue to another - bstop/bresume - suspends/resumes users
unfinished jobs - bkill - kill, suspend or resume users jobs
7Essential Commands bhosts
- bhosts -w-l-R res_reqhost_namehost_group
- Displays information about hosts/platforms
- lshosts -w -l -R "res_req" host_name
cluster_name - lshosts -s shared_resource_name ...
- Displays hosts and their static resource
information - ln0126en bhosts
- HOST_NAME STATUS JL/U
MAX NJOBS RUN SSUSP USUSP RSV - ln0126en ok - 2
0 0 0 0 0 - ln0127en ok - 2
0 0 0 0 0 - ln0128en ok - 2
0 0 0 0 0 - .
- .
- .
- ln0440en ok - 2
0 0 0 0 0 - ln0441en ok - 2
0 0 0 0 0 - ln0442en ok - 2
0 0 0 0 0
8Essential Commands bqueues
- bqueues -w-l-r-m host_name-m all
- -u user_name-u allqueue_name
- Displays information about queues.
- By default, returns the following information
about all queues queue name, queue priority,
queue status, job slot statistics, and job state
statistics. - ln0126en bqueues
- QUEUE_NAME PRIO STATUS MAX JL/U
JL/P JL/H NJOBS PEND RUN SUSP - special 500 OpenActive -
- - - 0 0 0 0 - premium 300 OpenActive -
- - - 0 0 0 0 - regular 200 OpenActive -
- - - 0 0 0 0 - economy 160 OpenActive -
- - - 0 0 0 0 - hold 104 OpenActive -
- - - 0 0 0 0 - standby 100 OpenActive -
- - - 0 0 0 0 - share 100 OpenActive -
- - - 0 0 0 0
9Essential Commands bsub
- bsub options command cmd_args
- Submits a job for batch execution
10Essential Commands bsub (contd)
- bsub options command cmd_args
11Essential Commands bsub (contd)
- bsub options command cmd_args
12The Importance of Being lt
LSF usage is different from LL/NQS
bsub a.out bsub -n 2 a.out bsub myscript bsub -q
queuename a.out bsub -i infile -o outfile - e
errfile a.out bsub lt myscript
13Sample LSF scriptSerial Job
!/bin/ksh LSF batch script to run a serial
code BSUB -P 93300070
Project 93300070 BSUB -n 1
number of tasks BSUB -J
seriallsf.test job
name BSUB -o seriallsf.out
output filename BSUB -e seriallsf.err
input filename BSUB -q regular
queue Fortran
example pgf90 -o samp_f -Mextend
samp.f ./samp_f C example pgcc -o samp_c
samp.c ./samp_c C example pgCC
--no_auto_instantiation -o samp_cc
samp.cc ./samp_cc
bsub lt serial.lsf
14Sample LSF scriptMPI Job
!/bin/ksh LSF batch script to run the test
MPI code BSUB -P 93300070
Project 93300070 BSUB -a mpich_gm
select the mpich-gm elim BSUB -x
exlusive use
of node (not_shared) BSUB -n 2
number of total tasks BSUB
-R "spanptile1" run 1 tasks per
node BSUB -J mpilsf.test
job name BSUB -o mpilsf.out
output filename BSUB -e mpilsf.err
error filename BSUB -q regular
queue Fortran
example mpif90 -o mpi_samp_f mpisamp.f mpirun.lsf
./mpi_samp_f C example mpicc -o mpi_samp_c
mpisamp.c mpirun.lsf ./mpi_samp_c C
example mpicxx -o mpi_samp_cc mpisamp.cc mpirun.ls
f ./mpi_samp_cc
bsub lt mpi.lsf
15Sample LSF script OpenMP Job
!/bin/ksh LSF script to run the test OMP
codes BSUB -P 93300070
Proposal group 2 - Project 93300070 BSUB -a
mpich_gm select the mpich-gm elim
BSUB -x
exclusive use of node BSUB -n 2
number of tasks BSUB -R
"spanhosts1" jobs run on one host BSUB
-J omplsf.test job name BSUB -o
omplsf.out ouput filename BSUB -e
omplsf.err input filename BSUB -q
regular queue Fortran
example pgf90 -o samp_f -Mextend -mp
samp.f export OMP_NUM_THREADS1 ./samp_f export
OMP_NUM_THREADS2 ./samp_f
C example pgcc -mp -o samp_c samp.c export
OMP_NUM_THREADS1 ./samp_c export
OMP_NUM_THREADS2 ./samp_c C example pgCC
--no_auto_instantiation -mp -o sampcc
samp.cc export OMP_NUM_THREADS1 ./samp_cc export
OMP_NUM_THREADS2 ./samp_cc
bsub lt omp.lsf
16Sample LSF scriptMPMD Job
!/bin/ksh LSF batch script to run the test
MPMD codes BSUB -P 93300070
Project 93300070 BSUB -a mpich_gm BSUB -n
2 BSUB -x BSUB -R "spanptile1" BSUB -o
mpmdlsf.out output
filename BSUB -e mpmdlsf.err
error filename BSUB -J mpmdlsf.test
job name BSUB -q regular
queue Build pgfile for mpmd
run rm -f pgfile touch pgfile EXE../bin/itmpmd
j0 for h in echo LSB_HOSTS do echo h"
"j" "EXEj gtgt pgfile jexpr j
1 done cat pgfile
Fortran example mpif90 -Mextend -o EXE'0'
../src/mpmd/itmpmd.f mpif90 -Mextend -o EXE'1'
../src/mpmd/itmpmd.f mpirun -pg pgfile
/bin/pwd C example mpicc -o EXE'0'
../src/mpmd/itmpmd.c mpicc -o EXE'1'
../src/mpmd/itmpmd.c mpirun -pg pgfile
/bin/pwd C example mpicxx --no_auto_instantia
tion -o EXE'0' ../src/mpmd/itmpmd.cc mpicxx
--no_auto_instantiation -o EXE'1'
../src/mpmd/itmpmd.cc mpirun -pg pgfile
/bin/pwd rm EXE'0' EXE'1' pgfile
bsub lt mpmd.lsf
17Sample LSF script Hybrid Job
!/bin/ksh LSF batch script to run the test
mixed MPI/OMP codes BSUB -a mpich_gm
select mpich_gm elim BSUB -x
exclusive use of
node BSUB -n 2
sum of number of tasks BSUB -R "spanptile1"
number of processes per node BSUB
-o mixlsf.out output
filename BSUB -e mixlsf.err
error filename BSUB -J mixlsf.test
job name BSUB -q regular
queue Build pgfile for mix run rm -f
pgfile touch pgfile EXEPWD/mix echo
LSB_HOSTS j0 for h in echo LSB_HOSTS do
echo h" "j" "EXE gtgt pgfile jexpr j
1 done
Fortran example mpif90 -Mextend -mp -lmp -o mix
mix.f export OMP_NUM_THREADS1 mpirun-env.pl -pg
pgfile EXE export OMP_NUM_THREADS2 mpirun-env.pl
-pg pgfile EXE C example mpicc -mp -o mix
mix.c export OMP_NUM_THREADS1 mpirun-env.pl -pg
pgfile EXE export OMP_NUM_THREADS2 mpirun-env.pl
-pg pgfile EXE C example mpicxx
--no_auto_instantiation -mp -o mix mix.cc export
OMP_NUM_THREADS1 mpirun-env.pl -pg pgfile
EXE export OMP_NUM_THREADS2 mpirun-env.pl -pg
pgfile EXE rm pgfile
bsub lt mix.lsf
18Essential Commands bjobs
- bjobs - Displays information about LSF jobs
- bjobs -u user_name
- bjobs -u all
- bjobs -l
- bjobs -r
- bjobs -s
- bjobs -q queue_name
19Essential Commands bhist
- bhist - displays historical information about
jobs - bhist -J job_name
- bhist -C start_time, end_time
- bhist -D start_time, end_time
- bhist -S start_time, end_time
- bhist -T start_time, end_time
20Essential Commands bpeek
- bpeek - displays stdout and stderr of users
selected, unfinished job - bpeek -f uses tail -f to display output instead
of cat - bpeek -q queue_name -m host_name -J job_name
- job_ID "job_IDindex_list"
21Essential Commands bmod
bmod - modifies job submission options of a
job bmod bsub options job_ID
"job_IDindex" bmod -g job_group_name -gn
job_ID bmod -sla service_class_name -slan
job_ID bmod -h -V
22Essential Commands bbot, btop
- bbot - moves a pending job relative to the last
job in the queue - bbot job_ID "job_IDindex_list" position
- bbot -h -V
- btop - moves a pending job relative to the first
job in the queue - btop job_ID "job_IDindex_list" position
- btop -h -V
23Essential Commands bswitch
bswitch - switches unfinished jobs from one
queue to another bswitch -J job_name -m
host_name -m host_group -q queue_name
-u user_name -u user_group -u all
destination_queue 0 bswitch destination_queue
job_ID "job_IDindex_list" ... bswitch -h
-V
24Essential Commands bstop/bresume
- bstop -suspends unfinished jobs
- bstop -a -d -g job_group_name -sla
service_class_name - -J job_name -m host_name -m host_group
- -q queue_name -u user_name -u user_group
-u all 0 - job_ID "job_IDindex" ...
- bstop -h -V
- bresume -resumes one or more suspended jobs
- bresume -g job_group_name -J job_name -m
host_name - -q queue_name -u user_name -u user_group
-u all 0 - bresume job_ID "job_IDindex_list" ...
- bresume -h -V
25Essential Commands bkill
bkill - sends signals to kill, suspend, or
resume unfinished jobs bkill -l -g
job_group_name -sla service_class_name -J
job_name -m host_name -m host_group -q
queue_name -r -s (signal_value
signal_name) -u user_name -u user_group
-u all job_ID ... 0 "job_IDindex"
... bkill -h -V
26Questions?Comments?