Title: Using the BYU SP-2
1Using the BYU SP-2
2Our System
- Interactive nodes (2)
- used for login, compilation testing
- marylou10.et.byu.edu
- I/O and scheduling nodes (7)
- used for the batch scheduling system and
the parallel file system - Compute nodes (26)
- 22 4 processor
- 4 16 processor
3Compilers
- xlc C
- xlC C
- xlf Fortran
- Parallel Compilers
- mpcc
- mpCC
- mpxlf
- Optimization
- -O5 -qarchpwr3 -qtunepwr3 -qhot
- Libraries
- -lblas, -lfftw, -llapack, -lessl
4Other Stuff
- Documentation
- http//www-1.ibm.com/servers/eserver/pseries/libra
ry/sp_books/ - http//marylou.byu.edu
- Launching parallel jobs
- done through the batch scheduler
- Your job is a shell script that you hand to the
batch scheduler for execution - Can look at xloadl for help creating script
5Batch job scheduler
- Batch Schedulers
- PBS (Portable Batch System) open source
- LoadLeveler - descendent of Condor
- The process
- user submits jobs to queue
- machines register with scheduler offering to run
jobs of certain class - scheduler allocates jobs to machines and tracks
them - once started, jobs are scheduled by kernel
6Scheduling parallel jobs
- jobs can ask for
- number of nodes (1 CPU)
- number of tasks per node (multiple CPUs)
- non shared nodes (multiple CPUs)
- mixing jobs can be bad
- two intense I/O processes on a 2 CPU node can
ruin performance for both - same for two RAM intensive processes
7Scheduling parallel jobs (2)
- All allocated nodes and processors and resources
are allocated for the duration of the entire job - No dynamic adjustments, except by creating jobs
with multiple steps - each step can have different requirements
- each step can express dependency on other steps
8Scheduling parallel jobs (3)
- Management must
- allow some jobs to use the entire machine
- allow short jobs to get started quickly they
should not have to wait weeks in the queue - Some very long jobs may be needed, but are to be
avoided
9Backfill scheduling
Job C
10 nodes system
Job D
Job B
Job A
time
B
A
C
D
10Backfill scheduling
- Requires real time limit to be set
- More accurate (shorter) estimate gives more
chance to be running earlier - Short jobs can move through system quicker
- Uses system better by avoiding waste of cycles
during wait
11Using LoadLeveler
- Graphical user interface xloadl
- Make shell script with LoadLeveler keywords as
shell comments
_at_output thing.log _at_error thing.err
_at_class short _at_queue
_at_executable thingx _at_node 6,10
_at_tasks_per_node 4 _at_requirements
(Adapterhps_us)
12Sample LoadLeveler Script
!/bin/ksh _at_ job_type parallel _at_ input
/dev/null _at_ output (Executable).(Cluster).(
Process).out _at_ error (Executable).(Cluster).
(Process).err _at_ initialdir
/gstudent/student_rt_y/directory _at_ notify_user
student_rt_y_at_byu.edu _at_ class short _at_
notification complete _at_ checkpoint no _at_
restart no _at_ requirements (Arch
"power3") _at_ blocking unlimited _at_
total_tasks 4 _at_ network.MPI
switch,shared,US _at_ queue ./your_exe_and_any_arg
s
13Sample serial job
!/bin/ksh _at_ job_type serial _at_ input
/dev/null _at_ output (Executable).(Cluster).(
Process).out _at_ error (Executable).(Cluster).
(Process).err _at_ initialdir
/gstudent/student_rt_y _at_ notify_user
student_rt_y_at_byu.edu _at_ class medium _at_
notification complete _at_ checkpoint no _at_
restart no _at_ queue paupnew Hlav3ashort.paup
14LoadLeveler commands
- llq shows all jobs
- can also use showq
- llq -s JobID show why not running
- llclass shows classes
- llstatus shows machines
- llcancel JobID cancel job
- llhold JobID put job in hold state
15Sample llq output
bash-2.05a llq Id Owner
Submitted ST PRI Class Running On
------------------------ ---------- -----------
-- --- ------------ ----------- m1015i.1127.0
mdt36 8/7 1241 R 50 long
m1009i m1015i.1128.0 mdt36
8/7 1241 R 50 long m1019i
m1015i.1497.0 jl447 8/12 1625
R 50 long m1012i m1015i.1544.0
to5 8/13 0844 R 50 long
m1045i m1015i.1545.0 to5
8/13 0844 R 50 long m1045i
m1015i.1602.0 taskman 8/14
0813 R 50 short m1017i
m1015i.1598.0 taskman 8/14 0813
R 50 short m1014i m1015i.1601.0
taskman 8/14 0813 R 50 short
m1017i m1015i.1599.0 taskman
8/14 0813 R 50 short m1014i
m1015i.1600.0 taskman 8/14 0813
R 50 short m1011i m1015i.1626.0
mendez 8/14 1307 I 50 long
m1015i.1625.0 cr66
8/14 1240 I 50 medium
m1015i.1513.0 jl447 8/13 0708
I 50 long m1015i.1572.0
dvd 8/13 1045 I 50 medium
m1015i.1576.0 dvd
8/13 1122 I 50 medium
m1015i.1577.0 dvd 8/13 1125
I 50 medium m1015i.1566.0
mdt36 8/13 0851 I 50 long
m1015i.1564.0 mdt36
8/13 0850 I 50 long
m1015i.1612.0 taskman 8/14
0827 I 50 short
m1015i.1624.0 taskman 8/14 0857
I 50 short m1015i.1623.0
taskman 8/14 0857 I 50 short
58 job step(s) in queue, 23 waiting,
0 pending, 35 running, 0 held, 0 preempted
16Sample showq output
bash-2.05a showq ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC
REMAINING STARTTIME
m1015i.1581.0 taskman Running 1
183900 Wed Aug 14 080624 m1015i.1582.0
taskman Running 1 183900 Wed Aug 14
080624 m1015i.1580.0 taskman Running
1 183900 Wed Aug 14 080624
m1015i.1615.0 taskman Running 1
213342 Wed Aug 14 110106 m1015i.1613.0
taskman Running 1 234305 Wed Aug 14
131029 m1015i.1575.0 dvd Running
4 2151038 Wed Aug 14 043802
m1015i.1127.0 mdt36 Running 8
2231421 Wed Aug 7 124145
m1015i.1567.0 jar65 Running 4
9040744 Tue Aug 13 173508
m1015i.1569.0 jar65 Running 4
9082816 Tue Aug 13 215540
m1015i.1547.0 to5 Running 8
9211149 Wed Aug 14 103913
m1015i.1546.0 to5 Running 8
9211149 Wed Aug 14 103913 35 Active
Jobs 150 of 184 Processors Active (81.52)
26 of 34 Nodes Active
(76.47) IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT
QUEUETIME m1015i.1513.0 jl447
Idle 2 5000000 Tue Aug 13
070809 m1015i.1572.0 dvd Idle
8 3000000 Tue Aug 13 104518 23
Idle Jobs NON-QUEUED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT
QUEUETIME Total Jobs 58 Active
Jobs 35 Idle Jobs 23 Non-Queued Jobs 0
17LoadLeveler environment
- Normally same as your login environment
- Limits are set, use llclass -l to see values
- ulimit -S -a
- ulimit -H -a
- Big heap requirements
- -bmaxdata0x80000000 up to 2 GB data (heap)
- -q64 -bmaxdata0x. Up to 8 EB