Title: Let
1Lets put our hands on the (Int.Eu.)Grid
Parallel Processing and Interactivity
EGEE Int.EU.Grid Tutorial Lisbon, 14th November
2007
- Gonçalo Borges, Mário David, Jorge Gomes
- LIP
2Preliminars
- Log-in in the UI using your account and start
your proxy - myself_at_MyTopPC ssh l user1 ui03
- user01_at_ui03 voms-proxy-init voms itut
- Enter in the mpi_and_interactivity working
directory - user01_at_ui03 cd mpi_and_interactivity
- Inside this working directory youll have 6
exercices - exe1 parallel hostname job example
- exe2 parallel Open MPI job example
- exe3 parallel Open MPI job example with Hooks
- exe4 parallel PACX MPI job example
- exe5 Interactivity job example
- exe6 Interactivity and Open MPI job example
Very Good !!! The Hard Part is finished. Now
here comes the funny stuff.
3Exe1 A parallel hostname job
- Build a paralell aplication which runs the
hostname command simultaneously in several
nodes - Start to build your JDL file
user01_at_ui03 exe1 cat hostname_openmpi.jdl Type
"job" JobType
"Parallel" SubJobType "openmpi"
NodeNumber 4 VirtualOrganisation
"itut" RentryCount 0 Executable
"/bin/hostname" StdOutput
"stderr.out" StdOutput
"stderr.out" OutputSandbox
"stderr.out","stderr.err" Environment
"OMPI_MCA_mpi_yield_when_idle1" Requirements
(other.GlueCEInfoHostName
"i2g-ce01.lip.pt")
4Exe1 Start the parallel hostname job
- Comment the Requirement instruction on your JDL
- Check where can you run your Job
- user01_at_ui03 exe1 i2g-job-list-match
hostname_openmpi.jdl - Re-include the Requirement instruction and...
- Submit your JDL
- user01_at_ui03 exe1 i2g-job-submit
hostname_openmpi.jdl - Query the Job status
- user01_at_ui03 exe1 i2g-job-status http//...
- Get the Job logging info
- user01_at_ui03 exe1 i2g-job-get-logging-info v
012 http//... - Retrieve the Job output
- user01_at_ui03 exe1 i2g-job-output http//...
5Exe1 Output
ui03 /home/tutorial/userXX/mpi_and_interactivity
/exe1 gt cat /tmp/jobOutput/userXX_TO55kCIfiBwYFPaj
EYYjHg/stdout.out Scientific Linux CERN SLC
release 4.5 (Beryllium) ssh(11409) Warning No
xauth data using fake authentication data for
X11 forwarding. /usr/bin/X11/xauth creating new
authority file /home/itut143/.Xauthority Scientifi
c Linux CERN SLC release 4.5 (Beryllium) Scientifi
c Linux CERN SLC release 4.5 (Beryllium) ssh(11427
) Warning No xauth data using fake
authentication data for X11 forwarding. /usr/bin/X
11/xauth creating new authority file
/home/itut143/.Xauthority Scientific Linux CERN
SLC release 4.5 (Beryllium) Scientific Linux CERN
SLC release 4.5 (Beryllium) ssh(11445) Warning
No xauth data using fake authentication data for
X11 forwarding. /usr/bin/X11/xauth creating new
authority file /home/itut143/.Xauthority Scientifi
c Linux CERN SLC release 4.5 (Beryllium) lflip22.l
ip.pt lflip26.lip.pt lflip30.lip.pt lflip28.lip.pt
Scientific Linux CERN SLC release 4.5
(Beryllium) ssh(11478) Warning No xauth data
using fake authentication data for X11
forwarding. Scientific Linux CERN SLC release 4.5
(Beryllium) ssh(11484) Warning No xauth data
using fake authentication data for X11
forwarding. Scientific Linux CERN SLC release 4.5
(Beryllium) ssh(11490) Warning No xauth data
using fake authentication data for X11 forwarding.
6Exe2 Open MPI example
- Lets compute p
- Copy the cpi.c program from goncalo/tutorial/exe2
/cpi.c - Compile the program with openmpi and build a JDL
for it... - /opt/i2g/openmpi/bin/mpicc -o cpi-openmpi cpi.c
user01_at_ui03 exe2 cat cpi_openmpi.jdl Type
"job" JobType
"Parallel" SubJobType "openmpi"
NodeNumber 4 VirtualOrganisation
"itut" RentryCount 0 Executable
cpi-openmpi" StdOutput
cpi-openmpi.out StdError
cpi-openmpi.err OutputSandbox
"cpi-openmpi.out","cpi-openmpi.err" InputSandbo
x "cpi-openmpi" Environment
"I2G_MPI_START_DEBUG1",
"I2G_MPI_START_VERBOSE1","I2G_MPI_START_TRACE1",
"OMPI_MCA_mpi_yield_when_i
dle1"Requirements (other.GlueCEInfoHo
stName "i2g-ce01.lip.pt")
7Exe2 Start the Open MPI example
- Including the Requirement instruction...
- Submit your JDL
- user01_at_ui03 exe2 i2g-job-submit
cpi_openmpi.jdl - Query the Job status
- user01_at_ui03 exe2 i2g-job-status lthttp//...gt
- Retrieve the Job output
- user01_at_ui03 exe2 i2g-job-output http//...
- Analyse...
- The standard output (cpi-openmpi.out)
- The standard error (cpi-openmpi.err)
8Exe2 Std Output
ui03 /home/tutorial/userXX/mpi_and_interactivity
/exe2 gt cat /tmp/jobOutput/userXX_26H7zXIlOm-sAxWF
DbNnSQ/cpi-openmpi.out
UID
itut143 HOST lflip27.lip.pt DATE
Wed Nov 7 152521 WET 2007 VERSION
0.0.44
mpi-start DEBUG
dump configuration (...) START
mpi-start DEBUG /opt/i2g/openmpi/bin/mpiexe
c -x X509_USER_PROXY --prefix /opt/i2g/openmpi
-machinefile /tmp/122976.1.itutgridsdj/tmp.SYthpM4
211 -np 4 /home/itut143/globus-tmp.lflip27.3648.0
/https_3a_2f_2fi2grb02.lip.pt_3a9000_2f26H7zXIlOms
AxWFDbNnS_0/cpi-openmpi Process 2 of 4 is on
lflip28.lip.pt Process 0 of 4 is on
lflip27.lip.pt Process 1 of 4 is on
lflip30.lip.pt Process 3 of 4 is on
lflip25.lip.pt pi is approximately
3.1415926544231239, Error is 0.0000000008333307 wa
ll clock time 0.722569 FINISHED
(.
..)
9Exe2 Std Error
ui03 /home/tutorial/userXX/mpi_and_interactivity
/exe2 gt cat /tmp/jobOutput/userXX_26H7zXIlOm-sAxWF
DbN nSQ/cpi-openmpi.err '' x/opt/i2g/bin/mpi-st
art x '' dirname /opt/i2g/bin/mpi-start
MPI_START_PREFIX/opt/i2g/bin
MPI_START_MACHINEFILE MPI_START_READY-1
info_msg 'search for scheduler' '' x1 x1
'' echo 'mpi-start INFO search' for
scheduler for i in 'MPI_START_PREFIX/../etc/mpi
-start/.scheduler' unset scheduler_available
unset scheduler_get_machinefile debug_msg
'source /opt/i2g/bin/../etc/mpi-start/lsf.schedule
r (...)
10Exe 3 Hooks in OpenMPI
- Lets simulate an example where you need to do
some stuff in the WNs before running the MPI job - For example, get files from a SE
- Hooks functions will be of great use...
- Lets copy a file to LIP SE and register it in
the LFC - A file called BIGDATAFILE.data is already stored
in LIP SE and registered in LFC - Use pre_run_hook to copy the data from the SE to
the WN - Use post_run_hook to know the replicas of that
file
11Exe 3 Hooks in OpenMPI
- Check the Pre/Post Hooks functions
user01_at_ui03 exe3 cat pre_run_hook.sh pre_run_ho
ok () echo "USER PRE RUN HOOK CALLED Copy
DATA from SE" export LFC_HOSTlfc01.lip.pt
export MY_VOitut export DATABIGDATAFILE.data
export LFC_DIR"/grid/itut/tut-14-11-07/mpi_and_i
nteractivity/" echo "/opt/lcg/bin/lcg-cp -v
--vo MY_VO lfnLFC_DIR/DATA
file//pwd/DATA" /opt/lcg/bin/lcg-cp -v
--vo MY_VO lfnLFC_DIR/DATA
file//pwd/DATA return 0
user01_at_ui03 MyMPIDir cat post_run_hook.sh post_
run_hook () echo "USER POST RUN HOOK CALLED
Delete/Unregister DATA from SE/LFC" export
LFC_HOSTlfc01.lip.pt export MY_VOitut
export DATABIGDATAFILE.data export
LFC_DIR"/grid/itut/tut-14-11-07/mpi_and_interacti
vity/" echo "/opt/lcg/bin/lcg-lr -v --vo
MY_VO lfnLFC_DIR/DATA
file//pwd/DATA" /opt/lcg/bin/lcg-lr -v
--vo MY_VO lfnLFC_DIR/DATA
file//pwd/DATA return 0
12Exe 3 Submit Hooks in OpenMPI
- The JDL for Hooking in an Open Mpi Job
- Submit your Job
- user01_at_ui03 exe3 i2g-job-submit
cpi_openmpi_hooks.jdl
user01_at_ui03 exe3 cat cpi_openmpi_hooks.jdl Type
"job" JobType
"Parallel" SubJobType
"openmpi" NodeNumber
4 VirtualOrganisation "itut" RetryCount
0 Executable "cpi-openmpi" StdOu
tput "cpi-openmpi.out" StdError
"cpi-openmpi.err" OutputSandbox
"cpi-openmpi.out","cpi-openmpi.err" InputSandbo
x "cpi-openmpi","pre_run_hook.sh","post_
run_hook.sh" Environment "I2G_MPI_START_DEBUG
1","I2G_MPI_START_VERBOSE1",
"I2G_MPI_PRE_RUN_HOOK./pre_run_hook.sh",
"I2G_MPI_POST_RUN_HOOK./post_run_hook.sh",
"OMPI_MCA_mpi_yield_when_idle1"
Requirements (other.GlueCEInfoHostName
"i2g-ce01.lip.pt")
13Exe 3 Output
ui03 /home/tutorial/userXX/mpi_and_interactivity
/exe3 gt cat /tmp/jobOutput/userXX_qhV2_fnF9VAGQiw8
-D-oQw/cpi-openmpi.out (...) -ltSTART PRE-RUN
HOOKgt---------------------------------------------
------ USER PRE RUN HOOK CALLED Copy DATA from
SE /opt/lcg/bin/lcg-cp -v --vo itut
lfn/grid/itut/tut-14-11-07/mpi_and_interactivity/
/BIGDATAFILE.data file///home/itut172/globus-tmp.
lflip22.21676.0/https_3a_2f_2fi2g-rb02.lip.pt_3a90
00_2fqhV2_5ffnF9VAGQiw8-D-oQw_0/BIGDATAFILE.data U
sing grid catalog type lfc Using grid catalog
lfc01.lip.pt VO name itut Source URL
lfn/grid/itut/tut-14-11-07/mpi_and_interactivity/
/BIGDATAFILE.data File size 39 Source URL for
copy gsiftp//stcms02.lip.pt2811//pnfs/lip.pt/da
ta/itut/generated/2007-11-08/fileb66a707a-7c7e-4e1
1-ad48-be c9aead0c44 Destination URL
file///home/itut172/globustmp.lflip22.21676.0/htt
ps_3a_2f_2fi2grb02.lip.pt_3a9000_2fqhV2_5ffnF9VAGQ
iw8-D-oQw_0/BIGDATAFILE.data streams 1 set
timeout to 0 (seconds) Transfer took 1000
ms -ltSTOP PRE-RUN HOOKgt---------------------------
------------------------- (...)
14Exe 4 Pacx MPI example
- Lets compute p with more CPUs
- Continue to use the same cpi.c...
- Compile the program with pacx-mpi and build a JDL
for it... - /opt/i2g/pacx-openmpi/bin/pacxcc -o cpi cpi.c
user01_at_ui03 exe4 cat cpi_pacxmpi.jdl Type
"job" JobType
"Parallel" SubJobType pacx-mpi"
NodeNumber 200 VirtualOrganisation
"itut" RentryCount 0 Executable
cpi-pacxmpi" StdOutput
cpi-pacxmpi.out StdError
cpi-pacxmpi.err OutputSandbox
"cpi-pacxmpi.out","cpi-pacxmpi.err" InputSandbo
x "cpi-pacxmpi" Environment
"I2G_MPI_START_DEBUG1",
"I2G_MPI_START_VERBOSE1","I2G_MPI_START_TRACE1",
"OMPI_MCA_mpi_yield_when_i
dle1"Requirements (other.GlueCEInfoHo
stName "i2g-ce01.lip.pt")
15Exe 4 Start the PACX MPI example
- Comment the Requirement instruction...
- Start with requesting 50 CPUs. Check where can
you ran you job... - Decrease to 20 CPUs. Check where can you ran you
job... - user01_at_ui03 exe4 i2g-job-list-match
cpi_pacxmpi.jdl - Uncomment the Requirement instruction...
- Decrease to 4 CPUs. Submit your job
- user01_at_ui03 exe4 i2g-job-submit
cpi_pacxmpi.jdl - Query the Job status
- user01_at_ui03 exe4 i2g-job-status http//...
- Retrieve the Job output
- user01_at_ui03 exe4 i2g-job-output http//...
- Analyse (diff with the openmpi outputs)
- The standard output (cpi-pacxmpi.out)
- The standard error (cpi-pacxmpi.err)
16Exe 4 PACX MPI Output
(...) mpi-start DEBUG gt I2G_MPI_APPLICATION_
ARGS mpi-start DEBUG gt I2G_MPI_TYPEpacx-mp
i mpi-start DEBUG gt I2G_MPI_VERSION mpi-sta
rt DEBUG gt I2G_MPI_PRE_RUN_HOOK mpi-start
DEBUG gt I2G_MPI_POST_RUN_HOOK mpi-start
DEBUG gt I2G_MPI_PRECOMMAND mpi-start
DEBUG gt I2G_MPI_FLAVOURopenmpi mpi-start
DEBUG gt I2G_MPI_JOB_NUMBER0 mpi-start
DEBUG gt I2G_MPI_STARTUP_INFOi2g-rb02.lip.pt
20090 mpi-start DEBUG gt I2G_MPI_RELAYi2g-c
e01.lip.pt (...) START
mpi-start
DEBUG /opt/i2g/openmpi/bin/mpiexec -x
X509_USER_PROXY prefix /opt/i2g/openmpi
-machinefile /tmp/123082.1.itutgridsdj/tmp.zbOKdz6
142 -mca pls tm,gridengine -mca ras
tm,gridengine -x GLOBUS_TCP_PORT_RANGE"20000
25000" -np 6 /home/itut143/globustmp.lflip22.5547
.0/https_3a_2f_2fi2grb02.lip.pt_3a9000_2fwnBfX2uYw
3QOLOweO6oATw_0/cpi-pacxmpi Process 1 of 4 is on
lflip18.lip.pt Process 2 of 4 is on
lflip24.lip.pt Process 0 of 4 is on
lflip22.lip.pt Process 3 of 4 is on
lflip31.lip.pt pi is approximately
3.1415926544231239, Error is 0.0000000008333307 wa
ll clock time 0.380859 FINISHED
(.
..)
17Exe 4 Compare with Open MPI Output
(...) mpi-start DEBUG gt I2G_MPI_APPLICATION_
ARGS mpi-start DEBUG gt I2G_MPI_TYPEopenmpi
mpi-start DEBUG gt I2G_MPI_VERSION mpi-star
t DEBUG gt I2G_MPI_PRE_RUN_HOOK mpi-start
DEBUG gt I2G_MPI_POST_RUN_HOOK mpi-start
DEBUG gt I2G_MPI_PRECOMMAND mpi-start
DEBUG gt I2G_MPI_FLAVOURopenmpi mpi-start
DEBUG gt I2G_MPI_JOB_NUMBER mpi-start
DEBUG gt I2G_MPI_STARTUP_INFO mpi-start
DEBUG gt I2G_MPI_RELAYi2g-ce01.lip.pt (...)
START
mpi-start DEBUG
/opt/i2g/openmpi/bin/mpiexec -x X509_USER_PROXY
--prefix /opt/i2g/openmpi -machinefile
/tmp/122976.1.itutgridsdj/tmp.SYthpM4211 -np 4
/home/itut143/globus-tmp.lflip27.3648.0/https_3a_2
f_2fi2g-rb02.lip.pt_3a9000_2f26H7zXIlOm-sAxWFDbNnS
Q_0/cpi-openmpi Process 2 of 4 is on
lflip28.lip.pt Process 0 of 4 is on
lflip27.lip.pt Process 1 of 4 is on
lflip30.lip.pt Process 3 of 4 is on
lflip25.lip.pt pi is approximately
3.1415926544231239, Error is 0.0000000008333307 wa
ll clock time 0.722569 FINISHED
(.
..)
18Exe 5 Interactivity
- From a UI03 terminal start i2glogin
- user01_at_ui03 exe5 i2glogin p 21016
- -p 21016193.136.6.71
- Everyone should use a different TCP port between
20000 and 25000 - I2glogin in UI03 waits for the connection to be
established... - Build a JDL for an interactive JOB
- Submit your JOB in a 2nd UI03 terminal and see
what happens in the 1st one... - user01_at_ui03 exe5 i2g-job-submit
interactive.jdl
user01_at_ui03 exe5 cat interactive.jdl JobType
"normal" Executable "/bin/sh" Arguments
"" Interactive True InteractiveAgent
"i2glogin" InteractiveAgentArguments "-r -p
21016193.136.6.71 -t -c" InputSandbox
"/opt/i2g/bin/i2glogin" Requirements
(other.GlueCEInfoHostName "i2g-ce01.lip.pt")
19Exe5 Output
ui03 /home/liplisbon/userXX/tutorial/exe5 gt
i2glogin -p 21016 -p 21016193.136.6.71
ui03 /home/liplisbon/userXX/tutorial/exe5 gt
i2glogin -p 21016 -p 21016193.136.6.71 lflip22
/home/itut143 gt whoami itut143 lflip22
/home/itut143 gt pwd /home/itut143 lflip22
/home/itut143 gt hostname lflip22.lip.pt lflip22
/home/itut143 gt
ui03 /home/liplisbon/userXX/tutorial/exe5 gt
i2g-job-submit interactive.jdl Selected Virtual
Organisation name (from proxy certificate
extension) itut Connecting to host
i2g-rb02.lip.pt, port 7772 Logging to host
i2g-rb02.lip.pt, port 9002
JOB SUBMIT
OUTCOME The job has been successfully submitted
to the Network Server. Use i2g-job-status
command to check job current status. Your job
identifier (edg_jobId) is - https//i2g-rb02.li
p.pt9000/pkAEboavWMLBwqP1WQfEag
20 Exe 6 Interactivity and Open MPI
- How can use Interactivity with any kind of other
Jobs - Lets compute p using interactivity... Here it is
the JDL...
user01_at_ui03 exe6 cat cpi_openmpi.jdl Type
"job"JobType
"Parallel"SubJobType
"openmpi" NodeNumber
4VirtualOrganisation "itut"RetryCount
0Executable
"cpi-openmpi"StdOutput
"cpi-openmpi.out" StdError
"cpi-openmpi.err" InteractiveAgent
"i2glogin"InteractiveAgentArguments "-r -p
21016193.136.6.71 -t -c"Interactive
true InputSandbox
"/opt/i2g/bin/i2glogin","cpi-openmpi"OutputSan
dbox "cpi-openmpi.out","cpi-openmpi
.err"Environment
"I2G_MPI_START_DEBUG1",
"I2G_MPI_START_VERBOSE1","I2G_MPI_START_TRAC
E1",
"OMPI_MCA_mpi_yield_when_idle1"Requirements
(other.GlueCEInfoHostName
"i2g-ce01.lip.pt")
21Exe 6 Interactivity and Open MPI
- From a UI03 terminal start i2glogin
- user01_at_ui03 exe6 i2glogin p 21016
- -p 21016193.136.6.71
- Everyone should use a different TCP port between
20000 and 25000 - I2glogin in UI03 waits for the connection to be
established... - Submit your JOB in a 2nd UI03 terminal and see
what happens in the 1st one... - user01_at_ui03 exe6 i2g-job-submit cpi_.jdl
22Exe6 Output
ui03 /home/liplisbon/userXX/tutorial/exe5 gt
i2glogin -p 21016 -p 21016193.136.6.71
ui03 /home/liplisbon/userXX/tutorial/exe6 gt
i2glogin -p 21016 -p 21016193.136.6.71 lflip22.l
ip.pt09357 mca base component_find unable to
open ras gridengine.la file not found
(ignored) lflip22.lip.pt09357 mca base
component_find unable to open ras gridengine.so
file not found (ignored) lflip22.lip.pt09357
mca base component_find unable to open
pls gridengine.la file not found
(ignored) lflip22.lip.pt09357 mca base
component_find unable to open pls gridengine.so
file not found (ignored) Scientific Linux CERN
SLC release 4.5 (Beryllium) Scientific Linux CERN
SLC release 4.5 (Beryllium) Scientific Linux CERN
SLC release 4.5 (Beryllium) Process 1 of 4 is on
lflip26.lip.pt Process 0 of 4 is on
lflip22.lip.pt Process 3 of 4 is on
lflip29.lip.pt pi is approximately
3.1415926544231239, Error is 0.0000000008333307 wa
ll clock time 0.033887 Process 2 of 4 is on
lflip19.lip.pt Connection closed by foreign host.
ui03 /home/liplisbon/userXX/tutorial/exe6 gt
i2g-job-submit interactive.jdl Selected Virtual
Organisation name (from JDL) itut Connecting to
host i2g-rb02.lip.pt, port 7772 Logging to host
i2g-rb02.lip.pt, port 9002
JOB SUBMIT
OUTCOME The job has been successfully submitted
to the Network Server. Use i2g-job-status
command to check job current status. Your job
identifier (edg_jobId) is - https//i2g-rb02.li
p.pt9000/atjku2ExIt6yXF04VsPViQ