Title: How to use the System
1How to use the System
- SSCK Workshop Introduction to HP XC6000
ClusterKarlsruhe, March 9 11, 2005 - Hartmut HäfnerSSCKUniversität Karlsruhe (TH)
- haefner_at_rz.uni-karlsruhe.de
2Interactive Login
3Available Services (1/2)
HWW-Firewall
ssh (scp)
XC1
passive ftp
- No print manager
- No exported file system
4Available Services (2/2)
- Login to HP XC6000 Clusterssh
ltuser-idgt_at_hwwxc1.hww.de - or within University Karlsruhessh
ltuser-idgt_at_xc1.rz.uni-karlsruhe.de - SSH2 from RZ administrated workstationsssh2 p
22 ltuser-idgt_at_hwwxc1.hww.de
5File Systems (1/2)
TMP
TMP
. . .
. . .
Quadrics QsNet II (single rail)
. . .
FC Network
TMP
HOME
WORK
10 TB
6File Systems (2/2)
- global - all nodes access the parallel file
system HP SFS, based on Lustre - local each node has ist own file system
- permanent files are stored permanently
- temporary files are removed at end of job or
session
7Moving Files (HP XC ?? Workstations)
- Either by the command scp or by passive ftpscp
ltuser-idgt_at_ws.institute.uni-karlsruhe.demydata
HOMEftp ws.institute.uni-karlsruhe.de
8Module Concept
- module is a user interface to the Modules
package. - Typically modulefiles instruct the module command
to set or alter environment variables like PATH,
MANPATH, . - Syntax ismodule switches sub-command
modulefilepathdirectory - Important switches are
- --force, -f Force active dependency resolution.
This will result in modules found on a prereq
command inside a modulefile being loaded
automatically. - --verbose, -v Enable verbose messages during
module comand execution. - Further switches control the amount of output of
the module command.
9Modules (1/2)
- module help modulefile...Print the useage of
each subcommand. If an argument is given, print
the Module specific help information for the
modulefile. - module addload modulefile modulefile...Load
modulefile into the shell environment. - module unloadrm modulefile modulefile...Remove
modulefile from the shell environment. - module switchswap modulefile1 modulefile2Switch
loaded modulefile1 with modulefile2. - module displayswitch modulefile
modulefile...Display information about the
modulefile. - module listList loaded modules.
- module avail path...List all available
modulefiles in the current MODULEPATH. - module purgeUnload all loaded modulefiles.
- Further commands to add directories to MODULEPATH
and to addremove modulefiles tofrom the shell - dependent startup files.
10Modules (2/2)
11Modulefiles containing modifications to the
environment
- modulefile is a file containing Tcl code
extensions for the Modules package. - modulefile contains the changes to a users
environment needed to access an application. - modulefiles can also be used to implement site
policies regarding the access and use of
applications. - modulefiles also hide the notion of different
types of shells. From the modulefile writers
perspective, this means one set of information
will take care of every type of shell. - Change default module environment by inserting
module add ltmodulefilegtin the setup file
.bash_profile. - Add your own Modulefiles by extending the
MODULEPATH environment variable.
12Compilers (1/4)
- Fortran 2 Intel Compilers (ifort in V8.1 and efc
in V7.1), NAG Compiler (f95), GNU
Compiler (g77 - only Fortran77) - C/C 2 Intel Compilers (icc in V8.1 and ecc in
V7.1), GNU Compiler (gcc) - -- General options -c, -Iltpathgt, -g,
-00,1,2,3, -Lltpathgt,
-lltlibrarygt, -o ltnamegt - NAG Fortran Compiler - best choice to check the
Fortran90/95 conformity of your program - Important specific options of the NAG Fortran
Compiler - -Ounsafe performs possibly unsafe optimizations
- -dusty allows the compilation of legacy
software (errors ? warning) - -ieeefullnonstdstop enablesdisables all
IEEE and deallocation facilities - -C compiles code with all possible runtime
checks - -mtrace traces memory allocation and
deallocation - -gline compiles code to generate a traceback in
case of runtime errors - -gc enables automatic garbage collection of the
executable - -tread_safe compiles code for safe execution in
a multi-threaded environment - -static prevents linking with shared libraries
13Compilers (2/4)
- Intel Fortran suffix names
- NAG Fortran suffix names
14Compilers (3/4)
- Change compiler by a simple module command (by
default the Intel compiler in version 8.1 is
used) module addload intel-compilers/7.1 - Using different compilers
- dont use explicit compiler names
- use the FC environment variable for the Fortran
compiler - use the CC environment variable for the C/C
compiler name
15Compilers (4/4)
- Compiling Fortran90/95 source code with Intel
compilerifort c O3 my_prog.f90 - Compiling Fortran90/95 source code with an
arbitrary Fortran compiler FC c O3
my_prog.f90 - Compiling C source code with Intel compilericc
c O3 my_prog.c - Compiling C source code with Intel compiler
CC c O3 my_prog.C
16Linking
- Special compiler scripts to (compile and) link
MPI programs (the scripts dont work together
with the GNU compilers) - mpicc (compile and) link C programs
- mpicc.mpich (compile and) link C programs in
MPICH compatibility mode - mpiCC (compile and) link C programs
- mpiCC.mpich (compile and) link C programs in
MPICH compatibility mode - mpif77 or mpif90 (compile and) link Fortran
programsIf MPICH compatibility mode is required,
call mpif77.mpich or mpif90.mpich - Example for Fortran90/95 object code with Intel
compilermpif90 o my_prog my_prog.o sub1.o
sub2.o
17Benchmarks
Measurements of Itanium2 (1.5 GHz) on HP XC6000
Cluster
What is remarkable? The dot product runs very
slow! The scattering of the performance rates,
if the data are stored in the L2-cache is very
high (up to 40 percent!!!).
18Benchmarks Ping Pong within a node
Neighbor send/receive speed test
--------------------------------- --- Multiple
simple Ping/Pong --- ----------------------------
----- Clock overhead is 0.1736E-07 secs per
snd/rcv. bytes ms MB/s
0 0.001 0.000 4
0.001 4.590 8 0.001
7.875 16 0.001 15.526
32 0.001 34.528 64 0.001
73.807 128 0.001 127.790
256 0.001 209.114 512 0.001
436.936 1024 0.002 674.397 2048
0.007 308.211 4096 0.007
550.674 8192 0.010 834.013 16384
0.014 1181.921 32768 0.022 1507.639
65536 0.036 1835.203 131072 0.071
1854.967 262144 0.126 2074.492 524288
0.254 2060.727 1048576 0.502 2089.745
Neighbor send/receive speed test
--------------------------------- --- Multiple
double Ping/Pong --- ----------------------------
----- Clock overhead is 0.2670E-08 secs per
snd/rcv. bytes ms MB/s
0 0.003 0.000
4 0.004 1.131 8
0.003 2.381 16 0.003
4.744 32 0.004 8.936
64 0.003 19.438 128
0.003 37.425 256 0.004
65.514 512 0.004 134.188
1024 0.004 253.168 2048 0.006
343.425 4096 0.008 541.139
8192 0.011 729.931 16384 0.018
914.383 32768 0.033 1002.130
65536 0.064 1021.981 131072 0.124
1055.018 262144 0.233 1127.460
524288 0.486 1078.485 1048576 0.911
1151.049
19Benchmarks Ping Pong between nodes
Neighbor send/receive speed test
--------------------------------- --- Multiple
simple Ping/Pong --- ----------------------------
----- Clock overhead is 0.1736E-07 secs per
snd/rcv. bytes ms MB/s
0 0.003 0.000 4
0.003 1.441 8 0.003
2.905 16 0.003 5.828
32 0.003 11.605 64
0.004 16.514 128 0.004
30.021 256 0.006 45.949
512 0.006 87.778 1024 0.006
161.227 2048 0.008 271.353
4096 0.010 408.196 8192 0.015
546.295 16384 0.025 659.058
32768 0.045 735.468 65536 0.084
781.339 131072 0.164 797.490
262144 0.320 818.153
524288 0.660 794.346 1048576 1.266
828.447
Neighbor send/receive speed test
--------------------------------- --- Multiple
double Ping/Pong --- ----------------------------
----- Clock overhead is 0.2666E-08 secs per
snd/rcv. bytes ms MB/s
0 0.009 0.000
4 0.009 0.443 8 0.009
0.899 16 0.009 1.739
32 0.009 3.497
64 0.010 6.495 128 0.010
12.508 256 0.012 22.125
512 0.012 43.344 1024
0.013 80.759 2048 0.016
129.800 4096 0.021 197.767
8192 0.031 267.897 16384 0.050
329.511 32768 0.089 367.656
65536 0.172 381.144 131072 0.334
392.161 262144 0.673 389.346
524288 1.313 399.440 1048576 2.816
372.366
20Benchmarks Overlap for short messages between
nodes
Neighbor send/receive overlap test
---------------------------------- ------ Short
messages --------- ----------------------------
------ The used message length during
computation is ... 10 the used
vectorlength during computation is . . .
10 all times in seconds, gtgtol_fac in
percent!!! Bal_fac Rep_fac_comm Rep_fac_comp
T_comm T_comp T_all T_ol ol_fac 1
103548 12267030
1.03 1.03 1.90 0.16 15.6 3
103548 12267030
1.02 3.08 3.94 0.17 16.6
The used message length during computation is ...
100 the used vectorlength during
computation is . . . 100 all times in
seconds, gtgtol_fac in percent!!! Bal_fac
Rep_fac_comm Rep_fac_comp T_comm T_comp
T_all T_ol ol_fac 1
69722 5725738 1.03
1.03 1.72 0.34 32.9 3
69722 5725738 1.03
3.09 3.78 0.35 33.8 The used
message length during computation is ...
1000 the used vectorlength during computation
is . . . 1000 all times in seconds,
gtgtol_fac in percent!!! Bal_fac Rep_fac_comm
Rep_fac_comp T_comm T_comp T_all T_ol
ol_fac 1 28496
979641 1.03 1.03 1.42
0.64 62.1 3 28496
979641 1.03 3.09
3.49 0.63 61.4
21Benchmarks Overlap for long messages between
nodes
Neighbor send/receive overlap test
---------------------------------- ------ Long
messages ---------- ---------------------------
------- The used message length during
computation is ... 10000 the used
vectorlength during computation is . . .
10000 all times in seconds, gtgtol_fac in
percent!!! Bal_fac Rep_fac_comm Rep_fac_comp
T_comm T_comp T_all T_ol ol_fac 1
4670 68699
1.03 1.03 1.13 0.92 89.7
3 4670 68699
1.03 3.00 3.23 0.80
78.0 The used message length during
computation is ... 100000 the used
vectorlength during computation is . . .
100000 all times in seconds, gtgtol_fac in
percent!!! Bal_fac Rep_fac_comm Rep_fac_comp
T_comm T_comp T_all T_ol ol_fac 1
503 6101
1.06 1.05 1.13 0.98 92.2
3 503
6101 1.06 3.13 3.19 1.00
94.0 The used message length during
computation is ... 1000000 the used
vectorlength during computation is . . .
1000000 all times in seconds, gtgtol_fac in
percent!!! Bal_fac Rep_fac_comm Rep_fac_comp
T_comm T_comp T_all T_ol ol_fac 1
49 101
1.05 1.06 1.30 0.82 78.0
3 49
101 1.05 3.18 3.35
0.88 83.7
22Debugging with DDT
- Commandsmodule add ddtddt hello
23HP MPI Execution of Parallel Programs
- The syntax to start a parallel application
interactively ismpirun mpirun_options
ltprogramgt ormpirun mpirun_options f
ltappfilegt -
24HP MPI Environment Variables
- Many environment variables
-
25Numerical Libraries
- HP XC Mathematical LIBrary (MLIB)
- Intel Mathematical Kernel Library (MKL)
- NAG Libraries (non-commercial users)
- LINear SOLver package (LINSOL)
26Well Established Open Source Libraries
- BLAS
- BLAS1,2,3 included in HP XC MLIB and Intel MKL
- LAPACK
- included in HP XC MLIB and Intel MKL
- contains many functions for the solution of
linear systems - and eihenvalue problems for dense and banded
matrices - ScaLAPACK
- included in HP XC MLIB
- contains above mentioned functions for parallel
computers - Metis
- included in HP XC MLIB
- contains a special implementation of the graph
partitioning and matrix reordering library
27HP XC MLIB (1/2)
- Functions from several areas linear equations,
least squares, eigenvalue problems, singular
value decomposition, vector and matrix
computations, convolutions and Fourier Transforms - Four components VECLIB, LAPACK, ScaLAPACK and
SuperLU_DIST - VECLIB includes all BLAS1,2,3 and sparse BLAS
subroutines, sparse linear equation solvers,
sparse eigenvalue and eigenvector solvers, FFTs,
correlation and convolution subprograms, random
number generators and METIS V4.0.1 - Load bevor use module add hp-mlib/7.1 for Intel
compiler V7.1 andmodule add hp-mlib for Intel
compiler V8.1
28HP XC MLIB (2/2)
- Appropriate options at link time
- VECLIBFC LMLIBPATH lveclib openmp o
myprog myprog.f90 - LAPACK FC LMLIBPATH llapack openmp o
myprog myprog.f90 - ScaLAPCKmpif90 LMLIBPATH lscalapack openmp
o myprog myprog.f90 - SuperLU_DIST mpif90 LMLIBPATH lsuperlu_dist
openmp o myprog myprog.f90 - More details http//www.rz.uni-karlsruhe.de/ssc/h
pxc-mlib
29Intel MKL (1/2)
- Many components
- BLAS,
- Sparse BLAS,
- LAPACK,
- direct sparse solver PARDISO,
- Vector Mathematical Library (VML) for core
mathematical functions on vector arguments, - Vector Statistical Library (VSL) for generating
vectors of pseudorandom numbers, - general Discrete Fourier Transform functions
(DFT) and - a subset of FFTs
- Load bevor use module add mkl
30Intel MKL (2/2)
- Appropriate options at link time
- BLAS, FFT, VML, VSL etc.FC LMKLPATH
lmkl_ipf lguide lpthread o myprog myprog.f90 - LAPACKFC LMKLPATH lmkl_lapack lmkl_ipf
lguide lpthread o myprog myprog.f90 - PARDISOmpif90 LMKLPATH lmkl_solver lmkl_ipf
lguide lpthread o myprog myprog.f90 - More details http//www.rz.uni-karlsruhe.de/ssc/h
pxc-mkl
31NAG Libraries
- NAG Fortran, NAG Fortran90 and NAG C libraries
only for non-commercial customers - Load bevor use module add naglib/7.1module add
mkl/7.1 for Intel compiler V7.1 andmodule add
naglib module add mkl for Intel compiler V8.1 - Appropriate options at compile and link time
- NAG Fortran LibraryFC myprog.f90
INAGLIBPATH/interface_blocks LNAGLIBPATH \
lnag-mkl LMKLPATH lmkl_lapack
lmkl_ipf lguide -lpthread - NAG Fortran90 LibraryFC myprog.f90
INAGLIBPATH/nag_mod_dir LNAGLIBPATH \
lnagfl90-noblas LMKLPATH
lmkl_lapack lmkl_ipf lguide -lpthread - NAG C LibraryCC myprog.c INAGLIBPATH/include
LNAGLIBPATH/nagc - More details http//www.rz.uni-karlsruhe.de/ssc/h
pxc-nag
32LINSOL
- LINSOL is a program package to solve large sparse
linear systems - many iterative solvers
- several polyalgorithms
- (I)LU direct solvers as preconditioners
- optimized for workstations (cache reuse),
vectorcomputers and parallel computers (MPI) - supporting 7 different storage patterns for
sparse matrices (automatic optimization to the
architecture of the computer) - Load bevor usemodule add linsol
- Appropriate options at compile and link
timempif90 LLINSOLPATH llinsol lMPI
myprog.o running a MPI jobFC LLINSOLPATH
llinsol lnocomm myprog.o running a serial job - More details http//www.rz.uni-karlsruhe.de/produ
kte/linsol