Title: Compiling and Using the
1Compiling and Using the best R
- Vipin Sachdeva
- IBM Computational Science Division
2Improving R performance
- Performance improvements
- Hardware (Number of cores etc.)
- Intel quad-core _at_2.4 Ghz Intel Q6600
- Compilers
- Intel versus GNU
- Compiler flags (unoptimized versus optimized)
- Libraries (BLAS)
- netlib BLAS, GotoBLAS2, Intel MKL, Intel MKL-SMP
3Benchmark for R
- R-benchmark-25.R
- http//r.research.att.com/benchmarks/R-benchmark-2
5.R - Measures timings for
- B A A,
- C A/B
- Eigenvalues, Determinant, Cholesky, Inverse
(BLAS) - Needs SuppDists package
- ./Rscript --vanilla R-benchmark-25.R
4Base R
- ./configure prefix/home/vsachde/R-install
-
- Source directory .
- Installation directory /home/vsachde/R-project
/all-R/GNU-R/R-native-unoptimized - C compiler gcc -stdgnu99 -g
-O2 - Fortran 77 compiler gfortran -g -O
- C compiler g -g -O2
- Fortran 90/95 compiler gfortran -g -O
- Obj-C compiler
- Interfaces supported X11, tcltk
- External libraries readline
- Additional capabilities PNG, JPEG, TIFF,
NLS, cairo - Options enabled static R library,
shared BLAS, R profiling, Java
Compiler flags
GNU Compilers
External libraries being used
5Somewhat Optimized R
- export optim_flags-O3 -funroll-loops
-ffast-math -marchcore2 - CC"gcc" CFLAGSoptim_flags CXX"g"
CXXFLAGSoptim_flags F77"gfortran"
FFLAGSoptim_flags FC"gfortran"
FCFLAGSoptim_flags ./configure
prefixinstalldir
C compiler gcc -stdgnu99 -O3
-funroll-loops -ffast-math -marchcore2 Fortran
77 compiler gfortran -O3 -funroll-loops
-ffast-math -marchcore2 C compiler
g -O3 -funroll-loops -ffast-math
-marchcore2 Fortran 90/95 compiler
gfortran -O3 -funroll-loops -ffast-math
-marchcore2
- Compilers can be changed by variables CC, CXX,
F77 - CCicc CXXicpc F77ifort will use Intel
compilers.
6Linking external BLAS with R
- R uses unoptimized routines to do linear algebra
if not linked with external BLAS. - ./configure -with-blasltlocation of BLAS libgt
- Various sources of BLAS
- Netlib BLAS - Generic and unoptimized
- GotoBLAS2 Optimized and multi-threaded
- Intel MKL Optimized library from Intel
(sequential) - Intel MKL-SMP (Multi-threaded)
- Many others including ACML, Atlas.
- Performance of kernels change on different
libraries used.
Tries to link the BLAS library
7Linking external BLAS with R
- If everything goes well
- Source directory .
- Installation directory /home/vsachde/R-proje
ct/all-R/GNU-R/R-netlib-blas - C compiler gcc -stdgnu99 -O3
-funroll-loops -ffast-math -marchcore2 - Fortran 77 compiler gfortran -O3
-funroll-loops -ffast-math -marchcore2 - C compiler g -O3
-funroll-loops -ffast-math -marchcore2 - Fortran 90/95 compiler gfortran -O3
-funroll-loops -ffast-math -marchcore2 - Obj-C compiler
- Interfaces supported X11, tcltk
- External libraries readline,
BLAS(generic) - Additional capabilities PNG, JPEG, TIFF,
NLS, cairo - Options enabled static R library, R
profiling, Java - Recommended packages yes
BLAS was linked in properly
8Linking external BLAS with R
- What does -with-blas do ?
- Link and run R with dgemm.
- configure28567 checking for dgemm_ in
/home/vsachde/R-project/all-blas/GNU-blas/netlib-b
las/libblas_GNU.a - configure28588 gcc -stdgnu99 -o conftest -g
-O2 -I/usr/local/include -L/usr/local/lib64
conftest.c /home/vsachde/R-project/all-blas/GNU-bl
as/netlib-blas/libblas_GNU.a -lgfortran -lm -ldl
-lm gt5 - configure28595 result yes
- If the above linking step fails
- Installation wont fail, but BLAS will not be
linked in. - Summary at end wont show external BLAS linking.
- Search for dgemm in config.log and look for
errors. - Advice Compile static libraries as they are
easier to link
9Linking with different BLAS
- Netlib-BLAS
- Download source from netlib.org, unoptimized.
- GotoBLAS2
- Download from TACC website
- Optimized and multi-threaded
- Turn off CPU throttling to compile.
- Intel MKL
- Sequential and SMP
- Linking step is same for most BLASes except Intel
libs
10Linking with Intel MKL libs
- export MKLPATH/opt/intel/Compiler/11.1/072/mkl/li
b/em64t/ - Intel MKL sequential
- --with-blas"-Wl,--start-group
MKLPATH/libmkl_intel_lp64.a MKLPATH/libmkl_seque
ntial.a MKLPATH/libmkl_core.a -Wl,--end-group
-lpthread - Intel MKL SMP
- --with-blas"-Wl,--start-group
MKLPATH/libmkl_intel_lp64.a MKLPATH/libmkl_intel
_thread.a MKLPATH/libmkl_core.a -Wl,--end-group
-liomp5 -lpthread"
Intel MKL SMP and GotoBLAS2 should show
performance improvements in quad-core (run 4
threads)
11Performance Single-thread BLAS
12Performance BLAS
Performance went down by 15-20X through
compilers, compiler options and hardware (4
threads)
Revolution R uses Intel MKL-SMP
13Results
- Generic R can be optimized for performance.
- Intel MKL libraries give best performance results
with freely available GotoBLAS2 a close second. - Experiment with LAPACK as well.
- Question How much is performance important for R
users ?