Title: Experiences deploying Clusterfinder on the grid
1Experiences deploying Clusterfinder on the grid
- Arthur Carlson (MPE)
- 7th AstroGrid-D Meeting
- TUM, 11th-12th June 2007
2Experiences deploying Clusterfinder on the grid
- What is the deployment problem?
- A prototype solution using
- grid-modules
- environments
- Status and conclusions
3Deployment is when ...
4Deployment is when ...
users
each of many
can (build and) run each of many
applications
hosts.
on each of many
5Deployment is when ...
users
each of many
can (build and) run each of many
applications
hosts.
on each of many
each gt90 many gt10
6Deployment is when ...
users
each of many
can (build and) run each of many
applications
certificates/password files VOs (update of
grid-mapfile, sharing software) firewalls
hosts.
on each of many
repository/distribution/version control data
access standard software (compiler,
...) environment
each gt90 many gt10
7grid-modules
8grid-modules
- A prototype system for getting software from
where it is maintained to where it is used. - Inspired by environment modules package
- load/unload (PATH)
- initadd/initclear (.profile)
- for software from a remote repository
- update/deinstall
- build/clean
- test
9grid-modules install and use
- grid-modules-clone NEWHOST(LIST)
- also copies /.subversion for passwords
- grid-module updateloadinitaddbuildtest gri
dmodenvgmoncfprocgat
10grid-modules adding modules
- set_module_info
- agd_rep'svn//svn.gac-grid.org/software
- all_modulesgridmod cf
- case module in
- gridmod) repagd_rep/grid-modules
fraggridmod/bin - cf) repagd_rep/clusterfinder
fragunknown - ) repunknown
fragunknown - esac
- customization scripts
11grid-modules adding modules
- set_module_info
- agd_rep'svn//svn.gac-grid.org/software
- planck_rep'http//www.mpa-garching.mpg.de/svn/pla
nck-group/planckbranches - all_modulesgridmod cf proc
- case module in
- gridmod) repagd_rep/grid-modules
fraggridmod/bin - cf) repagd_rep/clusterfinder
fragunknown - proc) repplanck_rep/ProC-2.3
fragproc/build/dist/bin - ) repunknown
fragunknown - esac
- customization scripts
- proc.build
- cd /grid-modules/proc/ProC-base
- ant
- proc.load
- mkdir -p HOME/.planck
- echo "allowIncompleteConf true" gt
"HOME/.planck/pipelinecoordinator.pref"
12environments
13environments
- A prototype system for making different hosts
look alike. - Does a required software package exist on a
remote host, and where is it installed? - export IMAGEMAGICK_HOME/usr/local/ImageMagick-6.3
.2 - Make it available!
- export PATHPATH/usr/local/ImageMagick-6.3.2/bin
- Host-specific information must be maintained by
somebody somewhere. - require modules or take the bull by the horns
14environments load_env
- The trick is to find the right scripts to execute
for each host. - if ! hostnamehostname -f 2gt/dev/null then
hostnamehostname fi - scriptssed -n "s/ hostname //p" ltltEOF
- astrogrid.aei.mpg.de aei
- buran.aei.mpg.de aei
- lx32i1.cos.lrz-muenchen.de lrz g95 lrz-32
- lx64a2.cos.lrz-muenchen.de lrz g95 lrz-64
- ...
- EOF
- cd /grid-modules/env/bin
- source ./default
This may need to be changed when adding a new host
15environments scripts
- The work is done in the scripts.
- default
- export GSL_INCL-I/usr/include
- export GSL_LIBS-L/usr/lib
- export IMAGEMAGICK_INCL-I/usr/include/
- export IMAGEMAGICK_LIBS-L/usr/lib/
- export FC'gfortran -stdgnu -fno-second-underscor
e' - export F_PORTABILITY_FLAGS-DPLANCK_GFORTRAN
- export F_COMMONFLAGS'-W -Wall -Wno-uninitialized
-Wno-unused -O2 -Wfatal-errors (F_PORTABILITY_FLA
GS)' - export FCFLAGS'-c (F_COMMONFLAGS) -I(INCDIR)'
- export CCgcc
- export CCFLAGS_NO_C'-W -Wall -I(INCDIR)
(GSL_INCL) (IMAGEMAGICK_INCL)
-fno-strict-aliasing -O2 -g0 -s -ffast-math' - export CCFLAGS'(CCFLAGS_NO_C) -c
- lrz
New scripts may need to be written for new hosts
Defaults work in most cases.
Cooperates with modules.
Defaults can be overridden.
16Status
17Status
- ca. 23 AGD hosts 9 DGI hosts are accessible
- F90 build of Clusterfinder successful on 22 hosts
(70) - Some of the problems experienced
- difficulty finding FQDNs of resources, hosts
listed by mistake - gsissh disabled
- default job factory type disabled for
globusrun-ws - no gsiscp installed, or unexpected default ports
- svn not installed, too old, or not allowed
connections - shell not bash, .profile not processed with batch
jobs - file quota too small
- some hosts (lx3264ia1 at LRZ) share a file
system - no F90 compiler installed, or hard to find
- deep changes in grid-modules are hard to update
18Conclusions
19Conclusions
- Clusterfinder has been deployed on many hosts
using a prototype deployment system that is
easily extendable to many users and many
applications. - The system handles diversity without standing in
the way of defining standards. - AGD should use this system or decide on something
better, but should not diverge.