SEEGRID Site installation and configuration - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

SEEGRID Site installation and configuration

Description:

Operating system: SL 4.x recommended, SLC, Centos, RHEL compatible ... http://grid-deployment.web.cern.ch/grid-deployment/yaim/repos/lcg-CA.repo ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 18
Provided by: antun2
Category:

less

Transcript and Presenter's Notes

Title: SEEGRID Site installation and configuration


1
SEE-GRID Site installation and configuration
SEE-GRID-2 Training Event,Tirana, Albania, 02
April 2008
Antun Balaz WP3 Leader Institute of Physics,
Belgrade antun_at_phy.bg.ac.yu
Emanouil Atanassov Institute for Parallel
Processing, BAS emanouil_at_parallel.bas.bg
2
Outline
  • Decisions about site configuration
  • OS installation, NTP, firewall issues
  • Java installation
  • Shared storage
  • Middleware installation
  • Configuration
  • Adding MPI support
  • APEL, Accounting configuration

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 2
3
Decisions about site configuration
  • Operating system SL 4.x recommended, SLC,
    Centos, RHEL compatible
  • 32 or 64 bit 64 bit is the future
  • Required service nodes CE, SE (dpm
    recommended), MON.
  • WNs can be 32 or 64 with preference for 64
  • Virtualization is the recommended solution for
    combining more nodes on one physical node.
  • SL5 host running xen with SL4 para or fully
    virtualized guests configuration is usable
  • UI must be close to the user

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 3
4
Decisions about site configuration
  • Storage 1TB for the SE
  • RAM at least 1 GB RAM per job/core
  • Internal networking goal should be the WNs to
    be all on one 1Gbps switch. 1Gbps should be the
    goal
  • External networking the more the better
  • Firewalls
  • Avoid NAT worker nodes
  • Service nodes MUST have public IPs, and DNS
    resolution MUST work for them both ways

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 4
5
OS installation
  • Install the lastest SL 4.x (for some node types
    3.0.x)
  • Keep the WNs homogeneous (cloning)
  • Be generous, install development packages,
    compilers.
  • Yum is recommended over apt-get, because of
    multi-arch support x86_64/i386.
  • Locate a reliable close NTP server for time
    synchronization!
  • Enable the dag repository
  • Do not allow automatic upgrades for the
    middleware repositories

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 5
6
Java installation
  • Use latest Java 1.5. Follow advice from
  • https//twiki.cern.ch/twiki/bin/view/EGEE/GLite31J
    Package
  • or
  • http//wiki.egee-see.org/index.php/SL4_WN_glite-3.
    1

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 6
7
Middleware installation
  • Add the CA repository as shown in
  • http//grid-deployment.web.cern.ch/grid-deployment
    /yaim/repos/lcg-CA.repo
  • Install with yum install lcg-CA
  • Pick the right repository
  • https//twiki.cern.ch/twiki/bin/view/LCG/GenericIn
    stallGuide310Updates
  • Install SEE-GRID VOMS server rpms (and any
    additional rpms for additional VOs). Currently
  • http//www.irb.hr/users/vvidic/seegrid/seegrid-0.
    5-1.noarch.rpm
  • http//www.grid.auth.gr/services/voms/SEE/GridAUT
    H-vomscert-1.2-5.noarch.rpm
  • Install glite middleware with yum, using the
    right target
  • CE lcg-CE glite-TORQUE_utils glite-TORQUE_server
    BDII_site
  • SE dpm glite-SE_dpm_mysql
  • MON glite-MON
  • WN glite-WN glite-TORQUE_client
  • UI glite-UI

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 7
8
Middleware configuration
  • Configuration consists mostly in editing several
    configuration files, which you can steal from
    http//glite.phy.bg.ac.yu/GLITE-3/AEGIS/
  • Ideally these files should be the same on all
    your nodes.
  • Be careful with MON_HOST and REG_HOST. MON_HOST
    is the FQDN of your MON box. REG_HOST should be
    gserv1.ipp.acad.bg for SEE-GRID only sites, but
    lcgic01.gridpp.rl.ac.uk for EGEE sites.
  • Configuration is done with one single command
  • /opt/glite/yaim/bin/yaim -c -s site-info.def -n
    ltnode_type1gt -n ltnode_type2gt
  • where you list node types as in
  • /opt/glite/yaim/bin/yaim -c -s site-info.def -n
    lcg-CE -n TORQUE_server -n TORQUE_utils
  • IMPORTANT More than one service node on the same
    logical computer is not supported and may result
    in severe headache.

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 8
9
Patching the configuration
  • Sometimes the resulting configuration has known
    problems. Then we manually patch the holes.
  • Examples pool users must be able to ssh from WN
    to CE without password (and without annoying
    warning messages)
  • Info provider on CE uses maui diagnose command.
    This is better avoided.
  • Timeouts in the infoprovider may have to be
    increased
  • On WNs cat /var/spool/pbs/mom_priv/config should
    return
  • pbsserver CEFQDN
  • restricted CEFQDN
  • logevent 255
  • For pbs a line could be added to use NFS instead
    of scp.

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 9
10
Maui and qmgr configuration
  • Recommendation Hard limits on the queues are
    best imposed with qmgr command
  • Qmgrgt set queue seegrid max_running21
  • Necessary changes in qmgr for MPI support if
    you have the default setting of max 48 hours CPU
    time, MPI jobs taking more than 48 hours of total
    time will be aborted. We suggest to set CPU time
    to much more than 48 hours, and to rely on Wall
    Clock Time to impose reasonable limit. Example
  • Qmgrgtset queue seegrid resources_max.cput172800
    00
  • to enable MPI jobs taking up to 200 CPU days,
    but
  • Qmgrgtset queue seegrid resources_max.walltime25
    9200
  • to allow up to 3 days wall clock usage.
  • In maui you can make reservations for specific
    groups (VOs),
  • users (by DN - if you have added the patched
    torque-submitter script, which also improves MPI
    support see later).
  • Necessary changes in maui for MPI support
  • ENABLEMULTIREQJOBS TRUE

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 10
11
iptables configuration
  • Outbound connectivity unlimited.
  • Inbound usually required to 20000-25000 TCP.
  • For MON box 8443, 8088,2135 (or 2170), 2136.
  • For CE 2170, 2135, 2119, 2811
  • For SE 2170, 2811, 8443
  • If you suspect firewall problem on a service
    node, look at
  • netstat anlpgrep LISTEN
  • To determine which ports have some deamons
    listening.
  • UI can be under NAT, but this prevents some
    useful commands from working.

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 11
12
iptables configuration
  • Outbound connectivity unlimited.
  • Public inbound connectivity usually required to
    20000-25000 TCP.
  • For MON box 8443, 8088,2135, 2136.
  • For CE 2170, 2119, 2811
  • For SE 2170, 2811, 8443
  • Intra-cluster difficult to manage effectively.
  • If you suspect firewall problem, look at
  • netstat anlpgrep LISTEN
  • to determine which ports have some deamons
    listening.
  • Control ssh access usually penetration happens
    because of weak admin or user passwords. Ideally
    replace password access with private key access.
    Teach users not to use unprotected private keys
    hackers are looking for these.
  • UIs are a security nightmare (but can be
    installed on a SL4.x VMware virtual machines
    behind NAT on the user laptop).

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 12
13
Shared filesystem
  • Home directories must be shared between CE and
    WNs for good MPI support (read if you are
    serious about MPI, they must be shared).
  • /opt/exp_soft directory must be exported to the
    worker nodes (all sites). Appropriate permissions
    must be set (read/write for SGM accounts, read
    for user accounts).
  • Other posix compliant shared filesystems are also
    possible

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 13
14
MPI support
  • MPI support usually requires pool users to be
    able to ssh between WNs without password. Mpiexec
    can avoid that, but users have problems with
    mpiexec.
  • A cron script that kills runaway processes
    (processes, run by users that do not have active
    job on that job) must be in place.
  • Jobs run with mpiexec produce correct accounting
    (but are killed by the batch system if they go
    above the max CPU time limit for the queue).
    Solution set max CPU time much higher than wall
    clock time.
  • Jobs run with mpich2 also result in correct
    accounting, and can be run across sites (tested
    in SEEGRID)!
  • For WAN MPI support some new protocols are more
    promising.
  • MPI can be based on SCTP instead of TCP (some
    success, but requires some changes in site
    configuration).

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 14
15
MPI support
  • EGEE moving towards standardized use of
    mpi-start
  • http//glite.web.cern.ch/glite/packages/R3.1/deplo
    yment/glite-MPI_utils/3.1.1-0/glite-MPI_utils-3.1.
    1-0-update.html
  • http//www.grid.ie/mpi/wiki/YaimConfig
  • glite-MPI_utils
  • nameglite 3.1 MPI
  • enabled1
  • gpgcheck0 baseurlhttp//glitesoft.cern.ch/EGEE/g
    Lite/R3.1/glite-MPI_utils/sl4/i386/
  • yum install glite-MPI_utils
  • Configure becomes
  • /opt/glite/yaim/bin/yaim -c -s site-info.def  -n M
    PI_CE
  • /opt/glite/yaim/bin/yaim -c -s site-info.def  -n M
    PI_WN -n glite-WN  -n TORQUE_client
  • Use submit filter for torque
  • Edit /var/spool/pbs/torque.cfg and add
  • SUBMITFILTER /var/spool/pbs/submit_filter.pl
  • steal mine from ce002.ipp.acad.bg
  • globus-url-copy gsiftp//ce002.ipp.acad.bg/var/spo
    ol/pbs/submit_filter.pl file///tmp/submit_filter.
    pl
  • My advice do not allow LAM for grid jobs

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 15
16
MPI support
  • Changes in site-info.def
  • jobmanagerpbs
  • CE_BATCH_SYSpbs (not torque!)
  • Add
  • MPI-START
  • MPICH
  • MPICH-1.2.7
  • MPICH2
  • MPICH2-1.0.4
  • OPENMPI
  • OPENMPI-1.1
  • Just after R-GMA for the GlueHostSoftwareEnvironme
    nt.

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 16
17
Accounting issues
  • Install MON box on SL 3.0.x
  • Make sure LcgRecords table has the correct format
    (on old installations some fields need to be made
    wider).
  • If using pbs (shared home dirs) change pbs.pm on
    the CE
  • http//glite.phy.bg.ac.yu/GLITE-3/AEGIS/pbs.pm
  • Make sure ports 8088 and 8443 are open.
  • After installation and configuration of MON box,

SEE-GRID-2 Training Event, Tirana, Albania, 2
April 2008 17
Write a Comment
User Comments (0)
About PowerShow.com