FermiGrid - PowerPoint PPT Presentation

About This Presentation
Title:

FermiGrid

Description:

User obtains a voms-qualified proxy in the normal fashion with voms-proxy-init ... Also needed, shared home directory for all users, (fermigrid has 226) ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 22
Provided by: Csw5
Category:
Tags: fermigrid

less

Transcript and Presenter's Notes

Title: FermiGrid


1
FermiGrid
  • Steven Timm
  • Fermilab
  • Computing Division
  • Fermilab Grid Support Center

2
People
  • FermiGrid Operations Team
  • Keith Chadwick (CD/CCF/FTP) Project Leader
  • Steve Timm (CD/CSS/FCS) Linux OS Support
  • Dan Yocum (CD/CCF/FTP) Application Support
  • Thanks to
  • Condor Team M. Livny, J. Frey, A. Roy and many
    others.
  • Globus Developers C. Bacon, S. Martin.
  • GridX1 R. Walker, D. Vanderster et al.
  • Fermilab grid developers G. Garzoglio, T.
    Levshina.
  • Representatives of following OSG Virtual
    Organizations CDF, DZERO, USCMS, DES, SDSS,
    FERMILAB, I2U2, NANOHUB, GADU.
  • FermiGrid Web Site Additional Documentation
  • http//fermigrid.fnal.gov//

3
FCCFeynman Computing Center
4
Fermilab Grid Computing Center
5
Computing at Fermilab
  • Reconstruction and analysis of data for High
    Energy Physics Experiments
  • gt 4 Petabytes on tape
  • Fast I/O to read file, many hours of computing,
    fast I/O to write
  • Each job independent of other jobs.
  • Simulation for future experiments (CMS at CERN)
  • In two years need to scale to gt50K jobs/day
  • Each big experiment has independent cluster or
    clusters
  • Diverse file systems, batch systems, management
    methods.
  • More than 3000 dual-processor Linux systems in
    all

6
FermiGrid Project
  • FermiGrid is a meta-facility established by
    Fermilab Computing Division
  • Four elements
  • Common Site Grid Services
  • Virtual Organization hosting (VOMS, VOMRS),
    Site-wide Globus GRAM gateway, Site
    AuthoriZation, MyProxy, GUMS.
  • Bi-lateral Interoperability between various
    experimental stakeholders
  • Interfaces to the Open Science Grid
  • Grid interfaces to mass storage systems.

7
(No Transcript)
8
Hardware
Dell 2850 Servers with dual 3.6 GHz Xeons,
4Gbytes of memory, 1000TX, Hardware Raid,
Scientific Linux 3.0.4, VDT 1.3.9 FermiGrid1 Sit
e Wide Globus Gateway FermiGrid2 Site Wide VOMS
VOMRS Server FermiGrid3 Site Wide GUMS
Server FermiGrid4 Myproxy server Site
AuthoriZation server
9
(No Transcript)
10
Site Wide Gateway Technique
  • This technique is closely adapted from a
    technique first used at GridX1 in Canada to
    forward jobs from the LCG into their clusters.
  • We begin by creating a new Job Manager script in
  • VDT_LOCATION/globus/lib/perl/Globus/GRAM/JobMana
    ger/condorg.pm
  • This script takes incoming jobs and resubmits
    them to Condor-G on fermigrid1
  • Condor matchmaking is used so that the jobs will
    be forwarded to the member cluster with the most
    open slots.
  • Each member cluster runs a cron job every five
    minutes to generate a ClassAD for their cluster.
    This is sent to fermigrid1 using
    condor_advertise.
  • Credentials to successfully forward the job are
    obtained in the following manner
  • User obtains a voms-qualified proxy in the normal
    fashion with voms-proxy-init
  • User sets X509_USER_CERT and X509_USER_KEY to
    point to the proxy instead of the usercert.pem
    and userkey.pem files
  • User uses myproxy-init to store the credentials,
    using myproxy, on the fermilab myproxy server
    myproxy.fnal.gov
  • jobmanager-condorg, which is running as the uid
    that the job will run on under fermigrid,
    executes a myproxy-get-delegation to get a proxy
    with full rights to resubmit the job.
  • Documentation of the steps to do this as a user
    is found in the Fermigrid User Guide
    http//fermigrid.fnal.gov/user-guide.html

11
(No Transcript)
12
(No Transcript)
13
OSG Interfaces for Fermilab
  • Four Fermilab clusters are directly accessible to
    OSG right now,
  • General Purpose Grid Cluster (FNAL_GPFARM)
  • US CMS Tier 1 Cluster (USCMS_FNAL_WC1_CE)
  • LQCD cluster (FNAL_LQCD)
  • SDSS cluster (SDSS_TAM)
  • Two more clusters (CDF) accessible only by
    Fermigrid site gateway.
  • Future Fermilab clusters will also only be
    accessible by Fermigrid site gateway.
  • Shell script is used to make a condor classad and
    send it with condor_advertise
  • Match is done based on number of free cpus and
    number of jobs waiting

14
OSG Requirements
  • OSG Job flow
  • User pre-stages applications and data via
    gridftp/srmcp to shared areas on cluster (can be
    NFS or SRM-based storage element.)
  • User submits a set of jobs to cluster
  • Jobs take applications and data from cluster-wide
    shared directories.
  • Results are written to local storage on cluster,
    then transferred across WAN
  • Most OSG jobs expect common shared disk areas for
    applications, data, and user home directories.
    Our clusters are currently not shared.
  • Most OSG jobs dont use myproxy in submission
    sequence
  • OSG makes use of monitoring to detect free
    resources, ours are not currently reported
    correctly.
  • Need to make the gateway transparent to the OSG
    so it looks like any other OSG resource. Right
    now it only reports 4 CPUs.
  • Want to add possibility for VO affinity to the
    classad advertising of the gateway.

15
(No Transcript)
16
Shared data areas and storage elements
  • At the moment OSG requires shared Application and
    Data areas
  • Also needed, shared home directory for all users,
    (fermigrid has 226).
  • It is planned to use a BlueArc NAS appliance to
    serve these to all the member clusters of
    FermiGrid. 24TB of disk is in process of being
    ordered. NAS head already in hand.
  • Also being commissioned, a shared volatile
    Storage Element for fermigrid, supports
    SRM/dCache access for all grid users.

17
Getting rid of MyProxy
  • Configure each individual cluster gatekeeper to
    accept restricted globus proxyfrom just one
    host, the site gateway.
  • On CDF clusters for example the gatekeeper is
    already restricted via tcp-wrappers to not take
    any connections from off-site. Could be
    restricted further to take connections only from
    glidecaf head and fermigrid1.
  • Then change gatekeeper configuration, call it
    with
  • accept_limited option, we would then be able
    to forward jobs without myproxy, and could call
    this the jobmanager-condor rather than the
    jobmanager-condorg. This has been tested in our
    test cluster, will move to production soon.

18
Reporting all resources
  • MonALISA?just need a unified Ganglia view of all
    Fermigrid and MonALISA will show right number of
    cpus, etc. Also make it so MonALISA queries all
    condor pools in Fermigrid
  • GridCat/ACDC-gt have to change condor subroutines
    in MIS-CI to get the right total number of CPUs
    from the cluster classads. Fairly
    straightforward
  • GIP-gt Need to change lcg-info-dynamic-condor
    script to do the right number of job slots per
    VO. Already had to do this once, not difficult.

19
Globus Gatekeeper Calls
20
VOMS access
21
GUMS user mappings
Write a Comment
User Comments (0)
About PowerShow.com