Grid Laboratory Of Wisconsin GLOW - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Grid Laboratory Of Wisconsin GLOW

Description:

Users submit jobs to their own private or department Condor scheduler. ... GLOW Condor pool is distributed across the campus to provide locality with big users. ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 17
Provided by: tera3
Category:

less

Transcript and Presenter's Notes

Title: Grid Laboratory Of Wisconsin GLOW


1
Grid Laboratory Of Wisconsin(GLOW)
UW Madisons Campus Grid
Dan Bradley Department of Physics
CSRepresenting the GLOW, Condor, and DISUN Teams
2
Grid Laboratory of Wisconsin
2003 Initiative funded by NSF/UWSix Initial GLOW
Sites
  • Computational Genomics, Chemistry
  • Amanda, Ice-cube, Space Science
  • CMS, High Energy Physics
  • Materials by Design, Chemical Engineering
  • Radiation Therapy, Medical Physics
  • Condor, Computer Science
  • Plus New Members
  • ATLAS, High Energy Physics
  • Plasma Physics
  • Multiscalar, Computer Science

Diverse users with different deadlines and usage
patterns ? high cpu utilization.
3
UW Madison Campus Grid
  • Condor pools in various departments, made
    accessible via Condor flocking
  • Users submit jobs to their own private or
    department Condor scheduler.
  • Jobs are dynamically matched to available
    machines.
  • Crosses multiple administrative domains.
  • No common uid-space across campus.
  • No cross-campus NFS for file access.
  • Users rely on Condor remote I/O, file-staging,
    AFS, SRM, gridftp, etc.
  • Must deal with firewalls.
  • Need to interoperate with outside collaborators.

4
UW Campus Grid Machines
  • GLOW Condor pool is distributed across the campus
    to provide locality with big users.
  • 1200 2.8 GHz Xeon CPUs
  • 400 1.8 GHz Opteron cores
  • 100 TB disk
  • Computer Science Condor pool
  • 1000 1GHz CPUs
  • testbed for new Condor releases
  • Other private pools
  • job submission and execution
  • private storage space
  • excess jobs flock to GLOW and CS pools

5
What About The Grid
  • Who needs a campus grid?
  • Why not have each cluster join The Grid
    independently?

6
The Value of Campus Scale
  • simplicitysoftware stack is just Linux Condor
  • fluidity
  • high common denominator makes sharing easier and
    provides richer feature-set
  • collective buying powerwe speak to vendors with
    one voice
  • standardized administratione.g. GLOW uses one
    centralized cfengine
  • synergyface-to-face technical meetingsmailing
    list scales well at campus level

7
The value of the big G
  • Our users want to collaborate outside the bounds
    of the campus (e.g. Atlas and CMS are
    international).
  • We dont want to be limited to sharing resources
    with people who have made identical technological
    choices.
  • The Open Science Grid gives us the opportunity to
    operate at both scales, which is ideal.

8
On the OSG Map
Any GLOW member is free to link their
resources to other grids.
facility WISC site UWMadisonCMS
9
Submitting Jobs within UW Campus Grid
UW HEP User
HEP matchmaker
CS matchmaker
GLOW matchmaker
flocking
and beyond
  • Supports full feature-set of Condor
  • matchmaking
  • remote system calls
  • checkpointing
  • MPI
  • suspension VMs
  • preemption policies
  • computing on demand

NCSA matchmaker
10
Submitting jobs through OSG to UW Campus Grid
Open Science Grid User
11
Routing Jobs fromUW Campus Grid to OSG
GLOW User
HEP matchmaker
CS matchmaker
GLOW matchmaker
Grid JobRouter
  • Combining both worlds
  • simple, feature-rich local mode
  • when possible, transform to grid job for
    traveling globally
  • using OSG Registration Authority within DOE
    Certification Authority

12
Condor Glidein
GLOW User
matchmaker
startd
startd
startd
  • full Condor feature seton top of generic
    resources
  • GCB required to provideconnectivity to worker
    nodes

condor gridmanager
13
Interoperating from a Condor-centric point of view
  • Question can we live in one Condor pool?
  • Appropriate scale a few 1000 CPUs (and
    counting!)
  • Need port range through firewall, or use GCB.
  • Agree on pool-wide policies
  • user-priorities, preemption, and fair share(e.g.
    immediate rights to ones own machines)
  • security policy GSI, Kerberos, IP-based
  • Worker node policies may still be
    independent(e.g. requirements and ranking of
    jobs)

14
Interoperating by Flocking
  • We dont want to merge pools, but we both run
    Condor, so we will flock.
  • Requires minimal change to existing
    configuration.
  • More independence in pool policies.
  • Must still open firewalls or use GCB.(Current
    work on GCB plans to make this a less intrusive
    option.)

15
Interoperating through OSG
  • Are you a member of a Virtual Organization in the
    Open Science Grid?
  • If so, our GUMS service probably already
    recognizes you.
  • Collaborators within GLOW may grant you something
    better than opportunistic priority.(e.g. CS is
    collaborating with CMS, Atlas, NanoHub, )
  • Allocating storage is currently a trickier
    proposition.

16
A to-do list
  • Core Condor technologies
  • Scalability, interconnectivity, security
  • Exporting jobs to non-Condor resources
  • Mixed success so far. Can we get our non-HEP
    users on board too?
  • Dealing with data
  • We dont seem to be heading towards a common
    storage solution within our campus grid. Should
    we?
Write a Comment
User Comments (0)
About PowerShow.com