Grid Laboratory Of Wisconsin GLOW - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

Grid Laboratory Of Wisconsin GLOW

Description:

Users submit jobs to their own private or department Condor scheduler. ... GLOW Condor pool is distributed across the campus to provide locality with big users. ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 17

Provided by: tera3

Category:

more less

Transcript and Presenter's Notes

Title: Grid Laboratory Of Wisconsin GLOW

1
Grid Laboratory Of Wisconsin(GLOW)
UW Madisons Campus Grid
Dan Bradley Department of Physics
CSRepresenting the GLOW, Condor, and DISUN Teams
2
Grid Laboratory of Wisconsin
2003 Initiative funded by NSF/UWSix Initial GLOW
Sites

Computational Genomics, Chemistry
Amanda, Ice-cube, Space Science
CMS, High Energy Physics
Materials by Design, Chemical Engineering
Radiation Therapy, Medical Physics
Condor, Computer Science
Plus New Members
ATLAS, High Energy Physics
Plasma Physics
Multiscalar, Computer Science

Diverse users with different deadlines and usage
patterns ? high cpu utilization.
3
UW Madison Campus Grid

Condor pools in various departments, made
accessible via Condor flocking
Users submit jobs to their own private or
department Condor scheduler.
Jobs are dynamically matched to available
machines.
Crosses multiple administrative domains.
No common uid-space across campus.
No cross-campus NFS for file access.
Users rely on Condor remote I/O, file-staging,
AFS, SRM, gridftp, etc.
Must deal with firewalls.
Need to interoperate with outside collaborators.

4
UW Campus Grid Machines

GLOW Condor pool is distributed across the campus
to provide locality with big users.
1200 2.8 GHz Xeon CPUs
400 1.8 GHz Opteron cores
100 TB disk
Computer Science Condor pool
1000 1GHz CPUs
testbed for new Condor releases
Other private pools
job submission and execution
private storage space
excess jobs flock to GLOW and CS pools

5
What About The Grid

Who needs a campus grid?
Why not have each cluster join The Grid
independently?

6
The Value of Campus Scale

simplicitysoftware stack is just Linux Condor
fluidity
high common denominator makes sharing easier and
provides richer feature-set
collective buying powerwe speak to vendors with
one voice
standardized administratione.g. GLOW uses one
centralized cfengine
synergyface-to-face technical meetingsmailing
list scales well at campus level

7
The value of the big G

Our users want to collaborate outside the bounds
of the campus (e.g. Atlas and CMS are
international).
We dont want to be limited to sharing resources
with people who have made identical technological
choices.
The Open Science Grid gives us the opportunity to
operate at both scales, which is ideal.

8
On the OSG Map
Any GLOW member is free to link their
resources to other grids.
facility WISC site UWMadisonCMS
9
Submitting Jobs within UW Campus Grid
UW HEP User
HEP matchmaker
CS matchmaker
GLOW matchmaker
flocking
and beyond

Supports full feature-set of Condor
matchmaking
remote system calls
checkpointing
MPI
suspension VMs
preemption policies
computing on demand

NCSA matchmaker
10
Submitting jobs through OSG to UW Campus Grid
Open Science Grid User
11
Routing Jobs fromUW Campus Grid to OSG
GLOW User
HEP matchmaker
CS matchmaker
GLOW matchmaker
Grid JobRouter

Combining both worlds
simple, feature-rich local mode
when possible, transform to grid job for
traveling globally
using OSG Registration Authority within DOE
Certification Authority

12
Condor Glidein
GLOW User
matchmaker
startd
startd
startd

full Condor feature seton top of generic
resources
GCB required to provideconnectivity to worker
nodes

condor gridmanager
13
Interoperating from a Condor-centric point of view

Question can we live in one Condor pool?
Appropriate scale a few 1000 CPUs (and
counting!)
Need port range through firewall, or use GCB.
Agree on pool-wide policies
user-priorities, preemption, and fair share(e.g.
immediate rights to ones own machines)
security policy GSI, Kerberos, IP-based
Worker node policies may still be
independent(e.g. requirements and ranking of
jobs)

14
Interoperating by Flocking

We dont want to merge pools, but we both run
Condor, so we will flock.
Requires minimal change to existing
configuration.
More independence in pool policies.
Must still open firewalls or use GCB.(Current
work on GCB plans to make this a less intrusive
option.)

15
Interoperating through OSG

Are you a member of a Virtual Organization in the
Open Science Grid?
If so, our GUMS service probably already
recognizes you.
Collaborators within GLOW may grant you something
better than opportunistic priority.(e.g. CS is
collaborating with CMS, Atlas, NanoHub, )
Allocating storage is currently a trickier
proposition.

16
A to-do list

Core Condor technologies
Scalability, interconnectivity, security
Exporting jobs to non-Condor resources
Mixed success so far. Can we get our non-HEP
users on board too?
Dealing with data
We dont seem to be heading towards a common
storage solution within our campus grid. Should
we?

Write a Comment

User Comments (0)