Title: The Globus Toolkit: A System Managers Perspective
1The Globus ToolkitA System Managers
Perspective
- Bruce Beckles, Systems Manager
- Cambridge eScience Centre
2Introduction
- Computational grids (or grid computing, the
Grid, etc.) are a rapidly emerging development
in the field of distributed computing - The Globus Toolkit is one of the principal
software toolkits used to build and develop
computational grid infrastructure (usually
described as middleware) there are others - Ill briefly review a few key concepts for the
benefit of those new to this field
3What is a Computational Grid?
- Modelled on the electricity power grid
computing resources on tap - Logically and geographically distinct
(distributed) computing resources - Dynamic resource location and allocation
- Single sign-on (one hopes!)
- Homogenous or heterogenous in composition
4Virtual Organisations (VOs)
- Key concept Virtual Organisations (VOs)
- Simple definition a group of disparate
individuals coming together for some common
purpose - Typically VOs cross institutional boundaries and
have a finite lifetime - Examples
- Research Groups/Projects involving multiple
organisations/institutions - Disaster Relief/Containment Team
- Safety/QA Teams
- In the Grid paradigm, authorisation, resource
allocation administration are all done around
the concept of VOs.
5How VOs are used in a grid environment (1)
- As we will see later, the authorisation mechanism
in the Globus Toolkit is a very simple one, and
extremely inflexible. - Using the VO concept, various solutions such as
the Community Authorisation Service (CAS) and the
EU DataGrids Virtual Organization Membership
Service (VOMS) have been developed to overcome
this.
6How VOs are used in a grid environment (2)
- The basic idea is that membership of a VO
authorises the user to use (at least some of) the
resources available to that VO. - A VO managements system such as VOMS, CAS, etc.
interfaces with the resources which have agreed
to allow that VO access and updates the
resources local authorisation file(s)
appropriately for each of its members. - This may be a manual process, e.g. sending the
resources administrator an e-mail with details
of the changes to make, or an automatic one.
7The Globus Toolkit Overview
http//www-unix.globus.org/toolkit/
- Open source toolkit
- Principally for Linux/UNIX platforms (some very
limited Windows implementations) - Incorporates modified versions of other open
source software, principally - OpenSSL (authentication secure transport)
- OpenLDAP (directory services resource
description discovery) - WU-FTPD (data transfer) components using
WU-FTPD (GridFTP) scheduled to be completely
re-written using completely new,
currently-in-development, solution
8The Globus Toolkit current releases
- Currently there are two versions
- GT 2.4 (currently 2.4.3) entirely concerned with
building grid infrastructure (at a comparatively
low level) - GT 3.0 (currently 3.0.2) principally concerned
with developing grid services (a type of web
service) incorporates current version of GT 2.4 - In this talk, I am going to focus on GT 2.4
because - More clearly illustrates the underlying grid
concepts and paradigms - More widely in use than GT 3.0, especially in the
UK Europe - In any case, a full installation of GT 3.0
incorporates GT 2.4 (except on the Windows
platform)
9Globus Toolkit - Components
- The Globus Toolkit provides
- Resource allocation and Job Management Globus
Resource Allocation Manager (GRAM) Resource
Management bundle - Data Management and transfer GridFTP Data
Management bundle note also GASS (Globus Access
to Secondary Storage), a component of GRAM - Resource location and description Monitoring and
Discovery Service (MDS) Information Services
bundle - Security framework Grid Security Infrastructure
(GSI) - X.509 certificate-based authentication (OpenSSL)
- APIs for grid-enabling applications or making
them grid-aware (via SDK)
10The Globus Toolkit What it doesnt do
- The Toolkit does not
- Provide resource harvesting facilities
- Provide resource brokering or job scheduling (or
prioritisation) facilities - Assist with parallelising code, nor does it
provide a parallel computing environment - Enable distributed computing (at least not as
we normally use that term)
11so what does it do, then?
Globus Gatekeeper
User
Computing Resource
Computing Resource
Jobmanager
Data Resource
The Gatekeeper and Jobmanager run on the same
machine, but the Jobmanager may simply pass the
job to a job manager or a job scheduler on a
remote machine
12lets have some technical details
- Gatekeeper (a GRAM component)
- listens for job requests on gatekeeper machine
(usually on TCP port 2119) - usually started via inetd or xinetd
- runs as root
- Performs mutual authentication confirms the
users identity, and proves its identity to user - Starts job manager process as local user (UID)
corresponding to authenticated remote user
13Technical details Authorisation
- local user (UID) corresponding to authenticated
remote user what does that mean? - well, it is to do with the authorisation
mechanism of the Toolkit - A user authenticates to the gatekeeper (or any
other Toolkit component requiring authentication)
by presenting an X.509 certificate (which is a
kind of digital certificate), whose Subject
contains a Distinguished Name (DN) which uniquely
identifies the user. - The gatekeeper (or other component) then looks
that DN up in a plain text file (the Gridmap file
normally /etc/grid-security/grid-mapfile) which
matches authorised DNs against local user
accounts. - This local account is then used to run all
Globus-related processes on this machine for that
authenticated remote user.
14Technical details Jobmanager
- Jobmanager (a GRAM component)
- Runs as an ordinary local user on gatekeeper
machine - Communicates with remote user over HTTPS (secure
HTTP) - Listens for requests from user on an arbitrary
(range can be restricted) TCP port greater than
1024. - Port chosen when connection originally made to
Gatekeeper, and returned to user as a URL - May also connect back to remote users on an
arbitrary TCP port above 1024 (e.g. to return job
status output) - The Jobmanager is usually a front end to another
job scheduler / manager either on the gatekeeper
machine (by default the Jobmanager simply calls
fork()) or remotely (e.g. a Condor pool)
15Important note about the Jobmanager
- As the Toolkit does not implement any resource
brokering or job scheduling itself, it relies on
an external job manager of some description to
provide these facilities (by default it simply
forks a new user process on the gatekeeper
machine) - Interfaces to various job schedulers (PBS, LSF,
Condor) are available these consist of a simple
shell script which acts as a wrapper for the
vendor supplied job submission tools. It should
be relatively simple to create such interfaces
for most other job schedulers. - Therefore, when considering installing the
Toolkit, or evaluating its security model or
system requirements, one must consider the job
scheduler which is to be used.
16Data Transfer
- Two main mechanisms
- GridFTP a modified form of WU-FTPD, modified to
support multiple data streams (to make it faster
and more efficient) and authentication using
X.509 certificates. This is the principal (and
preferred) mechanism for data transfer in the
Toolkit. - GASS (Globus Access to Secondary Storage)
supports anonymous HTTP FTP and its own
(authenticated) method of access it is designed
to be used for creation of local GASS caches,
which store data used by users jobs on the
system where the job is executing to eliminate
those jobs having to transfer their data (and
results) themselves. GASS has some of the
features of a distributed file system, but
without actually being one.
17GridFTP
- We shall not discuss GASS as it is intended to be
invisible to the user, and thus rarely explicitly
used it may well be phased out in later
versions of the Toolkit. - GridFTP, as has been mentioned, is based on
WU-FTPD. It uses the same authorisation and
authentication methods as the gatekeeper, and
most gatekeeper machines also run a GridFTP
daemon. It has two components the server
(daemon) and the client (you cannot connect to a
GridFTP server with an ordinary FTP client!).
18GridFTP daemon
- GridFTP server (daemon)
- listens for GridFTP connections (usually on TCP
port 2811) - Like standard FTP has a control channel and a
data channel (although unlike standard FTP can
have multiple data channels) control channel is
on TCP port 2811 (or whatever port is being
listening to if using a non-standard port) - usually started via inetd or xinetd
- runs as root
- Mutual authentication using X.509 certificates on
control and data channels - Data channels are on arbitrary (range can be
restricted) TCP ports above 1024 - Control channel is integrity protected data
channel is optionally integrity protected and
encrypted at clients request.
19GridFTP client
- GridFTP client
- Does not require the server components to be
installed on local machine - Uses arbitrary TCP ports above 1024
- Does not require root privilege to run
- Can initiate third-party data transfers between
two remote GridFTP servers a very useful feature
in a grid environment - Cannot connect to an ordinary FTP server
20Information Services Resource Description and
Discovery
- Due to the fundamentally distributed nature of
computational grids, a key requirement of any
large grid, or any heterogenous grid, is the
ability to identify and describe its components. - In the Globus Toolkit, this is handled by the
MDS, which in GT 2 is essentially a customised
LDAP server, based on OpenLDAP. (In GT 3, the
MDS architecture is completely different it is
based on XML technologies, and is not compatible
with the GT 2 MDS.) - The MDS is a very complex component of the
Toolkit, so I shall restrict myself to merely
mentioning a few points concerning the GT 2 MDS
likely to be of interest to system administrators.
21MDS (GT 2)
- MDS in GT 2
- Uses TCP port 2135 (can be configured to use
another port but this would cause serious
interoperability problems) - Does not require root privilege to run (normally
a dedicated user account is used) - Extremely difficult to configure and troubleshoot
- Highly unstable easily the least stable
component in the toolkit
22Security
- As key components of the Toolkit run as root,
security is a matter of some concern. - The Globus Toolkit seeks to address this by using
X.509 certificates for authentication. - There are both technical and usability issues
with this which must be considered. - also, such a paradigm only considers
authentication as the central security issue
clearly this is flawed.
23Security Certificates (1)
- In the UK e-Science community certificates are
issued by the UK e-Science Certification
Authority (CA) at RAL. - Identity is verified locally by a Registration
Authority (RA) in our case the UCS. - Certificates are valid for a year, and then must
be renewed. - Both the user and the resource they wish to use
must have certificates. - The users client machine and the resource they
wish to use must each know about the CA which
issued the others certificate.
24Security Certificates (2)
- Requesting a certificate from the UK e-Science CA
is regrettably less than straightforward - Serious browser dependencies mean only old
versions of Netscape (4.79, 4.8) and some
versions of Internet Explorer work. - For certificate requests, a Java client has now
been created, but is still a prototype, and
requires the user to install and configure some
software (Suns JRE, etc.) on their machine. - otherwise manually create a PKCS10 PEM
formatted request (dont do this!)
25Security Considerations
- Anyone whose identity the RA can verify can
legitimately request a server certificate (a
certificate to identify a resource) for any
machine whatsoever in the cam.ac.uk domain. - The RA only verifies that the user is who they
say they are, not their association with a
particular VO, nor, in our case, (given the UCS
user database) that they are still a current
member of the University. - The Globus CA (thankfully due to be shut down on
January 24, 2004) will issue certificates to
anyone, and by default the Toolkit comes
configured to accept certificates certified by
the Globus CA (obviously for user certificates,
you still need to authorise the certificate, but)
26Installation Issues
- Distributed in both binary and source form
- DO NOT install the binary form unless there is no
alternative! installations from the binaries
are likely to break when updated. - Uses its own packaging toolkit / package manager,
the Grid Packaging Toolkit (GPT) even if the
world needed another package manager there is
nothing to recommend this one. - Be aware that although it makes use of the
OpenSSL toolkit and other open source packages
which may already be present on the system, it
installs its own customised versions of these and
will not use the standard libraries. - Configuration can be difficult, and difficult to
troubleshoot.
27Additional Considerations
- Obviously, any additional installed software
increases your administrative burden,
particularly as regards security this is
particularly true of the Globus Toolkit. - The Toolkit, and the underlying paradigms, are
quite unintuitive, and so the support burden if
you are required to give users, particularly new
users, support is quite significant. - If you are part of a grid which has a relatively
small number of users, or where there are very
few changes to its user community, then you can
manually manage your Gridmap file, otherwise
youll need to look into using a VO management
system of some sort (e.g. CAS, VOMS).
28Useful References (1)
- Discussion of usability issues with the Toolkit
- Guy Rixon - Problems with Globus Toolkit 3 and
some possible solutions, - http//wiki.astrogrid.org/bin/view/Astrogrid/G
lobusToolkit3Problems - Towards tractable toolkits for the Grid a plea
for lightweight, usable middleware (Chin
Coveney, 2003), http//www.realitygrid.org/lgpaper
.html - Discussion of certificate security issues
- Grid Security and its use of X.509 Certificates
(Lock Sommerville, 2002), http//www.comp.lancs.
ac.uk/computing/research/cseg/projects/dirc/papers
/gridpaper.pdf
29Useful References (2)
- Globus and Firewalls
- http//www-fp.globus.org/security/v2.0/firewalls.h
tml - Not explicitly talked about today, but hopefully
you got some idea of the Toolkits firewall
requirements from the description of the
components - The Globus Toolkit Documentation
- http//www-unix.globus.org/toolkit/documentation.h
tml - UK e-Science CA documentation
- http//www.grid-support.ac.uk/ca/documentation.htm
30Useful Contacts
- CeSC contact for technical issues
- technical_at_escience.cam.ac.uk
- My e-mail address
- mbb10_at_cam.ac.uk
- Globus and Condor issues
- Paul Wilson, who will be speaking after coffee
- Mark Calleja, speaking after Paul
- Also a useful person to talk to about user
experiences of the Toolkit and related software,
and running programs in a grid environment.
31Questions?
- I have just given a preliminary overview of the
Toolkit, highlighting some of the issues system
administrators may wish to consider. There is
much more to say on this subject - Questions?
- Thank you!
32(No Transcript)