Title: Condor: What It Is and Why You Should Worry About It
1CondorWhat It Is and Why You Should Worry About
It
- Bruce Beckles
- e-Science Specialist
- University of Cambridge Computing Service
- TECHLINK SEMINAR 23 JUNE 2004
2Overview of Seminar
- What is Condor and what is it good for?
- How does it work?
- What are we planning to do with it in the CS?
- Why should I care?
- What will it do for me?
- What are the security implications?
3What is Condor?
http//www.cs.wisc.edu/condor/description.html
- A full-featured cross-platform batch scheduling
system like PBS or Sun Grid Engine (SGE) - A specialised workload management system for
compute-intensive jobs specially designed for
utilising spare CPU cycles, i.e. resource
harvesting or resource scavenging - Especially good for high throughput computing
processing large numbers of independent jobs
(embarrassingly parallel jobs) as efficiently
as possible
4What is Condor good for? (1)
- Consider the following scenario
- I have a simulation which takes two hours to run
on my high-end PC workstation - I need to run it 1000 times with slightly
different parameters each time - If I do this serially it will take at least 2000
hours 83? days 12 weeks, or 3 months. If I
try running more than one simulation on my PC at
once, this wont really improve things as each
simulation will then take much longer than 2
hours to run
5What is Condor good for? (1a)
- Suppose my department has 100 PCs like mine that
are mostly sitting around idle overnight (say for
an average of 8 hours a day). - If I could run jobs on those machines when they
were idle i.e. when their legitimate users
arent using them, so I dont inconvenience
anyone then I could get about 800 CPU hours a
day! - This is an ideal scenario for Condor, and so
using Condor in this situation means it would
only take me 2.5 days to run my simulations
instead of months. Hurrah!
6What is Condor good for? (2)
- Suppose I manage either a Linux or UNIX cluster,
or a collection of general purpose machines that
many people are supposed to be able to run jobs
on (or both). I could just let users access the
machines individually and run jobs on them, but
that might be a bad idea because - Too much trouble for the users to do that for
each machine they want to use - Want to make sure that use of the machines is
fair, i.e. no one user hogging the machines at
the expense of everyone else - Want jobs to be distributed evenly across
machines to stop individual machines being
overloaded - May want certain users jobs to have priority
over other users jobs - etc.
7What is Condor good for? (2a)
- So I want a batch queuing system of some
description to manage jobs on these machines. I
probably want it to have the following features - Easy to install, use and administer
- Free (because I probably dont have much money
for such things!) - Allows me to control who can use particular
resources, how much use they can make of them,
who has priority on them and so on - Distribute jobs evenly across all the machines
- If machines die, want to be able to restart jobs
somewhere else. If queue handler dies, dont
want all jobs in the queue to be lost - Can preempt or suspend jobs (if desired)
- May want it to support parallel jobs using MPI or
PVM - May want it to be cross-platform (e.g. my users
require both Solaris and Linux machines) - May want it to provide mechanisms for
checkpointing - Can be made secure at both the machine and user
levels (if desired), and can encrypt its network
traffic (if desired) - Condor has all these features! (with some
caveats, of course)
8How does Condor work?
- A collection of machines running Condor is called
a pool. Individual pools can be joined together
by a process known as flocking. - Machines in Condor have one (or more) of four
different roles see following two slides for
details. - Each machine can have more than one role, so it
is possible to have a Condor pool consisting of
one machine(!), or less trivially, two machines.
9Machine Roles in Condor (1)
- Central Manager Resource broker for a pool
keeps track of which machines are available, what
jobs are running, negotiates which machine will
run which job, etc. The central manager is also
sometimes confusingly called the master or
(mistakenly) the server. There must be exactly
one central manager per pool. For large pools
this machine should not have any other role in
the pool. - Submit Machine (or Submit Host) Machine which
submits jobs to the pool. There must be at least
one submit host in a pool.
10Machine Roles in Condor (2)
- Execution Machine (or Execute Host) Machine on
which jobs can be run. There must be at least
one execute host in a pool. - Checkpoint Server Machine which stores all the
checkpoint files produced by jobs which
checkpoint. Having such a machine is optional,
and there can only be one checkpoint server per
pool. - So, typically you would have one central
manager, one or more submit hosts and one or more
execute hosts. It is often the case that all
submit hosts are also execute hosts (and
vice-versa).
11A typical Condor Pool
Checkpoint server (optional)
Central Manager
Monitors status of execute hosts and assigns jobs
to them
Matches jobs from submit hosts to appropriate
execute hosts
These machines are both submit and execute hosts
Execute hosts
Submit hosts
Checkpoint files from jobs that checkpoint are
stored on checkpoint server
12Lets follow a job under Condor (1)
- A job is submitted at a submit host.
- The submit host tells the central manager about
the job using Condors ClassAd mechanism, which
provides a (user-customizable) way of describing
what the job requires in order to run, as well as
what it desires from the execute host (e.g. a job
might require a minimum of 256 MB of RAM, but
runs significantly better with 512 MB, and so it
would prefer to run on machines with 512MB or
more (if any are available), but would accept a
machine with 256MB if that is all that is
available).
13Lets follow a job under Condor (2)
- The central manager has been monitoring all the
execute hosts, so it knows which are available
and what sort of machine they are (OS, memory,
etc.). Execute hosts periodically send a ClassAd
describing themselves to the central manager. - Every so often the central manager enters a
negotiation cycle where it matches waiting jobs
with available execute hosts. - So eventually the job from the submit host is
matched to a suitable execute host (unless there
are no suitable execute hosts in the pool, of
course).
14Lets follow a job under Condor (3)
- The central manager informs the chosen execute
host that it has been claimed and gives it a
ticket. - The central manager then informs the submit host
which execute host to use and gives it a matching
ticket. - The submit host contacts the execute host,
presenting its matching ticket, and transfers the
jobs executable and data files to the execute
host (if necessary Condor can make use of a
shared filesystem between submit and execute
hosts), which then begins to run the job. - For some types of jobs the job running on the
execute host will access files and resources on
the submit host via remote procedure calls.
15Lets follow a job under Condor (4)
- For most jobs, a TCP connection is maintained
between the submit and execute host while the job
is running. If the submit host dies (or the TCP
connection is broken) the execute host aborts the
job. If the execute host dies (or the TCP
connection is broken) the job is re-submitted to
the Condor pool. - Certain sorts of jobs can checkpoint, both
periodically (for safety) and when interrupted.
If a job is interrupted and has successfully
checkpointed, it will resume from its last
checkpointed state when it starts to run again
(possibly on a different execute host) rather
than starting from the beginning. Checkpointing
is only supported for certain sorts of jobs, and
on certain platforms (most notably, it is not
supported under Windows). - When the job finishes, the results are returned
to the submit host (unless a shared filesystem is
in use between submit and execute hosts).
16Lets see that in a diagram
Central Manager
Condor daemons (Normally listen on ports 9614 and
9618)
Execute Host tells Central Manager about itself.
Central Manager tells it when to accept a job
from Submit Host.
Submit Host tells Central Manager about job.
Central Manager tells it to which Execute Host it
should send job.
Execute Host
Submit Host
Condor daemons
Condor daemons
Send job to Execute Host. Send results to Submit
Host.
Spawns job and signals it when to abort, suspend,
or checkpoint.
condor_shadow process
Users job
Users executable code Condor libraries
All system calls performed as remote procedure
calls back to Submit Host.
Checkpoint file (if any) is saved to disk.
17Types of Jobs
- Condor classifies jobs according to the type of
environment it provides for them to run in. Each
of these environments is called a universe.
There are currently 7 different universes, as
outlined on the next three slides. - Not all universes exist for all platforms in
particular only the vanilla universe is supported
for Windows platforms.
18Job Universes (1)
- Standard For jobs compiled with the Condor
libraries this universe provides checkpointing
and remote system calls. Jobs must be single
threaded and use a supported compiler. This
universe does not exist under Windows. - Vanilla For jobs which cannot be compiled with
the Condor libraries and for shell scripts and
(Windows) batch files. For these jobs Condor
simply spawns a process to run the given
executable, shell script or batch file. No
checkpointing or remote system calls are provided
by Condor for jobs in this universe.
19Job Universes (2)
- PVM For programs written to the Parallel Virtual
Machine interface. - MPI For programs written to the MPICH interface,
i.e. MPI jobs. Currently supports MPICH versions
1.2.2, 1.2.3 and 1.2.4. - Globus This is simply a mechanism to submit jobs
to resources managed by the Globus Toolkit 2.2 or
higher.
20Job Universes (3)
- Java For jobs written for the Java Virtual
Machine (JVM). All JVMs should be supported by
this universe. - Scheduler For special circumstances such as for
the Condor workflow tool DAGMan. Scheduler
universe jobs ignore any machine requirements
given, run on the submit host immediately upon
submission, and will never be preempted.
Normally an end-user would never explicitly use
this universe.
21Some Features of note (1)
- DAGMan The Directed Acyclic Graph Manager
(DAGMan) is a meta-scheduler or workflow tool for
Condor. It handles running sequences of jobs
where there are dependencies between the jobs
(e.g. one job has to finish before another can
start). A directed acyclic graph (DAG) is a way
of representing such sequences of jobs. - Flexible resource management Condors ClassAd
mechanism provides great flexibility in managing
resources. Resources can specify what sorts of
jobs they are prepared to run (e.g. only jobs
from certain users) as well as advertise unique
features (e.g. special software installed on that
resource).
22Some Features of note (2)
- User priorities Condor supports a priority
system for users by default it implements a
fair share policy (similar to SGEs share
based policy) so that the user priority for
users who frequently run jobs gradually gets
worse until they run fewer jobs and then their
priority gradually returns to normal. - Job priorities Condor supports a priority system
for jobs, independent of its user priority
system. Job priorities determine which of a
users jobs is given priority for appropriate,
available resources. By default all jobs have
the same priority.
23Some Features of note (3)
- Backfilling Backfilling is a method by which a
batch queuing system more efficiently utilises
resources. If a high priority job cannot run
because resources it needs are unavailable, the
scheduler will look for lower priority jobs which
can run on the currently available resources to
ensure resources are maximally utilised. Because
of the way Condor matches jobs and resources, a
resource will not remain idle if there is some
job it can run (OK there are some caveats to
that statement!), and this results in resources
being utilised as efficiently as a queuing system
which supports backfilling.
24Some Features of note (4)
- Job Preemption Condor supports job preemption
it can be configured so that low priority jobs
already running will be killed or checkpointed,
and returned to the queue so that higher priority
jobs can run instead. - Computing on Demand Computing on Demand (COD)
allows Condor to immediately run short-term jobs
on instantly available resources (at the expense
of any other jobs running on those resources) so
that it can support compute-intensive jobs which
require interactive response times, e.g. for the
rendering process of a graphics rendering
application which waits for user input and then
renders the chosen image (where the image
rendering requires a burst of intensive computing
power).
25Some Features of note (5)
- Perl interface The Condor Perl module provides a
Perl interface to Condor that allows job
submission, job monitoring and administration of
Condor. - Management of Globus jobs Condor-G is Condors
interface to Globus and provides many of the
benefits of Condors job management system for
such jobs. In particular, Condors job
submission and monitoring tools are much easier
to use than those provided by the Globus Toolkit. - DRMAA Support The current stable release of
Condor does not provide support for the
Distributed Resource Management Application API
(DRMAA), but the current experimental release
will, and this will eventually be incorporated in
the next major stable release.
26Some Features of note (6)
- Cygwin Under Windows Condor works well with jobs
that use the Cygwin libraries, although this is
not an official feature of Condor. This may
provide a method of using some UNIX or Linux
programs on Windows execute hosts with Condor. - Free Condor is freely available to anyone. A
paid support option is available, but is unlikely
to be necessary except for people with extremely
unusual environments, or thousands of machines in
their pool(s). - Open Source Technically, Condor is open
source. That is, its source is released under an
open source license to anyone who can convince
the Condor Team they should have it (they arent
that difficult to convince). Yes, we have it,
but No, Im not going to give it to just anyone.
27Some points worth knowing (1)
- Condor is useful for much more than just cycle
scavenging, but its default configuration is one
designed for such resource harvesting. - Condor is extremely happy to work with a
heterogeneous pool of machines in particular
the central manager can run on a completely
different platform to any other machine in the
pool. - By default, all the security (including
encryption) features in Condor are switched off! - When used on more than just one machine, Condor
wants to run its daemons as root (Administrator
under Windows) but this is not compulsory,
although there are consequences to not running it
as root that need to be considered.
28Some points worth knowing (2)
- Condor is extremely simple to install (normally a
few minutes per machine) and its installation can
be automated very simply. - However, precisely because it is so flexible and
powerful, its configuration can be quite complex
and is not entirely intuitive its manual,
although reasonably comprehensive (except where
security is concerned), requires considerable
work before it can be considered helpful. - Condor has a very active and friendly user
community who are usually happy to provide help
and advice.
29Some points worth knowing (3)
- In its default configuration Condor starts a
negotiation cycle every 300 seconds (5 minutes),
and it waits for a machine to be idle for 15
minutes before trying to run a job on it. This
is why new pool administrators often report that
it takes Condor up to 20 minutes to run jobs even
when all machines in the pool are idle this can
be easily fixed by adjusting the appropriate
configuration file parameters. - By default Condor will run one job per virtual
machine. A virtual machine is normally a CPU (so
dual CPU machines normally count as two
machines for Condors purposes). However, if
you want Condor to run more jobs on a particular
machine than that machine has CPUs then you can
configure the machine to claim to have as many
virtual machines as you wish.
30Some points worth knowing (4)
- Much of the communication in Condor uses UDP
rather than TCP. This may have implications if
machines running Condor are to communicate across
firewalls, or on congested networks. - Communication in Condor happens on two well known
(configurable) ports (9614, 9618) and many, many
transient ports. You can restrict the range of
transient ports it uses, but you cant restrict
it too much as it wont re-use ports as quickly
as one might like and so will quickly run out if
the range is too small (from experimentation, a
range of 10 ports is definitely too small!).
31Some points worth knowing (5)
- Condor uses a very different metaphor to other
batch queuing systems for describing its
scheduling process in Condor there is no notion
of queues so it is not usually possible to
straightforwardly translate the behaviour of
other scheduling systems into Condor terms - Despite this, I have so far found only one
significant feature of other batch queuing
systems that Condor doesnt implement namely it
doesnt support so-called interactive jobs
where the input and output of the job are
directed to a terminal window or console for user
input and control.
32Officially Supported Platforms
- UNIX
- HP-UX 10.20 (on PA7000 and PA8000)
- Solaris 2.6, 2.7, 8, 9 (all on SPARC)
- IRIX 6.5 (on R5000, R8000, R10000)
- MacOS X (10.2, 10.3) (on PowerPC)
- Digital Unix 4.0 (on Alpha)
- Tru64 5.1 (on Alpha)
- AIX 5.2L (on PowerPC)
- Linux Red Hat Linux 7.x (on Alpha, Intel x86
and Itanium), 8.0 (on Intel x86), 9.0 (on
Intel x86) - Windows Windows 2000 Professional and
Server, Windows XP Professional, Windows 2003
Server
Support is being dropped for the italicised
underlined platforms. Not all the features of
Condor are available on the platforms in blue,
most notably checkpointing and remote system
calls (the standard universe) are not available.
33But it will also run on
- Linux
- Debian GNU/Linux 3.0r2 (woody), and sarge
(although there are some reported problems with
compiling user programs against the Condor
libraries) - SuSE Linux 8.1 and 9.0
- Fedora Core 1 (although there are some reported
problems) - Gentoo Linux
- Probably also on Windows NT 4.0 (Service Pack 6a)
and Windows XP Home
34Official Support is planned for
- UNIX
- HP-UX 11.11 (on PA-RISC)
- Solaris 10
- FreeBSD (on Intel x86)
- Linux
- Fedora Core 2
- Also support for PowerPC and AMD64 architectures
35University Computing Service plans for Condor
36CS Plans for Condor (1)
- Deploy Condor across all the CS-owned PWF
machines, followed by those departments which
make use of the MCS Service and who wish to join.
Any user who would normally be eligible for CS
resources will be able, on application to the CS,
to use Condor on CS-owned PWF machines. - Initially on PWF Linux only, and then expanding
to include PWF Windows and the PWF Macs - Part of the CamGrid project
- Seeks to deploy a University-wide grid
- Initially using Condor a Globus gateway to be
added later? - Wants to use as many PCs as possible so
attempting to support as many different
requirements from resource owners as possible - Different deployment models for CS-managed
resources and non-CS-managed resources
37CS Plans for Condor (2)
- Default behaviour of PWF Linux machines
(rebooting into PWF Windows when idle) will be
changed - Machines with PWF Linux installed will boot into
PWF Linux when machine is idle at times when the
room is believed to be largely unused (e.g. at
night time, outside of the University term). - These machines will then reboot into Windows just
before the room is due to start being seriously
used by ordinary users (e.g. the next morning). - Clarification (added after talk was given) This
has not been finalised and is currently in
negotiation. - This means that during term jobs will probably
only have about 8 hours or so when they can run
on machines before they are killed by the machine
rebooting.
38Condor pool architecture
- Will use a dedicated machine as the central
manager (may also act as a Kerberos domain
controller for machines in this Condor pool) - Initially only a single submit host (more added
as necessary to cope with more execute hosts) and
1TB of short-term storage will be provided.
Access to this submit host will initially be via
SSH only. - No checkpoint server will be provided (maybe
later if there is sufficient demand).
39CS Condor pool
Submit Host tells Central Manager about a job.
Central Manager tells it to which Execute Host it
should send job.
Central Manager
Submit Host
Condor daemons Kerberos domain controller?
SSH access only
1TB of dedicated short-term storage
Send job to Execute Host.
Each Execute Host tells Central Manager about
itself. Central Manager tells each Execute Host
when to accept a job from Submit Host.
PWF Linux machines (Execute Hosts)
Execute host returns results to Submit Host.
All daemon-daemon (i.e. machine-machine)
communication in the pool is authenticated via
Kerberos.
40Security Configuration
- Users will have to SSH into the submit host to
submit jobs and this will also provide user
authentication (other secure access mechanisms,
e.g. a web portal over HTTPS, will be added
later). - Within the Condor pool authentication will be
machine-based and will use Kerberos. - Most (probably all) Condor daemons will not run
as root the most significant security
consequence of this for end-users is that any
files left behind by a Condor job will be
accessible to the next Condor job that runs on
that machine. We will probably implement a
script to clean up after jobs. - Network transmissions will not be encrypted.
41Job Universes supported
- Initially will only provide official support
for the vanilla universe and possibly the Java
universe. - Official support for the standard universe will
be provided in due course. - Unlikely to ever provide official support for
the PVM, Globus, and scheduler universes. In
fact, probably will not install support for the
PVM universe. Possible, but not very likely,
that official support may eventually be
provided for the MPI universe.
42Authorisation and Priority
- As mentioned before, any user normally eligible
for CS resources will be able, on application to
the CS, to use Condor on CS-owned PWF machines. - At least some Departments want to restrict use of
Condor on their MCS machines to members of the
Department and we will support this. - We will also support prioritising a Departments
machines for jobs from members of that Department
if anyone wants this. - For CS-owned PWF machines we will probably use
Condors default fair share policy (or a close
variant) for user priorities.
43Potential Users
- Particle Physics (HEP Group at the Cavendish
Laboratory) - Molecular Informatics (Unilever Centre for
Molecular Informatics, Department of Chemistry)
so far, will be our most intensive users, with
plans to analyse the molecular structure of
100,000 molecules using the General Atomic and
Molecular Electronic Structure System (GAMESS). - Mineral Science (Department of Earth Sciences)
- Anyone else? Please get in touch!
44Why Should I Care?
- Use of Condor amongst scientific researchers is
rapidly increasing, so for the science
departments chances are someone in your
department is either already using Condor or will
soon want to. - This means you may be asked to support it, either
explicitly, or implicitly (by allowing it to be
used on your network). - There are security implications to having Condor
running on your network more on this later.
45So, What Will Condor on the PWF Do For Me?
- Condor provides a way of increasing the
utilisation of your institutions computing
resources at no extra financial cost, and so
maximise the return on your institutions
investment in those resources. For some
Departments it is particularly important that
they can show they are making maximal use of
their IT equipment. And if you already use the
MCS Service, we will set this up for you. - Also, if you wanted to have a cluster of Linux
machines, but didnt want the MCS Service with
both PWF Windows and PWF Linux, we could offer a
Linux-only installation with Condor as a
scheduler for your cluster provided the way we
configure Condor and PWF Linux meets your needs
at a reduced cost (currently 500 15 per
machine, per anumn). - So what more do you want!?!
asks a quizzical kitten
46What Else Will Condor on the PWF Do For Me?
- You want more!?! Well
- The CS will be providing a significant
computational resource which will be freely
available to members of your Department or
College it may provide much needed computing
power for them that they cant get in the
Department, which may (if youre lucky!) mean
that they harass you less for more machines, a
greater share of your cluster(s), etc. So tell
them about it! - We will be a source of expertise in Condor, so we
will be able to provide some help and advice with
Condor pools in your institution.
47Why Should I Worry About Condor?
- or, The Security Implications of Having a Condor
Pool on Your Network
48What Hackers Want
- Before considering the security implications of a
system like Condor, it is useful to consider just
what it is an attacker will be trying to achieve.
Typically they want to achieve one or more of
the following overlapping goals - A higher level of access to your system(s) than
they are permitted (worst case root compromise) - The ability to run arbitrary code on your
system(s) - A denial of service (DoS) attack against you or
someone who can be attacked from your system(s) - If they can achieve either of the first two
goals, they can often achieve all of them.
49Condor Hackers Dream?
- Condor can help them achieve all of these
goals, but particularly the second (ability to
run arbitrary code) - Condor is designed to allow users to run
arbitrary code (no questions asked). For some
attackers this will be enough, but others may
want more, and if there is anything on your
system that could allow your system to be
compromised, Condor may well allow an attacker to
exploit this. - By design, Condor allows remote users (who do not
have an account on your system) to have access to
your system(s). By default, it allows the whole
world to have access to your system(s). - By default, Condor will want you to install it as
root, so a vulnerability in Condor could lead to
a root compromise. - Condor-related processes can consume a lot of CPU
cycles (perfect for a DoS attack against you) and
there are ways to force some of the Condor
daemons to do this. Even if the attacker only
manages to perform a DoS attack against Condor,
this may cause you significant problems if your
users make serious use of Condor.
50And now the bad news
- No one seems to have seriously attacked Condor,
so we dont have any idea of how secure it is - and because no one has attacked it, or attacked
using it, many muppets out there will tell you
that it must be secure because no one has
managed to use it compromise anyone (yet) - It is being increasingly deployed over very large
collections of machines (800 at UCL alone) which
will make it a very attractive target. - As previously mentioned, by default Condor wants
to be installed as root, has all security
features (authentication and encryption)
disabled, and gives everyone in the whole world
access to your machine. Go, evil hacker, go! - We dont yet have any way of remotely probing
machines to see if they are running Condor.
51Authentication in Condor
- Condor supports the authentication methods
described on the following four slides. - None of the so-called strong authentication
methods are supported on all platforms, so
heterogeneous Condor pools may well have problems
here - User authentication and machine (i.e. daemon)
authentication are controlled separately. - Authentication and encryption are implemented and
controlled separately. - Authentication and encryption can be set to one
of NEVER, OPTIONAL, PREFERRED or REQUIRED, with
the obvious meanings.
52Strong Authentication Methods in Condor (1)
- Kerberos Uses the MIT implementation of Kerberos
V5. Only supported under Linux and (I believe)
most versions of UNIX, excluding MacOS X.
Support for Kerberos under Windows is due to be
added in the current development release of
Condor. - Windows (NTSSPI) Authentication This uses
Microsofts Security Support Provider Interface
(SSPI) to enforce NT LAN Manager (NTLM)
authentication, which is based on challenge and
response, using the users password as a key.
NTLM authentication apparently bears some
similarity to Kerberos. (Obviously) only
available under Windows.
53Strong Authentication Methods in Condor (2)
- File System Authentication Utilises file
ownership to verify identity. The authenticating
daemon requires the party to be authenticated to
write a file to a specific location and then
checks the ownership of this file. This is only
available on non-Windows platforms. There are
only specific circumstances under which one
should regard this as strong authentication. - Remote File System Authentication Utilises file
ownership on a remote filesystem to verify
identity. The authenticating daemon requires the
party to be authenticated to write a file to a
specific location on a remote filesystem and then
checks the ownership of this file. This is only
available on non-Windows platforms. This is an
undocumented authentication mechanism. And
again, there are only specific circumstances
under which one should regard this as strong
authentication.
54Strong Authentication Methods in Condor (3)
- GSI GSI is the Grid Security Infrastructure,
developed by the Globus Alliance. It is a PKI
which uses X.509 digital certificates, and is so
appallingly implemented that my little sisters
cat could probably break it. The Globus Alliance
have yet to produce a stable implementation of
GSI under Windows so Condor only supports GSI
under Linux (and possibly some versions of UNIX?).
55Other Authentication Methods in Condor
- IP/Host-Based Security This form of security is
now considered outdated but remains available.
It allows or denies access based on the IP
address or DNS name of the remote machine. - Claim To Be Authentication Accept whatever
identity is presented by the client, i.e. no
authentication. Normally used for testing
purposes. - Anonymous Authentication Skip authentication
checks, i.e. no authentication. Normally used
for testing purposes.
56Other Security Considerations
- Condor will at least in some circumstances
overwrite existing files if they have the same
name as the output file(s) produced by the job.
Also, output files can have any legal file name
the job chooses. The inventive amongst you can
probably see how these (particularly in
combination) could lead to problems. - It is possible to submit a Condor job which will
spawn a process that does not die or get killed
when the job completes or when Condor terminates
it. Apparently Condors behaviour in this regard
will improve under the 2.6 kernel (that is, when
Condor officially supports this kernel). - There exists a trivial DoS attack against Condor
which anyone who can submit jobs to a Condor pool
can carry out which will cripple the central
manager. This will eventually be addressed in a
future version of Condor.
57And now
California Condor by Jamie Spangler
58What I havent discussed
- How Condor compares to batch queuing systems and
other batch scheduling systems such as PBS, LSF
and Sun Grid Engine (now N1 Grid Engine) - The roles of the different Condor daemons
- Consequences of Condor daemons not running as
root some of these consequences require a more
detailed knowledge of how Condor works than I
have given here, so if you really need to know,
ask me privately or consult the Condor manual,
Sections 3.7.1.1 and 3.7.2. - Compiling code against the Condor libraries for
the standard universe - The intricacies of job submission this is a
seminar in itself. Consult the Condor manual,
Sections 2.4 and 2.5 and the condor_submit
section in Section 9, and then ask (nicely)
myself or Mark Calleja (Earth Sciences) for
guidance. - Job monitoring
59References Condor Project and documentation
- Condor Project
- http//www.cs.wisc.edu/condor/
- Condor Manual
- Current Stable Release
- http//www.cs.wisc.edu/condor/manual/v6.6
- Current Development Release
- http//www.cs.wisc.edu/condor/manual/v6.7
- Previous Stable Release
- http//www.cs.wisc.edu/condor/manual/v6.4
60References other users of Condor
- Testimonials from Condor users
- http//www.cs.wisc.edu/condor/wts/stories.html
- UCL Condor
- http//grid.ucl.ac.uk/Condor.html
- eMinerals minigrid (Department of Earth
Sciences) - http//www.esc.cam.ac.uk/mcal00/grid.html
- Southampton University Computing Services
Windows Condor Pilot Service - http//www.iss.soton.ac.uk/research/e-science/cond
or/ - University of Reading Department of Meteorology
GRID - http//www.met.rdg.ac.uk/swsellis/system/grid/
- Condor at the University of Essex
- http//cswww.essex.ac.uk/intranet/students/Technic
alGroup/TechnicalHelp/condor.htm
61References security and firewalls
- Condor v6.6 Manual, Section 3.7 Security In
Condor - http//www.cs.wisc.edu/condor/manual/v6.6/3_7Secur
ity_In.html - MIT Kerberos
- http//web.mit.edu/kerberos/www/
- Microsofts Security Support Provider Interface
- http//msdn.microsoft.com/library/en-us/security/s
ecurity/sspi.asp - Grid Security Infrastructure (GSI)
- http//www-unix.globus.org/toolkit/docs/3.2/gsi/in
dex.html - http//www.globus.org/security/v2.0/
- Documentation from the CamGrid project on Condor
and firewalls - http//www.escience.cam.ac.uk/projects/camgrid/doc
umentation.html
62References other schedulers
- Sun Grid Engine (now known as N1 Grid Engine)
- http//wwws.sun.com/software/gridware/
- http//www.sun.com/products-n-solutions/hardware/d
ocs/Software/Sun_Grid_Engine/ - Portable Batch System (PBS)
- http//www.openpbs.org/
- http//www.pbspro.com/
- Platform LSF
- http//www.platform.com/products/LSF/
63Contacts
- Myself Bruce Beckles, e-Science Specialist
- condor-support_at_ucs.cam.ac.uk
- mbb10_at_cam.ac.uk
- Mark Calleja, Department of Earth Sciences
- Probably the most experienced user of Condor in
the University.
64Questions?