Title: Condor in the University of Oxford OxGrid
1Condor in the University of Oxford - OxGrid
- David Spence, Tiejun Ma, Xin Xiong and David
Wallom - Oxford e-Research Centre
2Overview
- Background
- OxGrid Design
- Linux Clients for Windows
- Networking
- Job management
- Packaging
- OxGrid Projects
- Conclusion
3Background
- Oxford University has a complex structure, which
makes rolling out a Campus Grid tricky - There are 38 colleges which are
legally-independent to the University... - ... and at least 63 academic departments which
are semi-autonomous - The colleges provide the bulk of PCs accessible
to students - Some departments tend to provide teaching
laboratories - Each have their own independent IT support staff
- OeRC is just one of the departments
4OxGrid Architecture Administration
Oxford e-Research Centre
Department/College
Department/College
Department/College
Resource Broker/ Login (Condor)
Storage (SRB)
BDII, VOMS, SSO CA...
Condor pool
Departmental Clusters
Condor pool
Other University/Institution
Other University/Institution
Other University/Institution
Microsoft Cluster
National Grid Service Cluster
Super-computing centre
National Grid Service Resource
5OxGrid Architecture Technical
OxGrid Resource Broker
Job Submit Scripts
Adverts include dynamic information e.g. FreeCPU
condor_schedd
RB Advertise Script
condor_collector
condor_ negotiator
Glue
condor_ gridmanager
Globus
OxGrid BDII
NGS BDII
Machines
Glue
Machines
Glue
Machines
Condor
Machines
National Grid Service Resource
Oxford Condor Pool
OeRC Resources
Machines
National Grid Service Resource
Oxford Condor Pool
OeRC Resources
Machines
National Grid Service Resource
Other Oxford Resources
Oxford Condor Pool Headnodes
Machines
Machines
Machines
Machines
Machines
Machines
6User Interface
- getcert
- Obtain a local, low assurance certificate
- Uses University SSO architecture and MyProxy
- submit-job
- Test accessibility of resources
- Matchmaking based on free CPUs
- Submit to different resource types (Campus Grid,
NGS...) - Flexible parameter/file/filename sweeps
- Various waiting primitives for implementing
workflows - job-submission-script
- Simpler one job
7Departmental Condor pools
- We are signing up Colleges and Departments
- We set up a Head node for the department
- Automatic installation script using VDT
- Called condor.DEPT.ox.ac.uk
- This is placed in their machine room
- Firewalls only needs to be open for Globus
traffic between this headnode and the resource
broker in OeRC - We then need to get IT staff to install clients
- Linux installation instructions
- But we really want Linux on Windows
- Electricity cost is becoming an issue
8Windows Client Requirements
- We want to support Windows and Linux jobs
- With mixed Condor pools
- Desktop and Lab PCs are not centrally owned or
managed - Most departments have Windows PCs
- It is not easy to persuade departmental IT staff
to support the Campus Grid - We need to provide an easy to install client
which can be used existing management frameworks
9Using CoLinux
- We decided to use the CoLinux system for running
Linux Condor under Windows - CoLinux is the Linux kernel ported to Windows
- It has proved to be reliable in our experience
- It is nearly as fast as native (0.1 overhead,
see http//www.ibm.com/developerworks/linux/libra
ry/l-colinux/) - It requires no porting of code
- Appears as a normal Windows service to Windows
- Can be stopped started from Services
- Or net start/stop colinux
10CoLinux Networking
- Must use the same interface/IP as Windows in
general case - Use Slirp which makes all CoLinux look like a
Windows process - Internal 10.x.x.x network
- Great for security specify exactly want ports
that Linux can listen on. - There is a small range of outgoing ports
configured in Condor with IN_LOWPORT and
IN_HIGHPORT.
11CoLinux Network Configuration
Windows
Internal Network
CoLinux
Win Sock API
eth0 10.0.2.15
Condor
Connection REAL_IP
Gateway 10.0.0.2
Socket API
From RealIP To RemoteIP ClassAd Contact
10.0.2.15port
From 10.0.2.15 To RemoteIP ClassAd Contact
10.0.2.15port
From RealIP To RemoteIP ClassAd Contact
RealIPport
From 10.0.2.15 To RemoteIP ClassAd Contact
RealIPport
From RealIP To RemoteIP ClassAd Contact
RealIPport
12CoLinux Network Configuration
- On start-up re-configure the networking in
CoLinux - IP of Windows passed to CoLinux as kernel
parameter - Create a new IP alias for eth0 with the Windows
IP address. - ifconfig eth01 ip netmask 255.255.255.255
- Set up IP Tables to re-route requests
- iptables -t nat -A POSTROUTING -o eth0 -j SNAT
--to 10.0.2.15 - iptables -t nat -APREROUTING -i eth0 -j DNAT --to
ip - Make up a unique hostname colinux.ltwindows IPgt
- hostname colinux.ip
- Add to /etc/hosts
- Export some variables to Condor, which are used
in the Condor configuration files - HOST_OS_IPip used for
NETWORK_INTERFACE - CONDOR_HOSTcondor.domain used for
CONDOR_HOST - CONDOR_DOMAIN.domain used for
access control
13Controlling CoLinux Jobs
- Providing feedback about user activity in Windows
- University of Nebraska monitoring scripts in
Windows - University of Reading start up service on
power-on/log-off, stop at log-in Group Policy - Oxford Teaching labs run in background at low
priority - Many Oxford Departments support Linux AND
Windows - Let Condor for Windows monitor the machine!
- Periodically check Windows Condor status
- Add HostState to the ClassAd for CoLinux
Condor - Used to control when jobs can run
- e.g. Start (HostState ! Claimed)
- Run job if there is at least one free core
reported - Best to only use with multiple cores
14Packaging
- For ease of installation we create a MSI
- One-click installation and configuration of
CoLinux - Can include Condor for Windows and set it up to
work with the CoLinux Condor setup - Can be used with all Windows remote management
systems - Will automatically set-up configuration files
based on machine characteristics - Debian based filesystem image
- 1 Gb limit to cab files compress image with
bzip2 (2.2 Gb -gt 336Mb) - Once we have set up a head-node for the
department/college then all the local IT need to
do is run the installer
15OnGoing work OxGrid Projects
- Low-Carbon ICT
- Research and communications in low carbon
technologies University-wide wake-on-LAN
service and Condor integration, (with OUCS,
Oxford Environmental Change Institute). - GridBS
- Extending the GridSAM middleware with Condor
Matchmaking, (with Imperial, OMII at
Southampton). Basis of the current Resource
Broker - SARoNGS
- Integrating the access to e-Research facilities
with Federated Access Management (Shibboleth)
being rolled-out across the UK, (with STFC and
Manchester).
16Conclusion
- Departmental and College Condor pools are
accessed through a central pool of pools that
also supports access to other resources - CoLinux is used to provide Linux support on
Windows - No networking configuration required
- Can be combined with Condor for Windows
- Zero-configuration installer for all components
17(No Transcript)