Title: The Condor JobRouter
1The Condor JobRouter
2aka schedd on the side
3Status
- Its in the current development series Condor
7.1.0, unix (windows soonish) - Used heavily by CMS physics experiment for
simulation on Open Science Grid (millions of jobs
routed)
4What is job routing?
routed (grid) job
original (vanilla) job
Universe vanilla Executable sim Arguments
seed345 Output stdout.345 Error
stderr.345 ShouldTransferFiles
True WhenToTransferOutput ON_EXIT
Universe grid GridType gt2 GridResource
\ cmsgrid01.hep.wisc.edu/jobmanager-condor
Executable sim Arguments seed345 Output
stdout Error stderr ShouldTransferFiles
True WhenToTransferOutput ON_EXIT
JobRouter
Routing Table Site 1 Site 2
final status
5Routing is just site-level matchmaking
- With feedback from job queue
- number of jobs currently routed to site X
- number of idle jobs routed to site X
- rate of recent success/failure at site X
- And with power to modify job ad
- change attribute values (e.g. Universe)
- insert new attributes (e.g. GridResource)
- add a portal grid proxy if desired
6Configuring the Routing Table
- JOB_ROUTER_ENTRIES
- list site ClassAds in configuration file
- JOB_ROUTER_ENTRIES_FILE
- read site ClassAds periodically from a file
- JOB_ROUTER_ENTRIES_CMD
- read periodically from a script
- example query a collector such as Open Science
Grid Resource Selection Service
7Syntax
- Read the 7.1 manual.
- Its in the chapter on Grid Computing
- Name Grid Site 1GridResource gt2
gatekeeperMaxIdleJobs 10FailureRateThresho
ld 0.01
8What Types of Input Jobs?
- Vanilla Universe
- Self Contained(everything needed is in file
transfer list) - High Throughput(many more jobs than cpus)
9What Target Grid Types?
- Globus, Condor-C work well
- others untested, but should be fine
- Why only target the grid universe?
- no reason at all
- 7.1.1 now allows any destination universe
10Grid Gotchas
- Globus gt2
- no exit status from job (reported as 0)
- must explicitly list desired output files
11JobRouter vs. Glidein
- Glidein - Condor overlays the grid
- job never waits in remote queue
- job runs in its normal universe
- private networks doable, but add to complexity
- need something to submit glideins on demand
- JobRouter
- some jobs wait in remote queue (MaxIdleJobs)
- job must be compatible with target grid semantics
- simple to set up, fully automatic to run