Title: Using New Features in Condor 7.2
1Using New Features in Condor 7.2
2Outline
- Startd Hooks
- Job Router
- Job Router Hooks
- Power Management
- Dynamic Slot Partitioning
- Concurrency Limits
- Variable Substitution
- Preemption Attributes
3Startd Job Hooks
- Users wanted to take advantage of Condors
resource management daemon (condor_startd) to run
jobs, but they had their own scheduling system. - Specialized scheduling needs
- Jobs live in their own database or other storage
rather than a Condor job queue
4Our solution
- Make a system of generic hooks that you can
plug into - A hook is a point during the life-cycle of a job
where the Condor daemons will invoke an external
program - Hook Condor to your existing job management
system without modifying the Condor code
5How does Condor communicate with hooks?
- Passing around ASCII ClassAds via standard input
and standard output - Some hooks get control data via a command-line
argument (argv) - Hooks can be written in any language (scripts,
binaries, whatever you want) so long as you can
read/write Stdin/out
6What hooks are available?
- Hooks for fetching work (startd)
- FETCH_JOB
- REPLY_FETCH
- EVICT_CLAIM
- Hooks for running jobs (starter)
- PREPARE_JOB
- UPDATE_JOB_INFO
- JOB_EXIT
7HOOK_FETCH_JOB
- Invoked by the startd whenever it wants to try to
fetch new work - FetchWorkDelay expression
- Stdin slot ClassAd
- Stdout job ClassAd
- If Stdout is empty, theres no work
8HOOK_REPLY_FETCH
- Invoked by the startd once it decides what to do
with the job ClassAd returned by HOOK_FETCH_WORK - Gives your external system a chance to know what
happened - argv1 accept or reject
- Stdin slot and job ClassAds
- Stdout ignored
9HOOK_EVICT_CLAIM
- Invoked if the startd has to evict a claim thats
running fetched work - Informational only you cant stop or delay this
train once its left the station - Stdin both slot and job ClassAds
- Stdout ignored
10HOOK_PREPARE_JOB
- Invoked by the condor_starter when it first
starts up (only if defined) - Opportunity to prepare the job execution
environment - Transfer input files, executables, etc.
- Stdin both slot and job ClassAds
- Stdout ignored, but starter wont continue until
this hook exits - Not specific to fetched work
11HOOK_UPDATE_JOB_INFO
- Periodically invoked by the starter to let you
know whats happening with the job - Stdin slot and job ClassAds
- Job ClassAd is updated with additional attributes
computed by the starter - ImageSize, JobState, RemoteUserCpu, etc.
- Stdout ignored
12HOOK_JOB_EXIT
- Invoked by the starter whenever the job exits for
any reason - Argv1 indicates what happened
- exit Died a natural death
- evict Booted off prematurely by the startd
(PREEMPT TRUE, condor_off, etc) - remove Removed by condor_rm
- hold Held by condor_hold
13HOOK_JOB_EXIT
- HUH!?! condor_rm? What are you talking about?
- The starter hooks can be defined even for regular
Condor jobs, local universe, etc. - Stdin copy of the job ClassAd with extra
attributes about what happened - ExitCode, JobDuration, etc.
- Stdout ignored
14Defining hooks
- Each slot can have its own hook keyword
- Prefix for config file parameters
- Can use different sets of hooks to talk to
different external systems on each slot - Global keyword used when the per-slot keyword is
not defined - Keyword is inserted by the startd into its copy
of the job ClassAd and given to the starter
15Defining hooks example
- Most slots fetch work from the database system
- STARTD_JOB_HOOK_KEYWORD DATABASE
- Slot4 fetches and runs work from a web service
- SLOT4_JOB_HOOK_KEYWORD WEB
- The database system needs to both provide work
and - know the reply for each attempted claim
- DB_DIR /usr/local/condor/fetch/db
- DATABASE_HOOK_FETCH_WORK (DB_DIR)/fetch_work.ph
p - DATABASE_HOOK_REPLY_FETCH (DB_DIR)/reply_fetch.
php - The web system only needs to fetch work
- WEB_DIR /usr/local/condor/fetch/web
- WEB_HOOK_FETCH_WORK (WEB_DIR)/fetch_work.php
16Semantics of fetched jobs
- Condor_startd treats them just like any other
kind of job - All the standard resource policy expressions
apply (START, SUSPEND, PREEMPT, RANK, etc). - Fetched jobs can coexist in the same pool with
jobs pushed by Condor, COD, etc. - Fetched work ! Backfill
17Semantics continued
- If the startd is unclaimed and fetches a job, a
claim is created - If that job completes, the claim is reused and
the startd fetches again - Keep fetching until either
- The claim is evicted by Condor
- The fetch hook returns no more work
18Limitations of the hooks
- If the starter cant run your fetched job because
your ClassAd is bogus, no hook is invoked to tell
you about it - We need a HOOK_STARTER_FAILURE
- No hook when the starter is about to evict you
(so you can checkpoint) - Can implement this yourself with a wrapper script
and the SoftKillSig attribute
19Job Router
- Automated way to let jobs run on a wider array of
resources - Transform jobs into different forms
- Reroute jobs to different destinations
20What is job routing?
routed (grid) job
original (vanilla) job
Universe vanilla Executable sim Arguments
seed345 Output stdout.345 Error
stderr.345 ShouldTransferFiles
True WhenToTransferOutput ON_EXIT
Universe grid GridType gt2 GridResource
\ cmsgrid01.hep.wisc.edu/jobmanager-condor
Executable sim Arguments seed345 Output
stdout Error stderr ShouldTransferFiles
True WhenToTransferOutput ON_EXIT
JobRouter
Routing Table Site 1 Site 2
final status
21Routing is just site-level matchmaking
- With feedback from job queue
- number of jobs currently routed to site X
- number of idle jobs routed to site X
- rate of recent success/failure at site X
- And with power to modify job ad
- change attribute values (e.g. Universe)
- insert new attributes (e.g. GridResource)
- add a portal grid proxy if desired
22Configuring the Routing Table
- JOB_ROUTER_ENTRIES
- list site ClassAds in configuration file
- JOB_ROUTER_ENTRIES_FILE
- read site ClassAds periodically from a file
- JOB_ROUTER_ENTRIES_CMD
- read periodically from a script
- example query a collector such as Open Science
Grid Resource Selection Service
23Syntax
- List of sites in new ClassAd format
- Name Grid Site 1
-
- Name Grid Site 2
-
-
- Name Grid site 3
-
-
24Syntax
- Name Site 1
- GridResource gt2 gk.foo.edu
- MaxIdleJobs 10
- MaxJobs 200
- FailureRateThreshold 0.01
- JobFailureTest other.RemoteWallClockTime lt
1800 - Requirements target.WantJobRouter is True
- delete_WantJobRouter true
- set_PeriodicRemove JobStatus 5
25What Types of Input Jobs?
- Vanilla Universe
- Self Contained(everything needed is in file
transfer list) - High Throughput(many more jobs than cpus)
26Grid Gotchas
- Globus gt2
- no exit status from job (reported as 0)
- Most grid universe types
- must explicitly list desired output files
27JobRouter vs. Glidein
- Glidein - Condor overlays the grid
- job never waits in remote queue
- job runs in its normal universe
- private networks doable, but add to complexity
- need something to submit glideins on demand
- JobRouter
- some jobs wait in remote queue (MaxIdleJobs)
- job must be compatible with target grid semantics
- simple to set up, fully automatic to run
28Job Router Hooks
- Truly transform jobs, not just reroute them
- E.g. stuff a job into a virtual machine (either
VM universe or Amazon EC2) - Hooks invoked like startd ones
29HOOK_TRANSLATE
- Invoked when a job is matched to a route
- Stdin route name and job ad
- Stdout transformed job ad
- Transformed job is submitted to Condor
30HOOK_UPDATE_JOB_INFO
- Invoked periodically to obtain extra information
about routed job - Stdin routed job ad
- Stdout attributes to update in routed job ad
31HOOK_JOB_FINALIZE
- Invoked when routed job has completed
- Stdin ads of original and routed jobs
- Stdout modified original job ad or nothing (no
updates)
32HOOK_JOB_CLEANUP
- Invoked when original job returned to schedd
(both success and failure) - Stdin Original job ad
- Use for cleanup of external resources
33Power Management
- Hibernate execute machines when not needed
- Condor doesnt handle waking machines up yet
- Information to wake machines available in machine
ads
34Configuring Power Management
- HIBERNATE
- Expression evaluated periodically by all slots to
decide when to hibernate - All slots must agree to hibernate
- HIBERNATE_CHECK_INTERVAL
- Number of seconds between hibernation checks
35Setting HIBERNATE
- HIBERNATE must evaluate to one of these strings
- NONE, 0
- S1, 1, STANDBY, SLEEP
- S2, 2
- S3, 3, RAM, MEM
- S4, 4, DISK, HIBERNATE
- S5, 5, SHUTDOWN
- These numbers are ACPI power states
36Power Management on Linux
- On linux, theses methods are tried in order for
setting power level - pm-UTIL tools
- /sys/power
- /proc/ACPI
- LINUX_HIBERNATION_METHOD can be set to pick a
favored method
37Sample Configuration
- ShouldHibernate \
- ((KeyboardIdle gt (StartIdleTime)) \
- (CPUIdle) \
- ((StateTimer) gt (2 (HOUR)))
- HIBERNATE ifThenElse( \
- (ShouldHibernate), RAM, NONE )
- HIBERNATE_CHECK_INTERVAL 300
- LINUX_HIBERNATION_METHOD /proc
38Dynamic Slot Partitioning
- Divide slots into chunks sized for matched jobs
- Readvertise remaining resources
- Partitionable resources are cpus, memory, and disk
39How It Works
- When match is made
- New sub-slot is created for job and advertised
- Slot is readvertised with remaining resources
- Slot can be partitioned multiple times
- Original slot ad never enters Claimed state
- But may eventually have too few resources to be
matched - When claim on sub-slot is released, resources are
added back to original slot
40Configuration
- Resources still statically partitioned between
slots - SLOT_TYPE_ltNgt_PARTITIONABLE
- Set to True to enable dynamic partition within
indicated slot
41New Machine Attributes
- In original slot machine ad
- PartitionableSlot True
- In ad for dynamically-created slots
- DynamicSlot True
- Can reference these in startd policy expressions
42Job Submit File
- Jobs can request how much of partitionable
resources they need - request_cpus 3
- request_memory 1024
- request_disk 10240
43Dynamic Partitioning Caveats
- Cannot preempt original slot or group of
sub-slots - Potential starvation of jobs with large resource
requirements - Partitioning happens once per slot each
negotiation cycle - Scheduling of large slots may be slow
44Concurrency Limits
- Limit job execution based on admin-defined
consumable resources - E.g. licenses
- Can have many different limits
- Jobs say what resources they need
- Negotiator enforces limits pool-wide
45Concurrency Example
- Negotiator config file
- MATLAB_LIMIT 5
- NFS_LIMIT 20
- Job submit file
- concurrency_limits matlab,nfs3
- This requests 1 Matlab token and 3 NFS tokens
46New Variable Substitution
- (Foo) in submit file
- Existing feature
- Attribute Foo from machine ad substituted
- (Memory 0.9) in submit file
- New feature
- Expression is evaluated and then substituted
47More Info For Preemption
- New attributes for these preemption expressions
in the negotiator - PREEMPTION_REQUIREMENTS
- PREEMPTION_RANK
- Used for controlling preemption due to user
priorities
48Preemption Attributes
- Submitter/RemoteUserPrio
- User priority of candidate and running jobs
- Submitter/RemoteUserResourcesInUse
- Number of slots in use by user of each job
- Submitter/RemoteGroupResourcesInUse
- Number of slots in use by each users group
- Submitter/RemoteGroupQuota
- Slot quota for each users group
49Thank You!