Upgrading Condor Best Practices

About This Presentation

Title:

Upgrading Condor Best Practices

Description:

Try to save this much swap space by not starting new shadows. ## Specified in megabytes. ... But we try very hard! Both forward and backward. Especially within ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 23

Provided by: con92

Learn more at: https://research.cs.wisc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Upgrading Condor Best Practices

1
Upgrading CondorBest Practices
2
The problem

More frequent releases of Condor
Every six to nine months?
Understand this is a problem for users
Were willing to help out

3
Overview

Config file management
Condor testing strategies
Standard Universe issues

4
Config files

LOCAL_CONFIG_FILE
Used for include-like behaviour
LOCAL_CONFIG_FILE \
(HOSTS), (GLOBAL), (POLICY)

5
Typical Config file

Try to save this much swap space by not
starting new shadows.
Specified in megabytes.
RESERVED_SWAP 5
Commented out lists the default value

6
Config file editing

Never edit base condor_config file
Except to specify the local file
Put all edits in a local file
One local file per config type
E.g. for schedds, CMs, types of execute machines
Can mix and match

7
Dealing with a new config

Diff base config with your config
Understand new items
Documented in manual version-history
Existing ones rarely change
Usually capacity changes
Almost always, overwriting base file works

8
Managing config files

Centralized management key
Cfengine, rsync, nfs (!) etc.

9
Testing new versions
10
Compatibility Guarantees

No guarantees
But we try very hard!
Both forward and backward
Especially within one machine
Federation techniques require this

11
Incremental testing!

Three basic components of Condor
Central Manager
Submit points
Execute machines
Test each independently

12
Testing Central Manager

Take advantage of statelessness
Condor HAD can help out here
If it breaks, existing jobs keep running

13
Testing schedds

Adding a new test schedd easy
Test jobs useful too, not just sleep
Schedd can be bottleneck
Probably only place you need to check cpu
performance

14
Testing startds

Easy to test a few at once
Be careful when running std uni
Glide in can be very helpful
But beware of root specific issues
Admin slots helpful

15
Now that weve tested

Always be undo-able!
(never overwrite files)
Rely on master restart on stat change

16
Big bang approach

What we do at CS
Just change a symlink to the binaries
Master does the rest
Can be a big hit on shared filesystems

17
Incremental restart

First, restart CM
No jobs lost
Send, reboot schedd
If restart happens in 20 minutes, jobs keep
running
What about the startds?
Might be OK for standard uni
Work on this coming soon

18
Standard Universe

More sensitive to backward compatibility
CheckpointPlatform clarifications
condor_qedit -constraint 'LastCheckpointPlatform
? "LINUX INTEL 2.6.x normal"'
LastCheckpointPlatform '"LINUX INTEL 2.6.x normal
0xffffe000"'

19
Draining old Std Uni

Keep a few old startds around
To finish old standard uni jobs
Set start to JobUniverse 1
Or maybe rank
Only on the old platforms

20
When to upgrade?

Zeroth law of software engineering
Development series actually pretty stable
Well let you know about security issues
Probably dont need every minor version
Dont be more than one major stable version behind

21
In summary

Keep config files under control
Test each component in isolation
Be aware of standard universe issues

22
Any questions?

Thank you!

Write a Comment

User Comments (0)

About PowerShow.com

Upgrading Condor Best Practices - PowerPoint PPT Presentation

Upgrading Condor Best Practices

Try to save this much swap space by not starting new shadows. ## Specified in megabytes. ... But we try very hard! Both forward and backward. Especially within ... – PowerPoint PPT presentation