Title: Fair Share Scheduling
1Fair Share Scheduling
Ethan Bolker Mathematics Computer Science UMass
Boston eb_at_cs.umb.edu www.cs.umb.edu/eb Queens
University March 23, 2001
2References
- www.bmc.com/patrol/fairshare
- www/cs.umb.edu/eb/goalmode
3Coming Attractions
- Queueing theory primer
- Fair share semantics
- Priority scheduling conservation laws
- Predicting response times from shares
- analytic formula
- experimental validation
- applet simulation
- Implementation geometry
4Transaction Workload
- Stream of jobs visiting a server
(ATM, time shared CPU, printer, ) - Jobs queue when server is busy
- Input
- Arrival rate ? job/sec
- Service demand s sec/job
- Performance metrics
- server utilization u ?s (must be ? 1)
- response time r ??? sec/job (average)
- degradation d r/s
5Response time computations
- r, d measure queueing delay
- r ? s (d ? 1), unless parallel processing
possible - Randomness really matters
- r s (d 1) if arrivals scheduled (best case,
no waiting) - r gtgt s for bulk arrivals (worst case, maximum
delays) - Theorem. If arrivals are Poisson and service is
exponentially distributed (M/M/1) then - d 1/(1- u) r s/(1- u)
- Think virtual server with speed 1-u
6M/M/1
- Essential nonlinearity often counterintuitive
- at u 95 average degradation is 1/(1-0.95)
20, - but 1 customer in 20 has no wait at all (5 idle
time) - A useful guide even when hypotheses fail
- accurate enough (? 30) for real computer systems
- d depends only on u many small jobs have same
impact as few large jobs - faster system ? smaller s ? smaller u
r s/(1-u) ? double win less
service, less wait - waiting costly, server cheap (telephones) want u
? 0 - server costly (doctors) want u ? 1 but scheduled
7Scheduling for Performance
- Customers want good response times
- Decreasing u is expensive
- High end Unix offerings from HP, IBM, Sun offer
fair share scheduling packages that allow an
administrator to allocate scarce resources (CPU,
processes, bandwidth) among workloads - How do these packages behave?
- Model as a black box, independent of internals
- Limit study to CPU shares on a uniprocessor
8Multiple Job Streams
- Multiple workloads, utilizations u1, u2,
- U ? ui lt 1
- If no workload prioritization
then all degradations are equal
di 1/(1-U) - Share allocations are de facto prioritizations
- Study degradation vector V (d1, d2, )
9Share Semantics
- Suppose workload w has CPU share fw
- Normalize shares so that ?w fw 1
- w gets fraction fw of CPU time slices when at
least one of its jobs is ready for service - Can it use more if competing workloads idle?
- No think share cap
- Yes think share guarantee
10Shares As Caps
- Good for accounting (sell fraction of web server)
- Available now from IBM, HP, soon from Sun
- Straightforward (boring) - workloads are isolated
- Each runs on a virtual processor with speed f
share f
dedicated system
u/f need f gt u !
utilization u
response time r r(1 ? u)/(f
? u)
11Shares As Guarantees
- Good for performance economy
(use otherwise idle resources) - Shares make a difference only when there are
multiple workloads - Large share resembles high priority
share may be less than utilization - Workload interaction is subtle, often
unintuitive, hard to explain
12Modeling
13Modeling
- Real system
- Complex, dynamic, frequent state changes
- Hard to tease out cause and effect
- Model
- Static snapshot, deals in averages and
probabilities - Fast enlightening answers to what if questions
- Abstraction helps you understand real system
- Start with a study of priority scheduling
14Priority Scheduling
- Priority state order workloads by priority (ties
OK) - two workloads, 3 states 12, 21, 12
- three workloads, 13 states
- 123 (6 3! of these ordered states),
- 123 (3 of these),
- 123 (3 of these),
- 123 (1 state with no priorities)
- n wkls, f(n) states, n! ordered (simplex lock
combos) - p(s) prob( state s ) fraction of time in
state s - V(s) degradation vector when state s
(measure this, or compute it using queueing
theory) - V ?s p(s)V(s) (time avg is convex combination)
- Achievable region is convex hull of vectors V(s)
15Two workloads
d1 d2
d2
V(12) (wkl 1 high prio)
?
?
V(12) (no priorities)
achievable region
?
V(21)
d1
16Two workloads
d1 d2
d2
V(12) (wkl 1 high prio)
?
?
V(12) (no priorities)
?
0.5 V(12) 0.5V(21) ? V(12)
?
V(21)
d1
17Two workloads
d1 d2
d2
V(12) (wkl 1 high prio)
?
?
V(12) (no priorities)
note u1 lt u2 ? wkl 2 effect on wkl 1 large
?
V(21)
d1
18Conservation
- No Free Lunch Theorem. Weighted average
degradation is constant, independent of priority
scheduling scheme - ?i (ui /U) di 1/(1-U)
- Provable from some hypotheses
- Observable in some real systems
- Sometimes false shortest job first minimizes
average response time (printer queues,
supermarket express checkout lines)
19Conservation
- For any proper set A of workloads
- Imagine giving those workloads top priority.
- Then can pretend other wkls dont exist. In that
case - ?i ? A (ui /U(A)) di 1/(1-U(A))
- When wkls in A have lower priorities they have
- higher degradations, so in general
- ?i ? A (ui /U(A)) di ? 1/(1-U(A))
- These 2n -2 linear inequalities determine the
convex achievable region R - R is a permutahedron only n! vertices
20Two Workloads
conservation law (d1 , d2 ) lies on the line
d2 workload 2 degradation
u 1d1 u 2d2 1/(1-U)
d1 workload 1 degradation
21Two Workloads
d2 workload 2 degradation
constraint resulting from workload 1
d 1 ? 1/(1- u1 )
d1 workload 1 degradation
22Two Workloads
Workload 1 runs at high priority V(1,2) (1
/(1- u1 ), 1 /(1- u1 )(1-U) )
?
d2 workload 2 degradation
constraint resulting from workload 1
d1 ? 1 /(1- u1 )
d1 workload 1 degradation
23Two Workloads
?
d2 workload 2 degradation
u 1d1 u 2d2 1/(1-U)
d2 ? 1 /(1- u2 )
?
V(2,1)
d1 workload 1 degradation
24Two Workloads
?
V(1,2) (1 /(1- u1 ), 1 /(1- u1 )(1-U) )
d2 workload 2 degradation
achievable region R
u 1d1 u 2d2 1/(1-U)
d2 ? 1 /(1- u2 )
?
V(2,1)
d1 ? 1 /(1- u1 )
d1 workload 1 degradation
25Three Workloads
- Degradation vector (d1,d2, d3) lies on plane
u1 d1 u2 d2 u3dr3 C - We know a constraint for each workload w
uw dw ? Cw - Conservation applies to each pair of wkls as
well u1 d1 u2 d2 ? C12 - Achievable region has one vertex for each
priority ordering of workloads 3! 6 in all - Hence its name the permutahedron
26Three Workload Permutahedron
3! 6 vertices (priority orders) 23 - 2 6
edges (conservation constraints)
u1 r1 u2 d2 u3 d3 C
d3
V(1,2,3)
?
?
V(2,1,3)
d2
d1
27Experimental evidence
28Four workload permutahedron
4! 24 vertices (ordered states) 24 - 2 14
facets (proper subsets) (conservation
constraints) 74 faces (states)
Simplicial geometry and transportation
polytopes, Trans. Amer. Math. Soc. 217 (1976) 138.
29Map shares to degradations- two workloads -
- Suppose f1 and f2 gt 0 , f1 f2 1
- Model System operates in state
- 12 with probability f1
- 21 with probability f2
- (independent of who is on queue)
- Average degradation vector
- V f1 V(12) f2 V(21)
30Predict Degradations From Shares(Two Workloads)
- Reasonable modeling assumption f1 1, f2 0
means workload 1 runs at high priority - For arbitrary shares workload priority order is
- (1,2) with probability f1
- (2,1) with probability f2
(probability fraction of time) - Compute average workload degradation
d1 f1 ? (wkl 1 degradation at high
priority) f2 ? (wkl
1 degradation at low priority )
31Model validation
32Model validation
33Map shares to degradations- three (n) workloads -
- f1
f2 f3 - prob(123) ------------------------------
- (f1 f2 f3) (f2 f3)
(f3) - Theorem These n! probabilities sum to 1
- interesting identity generalizing adding
fractions - prove by induction, or by coupon collecting
- V ?ordered states s prob(s) V(s)
- O(n!), ?(n!), good enough for n ? 9 (12)
34Model validation
35Model validation
36The Fair Share Applet
- Screen captures on next slides are from
www.bmc.com/patrol/fairshare - Experiment with what if fair share modeling
- Watch a simulation
- Random virtual job generator for the simulation
is the same one used to generate random real jobs
for our benchmark studies
37Three Transaction Workloads
???
- Three workloads, each with utilization
0.32 jobs/second ? 1.0
seconds/job 0.32 32 - CPU 96 busy, so average (conserved) response
time is 1.0/(1?0.96) 25 seconds - Individual workload average response times depend
on shares
???
???
38Three Transaction Workloads
- Normalized f3 0.20 means 20 of the time
workload 3 (development) would be dispatched at
highest priority
- During that time, workload priority order is
(3,1,2) for 32/80 of the time, (3,2,1) for 48/80 - Probability( priority order is 312 )
0.20?(32/80) 0.08
39Three Transaction Workloads
- Formulas on previous slide
- Average predicted response time weighted by
throughput 25 seconds (as expected) - Hard to understand intuitively
- Software helps
40Three Transaction Workloads
note change from 32
41Simulation
42When the Model Fails
- Real CPU uses round robin scheduling to deliver
time slices - Short jobs never wait for long jobs to complete
- That resembles shortest job first, so response
time conservation law fails - At high utilization, simulation shows smaller
response times than predicted by model - Response time conservation law yields
conservative predictions
43Scaling Degradation Predictions
- V ?ordered states s prob(s) V(s)
- Each s is a permutation of (1,2, , n)
- Think of it as a vector in n-space
- Those n! vectors lie on of a sphere
- For n large they are pretty densely packed
- Think of prob(s) as a discrete approximation to a
probability distribution on the sphere - V is an integral
44Monte Carlo
- loop sampleSize times
- choose a permutation s at random from the
distribution determined by the
shares - compute degradation vector V(s)
- accumulate V prob(s)V(s)
- sampleSize 40000 works well independent of n!
45Map shares to degradations(geometry)
- Interpret shares as barycentric coordinates in
the n-1 simplex - Study the geometry of the map from the simplex to
the n-1 dimensional permutahedron - Easy when n2 each is a line segment and map is
linear
46Mapping a triangle to a hexagon
f3 1
f1 0
312
132
f1 1
?
M
f3 0
321
123
wkl 1 high priority
213
231
wkl 1 low priority
47Mapping a triangle to a hexagon
f1 0
f1 1
?
23
48Mapping a triangle to a hexagon
49What This Means
- Add a strong statement that summarizes how you
feel or think about this topic - Summarize key points you want your audience to
remember