Fair Share Scheduling - PowerPoint PPT Presentation

About This Presentation

Title:

Fair Share Scheduling

Description:

If arrivals are Poisson and service is exponentially distributed (M/M/1) then ... M/M/1. Essential nonlinearity often counterintuitive ... – PowerPoint PPT presentation

Number of Views:887

Avg rating:3.0/5.0

Slides: 50

Provided by: ethanb8

Learn more at: https://www.cs.umb.edu

Category:

more less

Transcript and Presenter's Notes

Title: Fair Share Scheduling

1
Fair Share Scheduling
Ethan Bolker Mathematics Computer Science UMass
Boston eb_at_cs.umb.edu www.cs.umb.edu/eb Queens
University March 23, 2001
2
References

www.bmc.com/patrol/fairshare
www/cs.umb.edu/eb/goalmode

3
Coming Attractions

Queueing theory primer
Fair share semantics
Priority scheduling conservation laws
Predicting response times from shares
analytic formula
experimental validation
applet simulation
Implementation geometry

4
Transaction Workload

Stream of jobs visiting a server
(ATM, time shared CPU, printer, )
Jobs queue when server is busy
Input
Arrival rate ? job/sec
Service demand s sec/job
Performance metrics
server utilization u ?s (must be ? 1)
response time r ??? sec/job (average)
degradation d r/s

5
Response time computations

r, d measure queueing delay
r ? s (d ? 1), unless parallel processing
possible
Randomness really matters
r s (d 1) if arrivals scheduled (best case,
no waiting)
r gtgt s for bulk arrivals (worst case, maximum
delays)
Theorem. If arrivals are Poisson and service is
exponentially distributed (M/M/1) then
d 1/(1- u) r s/(1- u)
Think virtual server with speed 1-u

6
M/M/1

Essential nonlinearity often counterintuitive
at u 95 average degradation is 1/(1-0.95)
20,
but 1 customer in 20 has no wait at all (5 idle
time)
A useful guide even when hypotheses fail
accurate enough (? 30) for real computer systems
d depends only on u many small jobs have same
impact as few large jobs
faster system ? smaller s ? smaller u
r s/(1-u) ? double win less
service, less wait
waiting costly, server cheap (telephones) want u
? 0
server costly (doctors) want u ? 1 but scheduled

7
Scheduling for Performance

Customers want good response times
Decreasing u is expensive
High end Unix offerings from HP, IBM, Sun offer
fair share scheduling packages that allow an
administrator to allocate scarce resources (CPU,
processes, bandwidth) among workloads
How do these packages behave?
Model as a black box, independent of internals
Limit study to CPU shares on a uniprocessor

8
Multiple Job Streams

Multiple workloads, utilizations u1, u2,
U ? ui lt 1
If no workload prioritization
then all degradations are equal
di 1/(1-U)
Share allocations are de facto prioritizations
Study degradation vector V (d1, d2, )

9
Share Semantics

Suppose workload w has CPU share fw
Normalize shares so that ?w fw 1
w gets fraction fw of CPU time slices when at
least one of its jobs is ready for service
Can it use more if competing workloads idle?
No think share cap
Yes think share guarantee

10
Shares As Caps

Good for accounting (sell fraction of web server)
Available now from IBM, HP, soon from Sun
Straightforward (boring) - workloads are isolated
Each runs on a virtual processor with speed f

share f
dedicated system
u/f need f gt u !
utilization u
response time r r(1 ? u)/(f
? u)
11
Shares As Guarantees

Good for performance economy
(use otherwise idle resources)
Shares make a difference only when there are
multiple workloads
Large share resembles high priority
share may be less than utilization
Workload interaction is subtle, often
unintuitive, hard to explain

12
Modeling
13
Modeling

Real system
Complex, dynamic, frequent state changes
Hard to tease out cause and effect
Model
Static snapshot, deals in averages and
probabilities
Fast enlightening answers to what if questions
Abstraction helps you understand real system
Start with a study of priority scheduling

14
Priority Scheduling

Priority state order workloads by priority (ties
OK)
two workloads, 3 states 12, 21, 12
three workloads, 13 states
123 (6 3! of these ordered states),
123 (3 of these),
123 (3 of these),
123 (1 state with no priorities)
n wkls, f(n) states, n! ordered (simplex lock
combos)
p(s) prob( state s ) fraction of time in
state s
V(s) degradation vector when state s
(measure this, or compute it using queueing
theory)
V ?s p(s)V(s) (time avg is convex combination)
Achievable region is convex hull of vectors V(s)

15
Two workloads
d1 d2
d2
V(12) (wkl 1 high prio)
?
?
V(12) (no priorities)
achievable region
?
V(21)
d1
16
Two workloads
d1 d2
d2
V(12) (wkl 1 high prio)
?
?
V(12) (no priorities)
?
0.5 V(12) 0.5V(21) ? V(12)
?
V(21)
d1
17
Two workloads
d1 d2
d2
V(12) (wkl 1 high prio)
?
?
V(12) (no priorities)
note u1 lt u2 ? wkl 2 effect on wkl 1 large
?
V(21)
d1
18
Conservation

No Free Lunch Theorem. Weighted average
degradation is constant, independent of priority
scheduling scheme
?i (ui /U) di 1/(1-U)
Provable from some hypotheses
Observable in some real systems
Sometimes false shortest job first minimizes
average response time (printer queues,
supermarket express checkout lines)

19
Conservation

For any proper set A of workloads
Imagine giving those workloads top priority.
Then can pretend other wkls dont exist. In that
case
?i ? A (ui /U(A)) di 1/(1-U(A))
When wkls in A have lower priorities they have
higher degradations, so in general
?i ? A (ui /U(A)) di ? 1/(1-U(A))
These 2n -2 linear inequalities determine the
convex achievable region R
R is a permutahedron only n! vertices

20
Two Workloads
conservation law (d1 , d2 ) lies on the line
d2 workload 2 degradation
u 1d1 u 2d2 1/(1-U)
d1 workload 1 degradation
21
Two Workloads
d2 workload 2 degradation
constraint resulting from workload 1
d 1 ? 1/(1- u1 )
d1 workload 1 degradation
22
Two Workloads
Workload 1 runs at high priority V(1,2) (1
/(1- u1 ), 1 /(1- u1 )(1-U) )
?
d2 workload 2 degradation
constraint resulting from workload 1
d1 ? 1 /(1- u1 )
d1 workload 1 degradation
23
Two Workloads
?
d2 workload 2 degradation
u 1d1 u 2d2 1/(1-U)
d2 ? 1 /(1- u2 )
?
V(2,1)
d1 workload 1 degradation
24
Two Workloads
?
V(1,2) (1 /(1- u1 ), 1 /(1- u1 )(1-U) )
d2 workload 2 degradation
achievable region R
u 1d1 u 2d2 1/(1-U)
d2 ? 1 /(1- u2 )
?
V(2,1)
d1 ? 1 /(1- u1 )
d1 workload 1 degradation
25
Three Workloads

Degradation vector (d1,d2, d3) lies on plane
u1 d1 u2 d2 u3dr3 C
We know a constraint for each workload w
uw dw ? Cw
Conservation applies to each pair of wkls as
well u1 d1 u2 d2 ? C12
Achievable region has one vertex for each
priority ordering of workloads 3! 6 in all
Hence its name the permutahedron

26
Three Workload Permutahedron
3! 6 vertices (priority orders) 23 - 2 6
edges (conservation constraints)
u1 r1 u2 d2 u3 d3 C
d3
V(1,2,3)
?
?
V(2,1,3)
d2
d1
27
Experimental evidence
28
Four workload permutahedron
4! 24 vertices (ordered states) 24 - 2 14
facets (proper subsets) (conservation
constraints) 74 faces (states)
Simplicial geometry and transportation
polytopes, Trans. Amer. Math. Soc. 217 (1976) 138.
29
Map shares to degradations- two workloads -

Suppose f1 and f2 gt 0 , f1 f2 1
Model System operates in state
12 with probability f1
21 with probability f2
(independent of who is on queue)
Average degradation vector
V f1 V(12) f2 V(21)

30
Predict Degradations From Shares(Two Workloads)

Reasonable modeling assumption f1 1, f2 0
means workload 1 runs at high priority
For arbitrary shares workload priority order is
(1,2) with probability f1
(2,1) with probability f2

(probability fraction of time)
Compute average workload degradation
d1 f1 ? (wkl 1 degradation at high
priority) f2 ? (wkl
1 degradation at low priority )

31
Model validation
32
Model validation
33
Map shares to degradations- three (n) workloads -

f1
f2 f3
prob(123) ------------------------------
(f1 f2 f3) (f2 f3)
(f3)
Theorem These n! probabilities sum to 1
interesting identity generalizing adding
fractions
prove by induction, or by coupon collecting
V ?ordered states s prob(s) V(s)
O(n!), ?(n!), good enough for n ? 9 (12)

34
Model validation
35
Model validation
36
The Fair Share Applet

Screen captures on next slides are from
www.bmc.com/patrol/fairshare
Experiment with what if fair share modeling
Watch a simulation
Random virtual job generator for the simulation
is the same one used to generate random real jobs
for our benchmark studies

37
Three Transaction Workloads
???

Three workloads, each with utilization
0.32 jobs/second ? 1.0
seconds/job 0.32 32
CPU 96 busy, so average (conserved) response
time is 1.0/(1?0.96) 25 seconds
Individual workload average response times depend
on shares

???
???
38
Three Transaction Workloads

Normalized f3 0.20 means 20 of the time
workload 3 (development) would be dispatched at
highest priority

During that time, workload priority order is
(3,1,2) for 32/80 of the time, (3,2,1) for 48/80
Probability( priority order is 312 )
0.20?(32/80) 0.08

39
Three Transaction Workloads

Formulas on previous slide
Average predicted response time weighted by
throughput 25 seconds (as expected)
Hard to understand intuitively
Software helps

40
Three Transaction Workloads
note change from 32
41
Simulation
42
When the Model Fails

Real CPU uses round robin scheduling to deliver
time slices
Short jobs never wait for long jobs to complete
That resembles shortest job first, so response
time conservation law fails
At high utilization, simulation shows smaller
response times than predicted by model
Response time conservation law yields
conservative predictions

43
Scaling Degradation Predictions

V ?ordered states s prob(s) V(s)
Each s is a permutation of (1,2, , n)
Think of it as a vector in n-space
Those n! vectors lie on of a sphere
For n large they are pretty densely packed
Think of prob(s) as a discrete approximation to a
probability distribution on the sphere
V is an integral

44
Monte Carlo

loop sampleSize times
choose a permutation s at random from the
distribution determined by the
shares
compute degradation vector V(s)
accumulate V prob(s)V(s)
sampleSize 40000 works well independent of n!

45
Map shares to degradations(geometry)

Interpret shares as barycentric coordinates in
the n-1 simplex
Study the geometry of the map from the simplex to
the n-1 dimensional permutahedron
Easy when n2 each is a line segment and map is
linear

46
Mapping a triangle to a hexagon
f3 1
f1 0
312
132
f1 1
?
M
f3 0
321
123
wkl 1 high priority
213
231
wkl 1 low priority
47
Mapping a triangle to a hexagon
f1 0
f1 1
?
23
48
Mapping a triangle to a hexagon
49
What This Means