Title: Influence of heavy-tailed distributions on load balancing
1Mor Harchol-Balter Computer Science Dept, CMU
2Defn Queueing Theory The study of
queues, congestion, resource management,
stochastic (probabilistic) modeling
3TERMINOLOGY WARMUP
r l/m lt 1
Incoming jobs
l Avg. rate jobs arrive (jobs/sec)
m Avg. rate jobs served (server speed)
- Examples of single-server queues
- Router
- Supercomputing center
- A database lock queue
- Web server
4Q1 TERMINOLOGY WARMUP
Incoming jobs
l Avg. rate jobs arrive (jobs/sec)
m Avg. rate jobs served (server speed)
QUESTION 1 Suppose l ? 2l.
We want to keep ET unchanged. Should we
(a) Double service rate
(b) More than double service rate
(c) Less than double service rate
Impact Be careful not to overprovision!
5Workload Distribution Warmup
Huge Variability
Heavy tail top 1 jobs comprise half load
Heavy Tails are everywhere in CS
- CPU Lifetimes of UNIX jobs Harchol,Downey96
- Supercomputing job sizes Harchol-Balter,Schroede
r00 - Web file sizes Crovella,Bestavros98,Barford,Cr
ovella98 - Internet Node Degree Faloutsos,Faloutsos,Falouts
os99 - IP Flow durations Rexford99
- Self-similar arrival processes Willinger93
- many, many more ...
6Q2 Exponential Distribution
Heavy-tailed workload
Exponential workload
Huge Variability
QUESTION 2 Under exponentially-distributed job
demands, which scheduling policy wins for
ET?
FCFS
PS
7Q3 Heavy-tailed workload
Heavy-tailed workload
Exponential workload
Huge Variability
QUESTION 3 Under heavy-tailed job demands, which
scheduling policy wins for ET?
FCFS
PS
Impact Know your workload ? scheduling
8 Q4 Scheduling to minimize ET
QUESTION 4 Under heavy-tailed job demands, in
M/G/1, order these scheduling policies for
ET
High ET
Low ET
FCFS
PS
SJF
SRPT
RANDOM
9 Scheduling to minimize ET
Answer Under heavy-tailed job demands, in
M/G/1
High ET
Low ET
lt
lt
lt
RANDOM
FCFS
SJF
PS
SRPT
10single-server questions
11Growing trend towards server farms
Server farms cheap easy to scale
l
Dispatch
12Growing trend towards server farms
Supercomputing/Manufacturing
Web server farm
FCFS
PS
Router
Router
FCFS
PS
- Jobs non-preemptible
- Run-to-completion
- Served in FCFS order
- Often variable job size
- HTTP requests fully-preempt
- Commodity PS servers
- Highly-variable job size
- Examples
- Cisco Local Director
- IBM Network Dispatcher
- Microsoft SharePoint, etc.
13Q5 1 Fast versus Many Slow?
QUESTION 5 Which has lower ET? (for
heavy-tailed workload)
FCFS
Smart Dispatch
vs
l
FCFS
RANDOM Dispatch
Under
14Q5 1 Fast versus Many Slow?
QUESTION 5 Which has lower ET? (for
heavy-tailed workload)
FCFS
Smart Dispatch
vs
l
FCFS
OPT servers
Multiple servers way better under
variable workload
Least-Work-Left Dispatch
Under
Variability ?
3
2
1
Wierman, Osogami, Harchol-Balter,
Scheller-Wolf, Perf. Eval. 06
load r ?
15Q5 1 Fast versus Many Slow?
QUESTION 5 Which has lower ET? (for
heavy-tailed workload)
FCFS
Smart Dispatch
vs
l
FCFS
Multiple servers way better under high
variability workload
Size-based Dispatch
Under
Harchol-Balter, Crovella, Murta,
Jour.Par.Dist.Comp.99
16Q5 1 Fast versus Many Slow?
QUESTION 5 Which has lower ET? (for
heavy-tailed workload)
FCFS
Smart Dispatch
vs
l
FCFS
Multiple servers way better under
variable workload
Size-based Dispatch
small jobs
Unknown Size Dispatch
l
Under
big jobs
Harchol-Balter, Journ. ACM 02
17Q5 1 Fast versus Many Slow?
QUESTION 5 Which has lower ET? (for
heavy-tailed workload)
FCFS
Smart Dispatch
vs
l
FCFS
Impact Best architecture can be cheaper
18Q6 Which routing policy is best?
A Supercomputing/Manufacturing
B Web Server Farm
FCFS
Poisson Process
Router
Router
FCFS
Heavy-tailed, highly variable
Heavy-tailed, highly variable jobs
Least-Work-Left Go to host with least
total work.
Random Equal probability
Size-Based Splitting Jobs split up by size.
Join-Shortest-Queue Go to host with
fewest jobs.
19Q6 Which routing policy is best?
A Supercomputing/Manufacturing
FCFS
Poisson Process
Router
FCFS
Heavy-tailed, highly variable jobs
High ET
Answer to A 1. Random 2. Join-Shortest-Queue 3.
Least-Work-Left 4. Size-Based ( best! )
Low ET
Harchol-Balter, Crovella, Murta, JPDC 99
20single-server questions
21N-sharing model
cycle-stealing Donor helps Beneficiary with her
work when hes free.
But can do better with threshold policies
Studied by S. Bell, R. Williams, M. Harrison,
M. Lopez, M. Squillante, C. Xia, D.Yao, L. Zhang,
R. Schumsky, L. Green, S. Meyn, A. Ahn, D.
Stanford, W. Grassman,
22Q9 Who gets control man or woman?
Question 9 Who should have control? Dan
(donor) or Betty (beneficiary)?
23Q9 Who gets control man or woman?
Difficulty of analysis due to 2D-infinite
chain. We introduce Markov-based Dimensionality
Reduction. Harchol-Balter, Osogami,
Scheller-Wolf SPAA03, Sigmetrics03,
Allerton04, Questa05, Perf. Eval. 06
24Q9 Who gets control man or woman?
Answer Mean response time ET minimized when
woman controls!
25Q10 Which policy is more robust?
Q10 Want policy robust against mis-estimation of
load
26Q10 Which policy is more robust?
Answer Donor control helps, but even better is
to let Benef. have 2 thresholds, where Donor
controls which threshold is used.
27Results Adaptive Dual Threshold policy
TB6 (opt)
TB20 (robust)
TB6 (opt)
Mean response time
ADT meets both goals.
Dans load
Impact Robustness equally important to efficiency
28Conclusion
Weve covered many themes in system design
29If you want to know more
Come take my class ?
30BACKUP
31Q7 To balance or not to balance?
S
M
Size- based
L
XL
Question 7 How to choose the size cutoffs?
?
?
?
32To Balance or Not to Balance?
FCFS
s
s
s
s
S
FCFS
L
L
L
L
job size x
Answer Recent Research on heavy-tailed
workloads Pr Job size gt x
x-a
alt1
a1
agt1
UNBALANCE favor smalls
BALANCE LOAD
UNBALANCE favor larges
Impact May want to rethink all those load
balancing policies
Harchol-Balter,Vesilo, 06, Glynn,
Harchol-Balter, Ramanan, 06